How to copy files as fast as possible? How to copy files as fast as possible? unix unix

How to copy files as fast as possible?


you can try this command

rsync

from the

man rsync

you will see that: The rsync remote-update protocol allows rsync to transfer just the differences between two sets of files across the network connection, using an efficient checksum-search algorithm described in the technical report that accompanies this package.


You may try the HPN-SSH (High Performance SSH/SCP) - http://www.psc.edu/index.php/hpn-ssh or http://hpnssh.sourceforge.net/

The HPN-SSH project is the set of patches for OpenSSH (scp is part of it), to better tune various tcp and internal buffers. There is also "none" cipher ("None Cipher Switching") which disables encryption, and this may help you too (if you don't use public networks to send the data).

Both compression and encryption consumes CPU time; and 10 Gbit Ethernet sometimes may be faster to transfer uncompressed file then waiting CPU to compress and encrypt it.

You may profile your setup:

  • Measure the network bandwidth between machines using iperf or netperf. Compare with the actual network (network cards capabilities, switches). With good setup you should get more than 80-90 percents of declared speed.
  • Calculate data volume and the time needed to transfer so much data with your network using speed from iperf or netperf. Compare with actual transfer time, is there huge difference?
    • If your CPU is fast, data is compressible and network is slow, compressing will help you.
  • Take a look on top, vmstat, iostat.
    • Are there 100% loaded CPU cores (run top and press 1 to see cores)?
    • Are there too much interrupts (in) in vmstat 1? What about context switches (cs)?
    • What is file reading speed in iostat 1? Are your HDDs are fast enough to read data; to write data on receiver?
  • You can try to do full-system profiling using perf top or perf record -a. Is there lot of computing by scp, or network stack in Linux? If you can install dtrace or ktap, try to make also off-cpu profiling


You have 1.5 GB * 400 = 600 GB of data. Unrelated to the answer I suggest that the machine set up looks incorrect if you need to transfer this amount of data. You probably needed to generate this data at machine A in the first place.

There are 600 GB of data being transferred in 2 hours, that is ~ 85 MB/s transfer rate, which means you probably reached the transfer limits of either your disk drives or (almost) the network. I believe you won't be able to transfer faster with any other command.

If the machines are close to each other, the method of copying that I believe is the fastest is to physically remove the storage from machines B and C, put them in machine A and then locally copy them without transferring via the network. The time for this is the time to move around the storage, plus disk transfer times. I'm afraid, however, the copy won't be much faster than 85 MB/s.

The network transfer command that I believe would be the fastest one is netcat, because it has no overhead related to encryption. Additionally, if the files are not media files, you have to compress them using a compressor that compresses faster than 85 MB/s. I know of lzop and lz4 that are granted to be faster than this rate. So my command line for transfering a single directory would be (BSD netcat syntax):

machine A:

$ nc -l 2000 | lzop -d | tar x

machine B or C (can be executed from machine A with the help of ssh):

$ tar c directory | lzop | nc machineA 2000

Remove the compressor if transfering media files, which are already compressed.

The commands to organize your directory structure are irrelevant in terms of speed, so I didn't bother to write them here, but you can reuse your own code.

This is the fastest method I can think of, but, again, I don't believe this command will be much faster that what you already have.