What's the best way to sync large amounts of data around the world? What's the best way to sync large amounts of data around the world? unix unix

What's the best way to sync large amounts of data around the world?


Have you tried Unison?

I've had good results with it. It's basically a smarter rsync, which maybe is what you want. There is a listing comparing file syncing tools here.


Sounds like a job for BitTorrent.

For each new file at each site, create a bittorrent seed file and put it into centralized web-accessible dir.

Each site then downloads (via bittorrent) all files. This will gen you bandwidth sharing and automatic local copy reuse.

Actual recipe will depend on your need.For example, you can create 1 bittorrent seed for each file on each host, and set modification time of the seed file to be the same as the modification time of the file itself. Since you'll be doing it daily (hourly?) it's better to use something like "make" to (re-)create seed files only for new or updated files.

Then you copy all seed files from all hosts to the centralized location ("tracker dir") with option "overwrite only if newer". This gets you a set of torrent seeds for all newest copies of all files.

Then each host downloads all seed files (again, with "overwrite if newer setting") and starts bittorrent download on all of them. This will download/redownload all the new/updated files.

Rince and repeat, daily.

BTW, there will be no "downloading from itself", as you said in the comment. If file is already present on the local host, its checksum will be verified, and no downloading will take place.


How about something along the lines of Red Hat's Global Filesystem, so that the whole structure is split across every site onto multiple devices, rather than having it all replicated at each location?

Or perhaps a commercial network storage system such as from LeftHand Networks (disclaimer - I have no idea on cost, and haven't used them).