DistCp fault tolerance between two remote clusters
distcp
uses MapReduce to effect its distribution, error handling and recovery, and reporting.
Please see Update and Overwrite
You can use -overwrite option to avoid duplicates Moreover, you can check update option as well. If network connection fails, once its connection recovered then you can re-initiate with overwrite option
See the examples of -update and -overwrite as mentioned in above guide link.
Here is the link for refactored distcp:https://hadoop.apache.org/docs/r2.7.2/hadoop-distcp/DistCp.html
As "@RamPrasad G" mentioned, I guess you have no option other than redo the distcp in case of network failure.
Some good reads:
Hadoop distcp network failures with WebHDFS
http://www.ghostar.org/2015/08/hadoop-distcp-network-failures-with-webhdfs/
Distcp between two HA Cluster
http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/
Transferring Data to/from Altiscale via S3 using DistCp
https://documentation.altiscale.com/transferring-data-using-distcpThis page has a link for a shell script with retry, which could be helpful to you.
Note: Thanks to original authors.