How to get the file diff between two S3 buckets?
Using Display only filenames:
aws s3 ls s3://bucket-1 --recursive | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//' | sort > bucket_1_filesaws s3 ls s3://bucket-2 --recursive | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//' | sort > bucket_2_filesdiff bucket_1_files bucket_2_files
You can use the sync
command with the --dryrun
option to compare instead of syncing.
aws s3 sync s3://bucket s3://bucket2 --dryrun
You can, of course, also use it to compare a local directory with a bucket.
aws s3 sync . s3://bucket2 --dryrun
Inspired from @George comment
you can use this to extract paths list:
aws s3 sync s3://<main-bucket> s3://<second-bucket> --dryrun | awk 'match($3,"^(s3://[^/]+/)(.*)",a) {print a[2]}'
or for local paths
aws s3 sync <local-path> s3://darsak2.public --dryrun | awk 'match($3,"^(./)?(.*)",a) {print a[2]}'