How to get the file diff between two S3 buckets? How to get the file diff between two S3 buckets? shell shell

How to get the file diff between two S3 buckets?


Using Display only filenames:

aws s3 ls s3://bucket-1 --recursive | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//' | sort > bucket_1_filesaws s3 ls s3://bucket-2 --recursive | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//' | sort > bucket_2_filesdiff bucket_1_files bucket_2_files


You can use the sync command with the --dryrun option to compare instead of syncing.

aws s3 sync s3://bucket s3://bucket2 --dryrun

You can, of course, also use it to compare a local directory with a bucket.

aws s3 sync . s3://bucket2 --dryrun


Inspired from @George comment

you can use this to extract paths list:

aws s3 sync s3://<main-bucket> s3://<second-bucket> --dryrun | awk 'match($3,"^(s3://[^/]+/)(.*)",a) {print a[2]}'

or for local paths

aws s3 sync <local-path> s3://darsak2.public --dryrun | awk 'match($3,"^(./)?(.*)",a) {print a[2]}'