AWS CLI - Delete multiple files from S3 and exclude certain directories
During preparation for machine learning, I came across a situation where some of my test runs had generated thousands of files in my…
During preparation for machine learning, I came across a situation where some of my test runs had generated thousands of files in my training S3 bucket, and I needed to delete some but not all of them.
In my situation I had a folder called "output" that I absolutely could not delete under any circumstances.
Sure, I could have used the S3 web UI, but as there were over 999+ files it would have taken me hours to manually delete them.
Enter AWS CLI.
With one quick command, I can instruct the CLI to delete EVERYTHING in my S3 bucket, but exclude the folder and and subfolders within it.
It is important to note. The order of your --include or --exlude commands are very important, the further right the command, the more importance it gains.
The command to delete everything, except my folder was as follows:
aws s3 rm s3://ml-dataset --recursive --include "*" --exclude "output/*" --exclude "output/*/*" --dryrun
Note - I have included the --dryrun command above. This is highly recommended as you test your command against your own buckets as it won't actually perform the delete command and will show you what files will be affected by your command.
Once you're happy with the output simply remove the --dryrun flag and your command will delete those files.