Study notes for Amazon S3.
The Simple Storage Service was Amazon’s first service offering on the cloud, so it looks quite a bit different than the other services on AWS. For example, its REST API uses XML instead of JSON, and buckets are given their own subdomain instead of being another part of the URL path. Additionally, the bucket ACL’s are another oddity (more XML), and were probably created because IAM was not released for another 6 years! Reading the Timeline of Amazon Web Services on Wikipedia is a great way to see how AWS has evolved.
S3 Buckets are Regional(-ish)
One confusing part of S3 is that buckets are tied to a Region, but its name must be unique across the entire partition!
This means that most simple bucket names will already be taken, and you’ll have to resort to using random characters or prefixing the bucket name with your AWS Account ID, which might be a bad idea since gaining Account ID’s are the first steps for hackers to begin a targeted attack against your account.
Bucket Deletion is a Pain in Many Ways
The other annoying result of partition-unique bucket names is that you will have to wait up to an hour after deleting a bucket to create a new bucket with the same name.
This has led me having to do some searching to override the default qualifier used in the AWS CDK after I deleted the CdkStack created from the bootstrap (whoops). Thankfully, I figured out how to fix that.
Deleting a bucket through the AWS Console is blocked unless the bucket is empty. Sadly, there is no such protection if you use the AWS CLI’s aws s3 rb --force
command. It seems like deleting buckets accidentally (or maliciously) happens more often than you think, so do yourself a favor and do one or more of the following to protect your valuable S3 data:
- Enable Versioning
- Enable MFA Delete
- Create backups with AWS Backup
- Use cross-region replication
Incomplete Multipart Uploads
When uploading large (>100MB) objects to an S3 bucket, you will need to use a multi-part upload. This allows you to upload your object chunks at a time until the entire object is uploaded.
The process of actually performing a multi-part upload is a little complex, and thankfully it is abstracted away by the AWS SDK.
However, if that process were to be interrupted, say your EC2 Spot instance ran out of time, then you may be left with an incomplete multipart upload.
These incomplete multipart uploads still take up storage space that you are charged for!
Thankfully, you can configure a storage lifecycle rule to remove any incomplete multi-part uploads after a certain period of time.
If you are uploading objects to your S3 bucket in an automated fashion, then I recommend using that lifecycle rule.
Recommended Reading
Other cool stuff I found while learning about S3:
- Daniel Grzelak writes about Things you wish you didn’t need to know about S3
- An article about how S3 works under the hood
- Some people at cloudonaut have written twice about keeping clients from uploading viruses into your S3 buckets