Study notes for Amazon RDS and DynamoDB.
Databases are the core component of any modern web application, and must be kept secure and backed up.
Although Amazon’s training videos will tell you that no-SQL, schemaless databases are the future, just remember that PostgreSQL can also store JSON objects in tables and perform on par with all those no-SQL DBs. That being said, Dynamo DB’s serverless key-value tables do have a purpose, but just don’t abandon ship for No-SQL right away, ok?
If you do go the No-SQL route, please for the sake of all your data scientists and engineers, do NOT choose to go schemaless.
As a rule of thumb, if you have engineers that know and love MongoDB, let them use MongoDB (or AWS DocumentDB). If you have engineers that love MySQL, let them use MySQL. I don’t know if there is really that large of a gap in performance between different database vendors, so let your team work with what they’re best at.
Databases Need IPs Too
If you do not choose a serverless database such as Aurora Serverless or Dynamo DB, then you RDS instance will need to have a IP address assigned to it. If it’s in a public subnet, then you will be incurring a public IPv4 address cost too. Keep that in mind when calculating costs.
Use RDS Proxy
Databases and database clients kinda suck at handling connections. For on-premises deployments, many people use pg_bouncer
as a proxy in front of their PostgreSQL database to pool connections for faster connection times and less resource waste.
AWS has a similar tool called RDS Proxy.
Database Migration Service (DMS) is Tricky
One of the services you are introduced to by the AWS trainings is the Database Migration Service, which is designed to make migrating your on-premises databases to the cloud easier. It features a wide variety of source and target database types, and is marketed so heavily by AWS that you would think that it’s a solid choice.
Wrong. It’s an OK choice, but you don’t have to go far on the internet to find horror stories. Exercise caution when using DMS. Run some test migrations first, with representative data schemas and approximately the same scale of data that you will be migrating.
For example, the folks at Smily posted about their DMS migration experience going from PostgreSQL to PostgreSQL. It doesn’t get simpler than that, yet they still encountered issues. I recommend you give their article a read. In it, they point out a specific error they had that if a VARCHAR column didn’t have a size limit set, that it would automatically be assigned a size limit of 8000 characters, which they figured out from this Qlick webpage oddly enough. That being said, if you have giant text strings, maybe it’s time to convert that column to a TEXT
datatype instead of a VARCHAR
.
Also remember to thoroughly check your migrated database before using it live. This Redditor performed a DMS migration, and it seemed successful, but many rows of data were missing in the middle of the database tables. Entries at the ends of the tables were fine, but somehow the middle went missing. Some better testing and validation would have caught this.
Dynamo DB Change Events Works with API Sync Very Well
In a stroke of genius, AWS has created a very impressive integration between Dynamo DB and AWS App Sync.
Instead of client devices performing a dreaded HTTP polling loop, AWS AppSync can send change notifications to your client application using a GraphQL subscription over a WebSocket connection. It has worked great for me, and was relatively easy to setup. If you’re still doing periodic request polling in 2024, please switch to WebSockets or at least long polling.