AWS has a multitude of database options. There are overlaps between when you’d use them and it’s not always presented in the cleanest way. In this piece I’ll give my thoughts on which databases you should focus on and try to cover as many of the databases as I can.
Unfortunately not every AWS database product is straightforward to use or even worth using in some cases. On the other hand the best databases from AWS are world class. If you want the short version of this whole thing it’s this. If you want a relational database use Aurora preferably Postgres but if your team has tons of MySQL experience MySQL aurora is fine as well. If you are looking for a key value store then use DynamoDB. Caching with Elasticache Redis works just fine. For analytics you can consider Athena which Amazon doesn’t even classify as a database but it’s nice. Otherwise for analytics you are probably better off with Snowflake, databricks or big query depending on what you are doing.
Now let’s dive into some more depth about the main AWS databases and some of the more obscure ones and what I like and don’t like about each of them.
- RDS and Aurora
- Analytics Databases
- Key Value stores, DocumentDB and caches
- The niche databases
Let’s start talking about specific databases with RDS. RDS is an interesting product. It’s perfectly good for what it does. However it’s hard to recommend using it at this point because Aurora is better for most use cases.
There are some exceptions to that though for example if you are using the free tier RDS can exist on it but not Aurora. There are also issues around tuning where if you have queries that are very tuned for a specific engine RDS may be faster. The final reason why you may end up preferring RDS to Aurora is that you need something besides mySQL and postGRS.
In General though Aurora is vastly preferable due to its architecture. Aurora’s architecture allows for there to be a significant increase in the resiliency of your data. It also allows for things like running your database cross region which is incredibly powerful. If you are developing a project that is totally greenfield Aurora will usually end up being the right choice.
When it comes to databases for analytics AWS is going to push towards Redshift. Unfortunately I don’t particularly love Redshift. It’s hard to set up and use in a coherent way and looks very lacking when it’s compared to offerings like Snowflake and big query.
On the other hand AWS Athena is pretty great. It’s funny AWS doesn’t market it as a database. It’s not really it lets you query your s3 buckets. However it offers very flat query times over even pretty large datasets. It’s not quite as powerful as a traditional analytics database but if it fits your use case I think it’s a great choice.
NoSQL and Caches
Unlike with analytics for document databases and caches AWS offers some very strong options. Doing data modeling well with Dynamo is challenging; it offers an incredibly powerful platform for building. Data modeling with dynamo is well explained with this talk. If your data can be modeled with Dynamo I highly recommend it. When it’s used right the performance can be incredible and resiliency is great due to the ability to easily run global DynamoDB.
On the other hand DocumentDB is a much worse experience. It’s mostly MongoDB compatible but it’s missing key features. DocumentDB aims to be in line with MongoDB 4.0; it doesn’t totally do that and it’s significantly behind mainline Mongo which is at 6.0 at the time of writing. If you absolutely need a MongoDB compatible service and you need it to be a managed service running in AWS then DocumentDB is your only choice. Most of the time though I’d encourage you to either use Atlas or to use DynamoDB.
For the final section of this we will look at some of the more random and specialized databases that AWS has. For the most part you’ll probably never use these databases because there use cases are so specialized.
First up is Neptune which is the only one of these that I’ve actually used. It’s AWS graph database. Now the problem here is actually the graph database part. It seems like a super cool idea to store the data in a graph. This is often a structure that makes intuitive sense. The issue is that the number of times that this is actually the correct thing is way less then it would seem. Graphs have some unfortunate performance implications when you store everything as a graph. As far as graphDB go Neptune isn’t that bad but it’s just a rarely needed data type.
I have very little to say about QLDB, timestream and Keyspaces. I’m including them here for the sake of completeness. In the event that what you are doing fits one of their use cases they are worth experimenting with. I’ve never heard anybody say much about them unfortunately so there might be rough edges as there can be with less used AWS services.
As you can see there are a ton of options for building on AWS. Many of those options aren’t as great as you’d hope. However if I was building a greenfield application on AWS I’d use the following:
- Use Postgres Aurora if the relational model is a good fit
- If I need a key value store I’d use DynamoDB
- Elasticache Redis for a caching layer if using Aurora
- You can probably use DAX for DynamoDB if you need caching for Dynamo but dynamo queries tend to be fast
- For analytics I’d personally use Snowflake but there are tons of good options out there