Skip to main content

Data stores

Why

At the moment (Nov 19) we use Redis as a Celery backend (i.e. for storing results of Celery tasks) as well as a data store where we track copr builds, github app installations and whitelist of users. Redis is in-memory DB, i.e. the more data we store the more memory we use. Because memory is (unlike disk) quite expensive we want to move to a DB which stores the data on disk. The task here is to decide between SQL or NoSQL and then between our own deployment or a database in a cloud.

SQL

SQL stores data in a structured way of interconnected tables. The question here is whether we actually need structured tables to do some crazy queries. Big plus here is that Celery supports SQLAlchemy as a built-in backend and the same applies to Flask. From the databases which SQLAlchemy supports I'd select SQLite and Postgresql (see also this and this):

SQLite

Pros

Ultra-lightweight in setup, administration, and required resource. Very fast.

Cons

Because SQLite is a serverless database, it doesn’t provide direct network access to its data. An application (as I understand it) just stores data in a file by using a SQLite library. If more containers (in our case service and more workers) need to access the db, they probably need to have the file on a shared (RWX) volume.

Postgresql

Pros

JSON data type is a big plus, because we already have our data serialized as jsons so we can just store them that way and then do all kinds of queries over them.

Cons

Overkill

NoSQL

Example of a NoSQL (Document-oriented) database is MongoDB.

Pros

Stores data in flexible, JSON-like documents, meaning fields can vary from document to document and data structure can be changed over time. It also supports queries and indexing.

Cons

Using MongoDB as a Celery backend is not that straightforward, but there are options: [1], [2], [3]

DBaaS

Deploy ourselves or cloud-native? That's the question.

Deploy in OpenShift

There are publicly available container images for both Postgresql and MongoDB.

AWS

Postgresql

MongoDB

Even that one can use MongoDB in AWS (see [1], [2]), Amazon seems to be pushing their MongoDB compatible DocumentDB (see also Docs/Developer Guide). Given that we don't need full MongoDB compatibility, the DocumentDB seems to be the preferred one. (and the winner of today's battle)