Skip to main content

Data stores


At the moment (Nov 19) we use Redis as a Celery backend (i.e. for storing results of Celery tasks) as well as a data store where we track copr builds, github app installations and whitelist of users. Redis is in-memory DB, i.e. the more data we store the more memory we use. Because memory is (unlike disk) quite expensive we want to move to a DB which stores the data on disk. The task here is to decide between SQL or NoSQL and then between our own deployment or a database in a cloud.


SQL stores data in a structured way of interconnected tables. The question here is whether we actually need structured tables to do some crazy queries. Big plus here is that Celery supports SQLAlchemy as a built-in backend and the same applies to Flask. From the databases which SQLAlchemy supports I'd select SQLite and Postgresql (see also this and this):



Ultra-lightweight in setup, administration, and required resource. Very fast.


Because SQLite is a serverless database, it doesn’t provide direct network access to its data. An application (as I understand it) just stores data in a file by using a SQLite library. If more containers (in our case service and more workers) need to access the db, they probably need to have the file on a shared (RWX) volume.



JSON data type is a big plus, because we already have our data serialized as jsons so we can just store them that way and then do all kinds of queries over them.




Example of a NoSQL (Document-oriented) database is MongoDB.


Stores data in flexible, JSON-like documents, meaning fields can vary from document to document and data structure can be changed over time. It also supports queries and indexing.


Using MongoDB as a Celery backend is not that straightforward, but there are options: [1], [2], [3]


Deploy ourselves or cloud-native? That's the question.

Deploy in OpenShift

There are publicly available container images for both Postgresql and MongoDB.




Even that one can use MongoDB in AWS (see [1], [2]), Amazon seems to be pushing their MongoDB compatible DocumentDB (see also Docs/Developer Guide). Given that we don't need full MongoDB compatibility, the DocumentDB seems to be the preferred one. (and the winner of today's battle)