When embarking on a new backend project, our first move as developers is often to use some well-established relational database like PostgreSQL or MySQL. However, for some time there have been different database types available on the market. One of them is MongoDB. In this article, I would like to outline some advantages and disadvantages of using it on a project.
MongoDB is a document-oriented database. Data is organised as JSON documents (rows equivalent) with fields (columns equivalent) which are grouped into collections (tables equivalent). It is using BSON format for document storage (binary-serialized JSON) which extends JSON implementation to offer additional data types (e.g. arrays). It also provides data validation based on JSON schema standard (when setting up a collection you can provide JSON schema definition). MongoDB is schema-less by design i.e. each document can have its own set of unique fields within one collection. Additionally, it is distributed and easily scalable geographically/horizontally for better performance.
When considering switching to a document-oriented database, there is a couple of points that one has to weight in order to make the best choice. The first item is lack of referential integrity (RI) — explicitly defined, validated and enforced relations between different pieces of data in our database (e.g. foreign key constraints). It helps with keeping the information consistent and provides an additional layer of validation underneath the programmatic one. However, when the data set is large and diverse (differing fields), RI might prove to be inflexible and inefficient in terms of storage size. Next consideration is transactional support. Up until last month (June ‘18), MongoDB provided transactional support for one document only (there were suboptimal ways to overcome it by embedding different documents from multiple collections into one document). Since the release of version 4.0, we finally have multi-document transaction support (ACID) mirroring the one we have known from relational counterparts.
Another thing warranting examination is grouping/joins support. When delivering data from API there is often a need to preload some additional data (e.g. preloading products and order lines when fetching singular order). In MongoDB, there is no direct way to join additional documents into another collection during fetching. However, there are separate ways to aggregate data by $lookup (combining documents from multiple collections) and aggregation pipeline (grouping, filtering and processing of documents from one collection). It also depends on the structural design of data: sometimes it might be better to embed selected documents instead of creating separate collections for them. On the other hand, embedding too much of information in one document will result in slower query processing. Lastly, it is important to make sure that there is an ORM library (Object-Relational Mapping) for our programming language. The most popular are: mongoose (for node.js) and mongoid (for Ruby on Rails).
In technology debates, the answer rarely is clear-cut but often it is possible to identify some key features or use-cases that can guide us towards the right choice. When choosing the most appropriate database for the application you need to carefully weight different considerations that I have outlined in the former section and think of the type of data structure that you will need to handle.