Chapter 2: Why ORM Can Be the Scaling Bottleneck?

ORM is widely used in Doordash’s Django monolithic service. Like many other technologies, it enabled the developer to focus on the business logic. However, it became a major scalability bottleneck as the service grew. In this chapter, I am going to talk about the bad side of ORM, and why we decided to get rid of ORM in our new microservice and use SQL queries instead.

How does ORM work?

Object–relational mapping or ORM, is a technology to map the programming model to the database model, so the developers can manipulate the data models only while the framework handles mapping these data manipulation to the DB queries.

The purpose is to help improve the developer efficiency. When designing a stateful system, we have to worry about a few things: how to design the domain model, how to design the database models, and how to map them to each other. Using the store as an example, the following is the domain model:

class store:
    long id
    long businessId
    string name

The correspondent DB model looks like this:

In order to access the store data, you have two write db queries and bind them:

class StoreRepo:
    def get_store_by_id(store_id):
        query = "select * from store where :store_id"
        return db_binder(query, store_id=store_id)

store = store_repo.get_store_by_id(123)
name = store.name

However, with ORM, you can register your store model as an ORM model. I am going to use Django ORM as an example:

from django.db import models
from demo_models import Business

class Store(models.Model):
    id = models.BigIntegerField()
    name = models.CharField(max_length=100)
    business = models.ForeignKey(Business)


    def __str__(self):
        return self.name
...
store = Store.objects.get(id=123)
store_name = store.name

# you can also modify the model 
store.name = “new name”
store.save()

ORM creates an abstraction layer of the database for the developers. It is powerful and easy to use, however, this is also the reason why it can be easily abused.

Since ORM is so easy to use, it creates an illusion that DB operation is as cheap as the in memory data model operation. However they are not, a DB operation is much more expensive. When the traffic volume is low, and the DB operation is simple, the user doesn’t see the difference. However, as the traffic grows, the domain model and the DB model become more complex, the ORM based solution can be extremely slow and difficult to maintain.

ORM Related Issues

In doordash’s monolithic service, there were many issues related to ORM.

Cross database instance foreign key

We had a few database instance back end, to make the data access easy, a cross DB foreign key extension was invented. It enables you to define a model in another db instance as the foreign key dependency. For example, the order cart data lives in the main DB, while the store model lives in the merchant DB, and you can access the merchant DB directly from the order cart data models.

This breaks the isolation between domains and creates many otherwise unnecessary dependencies when you have to extract the service out of the monolithic service. When you extract the service out of a monolithic, one of the steps is to remove the foreign key dependency by replacing the data access with API calls. The more FK dependency you have, the more clients you will have to migrate. I will discuss more about this issue in later chapters.

Unoptimized access patterns

ORM will translate the model operations into DB queries, however, sometimes they are not efficient, especially when it comes to table joins.

Models becomes very heavy

Another anti pattern we see is to add business operations at the DB model save path. For example, sending update notifications to other services. We see a tendency to mix the DB operation with the business logic, and since the DB operation is at the common path of all operations, users with different requirements end up creating branches in the DB models. And the models end up being very complex.

Maintenance Issue

And there is no way to prevent people from using some unoptimize operations, such as querying by un-indexed columns, entire table scans etc. Since the ORM model can be imported and used anywhere, it is hard to control the increase of access. We had to rely on ad hoc DB query profiling to capture the expensive queries. Again, this is not a sustainable model.

Use Explicit DB Queries

Our new microservice is a JVM service. When we designed the new service, we thought about using a java based SQL ORM, such as Hibernate. However, considering all the bad experience we had with ORM in the monolithic service, we decided to use explicit DB queries and the JDBI binder. We are able to reduce the number of different Db queries from thousands to less than two hundreds, and with the help of DB profilers such as PGAnalyze, we can easily tune the performance of any queries.

About This Blog Series

This post is part of the “How to split a monolithic service?” blog series, check the introduction for more details: How to Split A Monolithic Service? Introduction

Chapter 2: Why ORM Can Be the Scaling Bottleneck?

How does ORM work?

ORM Related Issues

Use Explicit DB Queries

About This Blog Series

Leave a Reply Cancel reply

Categories

Archives