By the end of this November, I will be working at Doordash for four years. This is a major milestone and is a good time to take a break and summarize the learnings along the course. So I decided to write this blog series. I will focus on discussing the bad and good practices to build a scalable service and to break down a monolithic service into microservices. I hope you are able to apply them to your own work.
The biggest issues Doordash faced in the past few years are reliability and scalability challenges. They are connected: since we can’t scale the server as the business grows, we crash very often. In the worst outage, Doordash lost millions of dollars since it had to pay for the prepared food to the merchant and send extra apology credit to the customer besides the refund.
While we know migrating to microservice architecture would solve the problem, we don’t know how to get there. Very few of us had similar experience, in the meantime, the current monolithic service still needs lot’s of hand holding and we had to faced many organizational and culture challenges.
To understand the issue better, let’s take a look at Doordash’s product model. Doordash had always been a three side business before it stepped into grocery delivery recently. In a regular order cycle, the following events happen:
- The customer browses the store, creates their order cart, and submit the order.
- The order is sent to the merchant so that the merchant can prepare food.
- In the meantime, a dispatch request is created, and the dispatch system matches the order with the dasher.
- The dasher go to the merchant, pick up the food, then delivers it to the customer, and mark this delivery as done.
My team was responsible for extracting the merchant side data platform from the monolithic service. The merchant side data platform was responsible for servicing all merchant side data, including all the consumer side applications, the merchant side onboarding and management applications, as well as streaming the data to the search service for indexing. We spent about 2 years extracting the service out of the monolithic services and built a brand new microservices, with all the learnings from the monolithic world.
In the following chapters, I am not going to dive into the project directly, instead, I will spend most of the time explaining the challenges in technical, culture and organization. The following are my plan:
- Chapter 1: decoupling client and backend through BFF.
- Chapter 2: why ORM can be the scaling bottleneck?
- Chapter 3: fight the short term fires: how to scale the SQL database.
- Chapter 4: good and bad cache practices.
- Chapter 5: Build your own CDC when a native one is not available.
- Chapter 6: two ways to split a microservice, I wish I could freeze the time.
- Chapter 7: understand your client’s behavior and design the right API for them.
- Chapter 8: organizational barriers during service extraction.
- Chapter 9: the customer is impatient, incrementally delivering your project.
- Chapter 10: application level partitioning to scale the SQL database.
- Chapter 11: what I might have done differently.