Product and Tech
How we migrated all of our data models from plain SQL to DBT in just five days. What we learned along the way. Why it is important to be humble towards previous work in a fast-growing organization like Mentimeter. And how we had fun while doing it.
About the author: Ramtin Javanmardi is a Data Engineer in the Data Team.
As we have grown as a company but more specifically as a data team we have experienced some of the growth pains that come with that. And given that one of our core principles is 80/20 and scaling ourselves we looked at the ever-evolving data space and found a promising tool that has gained quite the traction, namely DBT. DBT is short for data building tool, the main use of this tool is to transform your data in your data warehouse, it does not support extraction and load operations.
So why did we feel compelled to migrate over 30 of our models and sunset the old models? One of the major reasons was that we as a team felt that many of the data models that are important were written a long time ago when the company was much smaller and were never supposed to be permanent. As such they were hard to understand for new team members and the style of the syntax was not ideal. This is something that more and more companies are experiencing these days: as the data usage matures, the requirements of the models being easy to understand and the code itself being readable and maintainable become more and more important.
One important more technical reason for the migration was that since the data landscape evolves so fast and we wanted to use python not only for extraction and load but also for the orchestration of the data jobs and reverse ETL jobs (a future post about how we are doing this will come, keep an eye out!). Using DBT lends itself very well for this use case too since it is built so that it is easy to use version control and orchestrate the transformations. Another reason was that as the reliance on our data models grew we wanted to be able to re-run specific models during the day, in case of something going wrong, and old code/implementation did not allow for this.
The last major problem that we wanted to solve was that we wanted to spread the knowledge of more of our models to the entire data team and get rapid adoption of DBT. Because many of the major models that were used daily by our data scientists were written before any of them had started at Mentimeter and that makes it harder to understand the nuances and intricacies of said models.
The way we did the migration was in three distinct steps.
The first step was that we wanted to have a small group of people that would be familiar with DBT before introducing the rest of the team to avoid ending up in a situation where the blind leads the blind. So the data platform team (DPT) took it upon ourselves to migrate a few of the models before the big migration so that we would get familiarized with DBT and run into some common issues with the setup.
The second step was the bulk of the work where the entire data team was involved. We chose to execute this part of the work as a hackathon, where everyone could tell us their favorite snacks and the DPT would go get them the day before and put stuff in order so that it was a fun environment. And after the workday had ended we had a team activity where we went out and had a proper dinner and could just relax. We (as in DPT) also graded the models so that everyone could set their own level, or at least start with an easy one and get up to speed so that they later can pick a slightly harder one.
While all of this was going on we were ready to help them with any setup issues and questions they had so that they would not get stuck. Since the first experience with a new set of tools is important we wanted to really convey how smooth and flexible DBT really is. The only real requirement that we had was that the migrated model and the original model should have the exact same output. We did not want to start optimizing and migrating at the same time.
The last step was that the DPT took care of all the PR:s and the stitching of all the models. Essentially making them import the same references and making everything coherent and some testing just to make sure that everything works as intended.
Big migrations of business-critical models do not have to be boring or feel stressful at all if the team as a whole just makes it fun and frames the entire project as a learning experience. Because now all our data scientists have at least touched DBT and could use it to create some new models or at least understand how to ask for what they need in a much more clear manner. The result of this is that the team can work more efficiently and at the same time everyone has learned how to use a framework that is getting more and more popular in the data space everyday!