We are splitting our database into Main and CI
Improving the performance and reliability of GitLab.com has always been a top priority for GitLab. While we continuously make iterative improvements to GitLab and our production architecture, we anticipate making a larger change to improve the scalability and reliability of GitLab.com: We are splitting our single PostgreSQL database into a main and a ci database. We believe this process, also known as functional decomposition, will increase GitLab’s database capacity by roughly 2x and allows GitLab.com to continue to scale. When will the split take place and what does this mean for users of GitLab.com? This change is planned to take place between Saturday, 2022-07-02, 05:00am UTC and Saturday, 2022-07-02, 09:00am UTC. The implementation of this change is anticipated to include a service downtime of up to 120 minutes between Saturday, 2022-07-02, 06:00am to 08:00am UTC. During this time you will experience complete service disruption of GitLab.com. We are taking downtime to ensure that the application works as expected following the split and to minimize the risk of any data integrity issues. Background GitLab.com’s database architecture uses a single PostgreSQL database cluster. This single cluster (let’s call it main), consists of a single primary and multiple read-only replicas and stores the data generated by all GitLab features. Database reads can be scaled horizontally through read-only replicas, but writes cannot because PostgreSQL does not support active-active replication natively. A large portion of all writes are generated by features related to Continuous Integration (CI). So, to scale GitLab.com’s database capacity, we are splitting the single PostgreSQL main cluster into two clusters: A Continuous Integration database cluster for all CI-related features (ci). A database cluster for all other features (main). At a high level, GitLab.com’s database architecture is changing like this: You can learn more through our direction page or by visiting our public epic: Decompose GitLab.com’s database to improve scalability. Impact Splitting our database into main and ci initially will only impact GitLab.com. To ensure consistency, we plan to enable decomposition for self-managed GitLab instances later. While this split is a significant architectural change that we believe will increase GitLab’s database capacity by roughly 2x, there are other benefits as well. Increased performance By running two separate database clusters, we believe we will increase the overall count of available database connections. This means we can serve more traffic. It also means that during peak hours there is more buffer, which reduces the likelihood of congestion that may cause performance and UX degradations. Another significant advantage is that we anticipate we will be able to tune the main and ci databases independently, allowing us to optimize these different workloads. Increased stability Splitting the database cluster into main and ci means that ci writes are shifting to the ci database cluster. We anticipate this will lead to reduced database saturation, which is a major cause of incidents. Consequently, we believe that the overall stability of GitLab.com may increase following the split. We believe increased stability means that development teams can spend more time working on generating value through new features and other improvements and less time guarding against potential issues. Shipping as fast as ever A primary objective of this project was to provide tooling to our development teams so that they can continue to develop features that use multiple databases. All of these tools, […]
