5 Tips for managing monorepos in GitLab
GitLab was founded 10 years ago on Git because it is the market leading version control system. As Marc Andressen pointed out in 2011, we see teams and code bases expanding at incredible rates, testing the limits of Git. Organizations are experiencing significant slowdowns in performance and added administration complexity working on enormous repositories or monolithic repositories.
Why do organizations develop on monorepos?
Great question. While some might believe that monorepos are a no-no, there are valid reasons why companies, including Google or GitLab (that’s right! We operate a monolithic repository), choose to do so. The main benefits are:
- Monorepos can reduce silos between teams, streamlining collaboration on design, development, and operation of different services because everything is within the same repository.
- Monorepos help organizations standardize on tooling and processes. If a company is pursuing a DevOps transformation, a monorepo can help accelerate change management when it comes to new workflows or the rollout of new tools.
- Monorepos simplify dependency management because all packages can be updated in a single commit.
- Monorepos offer unified CI/CD and build processes. Having all services in a single repository means that you can set up one system of pipelines for everyone.
While we still have a ways to go before monorepos or monolithic repositories are as easy to manage as multi-repos in GitLab, we put together five tips and tricks to maintain velocity while developing on a monorepo in GitLab.
1. Use CODEOWNERS to streamline merge request approvals
CODEOWNERS files live in the repository and assign an owner to a portion of the code, making it super efficient to process changes. Investing time in setting up a robust CODEOWNERS file that you can then use to automate merge request approvals from required people will save time down the road for developers.
You can then set your merge requests so they must be approved by Code Owners before merge. CODEOWNERS specified for the changed files in the merge request will be automatically notified.
2. Improve git operation performance with Git LFS
A universal truth of git is that managing large files is challenging. If you work in the gaming industry, I am sure you’ve been through the annoying process of trying to remove a binary file from the repository history after a well-meaning coworker committed it. This is where Git LFS comes in. Git LFS keeps all the big files in a different location so that they do not exponentially increase the size of a repository.
The GitLab server communicates with the Git LFS client over HTTPS. You can enable Git LFS for a project by toggling it in project settings. All files in Git LFS can be tracked in the GitLab interface. GitLab indicates what files are stored there with the LFS icon.
3. Reduce download time with partial clone operations
Partial clone is a performance optimization that allows Git to function without having a complete copy of the repository. The goal of this work is to allow Git to better handle extremely large repositories.
As we just talked about, storing large binary files in Git is normally discouraged, because every large file added is downloaded by everyone who clones or fetches changes thereafter. These downloads are slow and problematic, especially when working from a slow or unreliable internet connection.
Using partial clone with a file size filter solves this problem, by excluding troublesome large files from clones and fetches.
4. Take advantage of parent-child pipelines
Parent-child pipelines are where one pipeline triggers a set of downstream pipelines in the same project. The downstream pipelines still execute in the same stages or sequence without waiting for other pipelines to finish. Additionally, child pipelines reduce the configuration to the child pipeline, making it easier to interpret and understand. For monorepos, using parent-child pipelines in conjunction with rules:changes
will only run pipelines on specified files changes. This reduces wasted time running pipelines across the entire repository.
5. Use incremental backups to eliminate downtime
Incremental backups can be faster than full backups because they only pack changes since the last backup into the backup bundle for each repository. This is super useful when you are working on a large repository and only developing on certain parts of the code base at a time.
Where we are headed
While these tips have helped many customers migrate from other version control systems to GitLab, we know there is still room for improvement. Over the next year, you will see us working on the following projects. We’d LOVE to hear from you, so share your thoughts, ideas, or simply ? on an issue to help prioritize things that will make your life easier.
“Monolithic repositories have a long way to go until they are as easy to manage as multi-repos in GitLab so check out these 5 tips to make this DevOps task easier.” – Sarah Waldner, Jackie Porter
Click to tweet