An SA story about hyperscaling GitLab Runner workloads using Kubernetes
The following fictional story1 reflects a repeating pattern that Solutions Architects at GitLab encounter frequently. In the analysis of this story we intend to demonstrate three things: (a) Why one should be thoughtful in leveraging Kubernetes for scaling, (b) How unintended consequences of an approach to automation can create a net productivity loss for an organization (reversal of ROI) and (c) How solutions architecture perspectives can help find anti-patterns – retrospectively or when applied during a development process. A DevOps transformation story snippet Gild Investment Trust went through a DevOps transformational effort to build efficiency in their development process through automation with GitLab. Dakota, the application development director, knew that their current system handled about 80 pipelines with 600 total tasks and over 30,000 CI minutes so they knew that scaled CI was needed. Since development occurred primarily during European business hours, they were interested in reducing compute costs outside of peak work hours. Cloud compute was also a target due to acquring the pay per use model combined with elastic scaling. Ingrid was the infrastructure engineer for developer productivity who was tasked with building out the shared GitLab Runner fleet to meet the needs of the development teams. At the beginning of the project she made a successful bid to leverage Kubernetes to scale CI and CD to take advantage of the elastic scaling and high availability all with the efficiency of containers. Ingrid had recently achieved the Certified Kubernetes Administrator (CKA) certification and she was eager to put her knowledge to practical use. She did some additional reading around applications running on Kubernetes and noted the strong emphasis on minimizing the resource profile of microservices to achieve efficiency in the form of compute density. She defined runner containers with 2GB of memory and 750millicores (about three quarters of a CPU) had good results from running some test CI pipelines. She also decided to leverage the Kubernetes Cluster Autoscaler which would use the overall cluster utilization and scheduling to automatically add and remove Kubernetes worker nodes for smooth elastic scaling in response to demand. About 3 months into the proof of concept implementation, Sasha, a developer team lead, noted that many of their new job types were failing with strange error messages. The same jobs ran fine on quickly provisioned GitLab shell runners. Since the primary difference between the environments was the liberal allocation of machine resources in a shell runner, Sasha reasoned that the failures were likely due to the constrained CPU and memory resources of the Kubernetes pods. To test this hypothesis, Ingrid decided to add a new pod definition. She found it was difficult to discern which of the job types were failing due to CPU constraints, which ones due to memory constraints and which ones due to the combination of both. She knew it could be a lot of her time to discern the answer. She decided to simply define a pod that was more liberal on both CPU and memory and have it be selectable by runner tagging when more resources were needed for certain CI jobs. She created a GitLab Runner pod definition with 4GB of memory and 1750 millicores of CPU to cover the failing job types. Developers could then use these larger containers when the smaller ones failed by adding the […]
