The GitHub Security Lab’s journey to disclosing 500 CVEs in open source projects
When I stepped onto the scale this morning, I remembered that there are some numbers that feel awkward to celebrate, while perhaps some others are worth celebrating! Recently, the GitHub Security Lab passed the milestone of 500 CVEs disclosed to open source projects. What’s a CVE? In short, it’s the record of a security vulnerability, under the CVE program, intended to inform impacted users. So, finding more vulnerabilities in open source shouldn’t be good news, right? Even as developer communities are getting better at keeping themselves secure, security issues may still slip through their defenses. This means that there will always be a need for security researchers, like the Security Lab, to discover and help fix them.
If you’re not familiar with the Security Lab, we’re a team of security experts who work with the broader open source community to help fix security issues in their projects, with the goal of improving the overall security posture of open source. Our core activity is to audit open source projects, not only the ones hosted on GitHub–and help their maintainers fix the vulnerabilities we find, for free. This research is foundational for our other activities, such as education, improvement of our open source static analysis rules, and tooling. And now we are celebrating more than 500 CVEs disclosed. ????
How did we get here?
The history of the Security Lab dates back to Semmle, the company that created CodeQL, and which was later acquired by GitHub. 2017 was a pivotal year, as we realized how powerful our product could be for finding security vulnerabilities. Unlike many other static analysis tools, CodeQL efficiently codifies insecure patterns and responds urgently to new security threats at scale. To showcase this capability, Semmle created a small security research team who used CodeQL to search for vulnerabilities in open source projects, and a web portal named LGTM.com where all open source projects could run CodeQL for free and be alerted of potential security flaws directly within their pull requests. This approach grew into an important company objective: find and fix vulnerabilities at scale in open source. This was a way of giving back to the open source community, just like any software company should.
In September 2019, GitHub acquired Semmle, providing an ideal home for advancing the goal of improving open source security at scale. This led to the creation of the Security Lab, with a larger team and new initiatives, including curating the GitHub Advisory Database. The GitHub Advisory Database provides developers with the most accurate information about known security issues in their open source dependencies. GitHub also incorporated CodeQL as a foundation of code scanning and a core pillar of GitHub Advanced Security (GHAS), keeping it free for open source. Code scanning reached parity with LGTM.com in 2022.
We have also expanded beyond CodeQL and now use a variety of tools in our audit activities, such as fuzzing. But CodeQL remains one of the most effective tools in our toolbox, because it enables us to conduct variant analysis at scale, and allows us to share our knowledge of insecure patterns with the community, in the form of executable CodeQL queries.
The secret? Our maintainers-first approach
Not all reports get a CVE. CVE records are useful for informing downstream consumers, so when there is no downstream consumer, there is no need for a CVE. For example, a vulnerability in a CI workflow, or a vulnerability discovered in a development branch and fixed before it reached any release does not require a CVE. While we are credited for 500 CVEs, we have actually reported and helped fix over 1,000 vulnerabilities. But who’s counting, right?
That said, what matters most to us is our fix rate. When looking at the tens of thousands of reports in the GitHub Advisory Database, on average, 80% are fixed by maintainers. However, the fix rate for vulnerabilities the Security Lab reported is much higher: 96% of our reports end up with a fix. This reflects the validity of our reports and our effective collaboration with maintainers. We want project maintainers to succeed, and because of that, we are flexible on the disclosure timeline–when it’s safe for the rest of the community–we provide fix suggestions, and we always help test the new release. Our report template is open source for all security researchers who would like to use it as an inspiration for their own reports.
Now, let’s take a look at some vulnerabilities that stand out!!
Highlights from our first 500
CVE-2017-9805: Remote Code Execution vulnerability in Apache StrutsCVE-2018-4407: Kernel crash caused by out-of-bounds write in Apple’s ICMP packet-handling code’>
By exploiting an integer overflow in the XNU kernel’s networking code, a malicious TCP packet could trigger an out-of-bounds memory access, which would instantly crash the macOS kernel (video) and reboot any Mac or iOS device on the same network as the attacker, without user interaction. It even had a tweetable poc.
GHSL-2020-204: Remote Code Execution in Corona Warn App ServerCVE-2021-3560: Privilege escalation with polkit’>
polkit is a system service installed by default on many Linux distributions, including popular distributions such as RHEL and Ubuntu. A race condition vulnerability in this library enabled an unprivileged local user to get a root shell on Linux systems. The bug was in error handling code, and could be triggered by disconnecting the client too early.
CVE-2021-45046: Bypass of initial mitigations for Log4Shellpwn request” vulnerabilities in implementations of GitHub Actions workflows’>
We noticed emerging insecure patterns in the implementation of GitHub Actions and helped fix more than a hundred instances in open source projects. We also published guidelines and CodeQL queries to find these types of vulnerabilities, and an open source tool that helps users set the right permissions for the tokens used in these pipelines to limit the damage in case of an exploit. Since the vulnerabilities were in the implementation of CI/CD pipelines the reports didn’t get CVEs assigned as no immediate action was needed by the open source projects’ users once they were fixed.