CodeQL

ICYMI: improved C++ vulnerability coverage and CodeQL support for Lombok

In the ever-evolving software development landscape, static application security solutions face a unique challenge: as applications grow in complexity, they rely heavily on a diverse array of libraries, frameworks, and custom code. Ensuring the security of such intricate systems requires a meticulous approach—and not all solutions are created equal. The effectiveness of a static application security solution hinges on its ability to provide extensive vulnerability coverage and support for a wide range of languages and frameworks. Code scanning, for example, is equipped with broad coverage for the most popular languages and frameworks and can scrutinize all parts of the codebase, leaving no stone unturned. This approach leads to the identification of an expansive array of vulnerabilities, including those specific to certain technologies or development patterns. The result is a more thorough and reliable assessment of an organization’s security posture. We’re always looking for ways to help you detect more vulnerabilities in your codebase, so today, we’re highlighting two releases aimed at providing better coverage for both languages and frameworks, improved C++ vulnerability coverage and Lombok support. Improved C++ vulnerability coverage Detecting vulnerabilities in C++ code is uniquely challenging because of the language’s low-level memory manipulation, complexity, undefined behavior, platform discrepancies, and the absence of built-in memory safety features. Legacy code, concurrency issues, and dynamic memory allocation further compound this difficulty. Addressing these vulnerabilities must be done with precision, including rigorous code reviews, extensive testing, and the adoption of secure coding practices. CodeQL for C and C++ has recently gained increased support for detecting complex memory corruption vulnerabilities. Broadly speaking, these vulnerabilities are all related to dereferencing pointers that should not be dereferenced at a given point in the code. For those who are interested in delving deeper into the technical aspects of this topic, below we’ll explore a couple of new kinds of vulnerabilities CodeQL can now detect. An in depth look at CodeQL’s new C++ vulnerability coverage The default query suite can now detect double-free and use-after-free vulnerabilities using the queries cpp/double-free and cpp/use-after-free. These are classic memory corruption issues that C and C++ developers constantly have to keep in mind to avoid creating serious security incidents. In addition, the default query suite now also detects dereferences that look suspicious in general using the query cpp/redundant-null-check-simple. Finding “suspicious dereferences” in general is very hard since there are so many ways to make the dereference “obviously” safe. The query gets around these problems by finding dereferences that are always performed regardless of the result of a null check, or where a null check is always performed after the dereference (which suggests that the pointer may, in fact, sometimes be null). The security-extended suite has also gained much better support for reasoning about buffer overflows with two new queries cpp/overrun-write and cpp/invalid-pointer-deref, which detect different kinds of pointer dereferences that may be out of bounds. Both cpp/invalid-pointer-deref and cpp/overrun-write perform a novel analysis that finds the size of an allocation by doing two “parallel” dataflow analyses (one dataflow analysis to track the pointer and another dataflow analysis to track the size of the allocation), which enable us to find places in the code where a pointer dereference is incorrectly guarded. Such “off by one” errors are very common, and we have confirmed that cpp/invalid-pointer-deref finds existing CVEs such as https://www.cvedetails.com/cve/CVE-2018-14599/. This […]

Read More

CodeQL team uses AI to power vulnerability detection in code

AI is fundamentally changing the technology and security landscape. At GitHub, we see AI as a way for developers to both speed up their development process and simultaneously write more secure code. For instance, GitHub Copilot includes a security filter that targets the most common vulnerable coding patterns in Python and JavaScript–including hardcoded credentials, SQL injections, and path injections–to prevent vulnerable suggestions from being made. We’re also looking at ways security teams can use AI to enhance their organizations’ security posture, specifically leveraging prescriptive security intelligence to contextually assess, prioritize, visualize, and audit security posture in complex and interconnected and hybrid environments. For example, our CodeQL team is responsible for creating models for frameworks/APIs to help CodeQL discover more vulnerabilities out of the box. Creating and testing these models is a time consuming process, so we started thinking about ways to use AI to help speed things up. The results have been incredibly exciting; the team was able to leverage AI to optimize our modeling process and power the way we detect vulnerabilities in code. How the CodeQL team discovered a new CVE using AI modeling For CodeQL to produce results, we need to be able to recognize APIs as sources, sinks or propagators of untrusted user data also known as tainted data. The open source software (OSS) community has developed thousands of packages that potentially contain APIs that we need to recognize. Keeping up with these packages is critical because missing a source, a sink or a taint propagator could lead to false negatives. Traditionally, we modeled the APIs manually, but this was incredibly time consuming for our team given the thousands of OSS frameworks. In the last six months, we’ve started using Large Language Models (LLMs) to automatically model APIs for us. This not only turbo charged our modeling efforts, but allowed CodeQL to recognize more sinks, reducing CodeQL’s false negative rate, and helping it detect more vulnerabilities. When we make improvements to CodeQL, we often test them using a technique called variant analysis, which is a way to identify new types of security vulnerabilities. We often use this technique to run CodeQL queries across thousands of repositories hosted on GitHub.com. We did exactly that, and ran queries that use the AI-generated models across the most impactful repositories on GitHub.com. This combination of AI generated models and variant analysis led the team to discover a new CVE (CVE-2023-35947), a path traversal vulnerability in Gradle. For more information about the exact vulnerability, check out the entry on the Security Lab’s CodeQL Wall of Fame and the GitHub Advisory Database entry. Learn more about multi-repository variant analysis AI is fundamentally changing the way we secure our software. We will continue to strategically leverage AI to iterate and improve upon our security offerings with an eye towards bringing AI-powered security testing into your development workflows. The discovery of the CVE in Gradle is just one example of how GitHub’s security teams have been leveraging GitHub Advanced Security and AI to unlock incredible results. In March this year, we shipped Multi-Repository Variant Analysis (MRVA) allowing you to perform variant analysis at scale. If you’re looking to get started with CodeQL and code scanning on your repository, check out our documentation. As always, CodeQL is free to use on open source repositories. If […]

Read More