Check your overflows

This topic has concerned me forever, but an exploit I read about a month ago triggered me to finally write something.

The exploit itself is fascinating, so in case you haven’t read about it, I’ll leave the link here:

https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html

But what can be so fascinating about a vulnerability?, I can hear you say. We find exploits every day in every product, don’t we? To what I have to answer, “yes, we do”. And, while not the focus of this post, that’s exactly the problem.

TMS Software Delphi  Components

The exploit

This was an exploit in a JBIG2 library. JBIG2 is not designed to execute code, so it is not simple to make it execute code. It is not like a flaw in a PDF javascript implementation, where you get access to the javascript engine when you shouldn’t, and once you are in, you just execute commands. There is nothing to execute here. And yet…

The exploit starts simple, with an overflow bug. Basically you sum some numbers, say like:

MemoryToAllocate := Length(data1) + Length(data2); 
...
AllocatedMemory := AllocateMemory(MemoryToAllocate);
FillWithData(AllocatedMemory);

But what happens if I maliciously create a file where Length(data1) is near the maximum an integer can hold and then add the length of data2? In most programming languages, what will happen is that the integer will overflow, so 2147483647 + 1 = -2147483648

And of course, since I have control over both lengths, I can create a file that allocates, say 2 bytes, and then tries to write 100 bytes into the allocated memory, corrupting the stack.

And now it is where it gets interesting. We want to execute code, not to corrupt the stack, and as mentioned before JBIG2 doesn’t really execute any code. So how do we continue? Simple: We find out a way to emulate a NAND gate with the stack corruption, and then construct our own computer from those NAND gates, where we can execute any code we want. As the article says, it is not as performant as JavaScript, but it gets the job done. Now you can infect a phone just by sending it a message, without the user even having to click on anything.

 

The takeaway

In computing, we always tend to forget about border conditions. Usually, that works fine: I mean, what are the odds that an integer overflows in day-to-day images? The JBIG2 code was probably correct for virtually 100% of naturally occurring JBIG2 files. But when you add someone maliciously trying to break your code, the situation changes. No matter how small a hole you leave on your wall, someone will use it and build a 6-lane highway to enter your place. Give them a small overflow, and they will make a whole computer out of NAND gates.

And what can we do about it? First steps first: we should always stay vigilant, not just about how invalid data can break our program, but more important, how maliciously created, completely-unlikely-to-exist-in-the-real-world data can break our apps. 

WE NEED TO ALWAYS CONSIDER HOW SOMEONE COULD MALICIOUSLY ATTACK OUR CODE. IT IS NOT ENOUGH WITH JUST MAKING SURE IT WORKS IN THE EXPECTED OR REASONABLE CASES

But I firmly believe that staying vigilant is not enough. We are humans, after all, and humans make mistakes. So we need all the extra help we can get, and here is where the compiler comes in. Let’s see how a compiler could have helped us here:

C/C++:

Since the original article was about a C library, let’s start here.

In C and C++, we have to consider two different cases:

  • Unsigned integers. For unsigned integers, overflow is well-defined… and it just wraps around. In a typical “old-C” fashion, they make the incredibly-rare case the default (yes, sometimes you might want the integers to wrap around), and they make the usual case (integers don’t wrap around in mathematics), an impossible one. Just because it is normalized this way and there is no way to change it without breaking everything, my advice would be to avoid unsigned integers if you can. There are other reasons to avoid unsigned ints in C, so it is something to consider seriously.
  • Signed integers. Those are even eviler than unsigned integers. Signed integer overflow is undefined behavior, which means that if an integer overflows, the compiler can release the nasal demons on you. In practice, this usually means that you will get a base-2 complement like with unsigned ints, but don’t underestimate the incredibly powerful C/C++ optimizer. The optimizer might just remove whole parts of your code based on the assumption that the overflow can never happen, and then real demons will fly out of your nose.

The C/C++ situation is frustrating. Apparently, performance is the only thing that matters here, so we prefer a wrong result to a slow one.

Nobody seems to care. There were papers proposed to fix the undefined behavior of signed integers, and the proposals are mainly to make it defined to wrap around instead of acknowledging it is an error and throwing an exception.

There are three alternatives to check for overflows in C++, and none of them is good:

  1. Compile your code with UB Sanitizer on, and use signed integers. As signed integer overflow is undefined behavior, an overflow will abort your app, opening the door for a Denial-Of-Service attack. Of course, an exception would be best, but aborting is still better than continuing with a completely wrong value.
  2. Use a third-party library to have logically-behaving integers like boost safe-numerics or chromium base/numerics. This is the approach we are usingfor porting our FlexCel codebase to C++
  3. Check every integer operation manually and ensure there is no overflow. Make sure you don’t miss just a single one. Now I see you asking me: How can I know if “a*b” will overflow or not without doing a double multiplication or a division? This isn’t an important enough problem to be in the standard, but if you are using GCC or Clang, they do provide some compiler-specific builtins: https://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins

Delphi

A long, long time ago, in turbo pascal times, if I remember correctly (and I can be wrong), overflow checking was on by default, as was range checking. So you had to manually disable overflow checking in the places where you didn’t want it with a {$Q-} directive. Range checking should never be turned off, but you could with the {$R} directive.

But of course, Delphi also caught the “must be fast even if incorrect” bug, so on the page above (at the time I am writing this), it says:

Enabling overflow checking slows down your program and makes it somewhat larger, so use {$Q+} only for debugging.

And yes, overflow checking is enabled by default in Debug builds but not on Release builds. My advice is, if you haven’t yet, stop reading right now and go and make sure overflow and range checking are also set on Release for all your apps.

About us, we’ve always shipped FlexCel for VCL with both overflow and range checking enabled because it is the right thing to do. If it makes us lose some artificial benchmark because of the slow down, we are happy to trade it for correct behavior.

C# 

Like Delphi, C# checks for overflow in Debug but not in Release. So again, my advice is to turn overflow checking in Release and use the checked keyword to disable the checks in any code that might need it. And again, this is what we do in FlexCel .NET.

Java

Java wraps around in overflows, and there is no simple flag or even library that I know of that you can use to change the behavior. Since Java 8, you have some built-in functions that you can use, like https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#addExact-int-int-

You have to remember never to write “a + b”, but “addExact(a, b)” instead. This is the approach we are using in our FlexCel Java port.

Rust

I am mentioning it here because it is supposed to be a language created for safety… and still, it has the irritating behavior of checking in Debug but wrapping in Release. So if you are using Rust, make sure to check in Release.


Some numbers

So, why is everybody using that silly wrap-on-overflow by default? They know it is not right, or they wouldn’t enable overflow checking in Debug.

The reasons usually boil down to two:

  • Some workloads would work fine with overflow anyway. And it is true. If we were all writing hash functions all of the time, this argument could even have some merit. 
  • The performance loss would be unacceptable. And, if you focus on some micro-benchmark which all it does is integer operations, you might find a performance difference. But regular code does typically much more than just adding integers. So, how about some actual real-world testing? How much does it matter in a typical application?

After all I have written, it would be pretty bad if turning overflow checking slowed down your app by 50%, wouldn’t it? I can’t guarantee it won’t, but I can do some tests here to check how it works in the FlexCel case.

The tests were conducted in a Ryzen 5950x 32GB RAM, fully parallelized. The tests create about 12,000 files of many types (xls, xlsx, PDF, SVG, HTML), and they load about the same number of documents into memory. All Excel files are recalculated, but while some documents have no formulas, some others have thousands of them. All in all, the tests stress a lot the hard disks (we use two different SSDs, one for reading and the other for writing, to improve performance), they stress a lot the memory (some huge files have to be loaded into memory), and they stress a lot the CPU with all the calculations and zipping and unzipping. 

This machine is used exclusively to test FlexCel continuously, so it is not designed for benchmarking. To have more “fair” results, we should, for example, fix the CPU frequency, so no “boost” happens, also control the thermal throttling, etc. These are not professionally-made benchmarks and they don’t try to be.

But while they might not be good benchmarks, they can give us an idea of the performance of actual code in real machines. Maybe more so than ultra-controlled microbenchmarks. After all, real machines don’t have their CPU frequency fixed either. 

I’ve run the tests three times for each case. No other application was running in the test server, and all tests ran in release mode from the command line. Here are the results. All times are in mm:ss

First Run Second Run Third run
Delphi 11 – 64 bit
Overflow checking ON Range checking ON 1:47 1:42 1:47
Overflow checking OFF – Range checking OFF 1:41 1:46  1:45
.NET 4.8 – 64 bit
Overflow checking ON 0:52 0:53 0:52
Overflow checking OFF 0:51 0:52 0:52
.NET 6 – 64 bit
Overflow checking ON 0:45 0:45 0:45
Overflow checking OFF 0:44 0:44 0:4

Ok, overflow checking is not 100% free, but this is a difference I am willing to live with. And note that even when the difference is minimal, it is not even fair. Because the code without overflow checking is not correct. Do you know what would be even faster than returning -2147483648 when adding 2147483647 + 1? Returning “4” for every integer operation. It would not only be much faster, but it would also allow the compiler to do some incredible optimizations. For example, it could realize that the loop:

for (int i = 1; i < 3; i++) {something()}

would execute just once and stop, so it could eliminate the loop. The possibilities are endless, and the only thing you have to admit is that n + m = 4 for any value of n and m.

As much as I love performance, you can't just trade correctness for performance. And to make the code without overflow checking correct, you would have to manually check whenever there is a possibility of overflow in every integer operation. Besides all the new range of possible bugs, because you can forget some check, all those new checks would make the code slower again, most likely removing any advantage you had by setting overflow checking off. The very small performance advantage from range checking off happens if you don't check for errors at all and just ship vulnerable code. If you actually start checking for overflows manually, the advantage that was minimal to start with, will likely vanish or even be negative. Yes, the compiler can be more efficient than us checking for overflows. It most cpu architectures it can check for the carry flag, instead of doing a division after the multiplication to check there was no overflow.

We've lost tens of percents of performance in spectre/meltdown limitations, and that is ok because we can't afford insecurity. But somehow, a latent bomb like integer overflow is never handled because we can't afford the performance losses.

I think the time is due to start treating overflows seriously. 

What could compilers do?

Delphi should switch back to overflow checking and range checking by default in both Release and Debug builds. C# should do the same. C++ should provide a standard way to have exceptions on overflow. So should Java.

Whenever you write code that needs wrap-on-overflow, only then mark that code with overflow off, and let's not make all code insecure because three lines in your code rely on wrapping behavior.

What could you do?

Turn on overflow checking and range checking in all your code. Tell your friends. Make your own tests in your apps with and without overflow checking, and publish the results. If you have some results (no matter if they would make a case for overflow checking off or on), please let me know, and I'll add those results to this page.

Is this worth it?

You might still wonder if we aren't making a mountain of a molehill. The exploit that started this post was incredible, with a full computer created out of logic gates, but it was a single one. If the situation was so bad, wouldn't we be doing something about it?

And I can't honestly answer this one. I don't understand how we haven't made anything serious about it.

But to end this post, I'd like to quote a sentence from the original article describing the vulnerability (emphasis mine):

The vulnerability is a classic integer overflow when collating referenced segments:

And I'd like to emphasize the word "classic" here. This isn't a one-time bug. This is a "classic" bug.

After reading about the exploit and deciding to write something about it, I've seen this other bug that is also an integer overflow. I was not doing in-depth research; this was something that I saw in ordinary day-to-day browsing:

https://arstechnica.com/information-technology/2022/01/booby-trapped-sites-delivered-potent-new-backdoor-trojan-to-macos-users/

If you look at it, the backdoor starts with this vulnerability:

https://googleprojectzero.blogspot.com/2020/09/jitsploitation-one.html

And the bug is::

This is because 2147483648 is not representable as a 32 bit signed integer, and so -INT_MIN causes an integer overflow and again results in INT_MIN.

A week later I saw this other one:

https://news.ycombinator.com/item?id=30304238

which links to

https://www.openwall.com/lists/oss-security/2022/02/10/1

And again, a "classic" integer overflow at the core.

Both during one month, without explicitly looking for exploits, just looking at what's in the news. And there was another one which I didn't save the link.

Conclusion

I am aware that integer overflows aren't the only bugs causing exploits and that fixing them won't eliminate or maybe even slow down vulnerabilities. However, if you are in the camp of "This solution doesn't solve 100% of the problem, ergo it is useless", then I can understand you letting overflow checking (and range checking) off. 

But we could eliminate a whole category of bugs if we just accepted that overflow-wrapping is insane for a default. Let the compiler do its job instead of insisting on doing it manually (or, more realistically, not doing it manually at all, and shipping code with vulnerabilities). In a world where we took overflow checking more seriously, CPUs and programming languages could optimize even more the checks, making the performance difference even smaller. And there would be one less door for the bad buys.