Heap out-of-bounds in decompression loops.
The bug class that keeps producing zero-clicks. Integer overflows, missing bounds checks, and why hardened codebases still fall over the same edge — fifteen years after we first started naming it.
If you build attack capabilities for a living, you learn to recognise the shape of bugs that keep producing zero-clicks. Decompression code is one of those shapes. Image parsers, archive readers, font loaders, network protocol decoders — anywhere a hostile-controlled byte stream gets expanded into a heap buffer, you find the same pattern of mistakes.
This post breaks down the bug class, walks through why hardened codebases still fall, and ends with a short checklist for anyone reviewing decompression-shaped code.
The shape of the bug
Most decompression routines look something like this: parse a header that describes an output size, allocate a buffer of that size, then run a loop that consumes input bytes and writes output bytes. The bug class is usually one of three things:
- Trusting attacker-controlled size fields. The header says the output is 1024 bytes, the loop writes 2048. Bounds checks are correlated with the size, not the writes.
- Integer overflow in size calculations.
output_size = width * height * bppwraps to a small value, the buffer is undersized, the loop runs to completion past the end. - Off-by-one in copy primitives. Loop terminates on
i < lenbut a final write happens after the test, or amemcpyuses one length where another was bounds-checked.
None of these are new. They are all in CVEs filed in 2010. They are all still in CVEs filed last quarter.
Why hardened codebases still fall
Modern projects ship with ASan, MSan, libFuzzer, OSS-Fuzz integration, and dependency scanning. The bugs still ship. Three reasons we keep seeing in engagements:
1. Fuzz harnesses don't exercise the path
Decompression loops are often guarded by header validation that fuzzers struggle to satisfy by accident. If your seed corpus doesn't contain valid headers with exotic size fields, the fuzzer spends its budget on the parser and never reaches the loop body.
2. The unsafe primitive is one layer down
The vulnerable write is inside a helper that takes a length parameter. The helper has been audited. The caller passes an unchecked value. Reviewers look at the helper, see the bounds check, conclude it's fine.
3. Mitigations are misaligned
Allocator hardening (Scudo, PartitionAlloc) makes some OOB writes unexploitable, so they don't show up in crash telemetry. They still ship to production. When an exploit chain wants this specific primitive, it'll find it.
What we look for
When auditing decompression-shaped code, we run through the same checklist:
// 1. Where does the output buffer size come from?
// → attacker-controlled, validated, or derived?
// 2. What's the relationship between
// declared_size, allocated_size, and actual_writes?
// → is the loop bounded by declared, allocated, or input?
// 3. Are there any integer ops in the size calculation?
// → mul, add, shift — anywhere a wrap can happen?
// 4. Is there a "copy backref" or "match length" primitive?
// → LZ-style references frequently miss bounds on
// the source pointer (read OOB) or dest (write OOB).
The fourth bullet is the one that keeps producing high-severity findings. LZ-family compressors (LZSS, LZ4, LZMA, DEFLATE) all have some form of back-reference: "copy N bytes from M bytes before the current position." Both N and M are attacker-controlled. Both need bounds checks that the reviewer often skips.
A checklist for reviewers
If you take one thing away from this post: when you see a loop that consumes bytes and writes bytes, write down the bounds invariant before you read the code. Then check whether the code maintains it on every iteration. If you can't write the invariant down, you have not understood the loop and you should not approve the PR.
If you can't write the bounds invariant in one sentence, you don't understand the loop.
We use this approach on every code-review engagement. It catches more bugs than tooling does, because most tooling doesn't ask the question.
Want this kind of attention on your codebase?
This is roughly the depth we bring to application security and code-review engagements. If you ship parsers, decoders, or protocol implementations and you want a real adversary reading the loops, we'd be happy to talk.