Vulnerability Research

Heap Out-of-Bounds in Decompression Loops

The bug class that keeps producing zero-clicks.

Every media file you open triggers a decompression loop somewhere. A JPEG, a RAW photo, a lossless audio frame — they all share the same fundamental contract: a header declares the output geometry, an allocation is sized from that header, and then a bitstream-driven loop fills the buffer one element at a time.

When that contract breaks, you get a heap out-of-bounds write. And because media parsing happens automatically — thumbnails, previews, indexing — these bugs are zero-click by default. In August 2025, a spyware chain exploiting exactly this bug class was used to compromise targeted individuals through WhatsApp on iOS and macOS — no tap required.

I've audited decompression loops across image codecs, audio decoders, and container parsers on multiple platforms. The same structural patterns appear everywhere. This article maps the bug class and shows why it persists even in hardened codebases.

The Fundamental Tension

A decompression loop has two independent authorities over how much data gets written:

  1. The header — declares dimensions (width, height, channels, bit depth) used to compute allocation size
  2. The bitstream — contains markers, Huffman codes, or run-length tokens that control how many bytes actually get emitted

A correct implementation ensures these two authorities agree. A vulnerable one trusts the header for allocation and the bitstream for termination — or vice versa — without cross-checking.

This creates two distinct sub-patterns.

Pattern A: Integer Overflow in Allocation Math

The header dimensions are multiplied to compute a buffer size. If the multiplication overflows the register width, malloc receives a truncated (small) value while the decode loop writes the full (large) amount.

Consider this C, which compiles without warning on every major toolchain:

unsigned int width, height, bpp; // from file header
size_t total = width * height * bpp;
void *buf = malloc(total);

On a 64-bit platform, the multiplication happens in 32-bit unsigned arithmetic before the implicit widening to size_t. A 65536×65536 image with 2 bytes per sample: the true product is 8,589,934,592 but the 32-bit multiply wraps to zero. malloc(0) succeeds. The decode loop writes 8 GB.

I've found this pattern across two OS-level image frameworks and two open-source media libraries. The C is well-defined — unsigned wrap is legal. The resulting binary is completely broken.

The Intermediate Truncation Variant

Sometimes the compiler emits a 64-bit multiply for the first operation, producing a correct intermediate. But then the result gets folded into a 32-bit accumulator downstream — an add, a store, a mov to a narrower register. The 64-bit multiply is a red herring. The overflow isn't in the multiply; it's in the operation two instructions later that silently discards the upper bits.

I spent two days convinced a codebase was safe because the initial multiply was 64-bit. Then I traced the result three instructions further and found it stored into a 32-bit register before reaching the allocation call. The multiply was correct. The dataflow after it was not. During audit, you can't stop at "the multiply is 64-bit, so it's safe." You need to trace the full path from the multiply to the malloc argument.

The Cascade

The most dangerous variant chains multiple 32-bit operations. Channels × width overflows. The result is shifted left for bytes-per-pixel — overflows again. Then multiplied by height — overflows a third time. Each truncation compounds. I've seen cases where the true allocation should be 4+ GB but the cascaded truncations produce a value under 256 KB. The decode loop writes a clean 4 GB past the buffer into whatever follows on the heap.

Pattern B: Missing Bounds Check in Decode Loop

Even with a correctly-sized buffer, the decode loop itself may not track how many bytes it has consumed from the input. Huffman decoders are the classic case:

while (pixels_remaining > 0) {
    code = read_bits(stream, table);
    value = decode_huffman(code, table);
    *output++ = value;
    pixels_remaining--;
}

The loop tracks output position (pixels_remaining) but not input consumption. If the input stream is shorter than expected, read_bits reads past the buffer into adjacent heap data. The decoded "values" are heap bytes influenced by whatever the allocator placed nearby, which then get written to the output buffer.

This is an OOB read that becomes an attacker-influenced write — the attacker doesn't control the output position, but shapes the decoded values through heap layout.

I've found this in custom Huffman implementations that were hand-rolled for performance instead of delegating to a system library. The inner loop drops the bounds check because "the header guarantees enough data." The header, of course, is attacker-controlled.

Why This Persists in Hardened Codebases

Modern media parsing code has layers of defense: checked-arithmetic wrappers, 128-bit overflow detection, bounds-checked reads, compression whitelists. I've documented codebases with dozens of distinct safety checks across the parsing pipeline.

Yet the decompression loop bugs survive. Three reasons:

1. Defense Islands

The main parsing path — header validation, container atom processing, metadata extraction — gets hardened first because it's the obvious attack surface. The pixel decode path, buried inside vendor-specific codec subclasses, gets less attention.

I've seen base classes that correctly use 64-bit checked arithmetic for the primary buffer allocation, while subclass-specific intermediate allocations (Huffman workspace, wavelet coefficient buffers, tile scratch space) use raw unchecked 32-bit math. The developer hardened the contract they could see and missed the internal ones.

2. Custom vs. Library

When a codec delegates to a battle-tested library (libjpeg, zlib, libpng), the decompression loop has decades of hardening behind it. When an implementation rolls its own — custom lossless JPEG decoders, proprietary vendor compression, hand-tuned Huffman tables — the inner loop is fresh code with fresh bugs.

CVE-2025-43300 illustrates this directly. Apple's ImageIO framework delegates standard JPEG decoding to well-audited library code. But the RawCamera codec's lossless JPEG path is a fully custom implementation. Same file format family. Completely different security posture. The custom path is where the zero-day lived.

3. Broken Safety Nets

Some codebases add a cross-check: compare the header-derived allocation against a second independent computation as a sanity check. Smart idea. But if the two computations use different integer widths, the check itself is broken.

When one side of the comparison is 64-bit and the other is 32-bit with sign extension, there are input ranges where the sign extension turns the narrower value into a massive unsigned number — larger than any realistic 64-bit value. The check passes unconditionally for exactly the inputs where it should fail.

The developer wrote a safety net. The compiler generated something that looks correct. But a type mismatch turns the safety check into a no-op for the one input class that matters.

The Exploitability Spectrum

Not all decompression OOBs are equal:

Controlled Heap Overflow → Code Execution. The allocation truncates to a small, allocator-friendly size. The decode loop writes sequentially past the buffer. The attacker controls the write values through the bitstream. Adjacent heap objects contain function pointers or vtable references. This is classic heap corruption.

Massive Overshoot → Denial of Service. The truncated allocation is small but the true write size is gigabytes, immediately hitting unmapped virtual memory. The crash is deterministic but the attacker can't land the write on a useful target. On mobile, this can produce a persistent crash loop — the OS re-launches the process, the file is still there, the crash repeats.

OOB Read → Information Disclosure. The decode loop reads past the input buffer but writes to a correctly-sized output buffer. Heap data leaks into the decoded output. Not standalone code execution, but a useful primitive in a chain.

The boundary between code execution and denial of service often comes down to a single parameter in the crafted file. The same root cause, the same code path, but one set of dimensions gives you a controllable 2 MB overshoot while another gives you an uncontrollable 4 GB fault.

This Bug Class in the Wild

In August 2025, Apple patched CVE-2025-43300 — a heap out-of-bounds write in ImageIO's RawCamera codec. A crafted DNG image with mismatched metadata (TIFF SamplesPerPixel disagreeing with the JPEG SOF3 component count) caused the allocation to be sized for one geometry while the decode loop wrote for another. Pattern A.

The bug was chained with a WhatsApp authorization flaw (CVE-2025-55177) for zero-click delivery. The crafted image reached ImageIO during thumbnail generation. No tap, no preview, no user decision — just a message arriving on the device. The attack targeted fewer than 200 individuals.

The underlying bug class — a metadata disagreement causing an allocation/decode mismatch in a vendor-specific codec — is the same pattern that keeps appearing across codebases. The specific codec was custom. The vulnerability pattern was not.

The delivery mechanism changes — iMessage, WhatsApp, email, AirDrop — but the code execution primitive is always the same: a media parser trusting two independent authorities that disagree.

What Defenders Should Know

If you maintain a media parsing codebase:

Use checked arithmetic for every dimension multiplication. __builtin_mul_overflow, SafeInt, or equivalent. Don't rely on C integer promotion to do the right thing — it won't.

Match integer widths in cross-checks. If one side of a comparison is 64-bit, the other must be too. A mixed-width comparison with sign extension is worse than no check at all, because it creates a false sense of safety.

Bound the decode loop from both directions. Check input bytes remaining AND output bytes remaining on every iteration. The output check alone doesn't prevent OOB reads. The input check alone doesn't prevent OOB writes.

Audit subclass allocations separately from the base class. The framework may provide checked arithmetic helpers. That doesn't mean every codec subclass uses them.

Block compression types you haven't audited. A whitelist of known-safe codecs is more valuable than trying to harden every possible decompression path. I've seen whitelists that exist specifically to block known-vulnerable legacy codecs — the blocked types have bugs; the whitelist is the mitigation.

Closing Thought

Decompression loops are one of the oldest bug classes in software security. We've been finding integer overflows in image parsers for decades. And yet, in 2026, I'm still finding them — not in abandoned legacy code, but in actively-maintained, hardened, security-reviewed codebases on shipping platforms.

The reason is structural. Every new codec, every new vendor-specific compression scheme, every new container format creates a fresh decompression loop. The defenses don't inherit. They have to be re-implemented in every new inner loop. And the inner loop is where developers optimize for performance and cut safety margins.

The bug class is old. The instances are new every year.