Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions core/src/main/java/io/github/dfa1/vortex/core/IoBounds.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
package io.github.dfa1.vortex.core;

import java.lang.foreign.MemorySegment;

/// Bounds-checked access to untrusted [MemorySegment] regions.
///
/// The reader memory-maps and parses attacker-controlled binary input. Per the
/// [VortexException] contract, any malformed offset, length, or element count
/// must surface as a [VortexException] — never a raw JDK
/// [IndexOutOfBoundsException], [ArithmeticException], or
/// [NegativeArraySizeException]. Offsets and lengths drawn from parsed file
/// bytes flow through these helpers before they reach the JDK.
///
/// This covers the *parse* side only. A caller's random-access index into a
/// decoded array (`array.getInt(5)`) is a different contract: that is consumer
/// misuse and should throw [IndexOutOfBoundsException] via
/// [java.util.Objects#checkIndex(long, long)], not a `VortexException`.
///
/// See ADR 0003 (`docs/adr/0003-vortex-exception-sanitization.md`).
public final class IoBounds {

private IoBounds() {
}

/// Verifies that the range `[off, off + len)` lies within `[0, size]`.
///
/// @param off start offset into the region
/// @param len length of the range
/// @param size total size of the region the range must fit within
/// @throws VortexException if `off` or `len` is negative, or `off + len`
/// exceeds `size` (overflow-safe: checks `len > size - off`)
public static void checkRange(long off, long len, long size) {
if (off < 0 || len < 0 || len > size - off) {
throw new VortexException(
"slice out of bounds: off=" + off + " len=" + len + " size=" + size);
}
}

/// Bounds-checked [MemorySegment#asSlice(long, long)] — the canonical
/// replacement for a raw `asSlice` on an untrusted offset or length.
///
/// @param seg the segment to slice
/// @param off start offset into `seg`
/// @param len length of the slice
/// @return the slice `seg[off, off + len)`
/// @throws VortexException if the range falls outside `seg`
public static MemorySegment slice(MemorySegment seg, long off, long len) {
checkRange(off, len, seg.byteSize());
return seg.asSlice(off, len);
}

/// Narrows a `long` size or count to `int` for use as a [java.nio.ByteBuffer]
/// index or a Java array length. Replaces [Math#toIntExact(long)] (which
/// throws [ArithmeticException]) and guards the 2 GB `ByteBuffer` / array cap.
///
/// @param n the size or count, drawn from parsed file metadata
/// @return `n` as an `int`
/// @throws VortexException if `n` is negative or exceeds [Integer#MAX_VALUE]
public static int toIntSize(long n) {
if (n < 0 || n > Integer.MAX_VALUE) {
throw new VortexException("size exceeds 2 GB limit: " + n);
}
return (int) n;
}

/// Validates an element count before a `new T[count]` decode allocation.
///
/// Same guard as [#toIntSize(long)], named for the allocation call sites; a
/// per-encoding resource cap (ADR 0004) plugs in here later.
///
/// @param n the element count, drawn from parsed file metadata
/// @return `n` as an `int`
/// @throws VortexException if `n` is negative or exceeds [Integer#MAX_VALUE]
public static int checkCount(long n) {
return toIntSize(n);
}
}
135 changes: 135 additions & 0 deletions core/src/test/java/io/github/dfa1/vortex/core/IoBoundsTest.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
package io.github.dfa1.vortex.core;

import org.junit.jupiter.api.Nested;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.CsvSource;
import org.junit.jupiter.params.provider.ValueSource;

import java.lang.foreign.Arena;
import java.lang.foreign.MemorySegment;

import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;

class IoBoundsTest {

@Nested
class CheckRange {

@ParameterizedTest
// off, len against a size-16 region — every in-bounds case, including the
// exact end (off+len == size) and a zero-length slice at the boundary.
@CsvSource({"0,16", "0,0", "16,0", "4,8", "15,1"})
void acceptsInBoundsRange(long off, long len) {
// Given a region of 16 bytes
// When / Then no exception
IoBounds.checkRange(off, len, 16);
}

@Test
void rejectsNegativeOffset() {
// Given / When / Then
assertThatThrownBy(() -> IoBounds.checkRange(-1, 4, 16))
.isInstanceOf(VortexException.class)
.hasMessageContaining("out of bounds");
}

@Test
void rejectsNegativeLength() {
// Given / When / Then
assertThatThrownBy(() -> IoBounds.checkRange(0, -1, 16))
.isInstanceOf(VortexException.class);
}

@Test
void rejectsRangePastEnd() {
// Given off+len = 17 > size 16
// When / Then
assertThatThrownBy(() -> IoBounds.checkRange(10, 7, 16))
.isInstanceOf(VortexException.class);
}

@Test
void rejectsOverflowingLengthWithoutWrapping() {
// Given a crafted huge length that would overflow off+len if added naively;
// the check uses len > size - off precisely to stay overflow-safe.
assertThatThrownBy(() -> IoBounds.checkRange(8, Long.MAX_VALUE, 16))
.isInstanceOf(VortexException.class);
}
}

@Nested
class Slice {

@Test
void returnsRequestedSubRegion() {
try (Arena arena = Arena.ofConfined()) {
// Given a 16-byte segment
MemorySegment seg = arena.allocate(16);

// When slicing the middle 8 bytes
MemorySegment result = IoBounds.slice(seg, 4, 8);

// Then the slice has the requested size
assertThat(result.byteSize()).isEqualTo(8);
}
}

@Test
void throwsVortexExceptionNotJdkOnOverflow() {
try (Arena arena = Arena.ofConfined()) {
// Given a 16-byte segment and an out-of-range request
MemorySegment seg = arena.allocate(16);

// When / Then the contract holds — VortexException, never IndexOutOfBoundsException
assertThatThrownBy(() -> IoBounds.slice(seg, 0, 32))
.isInstanceOf(VortexException.class);
}
}
}

@Nested
class ToIntSize {

@ParameterizedTest
@ValueSource(longs = {0, 1, 1024, Integer.MAX_VALUE})
void narrowsValuesWithinIntRange(long n) {
// Given / When / Then
assertThat(IoBounds.toIntSize(n)).isEqualTo((int) n);
}

@Test
void rejectsValueAboveIntMax() {
// Given a length one past the 2 GB ByteBuffer/array cap
assertThatThrownBy(() -> IoBounds.toIntSize(Integer.MAX_VALUE + 1L))
.isInstanceOf(VortexException.class)
.hasMessageContaining("2 GB");
}

@Test
void rejectsNegativeValue() {
// Given a length that read back negative (e.g. a u32 stored as signed)
assertThatThrownBy(() -> IoBounds.toIntSize(-1))
.isInstanceOf(VortexException.class);
}
}

@Nested
class CheckCount {

@Test
void delegatesToTheSameGuard() {
// Given a valid count
// When / Then it returns the narrowed value
assertThat(IoBounds.checkCount(42)).isEqualTo(42);
}

@Test
void rejectsOversizedCount() {
// Given a crafted huge element count for a new T[n] allocation
assertThatThrownBy(() -> IoBounds.checkCount(Long.MAX_VALUE))
.isInstanceOf(VortexException.class);
}
}
}
135 changes: 133 additions & 2 deletions docs/adr/0003-vortex-exception-sanitization.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# ADR 0003: Structured sanitization of `VortexException` messages
# ADR 0003: `VortexException` contract — message sanitization and bounds typing

- **Status:** Accepted — implementation pending (see Phases below)
- **Date:** 2026-06-13
- **Date:** 2026-06-13 (bounds-typing scope added 2026-06-20)
- **Deciders:** project maintainer
- **Related:** [ADR 0001 — Split read and write runtimes](0001-split-read-and-write-runtimes.md),
[ADR 0004 — Resource caps and `ReadOptions`](0004-resource-caps-read-options.md),
[SECURITY.md](../../SECURITY.md)

## Context
Expand Down Expand Up @@ -40,6 +41,37 @@ typed `EncodingId` for the attribution field but accepts a free-form `String`
for the message body, which callers build via `+` or `.formatted()`. There is
no sanitization contract.

### Second axis — exception *type*, not just message

Message sanitization governs *what a `VortexException` says*. A separate,
orthogonal gap governs *whether a `VortexException` is thrown at all*. The
reader's contract (SECURITY.md) is: any malformed input throws
`VortexException`, never a raw JDK exception. But ~21 `MemorySegment.asSlice`
call sites take offsets/lengths straight from untrusted layout/footer
metadata and pass them to the JDK unguarded:

```java
// ScanIterator — fbStart/fbLen decoded from an attacker FlatBuffer
ByteBuffer fbBuf = seg.asSlice(fbStart, fbLen).asByteBuffer()... // raw IndexOutOfBoundsException on overflow
```

Only `PostscriptParser` guards its slices (a private `slice()` +
`checkBlobBounds()`); its own comment warns that every *other* scan-time
`asSlice` would throw `IndexOutOfBoundsException` and break the contract. The
same leak appears in three more shapes:

- `Math.toIntExact(storage.length())` (4 extension decoders) → raw
`ArithmeticException` on a > 2 GB declared length.
- `new byte[(int)(end - start)]` / `new long[(int) rowCount]` (VarBin, AlpRd,
Delta) → `NegativeArraySizeException` / `OutOfMemoryError` on crafted
non-monotonic offsets or huge counts.
- `ByteBuffer` is `int`-indexed (2 GB cap); a slice fed to `asByteBuffer()`
past that throws raw too.

These are the same `VortexException`-contract violation as a leaked ANSI
escape — just on the type axis instead of the content axis — so they belong
in the same ADR.

## Decision

**Pick Option A (enum error catalog) as the structural shape.** Add a
Expand Down Expand Up @@ -203,6 +235,73 @@ throw new VortexException(VortexError.UNKNOWN_LAYOUT_ENCODING, layout.encodingId
The message format `[UNKNOWN_LAYOUT_ENCODING] vortex.flat\x0a<injected>` is
machine-parseable, log-friendly, and injection-safe.

### Bounds typing: the `IoBounds` helper

A public static utility in `io.github.dfa1.vortex.core` (must be reachable by
core itself — `ProtoReader` has a site — plus reader, `reader.array`, and
`reader.decode`; `reader → core`, so core is the only home that covers all
layers). It wraps the untrusted-offset operations and throws `VortexException`
(via the `VortexError` catalog above) instead of the raw JDK exception:

```java
public final class IoBounds {
private IoBounds() {}

/// off/len must lie within [0, size]. Throws VortexException otherwise.
public static void checkRange(long off, long len, long size) {
if (off < 0 || len < 0 || len > size - off) {
throw new VortexException(VortexError.SEGMENT_INDEX_OUT_OF_RANGE, off, len, size);
}
}

/// Bounds-checked asSlice — the canonical replacement for raw seg.asSlice.
public static MemorySegment slice(MemorySegment seg, long off, long len) {
checkRange(off, len, seg.byteSize());
return seg.asSlice(off, len);
}

/// long → int for sizes/counts that index a ByteBuffer or back a Java array.
/// Replaces Math.toIntExact (ArithmeticException) and guards the 2 GB cap.
public static int toIntSize(long n) {
if (n < 0 || n > Integer.MAX_VALUE) {
throw new VortexException(VortexError.SEGMENT_INDEX_OUT_OF_RANGE, n);
}
return (int) n;
}

/// Element count for a `new T[n]` decode buffer; same guard as toIntSize,
/// named for the alloc-count call sites (the per-encoding cap from ADR 0004
/// plugs in here later).
public static int checkCount(long n) {
return toIntSize(n);
}
}
```

Why a static helper, not a `BoundedSegment` wrapper (the approach explored in
[PR #27](https://github.com/dfa1/vortex-java/pull/27)):

- The hot path is `MemorySegment` zero-copy slices; a wrapper type would have
to be unwrapped at every typed accessor or it taxes per-element reads. A
static call slices once, off the per-element path, and returns a plain
`MemorySegment` — no new type crosses module boundaries.
- It mirrors the `Sanitize` decision: one small pure primitive in `core`, not
a new abstraction. `Sanitize` cleans the message; `IoBounds` types the
throw. Symmetric.

#### The consumer-access carve-out

The ~14 per-element guards in `Lazy*` / `Materialized*` / `Generic` accessors
(`getInt(i)` etc.) throw `IndexOutOfBoundsException` and **stay that way** —
they are *consumer* random-access (`array.getInt(5)`), where IOOBE is the
correct JDK-idiomatic signal (cf. `List.get`), not a malformed-file event.
These must **not** be routed through `IoBounds`. They are instead collapsed
onto the JDK built-in `Objects.checkIndex(i, length)` (Java 16+) — stdlib, no
custom helper. The dividing line:

- offset/length/count from **parsed file bytes** → `IoBounds` → `VortexException`
- index from a **caller's accessor argument** → `Objects.checkIndex` → `IndexOutOfBoundsException`

## Migration phases

### Phase A — Foundation (~1.5 h)
Expand Down Expand Up @@ -244,6 +343,38 @@ approximate-fit existing ones.
prevent regression. Also flag `+` inside the VortexException args to
catch interpolation-before-sanitization.

### Phase E — Bounds typing via `IoBounds` (0.8.0)

Independent of A–D. The `VortexError` catalog (Phase A) is not built yet, so
`IoBounds` ships using the current `VortexException(String)` constructor with a
fixed, non-interpolated message (no attacker strings in the bounds messages —
only numeric offsets/lengths, which need no sanitization). When Phase A lands,
`IoBounds` migrates to `VortexError.SEGMENT_INDEX_OUT_OF_RANGE` mechanically
with every other site. Lands in 0.8.0 before the release, since variant decode
widens the parse surface.

1. Add `IoBounds` (`slice` / `checkRange` / `toIntSize` / `checkCount`) in
`core` with unit tests: negative offset, length overflow, off+len past end,
> 2 GB size, exact-boundary pass.
2. Route the ~21 raw `asSlice` sites through `IoBounds.slice`; fold
`PostscriptParser`'s private `slice()`/`checkBlobBounds` and `ProtoReader`'s
hand-rolled guard into it.
3. Replace `Math.toIntExact(...length())` (4 extension decoders) with
`IoBounds.toIntSize`; guard the `new T[(int) n]` alloc sites with
`IoBounds.checkCount`.
4. Collapse the ~14 consumer-access `getX(i)` guards onto
`Objects.checkIndex(i, length)` (separate commit — different error class,
no `IoBounds`).
5. Checkstyle `RegexpSingleline` rejecting raw `.asSlice(` in the
file-structure (`reader` root) and `reader.decode` packages — the layers
that slice on offsets parsed from file bytes. `reader.array` is excluded:
its only `asSlice` calls are inside `limited(rows)`, re-slicing an already
validated segment at offset 0 with `rows < length` (bounded by construction,
no untrusted offset), so wrapping them adds noise without closing a gap.
6. `BoundsTypingSecurityTest`: crafted file with out-of-range slice offset,
oversize declared length, and non-monotonic VarBin offsets each produce a
`VortexException`, never a raw JDK exception.

## Alternative considered

**Option B — Sealed `VortexException` hierarchy:** Make `VortexException`
Expand Down
Loading