Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,9 +172,10 @@ Lower is faster (nanoseconds per entity):
| miniplex | 12.15 |

`.each` is the ergonomic accessor path from the example above; `eachChunk` loops over
the raw storage arrays directly; `bindColumns` goes one step further and binds your
loop to those arrays once, up front — which is what lets it edge ahead of bitECS, so
long as you size the world before binding.
the raw storage arrays directly; `bindColumns` goes one step further and compiles a
specialized loop per archetype — which is what lets it beat bitECS, and it holds that
edge as the world grows (no pre-sizing required; it falls back to a plain loop where a
strict CSP or sandbox forbids dynamic compilation).

Worker-thread speedup on a compute-heavy simulation (8,192 entities, 512 physics
steps per frame, 60 frames), with every threaded run byte-identical to the
Expand Down
13 changes: 9 additions & 4 deletions bench/iterate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -109,23 +109,28 @@ export function makeEcsiaPinnedIter(n: number): IterCase {
v.dy = 0.5
}
const q = world.query(write(Position), write(Velocity))
// The factory is SELF-CONTAINED (closes over nothing): per-frame dt arrives via the runner's ctx,
// hoisted to a local const before the loop. This is the shape the codegen path requires — each
// archetype's runner is recompiled into a specialized singleton, no post-growth penalty.
const run = q.bindColumns(
[Position, 'x'],
[Position, 'y'],
[Velocity, 'dx'],
[Velocity, 'dy'],
([px, py, dx, dy], meta) => () => {
([px, py, dx, dy], meta) => (ctx: { dt: number }) => {
const dt = ctx.dt
const count = meta.count
for (let i = 0; i < count; i++) {
px[i] = px[i]! + dx[i]! * DT
py[i] = py[i]! + dy[i]! * DT
px[i] = px[i]! + dx[i]! * dt
py[i] = py[i]! + dy[i]! * dt
}
},
)
const ctx = { dt: DT } // hoisted so step() allocates nothing
return {
name: 'ecsia-pinned',
step() {
run()
run(ctx)
},
sampleX() {
return (world.entity(first).read(Position) as { x: number }).x
Expand Down
125 changes: 125 additions & 0 deletions packages/core/src/query/codegen.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
// Pinned-loop codegen: the mechanism that lets `bindColumns` BEAT bitECS on the default iteration
// path, robustly. The win is a V8 specialization detail: TurboFan embeds typed arrays captured as
// closure CONSTANTS directly into optimized code (base pointer + length as immediates), but ONLY for
// a SINGLETON closure — the single closure produced by its enclosing function. The interpreted path
// invokes ONE user factory per archetype; the moment it produces a second runner (a 2nd matched
// archetype, or a re-invoke after column growth) V8 sees the factory making multiple closures and
// disables specialization for ALL of them — ~1.5 ns/entity, which LOSES to bitECS (~1.4). (Measured.)
//
// The fix: recompile the user's factory into a DISTINCT function object per archetype (per growth),
// via `new Function('return (' + factory.toString() + ')')()`. Each archetype's runner is then the
// singleton of its own freshly-minted factory → specialized → ~1.0 ns/entity, ~0.7× bitECS, with NO
// post-growth penalty. The recompile cost is paid only at bind / growth (rare), never per frame.
//
// SAFETY (this never produces a wrong result):
// - eval availability is probed once; under CSP (`script-src` without unsafe-eval) or a sandbox
// that blocks `new Function`, codegen is skipped and the interpreted factory call is used.
// - the user's factory MUST be self-contained — it may close over NOTHING from its outer scope
// (the recompiled copy only sees globals), so per-frame inputs come through the runner's `ctx`
// argument and fixed constants are defined inside the factory body. A factory that violates this
// fails the pre-flight below and silently falls back to interpreted.
// - PRE-FLIGHT VALIDATION: before a codegen runner is ever trusted, it is run once on a tiny scratch
// clone of the columns alongside the interpreted runner; only if their outputs match byte-for-byte
// is codegen used. A miscompile, a ReferenceError from an illegal closure, or any divergence →
// fall back to interpreted. Codegen is therefore a pure speed optimization gated on proven equality.
//
// SECURITY: the generated source is `'return (' + factory.toString() + ')'` — the user's OWN function
// source, never interpolated external/untrusted strings. No injection surface beyond the code the
// caller already wrote and passed.

import type { TypedArray } from '../memory/index.js'
import type { BoundColumnsMeta } from '@ecsia/schema'

/** A bindColumns factory: resolve the views into a persistent runner; deps arrive via the runner's ctx. */
export type PinnedFactory<Ctx = unknown> = (
views: readonly TypedArray[],
meta: BoundColumnsMeta,
) => (ctx: Ctx) => void

/** Probed once: can this runtime compile a function from source? (False under strict CSP / locked sandboxes.) */
export const CODEGEN_AVAILABLE: boolean = (() => {
try {
// eslint-disable-next-line no-new-func
return new Function('return 1')() === 1
} catch {
return false
}
})()

/** Recompile a factory into a distinct function object (its runner becomes a specialized singleton). */
function recompile<Ctx>(factory: PinnedFactory<Ctx>): PinnedFactory<Ctx> {
// Re-evaluate the factory's OWN source as a fresh function. No external strings are interpolated.
// eslint-disable-next-line no-new-func
return new Function('return (' + factory.toString() + ')')() as PinnedFactory<Ctx>
}

/** A probe ctx that hands back a stable non-zero number for ANY property read, so a hoisted
* `const dt = ctx.dt` yields deterministic, comparable arithmetic in the pre-flight (and never NaN). */
const PROBE_CTX = new Proxy(
{},
{
get: () => 1,
},
) as never

/**
* Build the runner for one archetype binding. Returns the codegen runner when eval is available AND
* it provably matches the interpreted runner on a scratch row; otherwise the interpreted runner.
* `strides` sizes the scratch clone (slots per row, per spec).
*/
export function buildPinnedRunner<Ctx>(
factory: PinnedFactory<Ctx>,
views: readonly TypedArray[],
meta: BoundColumnsMeta,
strides: readonly number[],
): (ctx: Ctx) => void {
const interpreted = factory(views, meta)
if (!CODEGEN_AVAILABLE) return interpreted
try {
// The real runner is the singleton of its OWN recompiled factory (specialized). The pre-flight
// validates a SEPARATELY-recompiled runner over scratch — same source, so faithful ⇒ the real
// runner is faithful too. (Reusing one recompiled factory for both would make it produce two
// closures and forfeit specialization — the very penalty codegen exists to avoid.)
const codegen = recompile(factory)(views, meta)
if (preflightMatches(factory, views, strides)) return codegen
Comment on lines +76 to +84

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Interpreted runner always eagerly created even on the codegen path

factory(views, meta) (line 76) is called with the real production views unconditionally, creating an interpreted runner even when codegen will succeed and the runner will be discarded. On the happy codegen path the factory is therefore called twice with real data (once for interpreted, once for the recompiled copy on line 83) plus twice more with scratch arrays inside preflightMatches. For a factory that is a pure closure factory this is harmless, but a factory with construction-time side effects (e.g. an initialisation log, a registration call) will fire those effects even when the codegen runner is ultimately used. The public-facing comments describe runner-invocation side effects but do not document that the factory-creation code also runs on a discard path.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/core/src/query/codegen.ts
Line: 76-84

Comment:
**Interpreted runner always eagerly created even on the codegen path**

`factory(views, meta)` (line 76) is called with the real production views unconditionally, creating an interpreted runner even when codegen will succeed and the runner will be discarded. On the happy codegen path the factory is therefore called twice with real data (once for `interpreted`, once for the recompiled copy on line 83) plus twice more with scratch arrays inside `preflightMatches`. For a factory that is a pure closure factory this is harmless, but a factory with construction-time side effects (e.g. an initialisation log, a registration call) will fire those effects even when the codegen runner is ultimately used. The public-facing comments describe runner-invocation side effects but do not document that the factory-creation code also runs on a discard path.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

} catch {
// recompile / factory-invoke threw (illegal closure, exotic source) → interpreted.
}
return interpreted
}

/**
* Run a recompiled runner and a fresh interpreted runner over IDENTICAL 1-row scratch clones of the
* columns with the probe ctx, and compare. Equal ⇒ a recompile of this factory is faithful (and the
* real runner, recompiled from the same source, is therefore faithful too). The real columns are
* never touched. NOTE: this INVOKES the user's runner once over the scratch — a runner with effects
* beyond its views (a global write, a ctx method call) fires/throws here; the contract is a pure SoA
* loop reading values off ctx. A throw (e.g. an illegal outer-scope closure → ReferenceError) counts
* as a mismatch and falls back to interpreted.
Comment on lines +91 to +98

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Pre-flight NOTE says "invokes the user's runner once" but it fires twice

The NOTE on line 95 reads "this INVOKES the user's runner once over the scratch," but preflightMatches actually invokes it twice — once for the recompiled version (scratchA, line 110) and once for the interpreted version (scratchB, line 111). A runner with observable side effects (the exact scenario the NOTE is warning about) would see those effects fire on both runs. The wording should say "twice" (or "once per version") to avoid a subtle expectation mismatch for users debugging unexpected pre-flight behaviour.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/core/src/query/codegen.ts
Line: 91-98

Comment:
**Pre-flight NOTE says "invokes the user's runner once" but it fires twice**

The NOTE on line 95 reads "this INVOKES the user's runner once over the scratch," but `preflightMatches` actually invokes it twice — once for the recompiled version (`scratchA`, line 110) and once for the interpreted version (`scratchB`, line 111). A runner with observable side effects (the exact scenario the NOTE is warning about) would see those effects fire on both runs. The wording should say "twice" (or "once per version") to avoid a subtle expectation mismatch for users debugging unexpected pre-flight behaviour.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

*/
function preflightMatches<Ctx>(
factory: PinnedFactory<Ctx>,
views: readonly TypedArray[],
strides: readonly number[],
): boolean {
try {
const rows = 1
const scratchA = views.map((v, i) => v.slice(0, rows * (strides[i] ?? 1)) as TypedArray)
const scratchB = views.map((v, i) => v.slice(0, rows * (strides[i] ?? 1)) as TypedArray)
const scratchMeta: BoundColumnsMeta = { count: rows, strides }
recompile(factory)(scratchA, scratchMeta)(PROBE_CTX)
factory(scratchB, scratchMeta)(PROBE_CTX)
for (let i = 0; i < scratchA.length; i++) {
const a = scratchA[i] as TypedArray
const b = scratchB[i] as TypedArray
for (let k = 0; k < a.length; k++) {
// Object.is, not !==, so identical NaN writes (e.g. row-0 data already NaN at re-bind) count
// as a MATCH — a faithful recompile reproduces the same writes, NaN included.
if (!Object.is(a[k], b[k])) return false
}
}
return true
} catch {
return false
}
}
79 changes: 46 additions & 33 deletions packages/core/src/query/live-query.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ import type { Column, TypedArray } from '../memory/index.js'
import { decodeEid } from '../memory/index.js'
import type { CompiledQuery, CompiledValueTerm, RowFilterTerm } from './compile.js'
import type { SparseSetU32 } from './sparse-set.js'
import { buildPinnedRunner } from './codegen.js'
import type { PinnedFactory } from './codegen.js'

const NO_HANDLE = 0xffffffff as EntityHandle

Expand Down Expand Up @@ -377,36 +379,44 @@ export class LiveQuery {
}

/**
* Pinned columns (the persistent-closure fast path). Where {@link eachChunk} re-resolves column
* views every call, `bindColumns` resolves each `[ComponentDef, fieldName]` spec ONCE per matched
* hot archetype and invokes `factory(views, meta)` to mint that archetype's runner — a persistent
* closure capturing the views as constants, which V8 context-specializes. The returned `run()`
* walks the bindings (matching `eachChunk` iteration order: cold archetypes never visited, empty
* ones skipped) with one cheap safety check per binding per call.
* Pinned columns the fastest iteration path, ~0.7× bitECS on the canonical bench (measured).
* Where {@link eachChunk} re-resolves column views every call, `bindColumns` resolves each
* `[ComponentDef, fieldName]` spec ONCE per matched hot archetype and mints that archetype's runner
* — a persistent closure capturing the views as constants, which V8 embeds into optimized code. The
* returned `run(ctx)` walks the bindings (matching `eachChunk` iteration order: cold archetypes
* never visited, empty ones skipped) with one cheap safety check per binding per call.
*
* The two load-bearing contract points:
* 1. The runner takes ZERO arguments (a `count` parameter measures 2× slower).
* 2. `meta` is identity-stable across rebinds; read `meta.count` (the archetype's live row count)
* inside the runner — population churn (spawn/despawn) never re-invokes the factory.
* Per-archetype runners are CODEGEN'D (each recompiled into a distinct function so it stays a
* specialized V8 singleton) where the runtime allows `new Function`; under strict CSP / a locked
* sandbox it transparently falls back to the interpreted factory call. The codegen path is gated on
* a pre-flight equality check against the interpreted runner, so it is a pure speed win that can
* never change results. Because codegen recompiles fresh on growth, there is **NO post-growth
* penalty** — the loop holds ~1.0 ns/entity even after columns re-back, with no pre-sizing required.
*
* Contract:
* 1. The factory must be **self-contained** — it may close over NOTHING from its outer scope (the
* recompiled copy only sees globals). Pass per-frame inputs through the runner's `ctx` argument
* (hoist them to a local const before the loop: `const dt = ctx.dt`); define fixed constants
* inside the factory body. A factory that closes over outer scope fails the pre-flight and falls
* back to interpreted (correct, just unspecialized).
* 2. `meta` is identity-stable; read `meta.count` (the live row count) inside the runner — population
* churn (spawn/despawn) never re-invokes the factory.
*
* ```ts
* const run = q.bindColumns(
* [Position, 'x'], [Velocity, 'dx'],
* ([px, dx], meta) => () => {
* const count = meta.count
* for (let i = 0; i < count; i++) px[i] += dx[i] * dt
* },
* const run = q.bindColumns<{ dt: number }>(
* [Position, 'x'], [Velocity, 'dx'],
* ([px, dx], meta) => (ctx) => {
* const dt = ctx.dt // hoist per-frame inputs out of the loop
* const count = meta.count
* for (let i = 0; i < count; i++) px[i] += dx[i] * dt
* },
* )
* run() // per frame
* run({ dt: 1 / 60 }) // per frame
* ```
*
* The factory is re-invoked ONLY when a bound column re-backs (growth replaced its view) or when
* the matched-archetype set changes (a new archetype matched, or a matched cold archetype was
* warm-promoted) — never on population change. V8 only context-specializes SINGLETON closures, so
* the first re-invocation after binding disables specialization for that binding permanently
* (~1.7 ns/entity vs ~1.0 steady-state on the canonical bench): pre-size to peak capacity (spawn
* or reserve rows BEFORE binding) so growth never lands in a hot phase — growth before the first
* bind is free.
* The factory is re-invoked ONLY when a bound column re-backs (growth) or the matched-archetype set
* changes (a new archetype matched, or a matched cold archetype was warm-promoted) — never on
* population change.
*
* Vec fields hand their raw view through: row `r` occupies `[r*stride, (r+1)*stride)` where the
* stride is the declared vec arity (`vec3()` → 3). Hardcode it, or read `meta.strides[specIndex]`
Expand All @@ -421,13 +431,13 @@ export class LiveQuery {
* observe the write. Structural changes during `run()` follow the `eachChunk` discipline: collect,
* then mutate after the loop (despawn swap-removes rows under the runner's feet).
*/
bindColumns(
bindColumns<Ctx = void>(
...args: [
...specs: ReadonlyArray<readonly [ComponentDef<Schema>, string]>,
factory: (views: readonly TypedArray[], meta: BoundColumnsMeta) => () => void,
factory: PinnedFactory<Ctx>,
]
): () => void {
const factory = args[args.length - 1] as (views: readonly TypedArray[], meta: BoundColumnsMeta) => () => void
): (ctx: Ctx) => void {
const factory = args[args.length - 1] as PinnedFactory<Ctx>
const specs = args.slice(0, -1) as ReadonlyArray<readonly [ComponentDef<Schema>, string]>
if (this.compiled.rowFilters.length !== 0) {
throw new Error('bindColumns: row-filtered queries are not supported (a pinned runner cannot skip rows); use each()')
Expand Down Expand Up @@ -469,8 +479,9 @@ export class LiveQuery {
readonly arch: Archetype
readonly cols: readonly Column[]
views: readonly TypedArray[]
runner: () => void
runner: (ctx: Ctx) => void
readonly meta: BoundColumnsMeta
readonly strides: readonly number[]
}

// Bindings are keyed by archetype id and PRESERVED across rebuilds: a rebuild only mints
Expand All @@ -485,8 +496,10 @@ export class LiveQuery {
let boundMatchCount = -1

const reinvoke = (b: PinnedBinding): void => {
// Re-build through codegen on growth: a FRESH recompiled factory keeps the new runner a
// specialized singleton (re-invoking the same factory would forfeit specialization).
b.views = b.cols.map((c) => c.view)
b.runner = factory(b.views, b.meta)
b.runner = buildPinnedRunner(factory, b.views, b.meta, b.strides)
}

const makeBinding = (arch: Archetype): PinnedBinding => {
Expand All @@ -512,7 +525,7 @@ export class LiveQuery {
},
strides,
}
return { arch, cols, views, runner: factory(views, meta), meta }
return { arch, cols, views, runner: buildPinnedRunner(factory, views, meta, strides), meta, strides }
}

// Builds into locals and commits at the end so a makeBinding throw (defensive backstop; the
Expand Down Expand Up @@ -542,7 +555,7 @@ export class LiveQuery {

rebuild()

return () => {
return (ctx: Ctx) => {
// Archetype-set check: matchingArchetypes is append-only (lastMatchTick is never bumped), so
// a length change is the complete new-match signal; the cold flags cover warm promotion.
if (this.matchingArchetypes.length !== boundMatchCount) {
Expand All @@ -569,7 +582,7 @@ export class LiveQuery {
break
}
}
if (b.arch.count !== 0) b.runner()
if (b.arch.count !== 0) b.runner(ctx)
}
}
}
Expand Down
20 changes: 15 additions & 5 deletions packages/core/test/bind-columns.property.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -78,11 +78,21 @@ function makeRig(pinned: boolean): Rig {
[Position, 'y'],
[Velocity, 'dx'],
[Velocity, 'dy'],
([px, py, dx, dy], meta) => () => {
const count = meta.count
for (let i = 0; i < count; i++) {
px[i] = (px[i] as number) + (dx[i] as number) * DT
py[i] = (py[i] as number) + (dy[i] as number) * DT
// SELF-CONTAINED (DT defined inside, closes over nothing) so the CODEGEN path is exercised —
// this property (codegen integrator byte-identical to .each under random spawn/despawn/grow)
// is the primary codegen correctness gate.
(vs, meta) => {
const px = vs[0] as Float32Array
const py = vs[1] as Float32Array
const dx = vs[2] as Float32Array
const dy = vs[3] as Float32Array
const dt = 1 / 60
return () => {
const count = meta.count
for (let i = 0; i < count; i++) {
px[i] = px[i]! + dx[i]! * dt
py[i] = py[i]! + dy[i]! * dt
}
}
},
)
Expand Down
Loading
Loading