Skip to content

feat(types,symbols): runtime type system + interned i31 field keys#179

Merged
kollhof merged 22 commits into
mainfrom
i31-symbols
Jun 20, 2026
Merged

feat(types,symbols): runtime type system + interned i31 field keys#179
kollhof merged 22 commits into
mainfrom
i31-symbols

Conversation

@kollhof

@kollhof kollhof commented Jun 20, 2026

Copy link
Copy Markdown
Member

Summary

Lands the runtime type system and the symbol/field-key architecture as one line of work, plus the supporting fixes they exposed.

Types — runtime type/enum/union values

  • type: / enum: / union: declarations lower to first-class runtime $Type values ($RecType / $TupleType flavour subtypes), with construction, equality, and structural reads on typed instances.
  • A unified guard_apply mechanism dispatches typed / structural / predicate guards via br_cast, covering: typed rec and tuple guards, rec projection (downcast to a base type, dropping extra fields), union membership, enum case values + enum-level guards, and match-position guards.
  • fmt/repr render a typed instance as its bare structural payload.
  • Runtime rename: the WAT record type $Rec/$RecImpl becomes $Dict/$DictImpl (a record and a dict share one collection type).

Symbols — interned field keys as tagged i31

  • Static field-name keys are interned package-wide: one canonical id per distinct name, assigned at link.
  • A symbol is a compile-time constant tagged i31 word (id << 3) | 0b010 (past the bool words false = 0 / true = 1 in the shared i31 space) — a non-allocating immediate, not a heap struct. Identity is whole-word ref.eq; the word is its own hash. No per-key allocation and nothing for the GC to trace.
  • Identity ops treat the word opaquely (deep_eq via its ref.eq fallback, hash_i31 via its i31 arm); only renderers discriminate via an is_symbol predicate (dict key formatting, repr). The $Symbol struct type is removed.
  • Quoted string keys and computed keys keep their own keyspaces — {foo: 1, 'foo': 2} holds two distinct coexisting keys.

Supporting

  • JS interop rec_get resolves a string key to its symbol with a string-first / symbol-fallback navigation.
  • Dependency bump (wasmtime 45.0.2, cranelift 0.132.2).
  • DAP uses the null collector to sidestep a wasmtime DRC async-scan debug-assert.

Testing

  • Full suite green: 1366 lib + 42 CLI + JS interop integration, 0 failures.
  • New rt/symbols.test.fnk characterization fixture (field access, repr, distinct symbol/string keys, identity, bool branch/repr) pins symbol behaviour across the representation.
  • wasm snapshot expectations re-blessed for the i31 representation.

kollhof added 22 commits June 15, 2026 13:15
The HAMT-backed type was named $Rec, but it is the dynamic dict that
backs the record protocol, not the (static, typed) record itself.
Rename the runtime type to $Dict/$DictImpl and the cross-module export
to Dict; the rec_* protocol functions and @impl std/rec.fnk bindings
keep the rec_ name (they implement record operations). Surface
vocabulary (record) is unchanged.
Replace the panic! arm for type/enum/union with a uniform seed+accretion
lowering family, mirroring the rec/seq-literal pattern:

  type record   new_type   + type_set_field
  type tuple    new_type   + type_push
  type ..Base   new_type   + type_inherit
  union         new_union  + union_add
  enum          new_enum   + (new_type per member) + enum_add

An enum member is a type: with a tag, so member payloads reuse
lower_type_body. Generic declarations (type/enum/union T:) wrap the body
accretion in a fn over the type-params via the shared
wrap_decl_in_params_fn helper, so Spam u8 / Option u8 is application.

Covered by 15 cps_module snapshot tests in test_types.fnk. No WASM
lowering yet (the new builtins have no codegen arm).
WASM lowering for the type system: declarations now mint runtime
$Type values, instances construct, and both compare correctly.

Runtime (rt/types.wat): $Type + $Inst hierarchy ($Rec/$Tuple
instances wrapping $Dict/$List payloads). All 9 construction builtins
(new_type/type_set_field/type_push/type_inherit, new_union/union_add,
new_enum/enum_add) emit + run. Applying a $Type builds an instance --
apply.wat stays dumb (br_on_cast $Type, delegates wholesale to
type_apply); type-construction is invisible to the apply machinery.

Equality (protocols.wat eq/neq arms): type/enum = identity; union =
structural set-eq (members are a $Set, delegates to set eq -- order
independent); instances = nominal ($type ref-eq) + structural (payload
deep-eq). A typed instance is NOT equal to a bare collection with the
same contents (nominal typing). $Type gets a trivial hash (0, ref.eq
disambiguates -- cf. closure hash) so it can be a set member.

Intrinsics: std/int.fnk migrated to compiled-source stdlib
(MIGRATED_STDLIB_FNK); `u8 = type _` is a contentless placeholder so
record/tuple fields resolve. Real $RuntimeType ctors + per-field
literal coercion are deferred.

Tests: rt/types.test.fnk (8 behaviour tests documenting the identity
model -- incl. the enum-nominal vs union-structural distinction),
aggregated by rt/all.test.fnk, gated by runner::runtime_native_test_suite_runs.
WAT codegen snapshots in passes/wasm/test_types.fnk.

Deferred (documented): instance reads (field access / destructure --
delegate to payload), per-field literal coercion via field-type ctors,
real intrinsic $RuntimeType, tuple-spread splice order verification,
function types (FFI).
A typed instance responds to all structural read syntax by unwrapping
its payload and delegating -- field access (foo.bar / foo.(0)),
destructure ({bar} = / [bar] =), and spread ({..foo} / [..foo]).

Mechanism: inst_payload unwraps $Rec->$Dict / $Tuple->$List; op_dot's
$Inst arm unwraps and re-dispatches; is_rec_like/is_seq_like accept
instances and succeed with the bare payload.

Reads STRIP the nominal type (a read yields a bare collection); identity
is conferred ONLY by a constructor. So {..foo} is bare -- write
Foo {..foo} to re-type. (Refines the syntax-model: spread always strips,
no single-spread-preserves special case; multi-spread is just a bare
merge.) Guarded/refutable destructure is a later phase.

6 new behaviour tests (rec + tuple: destructure, dot, spread).
Collapse the three pattern-guard paths (IsSeqLike/IsRecLike structural
wrapping, predicate MatchGuard, type/constructor heads) into one CPS
node: guard_apply(head, val, succ, fail). The head is a runtime value
- a structural protocol (rec_protocol/tuple_protocol), a predicate fn,
or a type. Sub-pattern args (is_foo {bar}) now destructure in the
guard's success path instead of being built as a value.

Runtime guard_apply (rt/protocols.wat) br-casts the head: $Type ->
is_instance (rt/types.wat, with $base-chain walk); $Closure ->
predicate call + bool-branch via make_guard_branch (rt/apply.wat).

WASM lowering routes rec_protocol/tuple_protocol heads back to the
existing is_rec_like/is_seq_like runtime guards (shim, flagged for
follow-up to make the protocols real type values).
…ate guards

guard_apply(ctx, guard, val, succ, fail) br-casts the guard and branches
to succ(val)/fail():
- i31 sentinel (0=rec, 1=tuple) -> is_rec_like/is_seq_like, for bare
  {...}/[...] patterns. Interim magic-i31 singletons emitted inline by
  lowering; to be unified with real protocol type values later.
- $Type -> is_instance with $base-chain walk (subtype satisfies a base
  guard).
- $Closure -> fink predicate fn; bool verdict via the branch cont.

Lowering emits the i31 sentinel for RecProtocol/TupleProtocol heads;
is_rec_like/is_seq_like are retained as the structural handlers.
Replace the flat $Type (carrying both $fields and $positionals, flavour
inferred from dict_size) with a $base-chain of flavour subtypes:
- $Type: shared core (mod_id, cps_id, base).
- $RecType: + $fields (full name->field-type Dict; HAMT structural sharing
  makes carrying base's fields + own cheap).
- $TupleType: + $positionals.
- $Union/$Enum: drop the unused $fields/$positionals they carried.

Construction is chain-build: new_type seeds a unit $Type; type_set_field/
type_push construct-or-accrete (wrap the current node into a Rec/TupleType, or
accrete if already that flavour); type_inherit runtime-br_casts the base to copy
its flavour + members (handles inherit-only, where flavour is only known from the
base value). type_apply discriminates by br_cast on $RecType/$TupleType instead
of dict_size.

Contained to rt/types.wat (the accretion ops are already cont-threaded, so the
construct-or-accrete reshape needs no CPS/lowering change). All type tests pass;
full suite green; no snapshot drift.
Guarding a typed instance against a base type now projects it to the
base's field set: `Foo foo = FooBar {bar: 1, spam: 2}` binds `foo` as
`Foo {bar: 1}`, dropping fields absent from the base.

The guard binding now sees the post-guard refined value. lower_pat_lhs
threads guard_apply's success result (v1) to the ident bind and any
sub-patterns: the bind is a MatchBind of v1 chained after the MatchGuard,
so it nests into the success continuation and is collected for the arm
body. Previously the bind captured the pre-guard subject, so projection
was lost and the binding preceded the guard decision.

Runtime: copy_by_keys (HAMT key-filter) in dict.wat, project_inst in
types.wat, and the $Type guard arm in protocols.wat build the projected
instance.
Cover `match val: Foo foo -> ...` for typed rec instances: bind on
match, destructure (`Foo {bar}`), type-based arm selection, and
downcast-to-base projection in match position. All pass against the
existing guard_apply $Type arm -- match position reuses the same
pattern-guard mechanism as bind position, no runtime change.
`SomeUnion x = val` / `match val: SomeUnion m -> ...` matches when val is
an instance of any union member. is_union_member (types.wat) walks the
instance's type + $base chain, probing each link against the member set
via the set's synchronous op_in (HAMT has) -- no member iteration. A
derived instance matches a union listing its base (IS-A walks the
instance's ancestry); a base instance does not match a union listing a
derived type (subtyping is one-directional).

A union classifies -- it has no fields of its own -- so the guard binds
`val` UNCHANGED (no projection), keeping its concrete member type so a
further match can discriminate. The new $Union arm in guard_apply
precedes the $Type arm since $Union <: $Type.
…guards

Enum cases are proper types stored in the enum's $cases dict, so they
reuse all existing type machinery:

- `Opt.Some {val: 12}` constructs a case instance: a new $Enum arm in
  op_dot reads the case type from $cases (via enum_cases), then type_apply
  builds the instance. Equality and case guards (`Opt.Some s = ...`) work
  through the existing $Type machinery.
- `{Some, None} = Opt` destructures the case namespace: a new $Enum arm in
  is_rec_like unwraps to the $cases dict so rec_pop reads case types by name.
- `Opt o = Opt.Some {...}` / `match v: Opt _ -> ...` guard against the whole
  enum: enum_add links each case type's $base to the enum, so a case
  instance IS-A its enum and the existing $Type guard arm matches any case
  via the $base-walk (the enum analogue of union membership).

Enum-level match through a closure is blocked by a pre-existing
subtype-base-walk-through-closure bug (is_instance ref.eq breaks on a
closure-captured type; affects rec subtypes too); enum-level tests use the
inline form that proves the feature.
…re bug

Add a passing test for a rec subtype (`..Base`) guard matched through a
closure-captured type -- the base-walk works.

Skip the enum-level-match-through-closure test: `is_opt = fn val: match
val: Opt _` returns the wildcard arm for every case. Inline enum-level
match and rec-subtype-through-closure both work, so this is enum-specific
-- the closure-captured enum is not ref.eq to the value stored in each
case's $base. Under investigation.
…nt position

`f g.h 42` in argument position parsed as two flat args of the outer call
(`Apply(f, Member(g,h), 42)`) instead of one member application
(`Apply(f, Apply(Member(g,h), 42))`). parse_apply_no_block only collected
application args for an Ident head; a Member head fell through and returned
bare, letting the outer call grab the following tokens as separate args.
Add the Member-head case, mirroring the statement-level path in parse_apply.

This is the root cause behind `is_opt Opt.Some {val: 1}` failing: the enum
case constructor and its rec payload were passed as two arguments, so the
guard saw the bare case type, not a constructed instance. Fixing the parse
also unblocks match-arm member-access heads (`match v: Opt.Some s -> ...`).

Re-enables the enum-level-match and enum-case-match-with-destructure tests
(previously skipped/removed for this gap).
fmt_val and repr_val had no $Inst arm, so formatting a typed instance
(`Foo {bar: 1}`) trapped on the unreachable tail -- which made `equals`
trap when rendering a typed value in its failure message. Add an $Inst arm
to both: unwrap via inst_payload and re-dispatch, so a typed instance
renders as its bare payload (`{bar: 1}`).

A TODO in each marks the nominal-name rendering (`Foo {bar: 1}`): the
type's (mod_id, cps_id) resolves to the name host-side via the backtrace
reflection channel -- deferred.

With fmt working, the enum-instance tests drop the `a == b, true`
workaround and compare directly with equals/not_equals.
Add rt/symbols.wat: a $Symbol is the runtime identity of a source name
(a record field; later type/module/fn names), interned package-wide to one
canonical instance per id. Identity is ref.eq; the i32 id doubles as the
hash (dense, distinct -> distributes across hamt buckets, no string hashing).
The source name is strippable host-side metadata.

Wire the $Symbol arm into the hashing.wat dispatcher so a symbol is usable
as a dict key (dict keys are already (ref eq) -- no structural change).
Register the module in emit.rs.

Runtime-only groundwork: nothing emits $Symbol yet, so behavior is
unchanged. Next: lower.rs emits $Symbol for static record keys + a
per-fragment symbol table, with linker dedup-by-name giving canonical ids.

Representation is a struct for now; a tagged-i31 form is a later localized
swap (see design notes).
Symbols are compile-time interned by id: equality is i32.eq on $id, not
ref.eq. So the same id from any allocation is the same symbol, and the
compiler can emit `struct.new $Symbol (i32.const id)` inline at each use
site -- no global instance table, no init pass, no const-expr global. The
interning is purely the compile-time name->id assignment.
Record field keys derived from identifiers (`{foo}`, `r.foo`, type fields,
enum members) lower to a new `Lit::Symbol` IR variant, rendered `ƒ'name'`.
Quoted keys (`{'foo bar': v}`) and computed keys (`{(x): v}`) stay value keys
(dict semantics) -- the symbol-vs-value split is decided at lowering by key
syntax, not by runtime coercion.

Register test_patterns_dict.fnk for the dict-key contract.
Wasm lowering boxes a Lit::Symbol field key to $Symbol (via box_symbol /
Operand::SymbolId, linker-resolved to a canonical id); string and computed
keys stay value keys. Routed at every key site: RecPut/RecPop, op_dot,
TypeSetField, EnumAdd, the import-rec, and module Pub -- so exports are
symbol-keyed and `{x} = import` destructures match.

The interop boundary keeps a name<->symbol table (no interface files yet):
symbols.wat owns symbol_table/symbol_names + str_to_symbol/symbol_to_str,
populated by register_symbol; interop's rec_get_by_bytes resolves a host
byte-name to its symbol before lookup. fink dict ops never coerce -- key kind
is fixed at compile time. The dict formatter reprs symbol and string keys via
repr_val, computed keys in parens.

Known issue: dap::read_stdin_under_dap_traps fails with a wasmtime GC stack-
roots panic during the stdin-trap unwind; under investigation.
JS proxy property access (`mod.foo`) and dict string-key access are
indistinguishable -- both reach the same get trap with a string. rec_get now
branches on key type: a $Str key tries the literal string first, then falls
back to its interned $Symbol; non-$Str keys are used verbatim. This lets JS
navigate symbol-keyed exports by name while still honouring genuine string
dict keys.
`cargo update`: wasmtime 45.0.0 -> 45.0.2 (latest), cranelift 0.132.0 ->
0.132.2, and assorted transitive bumps.
The default deferred-reference-counting collector hits a debug-assert in its
async stack scan (`on-stack gc ref ... not in over-approximated stack roots
set`) when a collection fires mid-run under the async/fiber execution that
guest_debug requires -- e.g. allocating while entering a host call. Symbol-key
lowering raised allocation enough to trip it (read-stdin-under-dap path). A
debug session runs short programs in a throwaway process, so the null collector
(never reclaims, never scans) is a sound choice; the non-debug runner keeps the
real collector.
A static field-name key is a compile-time constant: the linker assigns one
package-wide id per distinct name and folds it into a tagged i31 word
`(id << 3) | 0b010`, sitting past the two bool words (false = 0, true = 1) in
the shared i31 space. box_symbol emits `ref.i31 <word>` and the table
population passes the encoded word to register_symbol, so a field key is a
non-allocating immediate with whole-word ref.eq identity -- no per-key
`struct.new $Symbol` and nothing for the GC to trace.

Identity ops treat the word opaquely: deep_eq routes symbols through its ref.eq
fallback and hash_i31 through its i31 arm (the word is its own hash), so both
shed their symbol-specific arms. Only renderers discriminate, via a new
is_symbol predicate -- dict key formatting and repr (which peels a symbol off
before the bool i31 arm). The $Symbol struct type and new_symbol constructor
are removed.
@github-actions

Copy link
Copy Markdown

📦 This PR will release v0.87.0 (minor) when merged.

@kollhof kollhof merged commit deacd82 into main Jun 20, 2026
14 checks passed
@kollhof kollhof deleted the i31-symbols branch June 20, 2026 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant