feat(types,symbols): runtime type system + interned i31 field keys by kollhof · Pull Request #179 · fink-lang/fink

kollhof · 2026-06-20T15:39:22Z

Summary

Lands the runtime type system and the symbol/field-key architecture as one line of work, plus the supporting fixes they exposed.

Types — runtime type/enum/union values

type: / enum: / union: declarations lower to first-class runtime $Type values ($RecType / $TupleType flavour subtypes), with construction, equality, and structural reads on typed instances.
A unified guard_apply mechanism dispatches typed / structural / predicate guards via br_cast, covering: typed rec and tuple guards, rec projection (downcast to a base type, dropping extra fields), union membership, enum case values + enum-level guards, and match-position guards.
fmt/repr render a typed instance as its bare structural payload.
Runtime rename: the WAT record type $Rec/$RecImpl becomes $Dict/$DictImpl (a record and a dict share one collection type).

Symbols — interned field keys as tagged i31

Static field-name keys are interned package-wide: one canonical id per distinct name, assigned at link.
A symbol is a compile-time constant tagged i31 word (id << 3) | 0b010 (past the bool words false = 0 / true = 1 in the shared i31 space) — a non-allocating immediate, not a heap struct. Identity is whole-word ref.eq; the word is its own hash. No per-key allocation and nothing for the GC to trace.
Identity ops treat the word opaquely (deep_eq via its ref.eq fallback, hash_i31 via its i31 arm); only renderers discriminate via an is_symbol predicate (dict key formatting, repr). The $Symbol struct type is removed.
Quoted string keys and computed keys keep their own keyspaces — {foo: 1, 'foo': 2} holds two distinct coexisting keys.

Supporting

JS interop rec_get resolves a string key to its symbol with a string-first / symbol-fallback navigation.
Dependency bump (wasmtime 45.0.2, cranelift 0.132.2).
DAP uses the null collector to sidestep a wasmtime DRC async-scan debug-assert.

Testing

Full suite green: 1366 lib + 42 CLI + JS interop integration, 0 failures.
New rt/symbols.test.fnk characterization fixture (field access, repr, distinct symbol/string keys, identity, bool branch/repr) pins symbol behaviour across the representation.
wasm snapshot expectations re-blessed for the i31 representation.

@impl

The HAMT-backed type was named $Rec, but it is the dynamic dict that backs the record protocol, not the (static, typed) record itself. Rename the runtime type to $Dict/$DictImpl and the cross-module export to Dict; the rec_* protocol functions and @impl std/rec.fnk bindings keep the rec_ name (they implement record operations). Surface vocabulary (record) is unchanged.

Replace the panic! arm for type/enum/union with a uniform seed+accretion lowering family, mirroring the rec/seq-literal pattern: type record new_type + type_set_field type tuple new_type + type_push type ..Base new_type + type_inherit union new_union + union_add enum new_enum + (new_type per member) + enum_add An enum member is a type: with a tag, so member payloads reuse lower_type_body. Generic declarations (type/enum/union T:) wrap the body accretion in a fn over the type-params via the shared wrap_decl_in_params_fn helper, so Spam u8 / Option u8 is application. Covered by 15 cps_module snapshot tests in test_types.fnk. No WASM lowering yet (the new builtins have no codegen arm).

WASM lowering for the type system: declarations now mint runtime $Type values, instances construct, and both compare correctly. Runtime (rt/types.wat): $Type + $Inst hierarchy ($Rec/$Tuple instances wrapping $Dict/$List payloads). All 9 construction builtins (new_type/type_set_field/type_push/type_inherit, new_union/union_add, new_enum/enum_add) emit + run. Applying a $Type builds an instance -- apply.wat stays dumb (br_on_cast $Type, delegates wholesale to type_apply); type-construction is invisible to the apply machinery. Equality (protocols.wat eq/neq arms): type/enum = identity; union = structural set-eq (members are a $Set, delegates to set eq -- order independent); instances = nominal ($type ref-eq) + structural (payload deep-eq). A typed instance is NOT equal to a bare collection with the same contents (nominal typing). $Type gets a trivial hash (0, ref.eq disambiguates -- cf. closure hash) so it can be a set member. Intrinsics: std/int.fnk migrated to compiled-source stdlib (MIGRATED_STDLIB_FNK); `u8 = type _` is a contentless placeholder so record/tuple fields resolve. Real $RuntimeType ctors + per-field literal coercion are deferred. Tests: rt/types.test.fnk (8 behaviour tests documenting the identity model -- incl. the enum-nominal vs union-structural distinction), aggregated by rt/all.test.fnk, gated by runner::runtime_native_test_suite_runs. WAT codegen snapshots in passes/wasm/test_types.fnk. Deferred (documented): instance reads (field access / destructure -- delegate to payload), per-field literal coercion via field-type ctors, real intrinsic $RuntimeType, tuple-spread splice order verification, function types (FFI).

A typed instance responds to all structural read syntax by unwrapping its payload and delegating -- field access (foo.bar / foo.(0)), destructure ({bar} = / [bar] =), and spread ({..foo} / [..foo]). Mechanism: inst_payload unwraps $Rec->$Dict / $Tuple->$List; op_dot's $Inst arm unwraps and re-dispatches; is_rec_like/is_seq_like accept instances and succeed with the bare payload. Reads STRIP the nominal type (a read yields a bare collection); identity is conferred ONLY by a constructor. So {..foo} is bare -- write Foo {..foo} to re-type. (Refines the syntax-model: spread always strips, no single-spread-preserves special case; multi-spread is just a bare merge.) Guarded/refutable destructure is a later phase. 6 new behaviour tests (rec + tuple: destructure, dot, spread).

Collapse the three pattern-guard paths (IsSeqLike/IsRecLike structural wrapping, predicate MatchGuard, type/constructor heads) into one CPS node: guard_apply(head, val, succ, fail). The head is a runtime value - a structural protocol (rec_protocol/tuple_protocol), a predicate fn, or a type. Sub-pattern args (is_foo {bar}) now destructure in the guard's success path instead of being built as a value. Runtime guard_apply (rt/protocols.wat) br-casts the head: $Type -> is_instance (rt/types.wat, with $base-chain walk); $Closure -> predicate call + bool-branch via make_guard_branch (rt/apply.wat). WASM lowering routes rec_protocol/tuple_protocol heads back to the existing is_rec_like/is_seq_like runtime guards (shim, flagged for follow-up to make the protocols real type values).

…ate guards guard_apply(ctx, guard, val, succ, fail) br-casts the guard and branches to succ(val)/fail(): - i31 sentinel (0=rec, 1=tuple) -> is_rec_like/is_seq_like, for bare {...}/[...] patterns. Interim magic-i31 singletons emitted inline by lowering; to be unified with real protocol type values later. - $Type -> is_instance with $base-chain walk (subtype satisfies a base guard). - $Closure -> fink predicate fn; bool verdict via the branch cont. Lowering emits the i31 sentinel for RecProtocol/TupleProtocol heads; is_rec_like/is_seq_like are retained as the structural handlers.

Replace the flat $Type (carrying both $fields and $positionals, flavour inferred from dict_size) with a $base-chain of flavour subtypes: - $Type: shared core (mod_id, cps_id, base). - $RecType: + $fields (full name->field-type Dict; HAMT structural sharing makes carrying base's fields + own cheap). - $TupleType: + $positionals. - $Union/$Enum: drop the unused $fields/$positionals they carried. Construction is chain-build: new_type seeds a unit $Type; type_set_field/ type_push construct-or-accrete (wrap the current node into a Rec/TupleType, or accrete if already that flavour); type_inherit runtime-br_casts the base to copy its flavour + members (handles inherit-only, where flavour is only known from the base value). type_apply discriminates by br_cast on $RecType/$TupleType instead of dict_size. Contained to rt/types.wat (the accretion ops are already cont-threaded, so the construct-or-accrete reshape needs no CPS/lowering change). All type tests pass; full suite green; no snapshot drift.

Guarding a typed instance against a base type now projects it to the base's field set: `Foo foo = FooBar {bar: 1, spam: 2}` binds `foo` as `Foo {bar: 1}`, dropping fields absent from the base. The guard binding now sees the post-guard refined value. lower_pat_lhs threads guard_apply's success result (v1) to the ident bind and any sub-patterns: the bind is a MatchBind of v1 chained after the MatchGuard, so it nests into the success continuation and is collected for the arm body. Previously the bind captured the pre-guard subject, so projection was lost and the binding preceded the guard decision. Runtime: copy_by_keys (HAMT key-filter) in dict.wat, project_inst in types.wat, and the $Type guard arm in protocols.wat build the projected instance.

Cover `match val: Foo foo -> ...` for typed rec instances: bind on match, destructure (`Foo {bar}`), type-based arm selection, and downcast-to-base projection in match position. All pass against the existing guard_apply $Type arm -- match position reuses the same pattern-guard mechanism as bind position, no runtime change.

`SomeUnion x = val` / `match val: SomeUnion m -> ...` matches when val is an instance of any union member. is_union_member (types.wat) walks the instance's type + $base chain, probing each link against the member set via the set's synchronous op_in (HAMT has) -- no member iteration. A derived instance matches a union listing its base (IS-A walks the instance's ancestry); a base instance does not match a union listing a derived type (subtyping is one-directional). A union classifies -- it has no fields of its own -- so the guard binds `val` UNCHANGED (no projection), keeping its concrete member type so a further match can discriminate. The new $Union arm in guard_apply precedes the $Type arm since $Union <: $Type.

…guards Enum cases are proper types stored in the enum's $cases dict, so they reuse all existing type machinery: - `Opt.Some {val: 12}` constructs a case instance: a new $Enum arm in op_dot reads the case type from $cases (via enum_cases), then type_apply builds the instance. Equality and case guards (`Opt.Some s = ...`) work through the existing $Type machinery. - `{Some, None} = Opt` destructures the case namespace: a new $Enum arm in is_rec_like unwraps to the $cases dict so rec_pop reads case types by name. - `Opt o = Opt.Some {...}` / `match v: Opt _ -> ...` guard against the whole enum: enum_add links each case type's $base to the enum, so a case instance IS-A its enum and the existing $Type guard arm matches any case via the $base-walk (the enum analogue of union membership). Enum-level match through a closure is blocked by a pre-existing subtype-base-walk-through-closure bug (is_instance ref.eq breaks on a closure-captured type; affects rec subtypes too); enum-level tests use the inline form that proves the feature.

…re bug Add a passing test for a rec subtype (`..Base`) guard matched through a closure-captured type -- the base-walk works. Skip the enum-level-match-through-closure test: `is_opt = fn val: match val: Opt _` returns the wildcard arm for every case. Inline enum-level match and rec-subtype-through-closure both work, so this is enum-specific -- the closure-captured enum is not ref.eq to the value stored in each case's $base. Under investigation.

…nt position `f g.h 42` in argument position parsed as two flat args of the outer call (`Apply(f, Member(g,h), 42)`) instead of one member application (`Apply(f, Apply(Member(g,h), 42))`). parse_apply_no_block only collected application args for an Ident head; a Member head fell through and returned bare, letting the outer call grab the following tokens as separate args. Add the Member-head case, mirroring the statement-level path in parse_apply. This is the root cause behind `is_opt Opt.Some {val: 1}` failing: the enum case constructor and its rec payload were passed as two arguments, so the guard saw the bare case type, not a constructed instance. Fixing the parse also unblocks match-arm member-access heads (`match v: Opt.Some s -> ...`). Re-enables the enum-level-match and enum-case-match-with-destructure tests (previously skipped/removed for this gap).

fmt_val and repr_val had no $Inst arm, so formatting a typed instance (`Foo {bar: 1}`) trapped on the unreachable tail -- which made `equals` trap when rendering a typed value in its failure message. Add an $Inst arm to both: unwrap via inst_payload and re-dispatch, so a typed instance renders as its bare payload (`{bar: 1}`). A TODO in each marks the nominal-name rendering (`Foo {bar: 1}`): the type's (mod_id, cps_id) resolves to the name host-side via the backtrace reflection channel -- deferred. With fmt working, the enum-instance tests drop the `a == b, true` workaround and compare directly with equals/not_equals.

Add rt/symbols.wat: a $Symbol is the runtime identity of a source name (a record field; later type/module/fn names), interned package-wide to one canonical instance per id. Identity is ref.eq; the i32 id doubles as the hash (dense, distinct -> distributes across hamt buckets, no string hashing). The source name is strippable host-side metadata. Wire the $Symbol arm into the hashing.wat dispatcher so a symbol is usable as a dict key (dict keys are already (ref eq) -- no structural change). Register the module in emit.rs. Runtime-only groundwork: nothing emits $Symbol yet, so behavior is unchanged. Next: lower.rs emits $Symbol for static record keys + a per-fragment symbol table, with linker dedup-by-name giving canonical ids. Representation is a struct for now; a tagged-i31 form is a later localized swap (see design notes).

Symbols are compile-time interned by id: equality is i32.eq on $id, not ref.eq. So the same id from any allocation is the same symbol, and the compiler can emit `struct.new $Symbol (i32.const id)` inline at each use site -- no global instance table, no init pass, no const-expr global. The interning is purely the compile-time name->id assignment.

Record field keys derived from identifiers (`{foo}`, `r.foo`, type fields, enum members) lower to a new `Lit::Symbol` IR variant, rendered `ƒ'name'`. Quoted keys (`{'foo bar': v}`) and computed keys (`{(x): v}`) stay value keys (dict semantics) -- the symbol-vs-value split is decided at lowering by key syntax, not by runtime coercion. Register test_patterns_dict.fnk for the dict-key contract.

Wasm lowering boxes a Lit::Symbol field key to $Symbol (via box_symbol / Operand::SymbolId, linker-resolved to a canonical id); string and computed keys stay value keys. Routed at every key site: RecPut/RecPop, op_dot, TypeSetField, EnumAdd, the import-rec, and module Pub -- so exports are symbol-keyed and `{x} = import` destructures match. The interop boundary keeps a name<->symbol table (no interface files yet): symbols.wat owns symbol_table/symbol_names + str_to_symbol/symbol_to_str, populated by register_symbol; interop's rec_get_by_bytes resolves a host byte-name to its symbol before lookup. fink dict ops never coerce -- key kind is fixed at compile time. The dict formatter reprs symbol and string keys via repr_val, computed keys in parens. Known issue: dap::read_stdin_under_dap_traps fails with a wasmtime GC stack- roots panic during the stdin-trap unwind; under investigation.

JS proxy property access (`mod.foo`) and dict string-key access are indistinguishable -- both reach the same get trap with a string. rec_get now branches on key type: a $Str key tries the literal string first, then falls back to its interned $Symbol; non-$Str keys are used verbatim. This lets JS navigate symbol-keyed exports by name while still honouring genuine string dict keys.

`cargo update`: wasmtime 45.0.0 -> 45.0.2 (latest), cranelift 0.132.0 -> 0.132.2, and assorted transitive bumps.

The default deferred-reference-counting collector hits a debug-assert in its async stack scan (`on-stack gc ref ... not in over-approximated stack roots set`) when a collection fires mid-run under the async/fiber execution that guest_debug requires -- e.g. allocating while entering a host call. Symbol-key lowering raised allocation enough to trip it (read-stdin-under-dap path). A debug session runs short programs in a throwaway process, so the null collector (never reclaims, never scans) is a sound choice; the non-debug runner keeps the real collector.

A static field-name key is a compile-time constant: the linker assigns one package-wide id per distinct name and folds it into a tagged i31 word `(id << 3) | 0b010`, sitting past the two bool words (false = 0, true = 1) in the shared i31 space. box_symbol emits `ref.i31 <word>` and the table population passes the encoded word to register_symbol, so a field key is a non-allocating immediate with whole-word ref.eq identity -- no per-key `struct.new $Symbol` and nothing for the GC to trace. Identity ops treat the word opaquely: deep_eq routes symbols through its ref.eq fallback and hash_i31 through its i31 arm (the word is its own hash), so both shed their symbol-specific arms. Only renderers discriminate, via a new is_symbol predicate -- dict key formatting and repr (which peels a symbol off before the bool i31 arm). The $Symbol struct type and new_symbol constructor are removed.

github-actions · 2026-06-20T15:39:39Z

📦 This PR will release v0.87.0 (minor) when merged.

kollhof added 22 commits June 15, 2026 13:15

chore(deps): update to latest compatible versions

b4da8a5

`cargo update`: wasmtime 45.0.0 -> 45.0.2 (latest), cranelift 0.132.0 -> 0.132.2, and assorted transitive bumps.

kollhof merged commit deacd82 into main Jun 20, 2026
14 checks passed

kollhof deleted the i31-symbols branch June 20, 2026 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(types,symbols): runtime type system + interned i31 field keys#179

feat(types,symbols): runtime type system + interned i31 field keys#179
kollhof merged 22 commits into
mainfrom
i31-symbols

kollhof commented Jun 20, 2026

Uh oh!

github-actions Bot commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kollhof commented Jun 20, 2026

Summary

Types — runtime type/enum/union values

Symbols — interned field keys as tagged i31

Supporting

Testing

Uh oh!

github-actions Bot commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant