feat(types,symbols): runtime type system + interned i31 field keys#179
Merged
Conversation
The HAMT-backed type was named $Rec, but it is the dynamic dict that backs the record protocol, not the (static, typed) record itself. Rename the runtime type to $Dict/$DictImpl and the cross-module export to Dict; the rec_* protocol functions and @impl std/rec.fnk bindings keep the rec_ name (they implement record operations). Surface vocabulary (record) is unchanged.
Replace the panic! arm for type/enum/union with a uniform seed+accretion lowering family, mirroring the rec/seq-literal pattern: type record new_type + type_set_field type tuple new_type + type_push type ..Base new_type + type_inherit union new_union + union_add enum new_enum + (new_type per member) + enum_add An enum member is a type: with a tag, so member payloads reuse lower_type_body. Generic declarations (type/enum/union T:) wrap the body accretion in a fn over the type-params via the shared wrap_decl_in_params_fn helper, so Spam u8 / Option u8 is application. Covered by 15 cps_module snapshot tests in test_types.fnk. No WASM lowering yet (the new builtins have no codegen arm).
WASM lowering for the type system: declarations now mint runtime $Type values, instances construct, and both compare correctly. Runtime (rt/types.wat): $Type + $Inst hierarchy ($Rec/$Tuple instances wrapping $Dict/$List payloads). All 9 construction builtins (new_type/type_set_field/type_push/type_inherit, new_union/union_add, new_enum/enum_add) emit + run. Applying a $Type builds an instance -- apply.wat stays dumb (br_on_cast $Type, delegates wholesale to type_apply); type-construction is invisible to the apply machinery. Equality (protocols.wat eq/neq arms): type/enum = identity; union = structural set-eq (members are a $Set, delegates to set eq -- order independent); instances = nominal ($type ref-eq) + structural (payload deep-eq). A typed instance is NOT equal to a bare collection with the same contents (nominal typing). $Type gets a trivial hash (0, ref.eq disambiguates -- cf. closure hash) so it can be a set member. Intrinsics: std/int.fnk migrated to compiled-source stdlib (MIGRATED_STDLIB_FNK); `u8 = type _` is a contentless placeholder so record/tuple fields resolve. Real $RuntimeType ctors + per-field literal coercion are deferred. Tests: rt/types.test.fnk (8 behaviour tests documenting the identity model -- incl. the enum-nominal vs union-structural distinction), aggregated by rt/all.test.fnk, gated by runner::runtime_native_test_suite_runs. WAT codegen snapshots in passes/wasm/test_types.fnk. Deferred (documented): instance reads (field access / destructure -- delegate to payload), per-field literal coercion via field-type ctors, real intrinsic $RuntimeType, tuple-spread splice order verification, function types (FFI).
A typed instance responds to all structural read syntax by unwrapping
its payload and delegating -- field access (foo.bar / foo.(0)),
destructure ({bar} = / [bar] =), and spread ({..foo} / [..foo]).
Mechanism: inst_payload unwraps $Rec->$Dict / $Tuple->$List; op_dot's
$Inst arm unwraps and re-dispatches; is_rec_like/is_seq_like accept
instances and succeed with the bare payload.
Reads STRIP the nominal type (a read yields a bare collection); identity
is conferred ONLY by a constructor. So {..foo} is bare -- write
Foo {..foo} to re-type. (Refines the syntax-model: spread always strips,
no single-spread-preserves special case; multi-spread is just a bare
merge.) Guarded/refutable destructure is a later phase.
6 new behaviour tests (rec + tuple: destructure, dot, spread).
Collapse the three pattern-guard paths (IsSeqLike/IsRecLike structural
wrapping, predicate MatchGuard, type/constructor heads) into one CPS
node: guard_apply(head, val, succ, fail). The head is a runtime value
- a structural protocol (rec_protocol/tuple_protocol), a predicate fn,
or a type. Sub-pattern args (is_foo {bar}) now destructure in the
guard's success path instead of being built as a value.
Runtime guard_apply (rt/protocols.wat) br-casts the head: $Type ->
is_instance (rt/types.wat, with $base-chain walk); $Closure ->
predicate call + bool-branch via make_guard_branch (rt/apply.wat).
WASM lowering routes rec_protocol/tuple_protocol heads back to the
existing is_rec_like/is_seq_like runtime guards (shim, flagged for
follow-up to make the protocols real type values).
…ate guards
guard_apply(ctx, guard, val, succ, fail) br-casts the guard and branches
to succ(val)/fail():
- i31 sentinel (0=rec, 1=tuple) -> is_rec_like/is_seq_like, for bare
{...}/[...] patterns. Interim magic-i31 singletons emitted inline by
lowering; to be unified with real protocol type values later.
- $Type -> is_instance with $base-chain walk (subtype satisfies a base
guard).
- $Closure -> fink predicate fn; bool verdict via the branch cont.
Lowering emits the i31 sentinel for RecProtocol/TupleProtocol heads;
is_rec_like/is_seq_like are retained as the structural handlers.
Replace the flat $Type (carrying both $fields and $positionals, flavour inferred from dict_size) with a $base-chain of flavour subtypes: - $Type: shared core (mod_id, cps_id, base). - $RecType: + $fields (full name->field-type Dict; HAMT structural sharing makes carrying base's fields + own cheap). - $TupleType: + $positionals. - $Union/$Enum: drop the unused $fields/$positionals they carried. Construction is chain-build: new_type seeds a unit $Type; type_set_field/ type_push construct-or-accrete (wrap the current node into a Rec/TupleType, or accrete if already that flavour); type_inherit runtime-br_casts the base to copy its flavour + members (handles inherit-only, where flavour is only known from the base value). type_apply discriminates by br_cast on $RecType/$TupleType instead of dict_size. Contained to rt/types.wat (the accretion ops are already cont-threaded, so the construct-or-accrete reshape needs no CPS/lowering change). All type tests pass; full suite green; no snapshot drift.
Guarding a typed instance against a base type now projects it to the
base's field set: `Foo foo = FooBar {bar: 1, spam: 2}` binds `foo` as
`Foo {bar: 1}`, dropping fields absent from the base.
The guard binding now sees the post-guard refined value. lower_pat_lhs
threads guard_apply's success result (v1) to the ident bind and any
sub-patterns: the bind is a MatchBind of v1 chained after the MatchGuard,
so it nests into the success continuation and is collected for the arm
body. Previously the bind captured the pre-guard subject, so projection
was lost and the binding preceded the guard decision.
Runtime: copy_by_keys (HAMT key-filter) in dict.wat, project_inst in
types.wat, and the $Type guard arm in protocols.wat build the projected
instance.
Cover `match val: Foo foo -> ...` for typed rec instances: bind on
match, destructure (`Foo {bar}`), type-based arm selection, and
downcast-to-base projection in match position. All pass against the
existing guard_apply $Type arm -- match position reuses the same
pattern-guard mechanism as bind position, no runtime change.
`SomeUnion x = val` / `match val: SomeUnion m -> ...` matches when val is an instance of any union member. is_union_member (types.wat) walks the instance's type + $base chain, probing each link against the member set via the set's synchronous op_in (HAMT has) -- no member iteration. A derived instance matches a union listing its base (IS-A walks the instance's ancestry); a base instance does not match a union listing a derived type (subtyping is one-directional). A union classifies -- it has no fields of its own -- so the guard binds `val` UNCHANGED (no projection), keeping its concrete member type so a further match can discriminate. The new $Union arm in guard_apply precedes the $Type arm since $Union <: $Type.
…guards
Enum cases are proper types stored in the enum's $cases dict, so they
reuse all existing type machinery:
- `Opt.Some {val: 12}` constructs a case instance: a new $Enum arm in
op_dot reads the case type from $cases (via enum_cases), then type_apply
builds the instance. Equality and case guards (`Opt.Some s = ...`) work
through the existing $Type machinery.
- `{Some, None} = Opt` destructures the case namespace: a new $Enum arm in
is_rec_like unwraps to the $cases dict so rec_pop reads case types by name.
- `Opt o = Opt.Some {...}` / `match v: Opt _ -> ...` guard against the whole
enum: enum_add links each case type's $base to the enum, so a case
instance IS-A its enum and the existing $Type guard arm matches any case
via the $base-walk (the enum analogue of union membership).
Enum-level match through a closure is blocked by a pre-existing
subtype-base-walk-through-closure bug (is_instance ref.eq breaks on a
closure-captured type; affects rec subtypes too); enum-level tests use the
inline form that proves the feature.
…re bug Add a passing test for a rec subtype (`..Base`) guard matched through a closure-captured type -- the base-walk works. Skip the enum-level-match-through-closure test: `is_opt = fn val: match val: Opt _` returns the wildcard arm for every case. Inline enum-level match and rec-subtype-through-closure both work, so this is enum-specific -- the closure-captured enum is not ref.eq to the value stored in each case's $base. Under investigation.
…nt position
`f g.h 42` in argument position parsed as two flat args of the outer call
(`Apply(f, Member(g,h), 42)`) instead of one member application
(`Apply(f, Apply(Member(g,h), 42))`). parse_apply_no_block only collected
application args for an Ident head; a Member head fell through and returned
bare, letting the outer call grab the following tokens as separate args.
Add the Member-head case, mirroring the statement-level path in parse_apply.
This is the root cause behind `is_opt Opt.Some {val: 1}` failing: the enum
case constructor and its rec payload were passed as two arguments, so the
guard saw the bare case type, not a constructed instance. Fixing the parse
also unblocks match-arm member-access heads (`match v: Opt.Some s -> ...`).
Re-enables the enum-level-match and enum-case-match-with-destructure tests
(previously skipped/removed for this gap).
fmt_val and repr_val had no $Inst arm, so formatting a typed instance
(`Foo {bar: 1}`) trapped on the unreachable tail -- which made `equals`
trap when rendering a typed value in its failure message. Add an $Inst arm
to both: unwrap via inst_payload and re-dispatch, so a typed instance
renders as its bare payload (`{bar: 1}`).
A TODO in each marks the nominal-name rendering (`Foo {bar: 1}`): the
type's (mod_id, cps_id) resolves to the name host-side via the backtrace
reflection channel -- deferred.
With fmt working, the enum-instance tests drop the `a == b, true`
workaround and compare directly with equals/not_equals.
Add rt/symbols.wat: a $Symbol is the runtime identity of a source name (a record field; later type/module/fn names), interned package-wide to one canonical instance per id. Identity is ref.eq; the i32 id doubles as the hash (dense, distinct -> distributes across hamt buckets, no string hashing). The source name is strippable host-side metadata. Wire the $Symbol arm into the hashing.wat dispatcher so a symbol is usable as a dict key (dict keys are already (ref eq) -- no structural change). Register the module in emit.rs. Runtime-only groundwork: nothing emits $Symbol yet, so behavior is unchanged. Next: lower.rs emits $Symbol for static record keys + a per-fragment symbol table, with linker dedup-by-name giving canonical ids. Representation is a struct for now; a tagged-i31 form is a later localized swap (see design notes).
Symbols are compile-time interned by id: equality is i32.eq on $id, not ref.eq. So the same id from any allocation is the same symbol, and the compiler can emit `struct.new $Symbol (i32.const id)` inline at each use site -- no global instance table, no init pass, no const-expr global. The interning is purely the compile-time name->id assignment.
Record field keys derived from identifiers (`{foo}`, `r.foo`, type fields,
enum members) lower to a new `Lit::Symbol` IR variant, rendered `ƒ'name'`.
Quoted keys (`{'foo bar': v}`) and computed keys (`{(x): v}`) stay value keys
(dict semantics) -- the symbol-vs-value split is decided at lowering by key
syntax, not by runtime coercion.
Register test_patterns_dict.fnk for the dict-key contract.
Wasm lowering boxes a Lit::Symbol field key to $Symbol (via box_symbol /
Operand::SymbolId, linker-resolved to a canonical id); string and computed
keys stay value keys. Routed at every key site: RecPut/RecPop, op_dot,
TypeSetField, EnumAdd, the import-rec, and module Pub -- so exports are
symbol-keyed and `{x} = import` destructures match.
The interop boundary keeps a name<->symbol table (no interface files yet):
symbols.wat owns symbol_table/symbol_names + str_to_symbol/symbol_to_str,
populated by register_symbol; interop's rec_get_by_bytes resolves a host
byte-name to its symbol before lookup. fink dict ops never coerce -- key kind
is fixed at compile time. The dict formatter reprs symbol and string keys via
repr_val, computed keys in parens.
Known issue: dap::read_stdin_under_dap_traps fails with a wasmtime GC stack-
roots panic during the stdin-trap unwind; under investigation.
JS proxy property access (`mod.foo`) and dict string-key access are indistinguishable -- both reach the same get trap with a string. rec_get now branches on key type: a $Str key tries the literal string first, then falls back to its interned $Symbol; non-$Str keys are used verbatim. This lets JS navigate symbol-keyed exports by name while still honouring genuine string dict keys.
`cargo update`: wasmtime 45.0.0 -> 45.0.2 (latest), cranelift 0.132.0 -> 0.132.2, and assorted transitive bumps.
The default deferred-reference-counting collector hits a debug-assert in its async stack scan (`on-stack gc ref ... not in over-approximated stack roots set`) when a collection fires mid-run under the async/fiber execution that guest_debug requires -- e.g. allocating while entering a host call. Symbol-key lowering raised allocation enough to trip it (read-stdin-under-dap path). A debug session runs short programs in a throwaway process, so the null collector (never reclaims, never scans) is a sound choice; the non-debug runner keeps the real collector.
A static field-name key is a compile-time constant: the linker assigns one package-wide id per distinct name and folds it into a tagged i31 word `(id << 3) | 0b010`, sitting past the two bool words (false = 0, true = 1) in the shared i31 space. box_symbol emits `ref.i31 <word>` and the table population passes the encoded word to register_symbol, so a field key is a non-allocating immediate with whole-word ref.eq identity -- no per-key `struct.new $Symbol` and nothing for the GC to trace. Identity ops treat the word opaquely: deep_eq routes symbols through its ref.eq fallback and hash_i31 through its i31 arm (the word is its own hash), so both shed their symbol-specific arms. Only renderers discriminate, via a new is_symbol predicate -- dict key formatting and repr (which peels a symbol off before the bool i31 arm). The $Symbol struct type and new_symbol constructor are removed.
|
📦 This PR will release v0.87.0 (minor) when merged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the runtime type system and the symbol/field-key architecture as one line of work, plus the supporting fixes they exposed.
Types — runtime type/enum/union values
type:/enum:/union:declarations lower to first-class runtime$Typevalues ($RecType/$TupleTypeflavour subtypes), with construction, equality, and structural reads on typed instances.guard_applymechanism dispatches typed / structural / predicate guards viabr_cast, covering: typed rec and tuple guards, rec projection (downcast to a base type, dropping extra fields), union membership, enum case values + enum-level guards, and match-position guards.fmt/reprrender a typed instance as its bare structural payload.$Rec/$RecImplbecomes$Dict/$DictImpl(a record and a dict share one collection type).Symbols — interned field keys as tagged i31
(id << 3) | 0b010(past the bool words false = 0 / true = 1 in the shared i31 space) — a non-allocating immediate, not a heap struct. Identity is whole-wordref.eq; the word is its own hash. No per-key allocation and nothing for the GC to trace.is_symbolpredicate (dict key formatting, repr). The$Symbolstruct type is removed.{foo: 1, 'foo': 2}holds two distinct coexisting keys.Supporting
rec_getresolves a string key to its symbol with a string-first / symbol-fallback navigation.Testing
rt/symbols.test.fnkcharacterization fixture (field access, repr, distinct symbol/string keys, identity, bool branch/repr) pins symbol behaviour across the representation.