Skip to content

Initial enum commit#261

Open
mitchpaulus wants to merge 33 commits into
mainfrom
enum-types
Open

Initial enum commit#261
mitchpaulus wants to merge 33 commits into
mainfrom
enum-types

Conversation

@mitchpaulus

Copy link
Copy Markdown
Owner

No description provided.

mitchpaulus and others added 30 commits June 28, 2026 21:30
- str/toJson now render enum payloads. str: `member(p ...)`, built with an
  explicit work stack (no recursion) so deeply nested values can't overflow.
  toJson: serde externally-tagged (`"m"`, `{"m": v}`, `{"m": [..]}`).
- Fix type-checker stack overflow on recursive enums passed through generics:
  walkTypeVars treats an enum as a ground leaf (payloads carry no type vars).
- Reject an empty `match` as non-exhaustive instead of letting it crash at
  runtime.
- Language-agnostic regression fixtures: recursive-generic, empty-match,
  str/json rendering, and a 50k-deep render overflow guard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr
mshell has no statement terminator, so after a nullary member a `(...)` on the
next line (a quotation, a filter predicate, etc.) was greedily parsed as that
member's payload list — making the parenthesized expression vanish. Require a
payload `(` to be attached to the member name (`failed(int str)`); a detached
paren belongs to the following code and the member is nullary.

Regression fixture: tests/success/enum_then_quote.msh.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr
Replace the parenthesized payload syntax (which required whitespace-significant
adjacency to avoid swallowing the next statement) with a block terminated by
`end`, like def/if/match/loop:

    enum CmdResult = ok str | failed int str | timeout end

Members are `|`-separated; each carries zero or more space-separated payload
types; `end` bounds the member list so it is unambiguous against following code
without relying on whitespace. Parser-only change plus docs, fixtures, and Go
tests updated to the new surface.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr
The design doc advertised ML-style member lists (one per line, each prefixed
with `|`), but the parser rejected a leading `|` after `=`. Accept an optional
leading `|` so `enum E =\n | a\n | b\nend` parses; the regular
`a | b` form is unchanged. Regression fixture: enum_leading_pipe.msh.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr
Previously `_` was an ordinary LITERAL special-cased by string comparison, so it
could be used as an identifier — including as an enum member name, which the
checker treated as a coverable member while the runtime treated `_` as the
catch-all wildcard, type-checking a program that mis-dispatched at runtime.

Lex a lone `_` as a new UNDERSCORE token. It is the wildcard / ignore marker in
all pattern positions (match arm, list/dict element, `just _`, `<type> _`, enum
payload binding), is rejected wherever a name is expected (enum member, def
name, ...), and remains usable as a bare argv word (the literal string "_") in a
list. `..._`, `_foo`, and `_!` are unaffected.

Regression fixtures: typecheck_fail/enum_underscore_member.msh and
success/underscore_argv.msh.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr
Enum payload types were resolved in the pre-pass before `type` declarations
were registered, so a payload referencing a `type` alias (a shape, union, etc.)
failed with "unknown type" regardless of source order — only primitives and
other enums worked.

Reorder the type-check pre-pass: predeclare enum names, then register `type`
declarations, then resolve enum payload bodies and constructors. Now an enum
payload may reference any enum or `type` alias in either direction.

Regression fixture: tests/success/enum_payload_typealias.msh.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr
uniq is typed `([t] -- [t])` (generic) but its runtime only deduplicated a
fixed set of primitives and threw for anything else — so `[enum] uniq`,
`[dict] uniq`, `[bool] uniq` type-checked but failed at runtime, against the
type system's goal of catching such errors statically (or not having them).

Deduplicate any value without a fast hash path by structural equality
(`Equals`), keeping the primitive fast paths. Enums, dicts, and booleans now
dedupe correctly; values whose equality is undefined (lists) are kept rather
than erroring, consistent with `=` on them.

Regression fixture: tests/success/uniq_enum.msh.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr
Equality was undefined (always errored) for lists, quotations, pipes, and
grids, and several types errored on a type mismatch while others returned
false — so equality could depend on operand order, and `[1] [1] =` threw.

Define `Equals` for every type and make it total:
- list / pipe: structural, element-wise (recursive)
- grid / gridview / gridrow: structural, cell-wise (by materialized rows)
- quotation: identity
- str / path / literal: compare by text content (symmetric)
- a type mismatch yields false rather than erroring, so equality is
  order-independent and union members (e.g. int | null) compare cleanly.
  Genuinely incompatible comparisons remain a static type error, caught by
  the checker before runtime.

uniq now dedupes lists/grids too via this equality.

Regression fixture: tests/success/equality.msh.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr
A grid column uses typed columnar storage (int/float/string/datetime/generic).
GridColumn.Set only stored a value when its Go type matched the column type;
for any other type it was a silent no-op, so setting a string, enum, bool, or
list into a typed column type-checked (gridSetCell is `(Grid str int t -- Grid)`)
but left the cell unchanged.

Promote the column to generic storage on a type mismatch: materialize the
existing typed data, switch to COL_GENERIC, then store the value. Matching-type
sets keep the fast typed path, and other rows are preserved.

Regression fixture: tests/success/grid_set_cell_mixed.msh.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr
Adversarial testing of the new enums surfaced four issues, fixed here:

- toJson and Equals on a deeply nested enum overflowed the Go stack
  (unlike str, which already used an explicit work stack). Both now walk
  payloads iteratively, so arbitrarily deep values are safe. Equality also
  no longer re-enters itself for enum-vs-non-enum pairs.

- An enum used inside a `type` union (`type T = C | int`) type-checked
  when matched by the enum's type name, but had no runtime implementation,
  so it always failed with "No matching arm found". The runtime now treats
  a bare enum type name as a type-test arm matching any member of that
  enum, and falls through cleanly for other union members.

- `sort` replaced every element with its string form, silently dropping
  enum payloads and changing element types (a list of ints came back as
  lexically-sorted strings). It now sorts and preserves the original
  objects using a total structural order: numbers numerically, text
  lexically, lists positionally, dicts by sorted key/value, enums by
  declaration order then payload, and different types by a fixed type rank.
  compareValues is iterative, so sorting deeply nested values is stack-safe
  too. sortV keeps its string-key behavior but now preserves elements.

Docs, CHANGELOG, and regression tests (deep json/equals/sort, union match,
structural sort) updated. All suites green: tests 213, typecheck 196, go test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
The runtime enum payload-binding loop bound by token lexeme and silently
skipped non-token items, so it accepted match arms the type checker rejects:
an operator-token name like `ok x` (x lexes as INTERPRET) was bound, and a
malformed binding like `items [a b]` was ignored while the arm still matched.

Now each payload binding must be a plain name (LITERAL) or the `_` wildcard,
failing with a clear message otherwise. This mirrors the checker
(enumMemberPattern) and the `just`/type-test binding forms, whose runtime and
checker already agree, so all three binding forms are now consistent across
type-check and run.

Adds tests/fail/enum_bad_payload_binding.msh. All suites green: tests 214,
typecheck 197, go test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
Naming an enum via `type Color2 = C` wraps it in a TKBrand, so the checker's
match logic (which tests Kind == TKEnum) no longer recognized the members: a
match on the aliased type was rejected as "unrecognized pattern", even though
brands are runtime-erased and the value matched correctly at run time.

This was inconsistent with a branded union: `type T = int | str` stays a
TKUnion node, so its arms remain matchable through the brand. Enums took the
opaque-wrapper path instead.

Now enumMemberPattern and CheckMatchExhaustive unwrap a TKBrand to its
underlying enum before dispatching, so a branded enum matches (and enforces
exhaustiveness over) its members exactly like a branded union matches its arms.
The brand stays nominal at value boundaries (an explicit `as` is still needed
to pass an enum where the alias is expected). Checker-only change; the runtime
already matched branded enums correctly.

Tests: tests/success/enum_branded_match.msh and
tests/typecheck_fail/enum_branded_nonexhaustive.msh. All suites green:
tests 215, typecheck 199, go test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
# Conflicts:
#	CHANGELOG.md
#	mshell/Type.go
#	tests/success/sort_test.msh
A list's ToString renders elements via DebugString, and the enum's
DebugString was the only value type to inject an `EnumName.` prefix. So an
enum inside a list printed as `[C.red C.green]`, inconsistent with its
standalone form (`red`), `map`-ed form (`red`), and dict/JSON form (`"red"`).

DebugString now returns the same member form as ToString, so an enum renders
identically in every context. The member name is globally unique, so the type
prefix added no disambiguation (unlike the quotes a string's DebugString adds).

Test: tests/success/enum_render_contexts.msh. Suites green: tests 217,
typecheck 202, go test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
Maybe.Equals asserted other.(Maybe) (the value form), but the runtime
constructs Maybe values as *Maybe pointers everywhere (just/none and every
lookup/parse builtin push &Maybe{...}). So the assertion missed every operand
and the method bailed to false — making *all* Maybe-vs-Maybe comparisons
false, including `none none =`, `5 just 5 just =`, and `Maybe[enum]` equality.
By extension `uniq` could not dedupe Maybes, and any list/dict/enum containing
a Maybe compared unequal to an identical value.

Now it unwraps other via the existing asMaybe helper (value or pointer),
matching how the match code and compareValues already handle both forms. The
receiver side already worked via Go's value-method promotion on *Maybe.

The equality.msh test had baked the wrong answers into its expected output
(None==None and Just==Just recorded as false); corrected and expanded with
real Just/None and Maybe[enum] assertions.

Suites green: tests 217, typecheck 203, go test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
… forms

The branded-enum fix unwrapped the TKBrand in two per-site spots
(enumMemberPattern and CheckMatchExhaustive), so a branded enum matched its
members — but the `just`/`none` and `<typekeyword> name` binding paths in
armPatternOf still checked the raw subject kind. So matching a branded Maybe
(e.g. `type MC = Maybe[C]`, a named optional enum) by `just v`/`none` was
rejected by the checker (`@v` unbound / non-exhaustive) even though the runtime
matched it fine — brands are runtime-erased.

Unwrap the brand once where the match subject is established (checkMatchBlock),
so every arm form — enum member, `just v`, `<type> name`, list/dict — and the
exhaustiveness check see the underlying type uniformly. The two per-site unwraps
are removed (redundant now); enumMemberPattern and CheckMatchExhaustive get the
already-unwrapped subject. Brands stay nominal at value boundaries (an `as` is
still needed to pass an enum where the alias is expected).

Tests: tests/success/branded_maybe_match.msh and
tests/typecheck_fail/branded_maybe_nonexhaustive.msh. Suites green: tests 218,
typecheck 205, go test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
A value built by reusing one subtree twice per level (`@t @t node` in a
loop) is a DAG: n heap nodes but 2^n paths when walked as a tree. Equality
and ordering walked it structurally with no notion of sharing, so `=`,
`uniq`, and `sort` on such values went exponential — depth 24 took 0.7s,
each +1 level doubled it, and depth 40 (41 actual nodes) would run for
hours. Measured, not theoretical.

Two layers of defense, both zero-cost for ordinary values:

- sameRef fast path: a pointer-identical pair is equal by definition, so
  the enum Equals pair loop, compareValues, and itemsEqual skip it instead
  of expanding. This alone collapses every same-reference case (self
  compare, dup, a shared subtree meeting itself) from 2^n to n. Typed
  per-kind pointer comparisons, so it can never hit Go's non-comparable
  interface panic (e.g. MShellBinary).

- dagGuard threshold memo: two *independently built* DAGs share no pointers
  across operands, so the fast path never fires. The walks count pops; past
  2^19 steps they memoize already-expanded enum/list/dict pointer pairs and
  skip repeats, making the comparison polynomial in actual nodes. Sound in
  a LIFO walk: a duplicate only pops after the first occurrence's expansion
  fully resolved, and any mismatch returns immediately. The memo is capped
  (2^18 entries) so a huge linear value cannot balloon memory; below the
  threshold the guard is one integer increment and never allocates.

Depth-40 self-compare: was ~13 hours extrapolated, now 0.035s. The
depth-64 regression test (tests/success/enum_dag_equality.msh) covers both
modes plus uniq/sort and an unequal tip. Deep linear values (50k suite
tests, 4M manual) are unaffected. Suites green: tests 219, typecheck 206,
go test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
The spec'd invariant — enum members share the word namespace, collisions are
declaration errors — was only enforced in one direction. defineEnum (pre-pass
1b) checks nameBuiltins, which at that point holds Go builtins and stdlib
sigs, so a member colliding with those is caught. But user def signatures
register in pre-pass 2, after the enums, with no reverse check — so a member
colliding with a same-file def was silently accepted in either textual order.

That was a real soundness hole: the checker resolved the shared word to the
enum constructor (e.g. when the context demanded the enum type) while the
runtime resolves definitions before enum members and ran the def — so a
cleanly type-checked program failed at runtime ("Unknown match pattern
literal") or diverged in stack shape for payload constructors.

defineEnum now records registered member names (enumMemberToks), and def
registration rejects a def whose name is a member, mirroring the existing
member-vs-def error. Def-vs-def duplication is untouched (a separate,
deferred decision).

Test: tests/typecheck_fail/enum_member_def_collision.msh. Suites green:
tests 219, typecheck 207, go test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
`enum` declarations only functioned in the main script: RegisterEnums had a
single call site on the script path, so an enum declared in the stdlib, the
user init file, or at the interactive prompt parsed and evaluated as a no-op
and its member words fell through to the bare-literal path ("Found literal
token"). `type` aliases in startup files were similarly invisible to the
checker, whose pre-passes walked only the main file's items.

Type-checking and running are separate today but should semantically match,
so both sides now read startup declarations:

- Runtime: loadStartupFile registers each startup file's enums on the
  EvalState (covering script and interactive sessions), and the REPL line
  executor registers enums declared at the prompt, next to its existing
  def handling.

- Checker: loadStartupFile retains startup top-level items;
  TypeCheckProgram passes them to the new Checker.RegisterStartupTypes,
  which runs the same three-phase pre-pass order as CheckProgram (enum
  names, type aliases, enum bodies + constructor words). Declaration
  bodies are not type-checked, matching the stdlib-def treatment. The LSP
  diagnostics pass registers the stdlib's items the same way.

Collision checks now span files in both directions: a startup enum member
rejects a colliding program def, and vice versa.

Tests: startup enum/type visible to checker + cross-file collision
(TypeEnum_test.go), startup-file enum registers constructors and constructs
at runtime (Startup_test.go). Suites green: tests 219, typecheck 207,
go test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
…tainers

Deep values with ALTERNATING container kinds crashed with a fatal Go stack
overflow: `enum E = m Maybe[E] | z end` built ~4M deep (a linked list with
optional next — a natural shape) killed `str`, `toJson`, and `=`. The earlier
per-type stack-safety fixes only covered pure enum nesting — each iterative
walker delegated non-enum payloads to that child's own recursive method
(Maybe.ToString, Maybe.Equals, list/dict ToJson, ...), so every
enum→Maybe→enum cycle added Go frames and alternation restored O(depth)
recursion. compareValues was already immune because it expands every kind
inline; this applies the same design to the other two walks:

- renderValue(obj, flavor): one work-stack renderer for ToString /
  DebugString / ToJson, expanding enum, Maybe, list, dict, and pipe inline
  with each kind's exact existing format (flavor switches per child the way
  the old methods did: list children render as DebugString, a dict's `str`
  form is its JSON, enum payloads render as ToString). Scalars, grids, and
  quotations stay leaves via their own methods.

- equalsIter(a, b): one pair-stack equality walk expanding the same kinds
  inline, keeping the sameRef fast path and the dagGuard shared-substructure
  memo that previously lived only on the enum walk (so lists/dicts/Maybes
  now get DAG protection too).

All the per-type methods (enum, Maybe, list, dict, pipe) are one-line
routings into the shared walkers; enumRender, the enum ToJson walker, the
old recursive bodies, itemsEqual, and DebugStrs are deleted. Net -30 lines.

Output is byte-identical across the suites, with two deliberate changes:
multi-key dict DebugString now emits sorted keys (it iterated Go map order —
nondeterministic — before), and dict equality no longer short-circuits on a
TypeName mismatch, so a str and a literal with equal text compare equal
inside dicts exactly as they do at top level.

The 4M alternating chain now renders (18M chars), serializes (14M), and
compares cleanly; enum↔dict alternation at 500k likewise. Regression test:
tests/success/enum_alternating_deep.msh (50k, mirroring the deep-test
family). Suites green: tests 220, typecheck 208, go test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
A list appended into itself (in-place `append`) is a genuinely cyclic value,
and an enum payload list can close a cycle through the enum
(`enum Box = wrap [Box] | z end`). Rendering one never terminated: the
work-stack renderer re-expanded the same pointer forever, growing output and
task stack without bound (the old recursive renderers at least died fast with
a stack overflow). Equality and sorting already terminate via the
pointer-identity fast path and pair memoization.

mshell is strict — a cycle is always the degenerate artifact of appending a
container into itself, never a meaningful value — so user-facing conversion
now errors: `str` and `toJson` on a cyclic value fail with "Cannot convert a
cyclic value (a container that contains itself) to a string/JSON".

renderValueDetect tracks the containers currently being expanded as an
on-path set (pointer kinds only; a DAG merely revisits a *finished* pointer
and still renders fine), unwinding via exit sentinels. Reaching an on-path
pointer emits a `<cycle>` marker and reports cycled=true. Internal rendering
(DebugString in error messages, stack dumps) stays total via the marker —
those paths cannot propagate errors and must never hang. The on-path lookup
is gated on the pointer-kind check, since hashing an interface holding an
unhashable dynamic type (MShellBinary, a []byte) panics even on a map read.

Test: tests/fail/cyclic_render.msh (cycle equality terminates, then `str`
errors). Suites green: tests 221, typecheck 208, go test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
A `match` as the body of an inferred quotation always failed the checker
with "stack underflow at 'match' (match subject)" even though it runs
fine — rejecting the canonical way to consume enums:

    [ 1 leaf 2 leaf ] (match leaf n : @n, node a b : 0, end) map

checkMatchBlock errored on an empty stack unconditionally; other constructs
(operators, `if`) participate in quote-body inference, where applySig
responds to underflow by synthesizing fresh input vars. The failure predates
enums (Maybe matches in map had it too), but match is the enum eliminator,
so `(match ...) map/filter/each` never type-checking bit constantly.

Two pieces:

- Under inference, an empty stack at `match` synthesizes the subject exactly
  as applySig's underflow path does (bottom of stack, front of inferInputs),
  so the quotation infers a one-input signature.

- A subject that is still an unresolved var is pinned from the first arm
  pattern that names a type — an enum member determines its enum (member
  names are global), `just`/`none` determine Maybe[fresh]. Pinning happens
  before the entry branch is captured, so every arm's analysis, payload
  bindings, and the exhaustiveness check see the resolved subject (per-arm
  substitution checkpoints would roll back a per-site pin). Value literals
  and type keywords deliberately do not pin: a type-keyword match may be
  discriminating a union, which pinning would wrongly narrow — those
  matches still check via their wildcard arm.

Exhaustiveness now works inside quotations too: a match that omits a member
is rejected, and a pinned enum quotation applied to a list of a different
element type fails overload resolution as it should.

Tests: tests/success/enum_match_in_quote.msh (enum map/filter/each, Maybe,
value literals) and tests/typecheck_fail/enum_match_in_quote_nonexhaustive.msh.
Suites green: tests 222, typecheck 210, go test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
…emo cap

Comparing two independently built self-doubling DAGs deeper than dagMemoCap
(2^18 levels) hung: the cap's "stop inserting" overflow policy left every
level beyond 262144 un-memoized, and each un-memoized level doubles the
walk. Measured: depth 200k compared in 1s, depth 300k ran effectively
forever (2^38000 work).

A first attempt — clearing the memo generationally on overflow — did NOT
fix it (also measured): the pending duplicate for each upper level pops
only after the entire subtree between, so the working set spans all levels
and defeats any bounded memo regardless of eviction policy.

The structural fix: deduplicate pointer-identical pairs when a container
pushes its children. A self-doubling value (`@t @t node`, `[ @x @x ]`)
expands to the SAME pair twice; pushing it once makes the whole family
linear at any depth with no memo involvement — equality at depth 300k/600k/
4M now runs in 1.1s/2.1s/14.4s (linear), and sort past the cap likewise.
Applied in equalsIter (pushPairsDedup for enum payloads, list and pipe
elements) and in compareValues' enum and list arms (skipping is sound in
both walks: an identical pointer pair contributes equal/0). The generational
clear is kept — the memo still covers cross-parent duplicate pairs
(diamond-shaped sharing) up to the cap — and its comment now states
honestly what it does and does not defend against.

Extends tests/success/enum_dag_equality.msh with a 300k-deep (past-cap)
independent-DAG comparison. Suites green: tests 222, typecheck 210, go test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
The push-time pair dedup that closed the exponential cliff for self-doubling
values covered enum payloads, lists, and pipes — but not the two dict arms.
A dict-shaped doubling value ({ "l": @d, "r": @d } per level, or the same
structure through an enum's {str: E} payload) pushed the identical pointer
pair once per key and re-opened the cliff: equality at depth 200k ran in
1.4s, depth 300k (past the memo cap) hung. Measured, same boundary as the
list/enum case.

equalsIter's dict arm now skips a pair pointer-identical to the last one it
pushed, and compareValues' dict arm skips the identical *value* pair while
always keeping the key comparison. Dict-DAG equality at 300k/600k now runs
1.5s/3.3s (linear), sort works, and a deep unequal pair is still detected.

Extends tests/success/enum_dag_equality.msh with a 300k dict-payload DAG.
Suites green: tests 222, typecheck 210, go test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
…bally

Four successive fixes (memo cap, generational eviction, push-dedup for
enum/list/pipe, then dict arms) each closed one self-doubling pattern and
left the next container or sharing shape exponential past the cap — a
non-consecutive pattern like [x y x] per level still hung at depth 300k.
The chain of same-shaped patches was defending the wrong invariant: the
memo cap itself.

The memo is now unbounded. Every revisited pointer pair memo-hits, so ANY
sharing pattern — consecutive, alternating, cross-parent, any container
mix, any depth — is polynomial in actual heap nodes. There is no boundary
left to probe.

The memory trade is proportionate, not abstract: the memo only activates
past the step threshold (2^19), so ordinary comparisons never allocate,
and a comparison big enough to grow a large memo already holds operands
larger than the memo. Measured: the pathological 4M-deep linear compare
(every pair distinct — the case the cap was protecting) runs 21.9s at
2.6GB peak, of which the two operands are over half; the alternating
[x y x] DAG at 300k, which defeated every previous patch, compares in
3.0s. Push-time dedup stays as a constant-factor fast path (it also
avoids the pre-threshold spin for shallow doubling), but is no longer
load-bearing for termination.

Extends tests/success/enum_dag_equality.msh with the non-consecutive
alternating pattern at 300k. Suites green: tests 222, typecheck 210,
go test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ
The comparison/render walkers carried three hand-maintained "which types
are pointers" switches (sameRef, refPairKey, cycleTrackable) that had
already drifted apart. Replace all three with one isRefKind predicate
that checks the dynamic kind via reflection: "is a pointer" is exactly
the property every site cares about (heap identity ⇒ can share
substructure, can cycle, safe to compare and use as a map key), and a
newly added pointer kind is covered with no list to keep in sync.

Also delete the push-time neighbor-dedup machinery (pushPairsDedup plus
three hand-rolled copies in the dict/list/enum arms). It was written to
compensate for the bounded dagGuard memo — its own comment still cited
the memo's eviction — but the memo is unbounded now, which subsumes the
whole trick: below the step threshold duplicate expansion is capped by
the threshold itself, past it every repeated pair memo-hits. Timing on
the DAG stress tests is unchanged.

Net -71 lines, no behavior change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Definition lookup is first-match-wins over [stdlib, init, script], so a
second def of a name never took effect — it was silently dead code while
the first definition kept running. A script "redefining" a stdlib word,
or an interactive redefinition, quietly did nothing; and since the type
checker registered the duplicate as an overload, a call could type-check
against the dead def's signature while the runtime executed the other
body. (Found via tests/success/enum_recursive_generic.msh, whose local
`def id` was shadowed by std.msh's id all along; renamed to `ident`.)

Reject duplicates everywhere instead:

- Runtime: FindDuplicateDefinition checks startup loading, script
  startup+file assembly, and each interactive input line (which is
  rejected with the session continuing). The error reports both
  positions.
- Checker: def registration records every definition name (mirroring
  the enum-member collision check) and rejects a repeat in both
  RegisterStdlibSigs and CheckProgram, so --type-check-only and the
  LSP report it too. A stdlib def records its name even when its sig
  defers to a table builtin (the 2unpack case) — runtime lookup still
  resolves to it. Def-shadowing-builtins stays legal; std.msh does
  that on purpose.
- Lexer: makeToken now stamps Token.TokenFile, which was wired through
  the lexer but never assigned. Cross-file collisions can then name the
  other file: "already defined at lib/std.msh:62:5". No existing test
  fixture is affected; stdin input still formats as bare line:col.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add the two user-facing branch changes missing from Unreleased: the
cyclic-value render error (str/toJson error instead of hanging) and
match type-checking inside inferred quotations ((match ...) map).

Also correct compareValues' doc comment: compare-0 coincides with
Equals only for orderable kinds — quotations and grids share a rank
and compare 0 while Equals still distinguishes them.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
mitchpaulus and others added 3 commits July 3, 2026 10:08
- doc/base.html: style mshellENUM (new this branch) and mshellTYPE
  (pre-existing gap) with the other declaration keywords, so
  --html-highlighted enum/type snippets render like def/match/end.
- Sublime: add enum, type, and match to the keyword list, and 0o/0b
  integer literal patterns alongside the existing hex one.

The VS Code grammar highlights no keywords at all (pre-existing
design), so it needs no enum entry to stay consistent.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The grammar only highlighted comments, booleans, strings, and
variables — no keyword rules at all, so def/end/match and the new
enum/type rendered as plain text. Add the same rule set as the
Sublime grammar: control keywords (including enum and type and the
else*/*if forms), and/or/not, soe, the int/float/bool type names,
and numeric literals including the new 0x/0o/0b integer forms.

Rules are placed after the variable patterns so a same-position tie
like `str!` keeps resolving to the variable-store rule.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The doc predated several final decisions and still read as a proposal.
Update it to be the accurate decision record:

- Status header: implemented (V1), with a map of what shipped vs. what
  stayed future work.
- Fix examples that no longer parse: `read`/`write` are lexer keywords
  and cannot be member names; the §9/§10.A examples were missing the
  mandatory `end` terminator the doc itself decided on in §6.
- §8: the shipped name-resolution rule replaced the draft's
  CmdResult.ok qualification — member names are globally unique and
  member/def collisions are declaration errors in both directions.
- §9: record the shipped serialization (externally-tagged toJson,
  member(payload) str form), structural equality, and declaration-order
  sorting.
- §10: per-tier status (A shipped; shape payload types shipped, their
  destructuring sugar not; B/D/E/F future).
- §11: rewrite the end-to-end example against the shipped feature — an
  explicit parseMode boundary word instead of the unimplemented
  Mode.decode/Mode.values string backing. Example verified to run and
  type-check.
- §12: open questions annotated with how each resolved.

Design-dir edit explicitly requested (doc was AI-authored).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant