From 0ca5b0ba4d325d41ae53be08f1cc6a3589f215bc Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Sun, 28 Jun 2026 21:30:23 -0500 Subject: [PATCH 01/32] Initial enum commit --- ai/enum_implementation_plan.md | 131 ++++++++ design/literal_or_enum_typing.html | 521 +++++++++++++++++++++++++++++ mshell/Evaluator.go | 41 +++ mshell/Lexer.go | 11 + mshell/MShellObject.go | 54 +++ mshell/Main.go | 4 + mshell/Parser.go | 6 + mshell/Type.go | 57 ++++ mshell/TypeCheckProgram.go | 8 + mshell/TypeChecker.go | 5 + mshell/TypeEnum.go | 67 ++++ mshell/TypeError.go | 2 + mshell/TypeParseIntegration.go | 68 ++++ mshell/TypeUnify.go | 12 + 14 files changed, 987 insertions(+) create mode 100644 ai/enum_implementation_plan.md create mode 100644 design/literal_or_enum_typing.html create mode 100644 mshell/TypeEnum.go diff --git a/ai/enum_implementation_plan.md b/ai/enum_implementation_plan.md new file mode 100644 index 00000000..5dd3c5ea --- /dev/null +++ b/ai/enum_implementation_plan.md @@ -0,0 +1,131 @@ +# Enum (generative tagged sum type) — implementation plan + +Companion to `design/literal_or_enum_typing.html` (the design + rationale). This is the +file-by-file build plan. Plans live here in `ai/`; the design lives in `design/`. + +## Scope & non-goals + +In scope (a generative tagged sum type declared with `enum`, inline `= a | b | c`): + +- `enum Name = c1 | c2 | ...` and `enum Name = c1 t.. | c2 t.. | ...`. +- Constructors are case-free; produced only by a constructor word or `decode` (**Position 1** — + no implicit coercion, `str as Enum` rejected). +- `match` over members with exhaustiveness; payload binding reuses the `just v` path. + +Explicit non-goals (per the design + owner direction): + +- **No `decode` / `encode` / `values` derived functions** in v1. Reading config back is handled at + the use site with `match` (or whatever fits). This removes the wire/serialization surface. +- **No backing strings** (`member = "wire"`) in v1 — they only existed to feed `decode`/`encode`. + The member's own name is its identity. (Easy to add later when serialization is wanted.) +- **No qualified `Enum.member` / `Enum.method` dispatch** in v1. Members are referenced by bare name, + resolved by context; member names are unique across enums (collision is a declaration error). This + removes the `.`-lexing / qualified-dispatch unknown entirely. +- **No `Result` type.** `Maybe` already covers the common case. +- **No change to JSON typing.** `JsonScalar` / `Json` stay *structural unions* — their variants are + distinguishable by structural type, so they do not need tags. Enums are only for cases structure + cannot discriminate (e.g. two variants with the same payload type) and for closed config sets. +- **No generic enums** (`Enum[t, e]`) in v1. +- **No `?`-propagation sugar** in v1. +- A *checked* `"GET" as Method` stays deferred (it needs literal/singleton types). + +## Phasing + +Two PRs. Phase 1 (nullary enums) is now small — declaration, construction, match — with **no +serialization surface and no qualified names**. Phase 2 adds payload variants and the tagged runtime +value where `Evaluator.go` gets touched substantially. + +--- + +## Phase 1 — Nullary enums + +`enum Mode = read | write | readwrite`, match-by-member + exhaustiveness, Position 1. The full v1 +surface is: declare, construct (bare member word), and `match`. No `decode`/`encode`/`values`, no +backing strings, no qualified names. + +The runtime value just needs to carry **which member it is** (enum `NameId` + member `NameId`). A +lightweight value suffices; no new heavy `MShellObject` is required for Phase 1. + +### Type system + +1. **`Lexer.go`** — add an `ENUM` keyword token (via `literalOrKeywordType`). Audit existing user + identifiers/usages of `enum`. +2. **`TypeParseIntegration.go`** — add `MShellEnumDecl` (parallel to `MShellTypeDecl`) and + `ParseEnumDecl`: `enum` Name `=` member (`|` member)*, where a member is a bare `LITERAL`. No + backing clause. +3. **`Parser.go`** — add `case ENUM:` beside `case TYPE:` (≈ line 677) dispatching to `ParseEnumDecl`. +4. **`Type.go`** — add `TKEnum` kind + a variants side table (`[]EnumVariant{Name NameId; Payload + []TypeId}`; Payload empty in Phase 1), `MakeEnum`, hashconsing key, and accessors. Nominal + identity = the declaration `NameId` (two `enum`s with identical members stay distinct, like brands). +5. **`Type.go` / `TypeUnify.go`** — extend `walkTypeVars` and `typeRewriter.mapType` with a `TKEnum` + arm (recurse payload types; none in Phase 1). `unify` (`TypeChecker.go`): `TKEnum` unifies only + with the same enum (by name). +6. **Constructors as words** — register each member as a nullary sig `( -- Mode)`. Members live in a + **global constructor namespace**; a member name duplicated across two enums is a declaration error + in v1 (no qualification to disambiguate yet). A bare member word resolves to its enum; where an + expected enum type is in context (match subject, sig slot) that pins it. +7. **Pre-pass registration** — mirror `DeclareType` registration (`TypeCheckProgram.go:99-101`): + collect `enum` headers with placeholder TypeIds, resolve bodies, register constructor words, + detect cross-enum member-name collisions. +8. **Match** — `analyzeTokenPattern` (`TypeCheckProgram.go:1381`): an enum member name is a + recognized pattern that **credits coverage** against the enum's closed set (flip the + "value literals credit no coverage" behavior at `:1402` for enum subjects). `TypeBranch.go`: + exhaustiveness over the member set; narrowing (subject known to be that member in the arm). + +### Runtime (`Evaluator.go`) + +9. A lightweight enum value (enum + member ids). Constructor evaluation pushes it; member-pattern + matching extends the `matchTokenPattern` path that already handles `none`/type keywords near + `:1117`; plus equality, `DebugString`, `ToJson`. + +### Docs / housekeeping + +10. `doc/type_system.inc.html` + `doc/mshell.md` (rebuild with `cd doc; msh build.msh`). +11. `CHANGELOG.md` → Unreleased / Added. +12. `lib/std.msh` completions, in the documented Vim-fold pattern. +13. Tests: `tests/` (+ `typecheck_test.sh`) and `mshell/ go test`. Cover: decl parse, construct, + match exhaustive (no `_`), non-exhaustive rejected, member narrowing, `str as Enum` rejected, + two enums with same members stay distinct, duplicate member name across enums rejected. + +--- + +## Phase 2 — Payload-carrying variants + +`enum CmdResult = ok str | failed int str | timeout`. Adds: + +1. **Parser** — arms parse a constructor name followed by payload type exprs (reuse + `parseTypeExpr` productions for each payload). +2. **`Type.go`** — `EnumVariant.Payload` populated; payload types flow through hashconsing and the + rewriter arms added in Phase 1. +3. **Constructors with payloads** — `failed : (int str -- CmdResult)`, postfix, consume from the + stack like `5 just`. +4. **Runtime value** — a new `MShellObject` generalizing `Maybe`: `{ enum NameId; tag; payload + []MShellObject }`. `Maybe` is the proven two-variant precedent; follow its equality/`DebugString`/ + `ToJson` shape. Phase-1 nullary values fold in as the empty-payload case. +5. **Match payload binding** — extend the `just v`-style binding (`TypeCheckProgram.go:1348`, + `Evaluator.go:1055`) to N payloads: `failed c e : ...` binds `c`, `e`. +6. **Recursive enums** — already work via the placeholder-TypeId pre-pass. +7. Docs / changelog / completions / tests as above (payload construct + destructure + recursive + enum + exhaustiveness with payloads). + +(Serialization helpers — `decode`/`encode`/backing strings — remain out of scope until a concrete +need appears; config reads are handled with `match` at the use site.) + +--- + +## Process + +- New feature branch before any code (per `CLAUDE.md`). +- Build in `mshell/` (`go build -o ...`, in-repo cache if needed) before testing. +- `gofmt` only with explicit permission. +- `CHANGELOG.md` for user-facing additions; `mshell/BuiltInList.go` kept in sync if builtins added. + +## Decisions still to nail before coding Phase 1 + +None blocking. The former unknowns (qualified-name dispatch, backing defaults, decode/encode +delivery) are all dropped from v1 scope above. Remaining small calls can be made during the build: + +- Exact lexical home for the lightweight runtime enum value (new `MShellObject` vs. reuse). +- Whether a bare member word with **no** expected-type context (e.g. stored straight into a var) is + allowed (resolves via the global member namespace) or requires a context — default: allowed, since + member names are unique across enums in v1. diff --git a/design/literal_or_enum_typing.html b/design/literal_or_enum_typing.html new file mode 100644 index 00000000..7e6d1150 --- /dev/null +++ b/design/literal_or_enum_typing.html @@ -0,0 +1,521 @@ + + + + + + Enums & Generative Types — mshell design + + + +
+ +
+

Enums & Generative Types

+

Status: design exploration / not implemented. Records the decision to + add a single generative tagged sum type (declared with enum), the + reasoning that got there (the structural-vs-generative distinction and the Haskell / Rust / TypeScript + prior art), the chosen surface syntax, and a sketch of representation, ergonomic sugar, and open + questions. For review.

+
+ +

The motivating request was type ConfigOption = "string1" | "string2". Working + through it showed the real missing primitive is not "literal types" but a generative + declaration — one that mints constructors and tags values at runtime — of which a + plain enumeration is just the simplest case. The proposal is one new keyword, enum, whose + grammar is a one-token delta from the existing type X = A | B.

+ +

1. The question, answered in one line

+

"At what point does an enum differ from a regular type definition with constructors?"

+

It differs exactly when the declaration introduces constructors — new ways to + make values that the runtime must tag — rather than merely naming or + combining types that already have values. Below that point ("an enum" of bare constants) the difference + is cosmetic; at or above it (variants carrying payloads) it is a genuinely new capability that today's + type cannot express. So there is one mechanism, and the colloquial "enum" is its degenerate + case.

+ +

2. Where mshell stands today

+

Most of the machinery already exists, which is why this is an extension and not a bolt-on:

+ + + + + + + + + + + + + +
CapabilityStatusWhere
Hashconsed type arena (TypeId = structural identity)haveType.go
Structural unions A | B (flatten / sort / dedupe)haveTKUnion, MakeUnion
Nominal brands / newtypes (type X = ...)haveTKBrand, brandify (TypeCast.go)
HM unification, type vars, occurs check; subtyping folded into unifyhaveTypeUnify.go, TypeChecker.go
Surface type Name = <expr>, | unions, as castshaveTypeExpr.go, TypeParseIntegration.go
One generative tagged sum type: Maybe[t] = just t | nonehaveMShellObject.go:172; just/none
Match with constructor destructuring + exhaustivenesshaveTypeBranch.go; match
A declaration that introduces new constructorsmissing— (this proposal)
+ +

Two structural facts frame everything:

+ + +

3. The conceptual spine: structural vs. generative

+ +

Two genuinely different kinds of declaration:

+ + + + + + + + + + +
type / unions / brandsgenerative sum type (enum)
Introduces new constructors?noyes
Discriminates by…structural typestored tag
Runtime footprint of the wrappernone (re-tag only)a stored tag
Two cases with the same payload type?impossiblefine
How you build a value<structural value> as Xcall a constructor
+ +

The litmus test

+

Try to express two cases that share a representation:

+
ok str | err str        # two cases, SAME payload type
+

A structural union str | str collapses to str; even branded, both cases are + str at runtime, so match — which discriminates by structural type — + cannot tell an ok from an err. The moment two variants share a representation + (or you want ok 5 and err 5 to be different values), you need a stored + tag, i.e. a real sum type. That is the precise line where "enum" stops being expressible as a + type.

+ +

4. Prior art: Haskell, Rust, TypeScript

+ + + + + + + + +
Tag / discriminant"enum" is…Structural unions?type keyword
Haskellimplicit (the constructor is the tag)an all-nullary data; enum ops via the derivable Enum classno — sum types are the only uniontransparent alias (structural)
Rustimplicit (compiler-managed)the C-like case of the one enum keywordno — must declare an enumtransparent alias (structural)
TypeScriptexplicit, hand-written data fielda separate weak runtime construct, mostly avoidedyes (untagged) — the default idiomstructural; unions + literals do the work
+ + + +

Punchline: Haskell and Rust both concluded the tagged sum type is the primitive and + the enum is its degenerate nullary case — one generative form, with type reserved for + transparent structural aliases. TypeScript went structural-first and bolted enum on, which + is exactly the schism to avoid. mshell already has the structural side (unions; brands ≈ transparent + aliases) and one generative sum type (Maybe), so it sits closer to Haskell/Rust. + The natural, non-bolted-on move is theirs.

+ +

5. Decision

+

Add a single generative tagged sum type declaration. Keep type exactly as + it is (the transparent / branded structural form). The colloquial "enum of constants" is the all-nullary + special case of the one mechanism — not a second concept. This subsumes Maybe (which + becomes "the built-in enum Maybe[t] = just t | none") and unlocks "make illegal states + unrepresentable" for command results, parse results, and JSON.

+ +

6. Syntax

+ +

Keyword

+

The keyword — not capitalization — signals "everything here is a constructor," which is what + lets the design be entirely case-free (see §8). Candidates:

+ + + + + + + + +
KeywordPrecedentNote
enum (chosen)Rust, Swiftmost recognizable; Rust/Swift legitimized it for payload-carrying variants
variantOCaml / Reasonaccurate, no baggage; less universally known
oneofProtobufself-describing; slightly informal
unionrejected: | already means a structural union
+ +

Declaration form — inline, mirroring type

+

The chosen surface is a one-token delta from the existing type Name = A | B: swap + type for enum, and each arm is a constructor name followed by its + payload type expression (zero or more), |-separated.

+ +
enum Mode   = read | write | readwrite           # nullary: a plain enumeration
+enum Shape  = circle float | rect float float | point
+enum CmdResult = ok str | failed int str | timeout
+ +

Generic enums use the same [..] parameter syntax as Maybe[t], and thereby + generalize the built-ins:

+
enum Result[t, e] = ok t | err e
+# Maybe[t] is the built-in `enum Maybe[t] = just t | none`
+ +

Grammar, stated against the existing type-expression parser + (parseTypeExpr / TypeUnionExpr):

+ +

The only new wrinkle vs. a structural union is that an arm's first token is a binding + occurrence (a constructor being declared) rather than a reference to an existing type. The + enum keyword is what tells the parser to read it that way — no capitalization rule.

+ +

A long block is still fine

+

The inline form wraps naturally; arms may be placed one per line, with an optional leading + | for alignment (a small additive allowance over the current type grammar):

+
enum Event =
+    | click int int
+    | key int
+    | close
+ +

7. Construction and matching (use sites)

+

Both reuse the existing Maybe machinery verbatim — the whole point of "Maybe + is just the built-in case."

+ +

Construction is postfix, exactly like 5 just (just : (a -- Maybe[a])). + A constructor is a word whose stack effect is "consume the payloads, push the enum":

+
"hello\n" ok            # ( str -- CmdResult )
+404 "not found" failed  # ( int str -- CmdResult )
+timeout                 # ( -- CmdResult ), like bare `none`
+ +

Match reuses the just v binding arm; payload names bind in the body. No + _ is needed when every constructor is covered — the set is closed, so the checker + proves exhaustiveness (the same path just/none use in + TypeBranch.go):

+
cmd run match
+    ok out     : @out wl,
+    failed c e : $"{@c}: {@e}" wl,
+    timeout    : "timed out" wl,
+end
+ +

The payoff: add a fourth constructor later and every match that forgot it becomes a + compile error.

+ +

8. Namespacing, not case

+

Capitalization is deliberately given no meaning anywhere. The one job case used to do — telling a + bare constructor apart from a bare variable / def — is handled by + namespacing plus checker context, the same way just/none and + branded unions already resolve:

+ + +

9. Representation & implementation sketch

+
    +
  1. Type kind. Add TKEnum (nominal, keyed by the declaration's + NameId) with a side table of variants: each variant is a (NameId, []TypeId) + payload list. Hashconsing, walkTypeVars, and typeRewriter.mapType gain one + arm each (recurse through payload types; nominal identity by name).
  2. +
  3. Constructors register as words with quote signatures + (failed : (int str -- CmdResult)), so application and overload machinery type-check them + with zero special-casing. Nullary constructors are ( -- Enum), like none.
  4. +
  5. Runtime value. A new MShellObject generalizing Maybe: a tag + (the variant NameId / index) plus a payload slice. Maybe is the pre-existing + two-variant instance, so the model is already proven; equality and DebugString follow its + shape.
  6. +
  7. Match. Constructor patterns credit coverage against the declaration's closed set + (flip today's "value literals credit no coverage", TypeCheckProgram.go:1402, for enum + subjects); exhaustiveness reuses the Maybe path; payload binding reuses the + just v path (TypeCheckProgram.go:1348).
  8. +
  9. Forward / recursive types work via the existing top-level pre-pass that reserves + placeholder TypeIds for type headers (extend it to enum): + enum Json = jnull | jbool bool | jnum float | jstr str | jarr [Json] | jobj {str: Json}.
  10. +
  11. Serialization. Payload variants have no canonical string form, so JSON/argv I/O needs + explicit encode/decode (serde-style). The all-nullary case is special-cased — see §10.B.
  12. +
+ +

10. Conciseness sugar for common cases

+

Ranked by value-for-effort. Tiers A–C are cheap and high-value; D–F are speculative until + usage demands them.

+ +

A. The nullary one-liner (already the base form)

+
enum Env   = dev | staging | prod
+enum Level = debug | info | warn | error
+

This is the everyday "configuration option" case, and it is already as terse as it can be.

+ +

B. Auto string backing + derived decode / encode / values

+

For an all-nullary enum, auto-back each constructor by its name and derive three functions, since config + crosses a string boundary. Override the backing with = when the wire format differs:

+
enum Method = get = "GET" | post = "POST" | put = "PUT"
+
+"GET" Method.decode    # ( str -- Maybe[Method] )   runtime-validated
+@m    Method.encode    # ( Method -- str )
+Method.values          # ( -- [Method] )            all members, in order
+

An all-nullary, all-backed enum can be represented at runtime as its backing string (free + serialization, no tag object) — the Rust "fieldless variants get a compact representation" move. + Payload-carrying enums fall back to the tagged value of §9.

+
$CONFIG_LEVEL Level.decode match
+    just lvl : @lvl configureLogging,
+    none     : $"bad level: {$CONFIG_LEVEL}; try {Level.values}" wl 1 exit,
+end
+ +

The backing string is wire-only — two paths to a value, no implicit coercion

+

Decided (V1): the only ways to produce an enum value are a constructor or + decode. The backing string is purely a serialization detail, never a way to name a + member in code — the same separation Rust/Serde draws (you write Method::Get in code; + "GET" is the wire format's business).

+ + + + + + +
PathSignatureFor
constructorget : ( -- Method)a compile-time-known member — the normal in-source form
decode( str -- Maybe[Method])a runtime string of unknown value, validated at the boundary
+

A bare str is not a Method, and + "GET" as Method is rejected: with no literal/singleton types, the checker + cannot verify a plain str is a valid backing, so an as here would be an + unchecked re-tag that could mint a Method matching no arm. This keeps the language's + existing "no implicit coercions" invariant intact and the structural→nominal boundary crisp: + constructors are the only way to make one. It also costs no source terseness — the concise form + was never "GET", it is the constructor get.

+

Deferred, additive: if literal/singleton types are ever added for other reasons, a + checked "GET" as Method (compile-time membership-verified, "GXT" as Method + an error) becomes a safe, optional nicety that can be layered on then with no migration. Implicit + bare-literal coercion is explicitly not a goal — it would be mshell's first + implicit coercion and would leak the wire format into program logic.

+ +

C. Shape payloads → free named destructuring

+

Let a variant's payload be a shape; this reuses TKShape and the existing + { 'k': name } dict-pattern matcher (including optional ?: fields) for both + construction and destructuring:

+
enum Event = click {x: int, y: int} | key {code: int, shift?: bool} | close
+
+ev match
+    click {x: x, y: y} : $"click {@x},{@y}" wl,
+    key   {code: c}    : @c handleKey,
+    close              : "bye" wl,
+end
+ +

D. Combined arms (|) for fan-in

+
status match
+    pending | running : "in progress" wl,
+    done              : "complete" wl,
+    failed e          : @e wl,
+end
+ +

E. Generic enums subsume the built-ins

+
enum Result[t, e] = ok t | err e
+# and Maybe[t] = just t | none is simply built in
+ +

F. ?-style propagation (speculative)

+

mshell already has guard-style return. For a designated Result-shaped enum, an + unwrap-or-early-return operator (Rust's ?) could compress + @x parseInt match just v : @v, none : return end to something like @x parseInt!?. + Defer until Result is idiomatic.

+ +

11. End-to-end example

+
enum Mode = read | write | readwrite = "rw"
+
+def openFile (path Mode -- Handle)
+    mode!
+    @mode match
+        read      : @path openRead,
+        write     : @path openWrite,
+        readwrite : @path openRW,
+    end
+end
+
+# at the boundary: a string from argv, validated exactly once
+$1 Mode.decode match
+    just m : somePath @m openFile use,
+    none   : $"--mode must be one of {Mode.values}" wl 1 exit,
+end
+

The checker guarantees openFile handles every mode, that no unvalidated string reaches it, + and that adding a append constructor breaks the build at openFile until it is + handled.

+ +

12. Open questions

+ + +
+

Companion to ai/type_checker.md (the implemented type-system design) and + design/optional_dict_keys.html. File locations and line numbers reference the tree at time + of writing and may drift. Earlier drafts of this doc compared three options (literal types, backed enum, + payload enum); §3–§5 record why the generative sum type is the single primitive those collapse + into.

+ +
+ + diff --git a/mshell/Evaluator.go b/mshell/Evaluator.go index 315f805f..4c247001 100644 --- a/mshell/Evaluator.go +++ b/mshell/Evaluator.go @@ -325,6 +325,34 @@ type EvalState struct { defIndex map[string]int defIndexLen int + + // EnumMembers maps a member name to its enum's declared name. Populated + // from `enum` declarations (RegisterEnums) before evaluation; member + // names are unique across enums in v1, so this flat member -> enum + // lookup is enough to construct a value from a bare member word. + EnumMembers map[string]string +} + +// RegisterEnums scans parse items for `enum` declarations and records each +// member, so a bare member word can be constructed at evaluation time. Called +// once before top-level evaluation; mirrors the checker's enum pre-pass so the +// two agree regardless of declaration order. +func (state *EvalState) RegisterEnums(items []MShellParseItem) { + for _, item := range items { + d, ok := item.(*MShellEnumDecl) + if !ok { + continue + } + if state.EnumMembers == nil { + state.EnumMembers = make(map[string]string) + } + for _, m := range d.Members { + if _, exists := state.EnumMembers[m]; exists { + continue + } + state.EnumMembers[m] = d.Name + } + } } // RebuildDefinitionIndex records the first index for each name, matching @@ -838,6 +866,11 @@ func (state *EvalState) processToken(token MShellParseItem, frame *EvaluationFra // Static-only: type declarations have no runtime effect by design. return SimpleSuccess() + case *MShellEnumDecl: + // Static-only: enum declarations have no runtime effect; members are + // pre-registered via RegisterEnums. + return SimpleSuccess() + case *MShellAsCast: // Static-only: `as` is a checker hint; no runtime work. return SimpleSuccess() @@ -5776,6 +5809,14 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu return SimpleSuccess() } + // Enum constructor: a bare member word pushes its enum value. + if state.EnumMembers != nil { + if enumName, ok := state.EnumMembers[t.Lexeme]; ok { + stack.Push(&MShellEnum{EnumName: enumName, Member: t.Lexeme}) + return SimpleSuccess() + } + } + if t.Lexeme == "stack" { // Print current stack fmt.Fprint(os.Stderr, stack.String()) diff --git a/mshell/Lexer.go b/mshell/Lexer.go index 0a6d43cd..5c31275a 100644 --- a/mshell/Lexer.go +++ b/mshell/Lexer.go @@ -109,6 +109,7 @@ const ( TRY FAIL_KEYWORD PURE + ENUM ) func (t TokenType) String() string { @@ -283,6 +284,8 @@ func (t TokenType) String() string { return "AS" case TYPE: return "TYPE" + case ENUM: + return "ENUM" case TRY: return "TRY" case FAIL_KEYWORD: @@ -530,6 +533,14 @@ func (l *Lexer) literalOrKeywordType() TokenType { } return l.checkKeyword(2, "se", ELSE) case 'n': + if l.curLen() > 2 { + switch l.input[l.start+2] { + case 'd': + return l.checkKeyword(3, "", END) + case 'u': + return l.checkKeyword(3, "m", ENUM) + } + } return l.checkKeyword(2, "d", END) } } diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index f68309f6..8786be12 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -341,6 +341,60 @@ func (n MShellNull) CastString() (string, error) { // }}} +// Enum {{{ + +// MShellEnum is a value of a user-declared `enum` (a generative tagged sum +// type). In v1 every member is nullary, so the value is just the enum's +// declared name plus the member's name. Member names are unique across enums, +// so the member is the identity; the enum name rides along for diagnostics and +// `match`. +type MShellEnum struct { + EnumName string + Member string +} + +func (e *MShellEnum) TypeName() string { return e.EnumName } +func (e *MShellEnum) IsCommandLineable() bool { return true } +func (e *MShellEnum) IsNumeric() bool { return false } +func (e *MShellEnum) FloatNumeric() float64 { return 0 } +func (e *MShellEnum) CommandLine() string { return e.Member } +func (e *MShellEnum) DebugString() string { return e.EnumName + "." + e.Member } + +func (e *MShellEnum) Index(index int) (MShellObject, error) { + return nil, fmt.Errorf("Cannot index into an enum.\n") +} + +func (e *MShellEnum) SliceStart(startInclusive int) (MShellObject, error) { + return nil, fmt.Errorf("Cannot slice an enum.\n") +} + +func (e *MShellEnum) SliceEnd(end int) (MShellObject, error) { + return nil, fmt.Errorf("Cannot slice an enum.\n") +} + +func (e *MShellEnum) Slice(startInc int, endExc int) (MShellObject, error) { + return nil, fmt.Errorf("Cannot slice an enum.\n") +} + +func (e *MShellEnum) ToJson() string { return fmt.Sprintf("%q", e.Member) } +func (e *MShellEnum) ToString() string { return e.Member } +func (e *MShellEnum) IndexErrStr() string { return "" } + +func (e *MShellEnum) Concat(other MShellObject) (MShellObject, error) { + return nil, fmt.Errorf("Cannot concatenate an enum.\n") +} + +func (e *MShellEnum) Equals(other MShellObject) (bool, error) { + if o, ok := other.(*MShellEnum); ok { + return e.EnumName == o.EnumName && e.Member == o.Member, nil + } + return false, nil +} + +func (e *MShellEnum) CastString() (string, error) { return e.Member, nil } + +// }}} + // Date time {{{ type MShellDateTime struct { diff --git a/mshell/Main.go b/mshell/Main.go index eba1046e..42d55c6f 100644 --- a/mshell/Main.go +++ b/mshell/Main.go @@ -857,6 +857,10 @@ func main() { os.Exit(0) } + // Register enum constructors before evaluation so bare member words can + // be constructed (mirrors the checker's enum pre-pass). + state.RegisterEnums(file.Items) + callStackItem := CallStackItem{ MShellParseItem: nil, Name: "main", diff --git a/mshell/Parser.go b/mshell/Parser.go index f510cd7d..cd91bbef 100644 --- a/mshell/Parser.go +++ b/mshell/Parser.go @@ -680,6 +680,12 @@ func (parser *MShellParser) ParseFile() (file *MShellFile, err error) { return file, err } file.Items = append(file.Items, decl) + case ENUM: + decl, err := parser.ParseEnumDecl() + if err != nil { + return file, err + } + file.Items = append(file.Items, decl) case VER: if file.Version != "" { return file, fmt.Errorf("%d:%d: Duplicate VER directive; version already set to %q", parser.curr.Line, parser.curr.Column, file.Version) diff --git a/mshell/Type.go b/mshell/Type.go index 08d725ae..2934192f 100644 --- a/mshell/Type.go +++ b/mshell/Type.go @@ -59,6 +59,11 @@ const ( TKGrid // Extra = index into gridSchemas (0 = unknown schema) TKGridView // Extra = index into gridSchemas (0 = unknown schema) TKGridRow // Extra = index into gridSchemas (0 = unknown schema) + + // Generative tagged sum type (a user `enum`). Nominal: identity is the + // declaration NameId, not the member set. A = decl NameId; Extra = index + // into enumVariants. See design/literal_or_enum_typing.html. + TKEnum ) // String returns a debug name for a TypeKind. @@ -94,6 +99,8 @@ func (k TypeKind) String() string { return "GridView" case TKGridRow: return "GridRow" + case TKEnum: + return "Enum" } return "Unknown" } @@ -130,6 +137,15 @@ type ShapeField struct { Optional bool } +// EnumVariant is one constructor of a TKEnum. Payload is the (ordered) +// list of payload types the constructor carries; it is empty for a +// nullary member (the only kind in v1). Order is meaningful — it sets +// member order for diagnostics and exhaustiveness. +type EnumVariant struct { + Name NameId + Payload []TypeId +} + // GridSchemaCol is one column in a TKGrid / TKGridView / TKGridRow schema. // Order is meaningful (grids have column order). type GridSchemaCol struct { @@ -176,6 +192,7 @@ type TypeArena struct { unionMembers [][]TypeId // each slice is sorted, deduped gridSchemas []GridSchema gridSchemaCons map[string]uint32 + enumVariants [][]EnumVariant } // NewTypeArena constructs an arena pre-populated with the primitive ids @@ -213,6 +230,8 @@ func NewTypeArena() *TypeArena { a.shapeFields = append(a.shapeFields, nil) a.quoteSigs = append(a.quoteSigs, QuoteSig{}) a.overloadedQuoteSigs = append(a.overloadedQuoteSigs, nil) + // Reserve enumVariants[0] as a placeholder so non-zero Extra is meaningful. + a.enumVariants = append(a.enumVariants, nil) return a } @@ -352,6 +371,44 @@ func (a *TypeArena) MakeOverloadedQuote(sigs []QuoteSig) TypeId { return id } +// MakeEnum returns the canonical TypeId for a user enum named by nameId, +// with the given variants. Identity is nominal: it keys on the declaration +// NameId alone, so two enums with identical members are distinct types and +// the same enum always interns to the same TypeId. A duplicate declaration +// is caught higher up (DeclareEnum); reaching here with a name already +// interned returns the existing id. +func (a *TypeArena) MakeEnum(nameId NameId, variants []EnumVariant) TypeId { + key := "E:" + strconv.FormatUint(uint64(nameId), 10) + if id, ok := a.cons[key]; ok { + return id + } + cp := make([]EnumVariant, len(variants)) + copy(cp, variants) + idx := uint32(len(a.enumVariants)) + a.enumVariants = append(a.enumVariants, cp) + id := a.append(TypeNode{Kind: TKEnum, A: uint32(nameId), Extra: idx}) + a.cons[key] = id + return id +} + +// EnumNameId returns the declaration NameId of an enum type. +func (a *TypeArena) EnumNameId(id TypeId) NameId { + n := a.Node(id) + if n.Kind != TKEnum { + panic("TypeArena.EnumNameId: not an enum") + } + return NameId(n.A) +} + +// EnumVariants returns the variants of an enum type. Caller must not mutate. +func (a *TypeArena) EnumVariants(id TypeId) []EnumVariant { + n := a.Node(id) + if n.Kind != TKEnum { + panic("TypeArena.EnumVariants: not an enum") + } + return a.enumVariants[n.Extra] +} + // MakeGrid returns the canonical TypeId for a grid type. schemaIdx of 0 // denotes "schema unknown" (the V1 default until schema tracking lands). func (a *TypeArena) MakeGrid(schemaIdx uint32) TypeId { diff --git a/mshell/TypeCheckProgram.go b/mshell/TypeCheckProgram.go index 87716744..960b8606 100644 --- a/mshell/TypeCheckProgram.go +++ b/mshell/TypeCheckProgram.go @@ -94,6 +94,14 @@ func (c *Checker) RegisterStdlibSigs(defs []MShellDefinition) { // parse tree driving the type stack. Error accumulation lives on the // Checker. func (c *Checker) CheckProgram(file *MShellFile) { + // Pre-pass 0: register all `enum` declarations (nominal types + + // constructor words). Done before `type` decls so a `type` body may + // reference an enum by name. + for _, item := range file.Items { + if d, ok := item.(*MShellEnumDecl); ok { + c.DeclareEnum(d) + } + } // Pre-pass 1: register all `type` declarations. for _, item := range file.Items { if d, ok := item.(*MShellTypeDecl); ok { diff --git a/mshell/TypeChecker.go b/mshell/TypeChecker.go index e62263b4..16759ede 100644 --- a/mshell/TypeChecker.go +++ b/mshell/TypeChecker.go @@ -930,6 +930,11 @@ func (c *Checker) unify(got, want TypeId) bool { return c.unifyQuote(gn, wn) case TKOverloadedQuote: return false + case TKEnum: + // Nominal: two enum types unify only when identical. Equal ids were + // already accepted at the top of unify; reaching here means distinct + // enums, which never unify. + return false case TKGrid, TKGridView, TKGridRow: // Phase-3 grids are opaque. Equality-by-id is the only way two grid // types match; if we got here with same kind but different ids, diff --git a/mshell/TypeEnum.go b/mshell/TypeEnum.go new file mode 100644 index 00000000..ff8fda23 --- /dev/null +++ b/mshell/TypeEnum.go @@ -0,0 +1,67 @@ +package main + +// Phase 1 enum support: registering `enum Name = a | b | c` declarations. +// +// An enum is a generative tagged sum type; v1 members are all nullary. +// Declaring an enum (1) registers a nominal TKEnum type in the type +// environment under its name (so the name resolves in type positions like +// def signatures), and (2) registers each member as a nullary constructor +// word `( -- Enum)` in the name-builtin table, so a bare member reference +// type-checks as construction through the ordinary resolveAndApply path. +// +// Members share the global word namespace: a member name that collides with +// an existing builtin / def / another enum member is rejected. See +// design/literal_or_enum_typing.html and ai/enum_implementation_plan.md. + +// DeclareEnum registers the type and constructors for one `enum` declaration. +func (c *Checker) DeclareEnum(d *MShellEnumDecl) { + if IsReservedTypeName(d.Name) { + c.errors = append(c.errors, TypeError{Kind: TErrReservedTypeName, Pos: d.NameToken, Name: d.Name}) + return + } + if c.typeEnv == nil { + c.typeEnv = make(map[NameId]TypeId, 8) + } + nameId := c.names.Intern(d.Name) + if _, exists := c.typeEnv[nameId]; exists { + c.errors = append(c.errors, TypeError{Kind: TErrDuplicateTypeName, Pos: d.NameToken, Name: d.Name}) + return + } + + type member struct { + name string + tok Token + } + uniq := make([]member, 0, len(d.Members)) + variants := make([]EnumVariant, 0, len(d.Members)) + seen := make(map[string]bool, len(d.Members)) + for i, m := range d.Members { + tok := d.MemberToks[i] + if seen[m] { + c.errors = append(c.errors, TypeError{ + Kind: TErrTypeParse, Pos: tok, + Hint: "duplicate enum member '" + m + "' in '" + d.Name + "'", + }) + continue + } + seen[m] = true + uniq = append(uniq, member{name: m, tok: tok}) + variants = append(variants, EnumVariant{Name: c.names.Intern(m)}) + } + + enumType := c.arena.MakeEnum(nameId, variants) + c.typeEnv[nameId] = enumType + + // Register each (unique) member as a nullary constructor word. + for _, u := range uniq { + mid := c.names.Intern(u.name) + if _, exists := c.nameBuiltins[mid]; exists { + c.errors = append(c.errors, TypeError{ + Kind: TErrTypeParse, Pos: u.tok, + Hint: "enum member '" + u.name + "' conflicts with an existing definition or builtin of the same name", + }) + continue + } + c.nameBuiltins[mid] = append(c.nameBuiltins[mid], QuoteSig{Outputs: []TypeId{enumType}}) + } +} diff --git a/mshell/TypeError.go b/mshell/TypeError.go index 66cc38c3..b047b9c6 100644 --- a/mshell/TypeError.go +++ b/mshell/TypeError.go @@ -278,6 +278,8 @@ func FormatType(arena *TypeArena, names *NameTable, id TypeId) string { return "GridView" case TKGridRow: return "GridRow" + case TKEnum: + return names.Name(NameId(n.A)) } return fmt.Sprintf("<%s #%d>", n.Kind, uint32(id)) } diff --git a/mshell/TypeParseIntegration.go b/mshell/TypeParseIntegration.go index 20979c26..1ec8e1f8 100644 --- a/mshell/TypeParseIntegration.go +++ b/mshell/TypeParseIntegration.go @@ -35,6 +35,74 @@ func (d *MShellTypeDecl) DebugString() string { func (d *MShellTypeDecl) GetStartToken() Token { return d.StartTok } func (d *MShellTypeDecl) GetEndToken() Token { return d.NameToken } +// MShellEnumDecl is a top-level `enum Name = c1 | c2 | ...` declaration: +// a generative tagged sum type. In v1 every member is nullary (a bare +// constructor name); payload-carrying variants land in a later phase. +type MShellEnumDecl struct { + Name string + NameToken Token + StartTok Token // the ENUM keyword + Members []string + MemberToks []Token +} + +func (d *MShellEnumDecl) ToJson() string { + parts := make([]string, len(d.Members)) + for i, m := range d.Members { + parts[i] = fmt.Sprintf("%q", m) + } + return fmt.Sprintf("{\"kind\": \"enumDecl\", \"name\": %q, \"members\": [%s]}", d.Name, strings.Join(parts, ", ")) +} + +func (d *MShellEnumDecl) DebugString() string { + return fmt.Sprintf("enum %s = %s", d.Name, strings.Join(d.Members, " | ")) +} + +func (d *MShellEnumDecl) GetStartToken() Token { return d.StartTok } +func (d *MShellEnumDecl) GetEndToken() Token { + if len(d.MemberToks) > 0 { + return d.MemberToks[len(d.MemberToks)-1] + } + return d.NameToken +} + +// ParseEnumDecl handles a top-level `enum Name = member (| member)*`. The +// ENUM keyword is the current token on entry; on return, parser.curr is +// positioned past the last member. Members must be bare identifiers +// (LITERAL); a keyword used as a member name is a parse error. +func (parser *MShellParser) ParseEnumDecl() (*MShellEnumDecl, error) { + startTok := parser.curr + parser.NextToken() // consume ENUM + if parser.curr.Type != LITERAL { + return nil, fmt.Errorf("%d:%d: expected an enum name after 'enum', got %s", + parser.curr.Line, parser.curr.Column, parser.curr.Type) + } + nameTok := parser.curr + parser.NextToken() // consume name + if parser.curr.Type != EQUALS { + return nil, fmt.Errorf("%d:%d: expected '=' in enum declaration, got %s", + parser.curr.Line, parser.curr.Column, parser.curr.Type) + } + parser.NextToken() // consume = + + decl := &MShellEnumDecl{Name: nameTok.Lexeme, NameToken: nameTok, StartTok: startTok} + for { + if parser.curr.Type != LITERAL { + return nil, fmt.Errorf("%d:%d: expected an enum member name (an identifier), got %s", + parser.curr.Line, parser.curr.Column, parser.curr.Type) + } + decl.Members = append(decl.Members, parser.curr.Lexeme) + decl.MemberToks = append(decl.MemberToks, parser.curr) + parser.NextToken() // consume member + if parser.curr.Type == PIPE { + parser.NextToken() // consume | + continue + } + break + } + return decl, nil +} + // MShellAsCast is a ` as ` postfix cast. type MShellAsCast struct { AsToken Token diff --git a/mshell/TypeUnify.go b/mshell/TypeUnify.go index d5b9a5b7..3f46d92d 100644 --- a/mshell/TypeUnify.go +++ b/mshell/TypeUnify.go @@ -158,6 +158,10 @@ func (w *typeRewriter) mapType(t TypeId, skip map[TypeVarId]struct{}) TypeId { return t } return w.arena.MakeCommand(argv, CommandCaptureMode(n.B), CommandCaptureMode(n.Extra)) + case TKEnum: + // Nominal and (in v1) ground: no type variables to rewrite, and + // identity is the declaration name, so return unchanged. + return t case TKQuote: sig, changed := w.mapSig(w.arena.quoteSigs[n.Extra], skip) if !changed { @@ -342,6 +346,14 @@ func (a *TypeArena) walkTypeVars(t TypeId, visit func(TypeVarId) bool) bool { return true } } + case TKEnum: + for _, v := range a.enumVariants[n.Extra] { + for _, p := range v.Payload { + if a.walkTypeVars(p, visit) { + return true + } + } + } case TKQuote: if a.walkSigVars(a.quoteSigs[n.Extra], visit) { return true From 27821a84ae03bda5bd4dcf54056c9b7af1a1047e Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Sun, 28 Jun 2026 22:14:44 -0500 Subject: [PATCH 02/32] More Enum --- CHANGELOG.md | 11 ++ ai/enum_implementation_plan.md | 8 ++ design/literal_or_enum_typing.html | 29 +++--- doc/mshell.md | 33 ++++++ doc/type_system.inc.html | 50 ++++++++++ mshell/Evaluator.go | 63 ++++++++++-- mshell/MShellObject.go | 33 ++++-- mshell/Type.go | 18 +++- mshell/TypeBranch.go | 33 +++++- mshell/TypeCheckProgram.go | 84 +++++++++++++++- mshell/TypeEnum.go | 61 ++++++++---- mshell/TypeEnum_test.go | 105 ++++++++++++++++++++ mshell/TypeParseIntegration.go | 55 +++++++--- mshell/TypeUnify.go | 4 +- tests/success/enum.msh | 33 ++++++ tests/success/enum.msh.stdout | 4 + tests/typecheck_fail/enum_distinct.msh | 11 ++ tests/typecheck_fail/enum_nonexhaustive.msh | 8 ++ 18 files changed, 565 insertions(+), 78 deletions(-) create mode 100644 mshell/TypeEnum_test.go create mode 100644 tests/success/enum.msh create mode 100644 tests/success/enum.msh.stdout create mode 100644 tests/typecheck_fail/enum_distinct.msh create mode 100644 tests/typecheck_fail/enum_nonexhaustive.msh diff --git a/CHANGELOG.md b/CHANGELOG.md index f95383f0..e9aca0d6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added +- `enum` declarations: a generative tagged sum type, written + `enum Name = a | b | c`, mirroring `type`. Members are bare constructor names, + optionally carrying a payload in parentheses: `enum CmdResult = ok(str) | + failed(int str) | timeout`. A bare member word constructs a value (consuming + any payload from the stack, e.g. `404 "x" failed`), and `match` dispatches on + members with binding (`failed code msg : ...`) and exhaustiveness checking — a + match that omits a member is rejected unless it has a `_` arm. Enums are + nominal: two enums with the same members are distinct types. Member names are + identifiers (not keywords) and are unique across all enums. Payloads may + reference the enum itself, so recursive enums like + `enum Tree = leaf(int) | node(Tree Tree)` are supported. - Optional fields in dictionary shape types, written `name?: T` (and `"name"?: T` in `def` signatures). An optional field may be absent from a value; when present, its value is still type-checked. This lets option-style diff --git a/ai/enum_implementation_plan.md b/ai/enum_implementation_plan.md index 5dd3c5ea..1e74346d 100644 --- a/ai/enum_implementation_plan.md +++ b/ai/enum_implementation_plan.md @@ -3,6 +3,14 @@ Companion to `design/literal_or_enum_typing.html` (the design + rationale). This is the file-by-file build plan. Plans live here in `ai/`; the design lives in `design/`. +**Status: implemented on branch `enum-types`** — nullary + payload-carrying members, +construction, nominal distinctness, and `match` (member dispatch, payload binding, +exhaustiveness) all ship in one PR. Payloads use a parenthesized list (`member(T..)`) +rather than the space-separated form originally sketched, because mshell has no statement +terminator and space-separated payloads are ambiguous against following code. +Out of scope, as agreed: derived `decode`/`encode`/`values`, backing strings, qualified +`Enum.member` names, generics, and `Result` (Maybe suffices). JSON stays a structural union. + ## Scope & non-goals In scope (a generative tagged sum type declared with `enum`, inline `= a | b | c`): diff --git a/design/literal_or_enum_typing.html b/design/literal_or_enum_typing.html index 7e6d1150..2ca5f48b 100644 --- a/design/literal_or_enum_typing.html +++ b/design/literal_or_enum_typing.html @@ -295,25 +295,26 @@

Keyword

Declaration form — inline, mirroring type

The chosen surface is a one-token delta from the existing type Name = A | B: swap - type for enum, and each arm is a constructor name followed by its - payload type expression (zero or more), |-separated.

+ type for enum, and each arm is a constructor name with an optional + parenthesized payload type list, |-separated. A nullary member takes no + parentheses, so a plain enumeration is identical to the approved form.

enum Mode   = read | write | readwrite           # nullary: a plain enumeration
-enum Shape  = circle float | rect float float | point
-enum CmdResult = ok str | failed int str | timeout
+enum Shape = circle(float) | rect(float float) | point +enum CmdResult = ok(str) | failed(int str) | timeout -

Generic enums use the same [..] parameter syntax as Maybe[t], and thereby - generalize the built-ins:

-
enum Result[t, e] = ok t | err e
-# Maybe[t] is the built-in `enum Maybe[t] = just t | none`
+

The parentheses delimit the payload because mshell has no statement terminator: with + space-separated payloads, a nullary member followed by ordinary code (... | blue then + green describe) is ambiguous — the parser cannot tell whether green is a + payload type of blue or the next statement. Parentheses make the member set + self-delimiting. (Recursive payloads work: enum Tree = leaf(int) | node(Tree Tree).)

-

Grammar, stated against the existing type-expression parser - (parseTypeExpr / TypeUnionExpr):

+

Grammar:

    -
  • enum Name ([ params ])opt - = arm (| arm)*
  • -
  • arm ::= constructorName typeExpr*   (the constructor name is a bare identifier; the rest of the - arm is parsed with the ordinary type-expression productions)
  • +
  • enum Name = arm (| arm)*
  • +
  • arm ::= constructorName (( typePrimary+ ))opt   (the + constructor name is a bare identifier; each payload is a type primary, so a union payload must be + named via a type alias)

The only new wrinkle vs. a structural union is that an arm's first token is a binding occurrence (a constructor being declared) rather than a reference to an existing type. The diff --git a/doc/mshell.md b/doc/mshell.md index f694af66..a2b7e741 100644 --- a/doc/mshell.md +++ b/doc/mshell.md @@ -628,6 +628,39 @@ An operation that is valid for only some members is a type error — dividing an For more detail, see the generated Type System help page. +## Enums + +An `enum` declares a generative tagged sum type, written like `type` with members separated by `|`: + +``` +enum Color = red | green | blue +``` + +A bare member name constructs that value (`green` pushes a `Color`). +Member names are identifiers (not keywords) and are unique across all enums. +Unlike a union, an enum is nominal — two enums with the same members are distinct types. + +Members may carry a payload in parentheses; the constructor consumes those values from the stack. +A nullary member takes no parentheses. Payloads may reference the enum itself (recursive enums). + +``` +enum CmdResult = ok(str) | failed(int str) | timeout +404 "not found" failed # ( int str -- CmdResult ) + +enum Tree = leaf(int) | node(Tree Tree) +``` + +`match` dispatches on the member and binds payload values (like `just v`). +A match must cover every member or include a `_` arm; omitting a member is a static error. + +``` +result match + ok out : @out wl, + failed c e : $"{@e} ({@c})" wl, + timeout : "timed out" wl, +end +``` + ## Definitions Definitions use `def` with an optional metadata dictionary before the type signature. diff --git a/doc/type_system.inc.html b/doc/type_system.inc.html index 6aa26492..3c7d50b7 100644 --- a/doc/type_system.inc.html +++ b/doc/type_system.inc.html @@ -11,6 +11,7 @@

  • Type Expressions
  • Dictionaries
  • Lists
  • +
  • Enums
  • Quotations
  • Control Flow
  • Current Boundaries
  • @@ -339,6 +340,55 @@

    Heterogeneous Lists can be understood as [int | str], but the checker does not use index :0: to prove "this position is always int" unless the value is converted or asserted in another way.

    +

    Enums § Back to top

    + +

    +An enum declares a named type whose values are a fixed set of members (constructors). +It is written like a type declaration, with the members separated by |. +Unlike a union, an enum is generative: each member is a brand-new value that the language tags, so two enums with the same members are still distinct types, and a member can carry data that another member does not. +

    + +
    enum Color = red | green | blue
    + +

    +A member name is written bare to construct that value. +Member names are ordinary identifiers (they may not be language keywords), and every member name is unique across all enums. +

    + +
    green                  # a value of type Color
    + +

    +Members may carry a payload: a parenthesized list of types the member holds. +The constructor then consumes those values from the stack, just like any other word. +A member with no payload is nullary and takes no parentheses. +

    + +
    enum CmdResult = ok(str) | failed(int str) | timeout
    +
    +"output"        ok       # ( str -- CmdResult )
    +404 "not found" failed   # ( int str -- CmdResult )
    +timeout                  # ( -- CmdResult )
    + +

    +A payload may reference the enum itself, so recursive enums are allowed. +

    + +
    enum Tree = leaf(int) | node(Tree Tree)
    + +

    +Use match to dispatch on the member. +A payload member binds its values to names in the arm body, the same way just v binds a Maybe payload. +The match must cover every member, or include a wildcard _ arm; a match that omits a member is a static error, so adding a member later forces every match to handle it. +

    + +
    def render (CmdResult -- str)
    +    match
    +        ok out     : @out,
    +        failed c e : $"{@e} ({@c})",
    +        timeout    : "timed out",
    +    end
    +end
    +

    Quotations § Back to top

    diff --git a/mshell/Evaluator.go b/mshell/Evaluator.go index 4c247001..db4ccd76 100644 --- a/mshell/Evaluator.go +++ b/mshell/Evaluator.go @@ -326,11 +326,18 @@ type EvalState struct { defIndex map[string]int defIndexLen int - // EnumMembers maps a member name to its enum's declared name. Populated - // from `enum` declarations (RegisterEnums) before evaluation; member - // names are unique across enums in v1, so this flat member -> enum - // lookup is enough to construct a value from a bare member word. - EnumMembers map[string]string + // EnumMembers maps a member name to its enum and payload arity. Populated + // from `enum` declarations (RegisterEnums) before evaluation; member names + // are unique across enums, so this flat lookup is enough to construct a + // value from a bare member word, including consuming its payload. + EnumMembers map[string]EnumMemberInfo +} + +// EnumMemberInfo records where a member came from and how many payload values +// its constructor consumes from the stack. +type EnumMemberInfo struct { + EnumName string + Arity int } // RegisterEnums scans parse items for `enum` declarations and records each @@ -344,13 +351,13 @@ func (state *EvalState) RegisterEnums(items []MShellParseItem) { continue } if state.EnumMembers == nil { - state.EnumMembers = make(map[string]string) + state.EnumMembers = make(map[string]EnumMemberInfo) } - for _, m := range d.Members { + for i, m := range d.Members { if _, exists := state.EnumMembers[m]; exists { continue } - state.EnumMembers[m] = d.Name + state.EnumMembers[m] = EnumMemberInfo{EnumName: d.Name, Arity: len(d.MemberPayloads[i])} } } } @@ -1079,6 +1086,29 @@ func (state *EvalState) processMatchBlock(matchBlock *MShellParseMatchBlock, fra // matchPattern checks if a subject matches a pattern (list of parse items). // Returns (matched bool, bindings map, result EvalResult). func (state *EvalState) matchPattern(pattern []MShellParseItem, subject MShellObject, startToken Token) (bool, map[string]MShellObject, EvalResult) { + // Enum constructor pattern: `member` or `member b1 b2 ...`. Only a member + // name matches an enum value (a sibling member just fails this arm and the + // next is tried); `_` and `none` fall through to the generic handling. + if enumVal, ok := subject.(*MShellEnum); ok && len(pattern) >= 1 { + if tok, okTok := pattern[0].(Token); okTok && tok.Type == LITERAL && tok.Lexeme != "_" && tok.Lexeme != "none" { + if tok.Lexeme != enumVal.Member { + return false, nil, SimpleSuccess() + } + binds := pattern[1:] + if len(binds) != len(enumVal.Payload) { + return false, nil, state.FailWithMessage(fmt.Sprintf("%d:%d: enum member '%s' binds %d payload value(s), got %d.\n", + tok.Line, tok.Column, tok.Lexeme, len(enumVal.Payload), len(binds))) + } + bindings := make(map[string]MShellObject) + for i, b := range binds { + if bt, okBt := b.(Token); okBt && bt.Lexeme != "_" { + bindings[bt.Lexeme] = enumVal.Payload[i] + } + } + return true, bindings, SimpleSuccess() + } + } + // Handle multi-token patterns (e.g., "just v" for maybe destructuring, // or " name" for type-test binding). if len(pattern) == 2 { @@ -5809,10 +5839,21 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu return SimpleSuccess() } - // Enum constructor: a bare member word pushes its enum value. + // Enum constructor: a bare member word consumes its payload + // (if any) from the stack and pushes the enum value. if state.EnumMembers != nil { - if enumName, ok := state.EnumMembers[t.Lexeme]; ok { - stack.Push(&MShellEnum{EnumName: enumName, Member: t.Lexeme}) + if info, ok := state.EnumMembers[t.Lexeme]; ok { + var payload []MShellObject + if info.Arity > 0 { + if len(*stack) < info.Arity { + return state.FailWithMessage(fmt.Sprintf("%d:%d: enum constructor '%s' needs %d payload value(s) on the stack.\n", t.Line, t.Column, t.Lexeme, info.Arity)) + } + payload = make([]MShellObject, info.Arity) + for i := info.Arity - 1; i >= 0; i-- { + payload[i], _ = stack.Pop() + } + } + stack.Push(&MShellEnum{EnumName: info.EnumName, Member: t.Lexeme, Payload: payload}) return SimpleSuccess() } } diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index 8786be12..2f78ff4d 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -344,13 +344,14 @@ func (n MShellNull) CastString() (string, error) { // Enum {{{ // MShellEnum is a value of a user-declared `enum` (a generative tagged sum -// type). In v1 every member is nullary, so the value is just the enum's -// declared name plus the member's name. Member names are unique across enums, -// so the member is the identity; the enum name rides along for diagnostics and -// `match`. +// type): the enum's declared name, the chosen member, and the member's payload +// values (nil for a nullary member). Member names are unique across enums, so +// the member identifies the value; the enum name rides along for diagnostics +// and `match`. type MShellEnum struct { EnumName string Member string + Payload []MShellObject } func (e *MShellEnum) TypeName() string { return e.EnumName } @@ -358,7 +359,16 @@ func (e *MShellEnum) IsCommandLineable() bool { return true } func (e *MShellEnum) IsNumeric() bool { return false } func (e *MShellEnum) FloatNumeric() float64 { return 0 } func (e *MShellEnum) CommandLine() string { return e.Member } -func (e *MShellEnum) DebugString() string { return e.EnumName + "." + e.Member } +func (e *MShellEnum) DebugString() string { + if len(e.Payload) == 0 { + return e.EnumName + "." + e.Member + } + parts := make([]string, len(e.Payload)) + for i, p := range e.Payload { + parts[i] = p.DebugString() + } + return e.EnumName + "." + e.Member + "(" + strings.Join(parts, " ") + ")" +} func (e *MShellEnum) Index(index int) (MShellObject, error) { return nil, fmt.Errorf("Cannot index into an enum.\n") @@ -385,10 +395,17 @@ func (e *MShellEnum) Concat(other MShellObject) (MShellObject, error) { } func (e *MShellEnum) Equals(other MShellObject) (bool, error) { - if o, ok := other.(*MShellEnum); ok { - return e.EnumName == o.EnumName && e.Member == o.Member, nil + o, ok := other.(*MShellEnum) + if !ok || e.EnumName != o.EnumName || e.Member != o.Member || len(e.Payload) != len(o.Payload) { + return false, nil } - return false, nil + for i := range e.Payload { + eq, err := e.Payload[i].Equals(o.Payload[i]) + if err != nil || !eq { + return false, err + } + } + return true, nil } func (e *MShellEnum) CastString() (string, error) { return e.Member, nil } diff --git a/mshell/Type.go b/mshell/Type.go index 2934192f..fa74d8a9 100644 --- a/mshell/Type.go +++ b/mshell/Type.go @@ -139,8 +139,8 @@ type ShapeField struct { // EnumVariant is one constructor of a TKEnum. Payload is the (ordered) // list of payload types the constructor carries; it is empty for a -// nullary member (the only kind in v1). Order is meaningful — it sets -// member order for diagnostics and exhaustiveness. +// nullary member. Order is meaningful — it sets member order for +// diagnostics and exhaustiveness. type EnumVariant struct { Name NameId Payload []TypeId @@ -391,6 +391,20 @@ func (a *TypeArena) MakeEnum(nameId NameId, variants []EnumVariant) TypeId { return id } +// SetEnumVariants replaces the variant list of an already-created enum type. +// Used to finalize an enum after its name was registered with a placeholder +// (empty) variant list, so member payloads can reference the enum itself or +// other enums regardless of declaration order. +func (a *TypeArena) SetEnumVariants(id TypeId, variants []EnumVariant) { + n := a.Node(id) + if n.Kind != TKEnum { + panic("TypeArena.SetEnumVariants: not an enum") + } + cp := make([]EnumVariant, len(variants)) + copy(cp, variants) + a.enumVariants[n.Extra] = cp +} + // EnumNameId returns the declaration NameId of an enum type. func (a *TypeArena) EnumNameId(id TypeId) NameId { n := a.Node(id) diff --git a/mshell/TypeBranch.go b/mshell/TypeBranch.go index 877f0a7a..c760cead 100644 --- a/mshell/TypeBranch.go +++ b/mshell/TypeBranch.go @@ -76,6 +76,10 @@ const ( MatchArmFalse // bool literal `false` pattern // MatchArmEmptyList: `[]` pattern. Covers empty lists. MatchArmEmptyList + // MatchArmEnumMember: an enum constructor pattern (`member` or + // `member b1 b2 ...`). TypeArm holds the enum type, EnumMember the + // member's NameId. + MatchArmEnumMember // MatchArmListWithRest: `[a ...rest]`, `[a b ...rest]`, or // `[...rest]` — any list pattern with a `...name` element. // Covers all lists whose length is at least the number of @@ -87,8 +91,9 @@ const ( // type effects flow through ReconcileArms; this struct only feeds // the exhaustiveness check. type MatchArmTag struct { - Kind MatchArmKind - TypeArm TypeId // valid when Kind == MatchArmType + Kind MatchArmKind + TypeArm TypeId // valid when Kind == MatchArmType or MatchArmEnumMember + EnumMember NameId // valid when Kind == MatchArmEnumMember } // CheckMatchExhaustive verifies that arms cover every inhabitant of @@ -178,6 +183,30 @@ func (c *Checker) CheckMatchExhaustive(matched TypeId, arms []MatchArmTag, callS } } + case TKEnum: + variants := c.arena.enumVariants[n.Extra] + covered := make(map[NameId]bool, len(variants)) + for _, arm := range arms { + if arm.Kind == MatchArmEnumMember && c.subst.Apply(c.arena, arm.TypeArm) == matched { + covered[arm.EnumMember] = true + } + } + var missing []string + for _, v := range variants { + if !covered[v.Name] { + missing = append(missing, c.names.Name(v.Name)) + } + } + if len(missing) == 0 { + return true + } + c.errors = append(c.errors, TypeError{ + Kind: TErrNonExhaustiveMatch, + Pos: callSite, + Hint: "enum match must cover every member or include a wildcard; missing: " + strings.Join(missing, ", "), + }) + return false + case TKList: // A list's inhabitants split by length: zero (empty) vs // one-or-more. `[]` covers empty; any list pattern that ends diff --git a/mshell/TypeCheckProgram.go b/mshell/TypeCheckProgram.go index 960b8606..dddb5d4e 100644 --- a/mshell/TypeCheckProgram.go +++ b/mshell/TypeCheckProgram.go @@ -94,14 +94,21 @@ func (c *Checker) RegisterStdlibSigs(defs []MShellDefinition) { // parse tree driving the type stack. Error accumulation lives on the // Checker. func (c *Checker) CheckProgram(file *MShellFile) { - // Pre-pass 0: register all `enum` declarations (nominal types + - // constructor words). Done before `type` decls so a `type` body may - // reference an enum by name. + // Pre-pass 0: register all `enum` declarations. Names are predeclared + // first so member payloads can reference any enum (including the enum + // itself) regardless of order; bodies and constructor words follow. Done + // before `type` decls so a `type` body may reference an enum by name. + var enumDecls []*MShellEnumDecl for _, item := range file.Items { if d, ok := item.(*MShellEnumDecl); ok { - c.DeclareEnum(d) + if c.predeclareEnum(d) { + enumDecls = append(enumDecls, d) + } } } + for _, d := range enumDecls { + c.defineEnum(d) + } // Pre-pass 1: register all `type` declarations. for _, item := range file.Items { if d, ok := item.(*MShellTypeDecl); ok { @@ -1294,6 +1301,14 @@ func (c *Checker) analyzeArmPattern(subject TypeId, pattern []MShellParseItem) a func (c *Checker) armPatternOf(subject TypeId, pattern []MShellParseItem) armPattern { out := armPattern{Tag: MatchArmTag{Kind: MatchArmType, TypeArm: TidNothing}} + // Enum constructor pattern: `member` or `member b1 b2 ...`, when the + // subject is an enum and the first token names one of its members. This + // can be any length (one token per payload binding), so it is handled + // before the length-based switch. + if ep, ok := c.enumMemberPattern(subject, pattern); ok { + return ep + } + switch len(pattern) { case 1: switch p := pattern[0].(type) { @@ -1384,6 +1399,67 @@ func (c *Checker) armPatternOf(subject TypeId, pattern []MShellParseItem) armPat return out } +// enumMemberPattern recognizes an enum constructor arm: `member` (nullary) or +// `member b1 b2 ...` (one binding name per payload). It returns ok=false when +// the subject is not an enum or the first token is not one of its members, so +// the caller falls back to the ordinary pattern forms. A payload-arity mismatch +// is recognized (so no "invalid pattern" cascade) but reported. +func (c *Checker) enumMemberPattern(subject TypeId, pattern []MShellParseItem) (armPattern, bool) { + if len(pattern) == 0 { + return armPattern{}, false + } + tok, ok := pattern[0].(Token) + if !ok || tok.Type != LITERAL { + return armPattern{}, false + } + resolved := c.subst.Apply(c.arena, subject) + sn := c.arena.Node(resolved) + if sn.Kind != TKEnum { + return armPattern{}, false + } + memberId := c.names.Intern(tok.Lexeme) + var payload []TypeId + found := false + for _, v := range c.arena.enumVariants[sn.Extra] { + if v.Name == memberId { + payload = v.Payload + found = true + break + } + } + if !found { + return armPattern{}, false + } + out := armPattern{ + Recognized: true, + Tag: MatchArmTag{Kind: MatchArmEnumMember, TypeArm: resolved, EnumMember: memberId}, + } + binds := pattern[1:] + if len(binds) != len(payload) { + c.errors = append(c.errors, TypeError{ + Kind: TErrInvalidMatchPattern, + Pos: tok, + Hint: fmt.Sprintf("enum member '%s' binds %d payload value(s), got %d", tok.Lexeme, len(payload), len(binds)), + }) + return out, true + } + for i, b := range binds { + bt, ok := b.(Token) + if !ok || bt.Type != LITERAL { + c.errors = append(c.errors, TypeError{ + Kind: TErrInvalidMatchPattern, + Pos: tok, + Hint: "enum payload bindings must be names", + }) + return out, true + } + if bt.Lexeme != "_" { + out.Bindings = append(out.Bindings, patternBind{bt.Lexeme, payload[i]}) + } + } + return out, true +} + // analyzeTokenPattern handles the single-token pattern forms: type // keywords, value literals, `_`, `none`, and user-declared type names. func (c *Checker) analyzeTokenPattern(tok Token, out *armPattern) { diff --git a/mshell/TypeEnum.go b/mshell/TypeEnum.go index ff8fda23..4a20ee81 100644 --- a/mshell/TypeEnum.go +++ b/mshell/TypeEnum.go @@ -1,23 +1,27 @@ package main -// Phase 1 enum support: registering `enum Name = a | b | c` declarations. +// Enum support: registering `enum Name = a | b(T..) | ...` declarations. // -// An enum is a generative tagged sum type; v1 members are all nullary. -// Declaring an enum (1) registers a nominal TKEnum type in the type -// environment under its name (so the name resolves in type positions like -// def signatures), and (2) registers each member as a nullary constructor -// word `( -- Enum)` in the name-builtin table, so a bare member reference -// type-checks as construction through the ordinary resolveAndApply path. +// An enum is a generative tagged sum type. Registration happens in two passes +// so member payloads may reference any enum regardless of order (including the +// enum itself): // -// Members share the global word namespace: a member name that collides with -// an existing builtin / def / another enum member is rejected. See -// design/literal_or_enum_typing.html and ai/enum_implementation_plan.md. +// - predeclareEnum interns the name and registers a placeholder TKEnum in the +// type environment, so the name resolves in any later type position. +// - defineEnum resolves each member's payload types, finalizes the variant +// list, and registers each member as a constructor word whose signature +// consumes the payload and produces the enum (`(T.. -- Enum)`). +// +// Members share the global word namespace: a member name that collides with an +// existing builtin / def / another enum member is rejected. -// DeclareEnum registers the type and constructors for one `enum` declaration. -func (c *Checker) DeclareEnum(d *MShellEnumDecl) { +// predeclareEnum registers the enum's name with a placeholder type. It returns +// true when the name was newly registered (so defineEnum should finish it), and +// false for a reserved or duplicate name (an error is recorded). +func (c *Checker) predeclareEnum(d *MShellEnumDecl) bool { if IsReservedTypeName(d.Name) { c.errors = append(c.errors, TypeError{Kind: TErrReservedTypeName, Pos: d.NameToken, Name: d.Name}) - return + return false } if c.typeEnv == nil { c.typeEnv = make(map[NameId]TypeId, 8) @@ -25,12 +29,22 @@ func (c *Checker) DeclareEnum(d *MShellEnumDecl) { nameId := c.names.Intern(d.Name) if _, exists := c.typeEnv[nameId]; exists { c.errors = append(c.errors, TypeError{Kind: TErrDuplicateTypeName, Pos: d.NameToken, Name: d.Name}) - return + return false } + c.typeEnv[nameId] = c.arena.MakeEnum(nameId, nil) + return true +} + +// defineEnum resolves payload types, finalizes the variant list, and registers +// constructor words. It must run after predeclareEnum has registered the name. +func (c *Checker) defineEnum(d *MShellEnumDecl) { + nameId := c.names.Intern(d.Name) + enumType := c.typeEnv[nameId] type member struct { - name string - tok Token + name string + tok Token + payloads []TypeId } uniq := make([]member, 0, len(d.Members)) variants := make([]EnumVariant, 0, len(d.Members)) @@ -45,14 +59,17 @@ func (c *Checker) DeclareEnum(d *MShellEnumDecl) { continue } seen[m] = true - uniq = append(uniq, member{name: m, tok: tok}) - variants = append(variants, EnumVariant{Name: c.names.Intern(m)}) + + var payloads []TypeId + for _, p := range d.MemberPayloads[i] { + payloads = append(payloads, c.resolveTypeExpr(p, nil)) + } + uniq = append(uniq, member{name: m, tok: tok, payloads: payloads}) + variants = append(variants, EnumVariant{Name: c.names.Intern(m), Payload: payloads}) } - enumType := c.arena.MakeEnum(nameId, variants) - c.typeEnv[nameId] = enumType + c.arena.SetEnumVariants(enumType, variants) - // Register each (unique) member as a nullary constructor word. for _, u := range uniq { mid := c.names.Intern(u.name) if _, exists := c.nameBuiltins[mid]; exists { @@ -62,6 +79,6 @@ func (c *Checker) DeclareEnum(d *MShellEnumDecl) { }) continue } - c.nameBuiltins[mid] = append(c.nameBuiltins[mid], QuoteSig{Outputs: []TypeId{enumType}}) + c.nameBuiltins[mid] = append(c.nameBuiltins[mid], QuoteSig{Inputs: u.payloads, Outputs: []TypeId{enumType}}) } } diff --git a/mshell/TypeEnum_test.go b/mshell/TypeEnum_test.go new file mode 100644 index 00000000..929e625f --- /dev/null +++ b/mshell/TypeEnum_test.go @@ -0,0 +1,105 @@ +package main + +import ( + "strings" + "testing" +) + +func TestEnumNullaryDeclAndConstruct(t *testing.T) { + errs, ok := parseAndCheck(t, "enum Color = red | green | blue\ndef describe (Color -- str) c! \"x\" end\nred describe") + if !ok || len(errs) != 0 { + t.Fatalf("nullary enum decl + construct should pass; errs=%v ok=%v", errs, ok) + } +} + +func TestEnumPayloadConstructorSignature(t *testing.T) { + // A payload constructor has signature (payload... -- Enum). + errs, ok := parseAndCheck(t, "enum R = ok(str) | failed(int str) | none2\ndef use (R -- str) c! \"x\" end\n404 \"nf\" failed use") + if !ok || len(errs) != 0 { + t.Fatalf("payload constructor should type-check; errs=%v ok=%v", errs, ok) + } +} + +func TestEnumPayloadWrongType(t *testing.T) { + errs, ok := parseAndCheck(t, "enum R = ok(int)\n\"x\" ok") + if ok { + t.Fatalf("wrong payload type should fail; errs=%v", errs) + } +} + +func TestEnumDistinctNominal(t *testing.T) { + // Two enums with parallel members do not unify. + src := "enum A = a1 | a2\nenum B = b1 | b2\ndef takesA (A -- str) c! \"x\" end\nb1 takesA" + errs, ok := parseAndCheck(t, src) + if ok { + t.Fatalf("feeding enum B where A is expected should fail; errs=%v", errs) + } +} + +func TestEnumDuplicateMember(t *testing.T) { + errs, ok := parseAndCheck(t, "enum E = a | b | a") + if ok { + t.Fatalf("duplicate enum member should fail; errs=%v", errs) + } + if !strings.Contains(strings.Join(errs, "\n"), "duplicate enum member") { + t.Fatalf("expected duplicate-member error; errs=%v", errs) + } +} + +func TestEnumCrossEnumMemberCollision(t *testing.T) { + errs, ok := parseAndCheck(t, "enum E = x1 | shared\nenum F = shared | y1") + if ok { + t.Fatalf("member name reused across enums should fail; errs=%v", errs) + } +} + +func TestEnumReservedName(t *testing.T) { + errs, ok := parseAndCheck(t, "enum Maybe = a | b") + if ok { + t.Fatalf("enum named with a reserved type name should fail; errs=%v", errs) + } +} + +func TestEnumMatchExhaustive(t *testing.T) { + src := "enum Color = red | green | blue\ngreen match\n red : \"r\" wl,\n green : \"g\" wl,\n blue : \"b\" wl,\nend" + errs, ok := parseAndCheck(t, src) + if !ok || len(errs) != 0 { + t.Fatalf("exhaustive enum match should pass; errs=%v ok=%v", errs, ok) + } +} + +func TestEnumMatchNonExhaustive(t *testing.T) { + src := "enum Color = red | green | blue\ngreen match\n red : \"r\" wl,\n blue : \"b\" wl,\nend" + errs, ok := parseAndCheck(t, src) + if ok { + t.Fatalf("non-exhaustive enum match should fail; errs=%v", errs) + } + if !strings.Contains(strings.Join(errs, "\n"), "missing: green") { + t.Fatalf("expected missing-member hint naming 'green'; errs=%v", errs) + } +} + +func TestEnumMatchWildcardExhaustive(t *testing.T) { + src := "enum Color = red | green | blue\nred match\n red : \"r\" wl,\n _ : \"o\" wl,\nend" + errs, ok := parseAndCheck(t, src) + if !ok || len(errs) != 0 { + t.Fatalf("wildcard should make enum match exhaustive; errs=%v ok=%v", errs, ok) + } +} + +func TestEnumMatchPayloadBinding(t *testing.T) { + src := "enum R = ok(str) | failed(int str) | quit\n404 \"nf\" failed match\n ok s : @s wl,\n failed c e : @e wl,\n quit : \"q\" wl,\nend" + errs, ok := parseAndCheck(t, src) + if !ok || len(errs) != 0 { + t.Fatalf("payload-binding enum match should pass; errs=%v ok=%v", errs, ok) + } +} + +func TestEnumRecursivePayload(t *testing.T) { + // A member may carry a payload that references the enum itself. + src := "enum Tree = leaf(int) | node(Tree Tree)\n3 leaf 4 leaf node" + errs, ok := parseAndCheck(t, src) + if !ok || len(errs) != 0 { + t.Fatalf("self-referential enum payload should type-check; errs=%v ok=%v", errs, ok) + } +} diff --git a/mshell/TypeParseIntegration.go b/mshell/TypeParseIntegration.go index 1ec8e1f8..29f47a3d 100644 --- a/mshell/TypeParseIntegration.go +++ b/mshell/TypeParseIntegration.go @@ -35,15 +35,17 @@ func (d *MShellTypeDecl) DebugString() string { func (d *MShellTypeDecl) GetStartToken() Token { return d.StartTok } func (d *MShellTypeDecl) GetEndToken() Token { return d.NameToken } -// MShellEnumDecl is a top-level `enum Name = c1 | c2 | ...` declaration: -// a generative tagged sum type. In v1 every member is nullary (a bare -// constructor name); payload-carrying variants land in a later phase. +// MShellEnumDecl is a top-level `enum Name = c1 | c2(T..) | ...` +// declaration: a generative tagged sum type. Each member is a constructor +// name with an optional parenthesized payload type list. MemberPayloads is +// parallel to Members; an entry is empty for a nullary member. type MShellEnumDecl struct { - Name string - NameToken Token - StartTok Token // the ENUM keyword - Members []string - MemberToks []Token + Name string + NameToken Token + StartTok Token // the ENUM keyword + Members []string + MemberToks []Token + MemberPayloads [][]MShellParseItem } func (d *MShellEnumDecl) ToJson() string { @@ -66,10 +68,12 @@ func (d *MShellEnumDecl) GetEndToken() Token { return d.NameToken } -// ParseEnumDecl handles a top-level `enum Name = member (| member)*`. The -// ENUM keyword is the current token on entry; on return, parser.curr is -// positioned past the last member. Members must be bare identifiers -// (LITERAL); a keyword used as a member name is a parse error. +// ParseEnumDecl handles a top-level `enum Name = member (| member)*`, where a +// member is a bare identifier (LITERAL) optionally followed by a parenthesized +// payload type list `(T1 T2 ...)`. The parentheses delimit the payload so the +// member set is unambiguous against following code (mshell has no statement +// terminator). The ENUM keyword is the current token on entry; on return, +// parser.curr is positioned past the last member. func (parser *MShellParser) ParseEnumDecl() (*MShellEnumDecl, error) { startTok := parser.curr parser.NextToken() // consume ENUM @@ -86,20 +90,45 @@ func (parser *MShellParser) ParseEnumDecl() (*MShellEnumDecl, error) { parser.NextToken() // consume = decl := &MShellEnumDecl{Name: nameTok.Lexeme, NameToken: nameTok, StartTok: startTok} + var errs []TypeError for { if parser.curr.Type != LITERAL { return nil, fmt.Errorf("%d:%d: expected an enum member name (an identifier), got %s", parser.curr.Line, parser.curr.Column, parser.curr.Type) } - decl.Members = append(decl.Members, parser.curr.Lexeme) + memberName := parser.curr.Lexeme + decl.Members = append(decl.Members, memberName) decl.MemberToks = append(decl.MemberToks, parser.curr) parser.NextToken() // consume member + + var payloads []MShellParseItem + if parser.curr.Type == LEFT_PAREN { + openTok := parser.curr + parser.NextToken() // consume ( + for parser.curr.Type != RIGHT_PAREN && parser.curr.Type != EOF { + payloads = append(payloads, parser.parseTypePrimary(&errs)) + } + if parser.curr.Type != RIGHT_PAREN { + return nil, fmt.Errorf("%d:%d: expected ')' to close the payload list for enum member '%s'", + openTok.Line, openTok.Column, memberName) + } + parser.NextToken() // consume ) + if len(payloads) == 0 { + return nil, fmt.Errorf("%d:%d: enum member '%s' has an empty payload list '()'; omit the parentheses for a nullary member", + openTok.Line, openTok.Column, memberName) + } + } + decl.MemberPayloads = append(decl.MemberPayloads, payloads) + if parser.curr.Type == PIPE { parser.NextToken() // consume | continue } break } + if len(errs) > 0 { + return nil, fmt.Errorf("enum declaration body: %s", joinTypeErrs(errs)) + } return decl, nil } diff --git a/mshell/TypeUnify.go b/mshell/TypeUnify.go index 3f46d92d..67a45a7e 100644 --- a/mshell/TypeUnify.go +++ b/mshell/TypeUnify.go @@ -159,8 +159,8 @@ func (w *typeRewriter) mapType(t TypeId, skip map[TypeVarId]struct{}) TypeId { } return w.arena.MakeCommand(argv, CommandCaptureMode(n.B), CommandCaptureMode(n.Extra)) case TKEnum: - // Nominal and (in v1) ground: no type variables to rewrite, and - // identity is the declaration name, so return unchanged. + // Nominal and ground: identity is the declaration name and payloads + // carry no type variables, so there is nothing to rewrite. return t case TKQuote: sig, changed := w.mapSig(w.arena.quoteSigs[n.Extra], skip) diff --git a/tests/success/enum.msh b/tests/success/enum.msh new file mode 100644 index 00000000..60d6f925 --- /dev/null +++ b/tests/success/enum.msh @@ -0,0 +1,33 @@ +# Enum: declaration, construction, match, and payload-carrying variants. +enum Color = red | green | blue + +green match + red : "red" wl, + green : "green" wl, + blue : "blue" wl, +end + +enum CmdResult = ok(str) | failed(int str) | timeout + +404 "not found" failed match + ok out : @out wl, + failed c e : [@e " " @c str] "" join wl, + timeout : "timeout" wl, +end + +timeout match + ok _ : "ok" wl, + failed _ _ : "failed" wl, + timeout : "timed out" wl, +end + +# Enum used in a def signature, distinct from other enums. +def label (Color -- str) + match + red : "is red", + green : "is green", + blue : "is blue", + end +end + +blue label wl diff --git a/tests/success/enum.msh.stdout b/tests/success/enum.msh.stdout new file mode 100644 index 00000000..e8de0ada --- /dev/null +++ b/tests/success/enum.msh.stdout @@ -0,0 +1,4 @@ +green +not found 404 +timed out +is blue diff --git a/tests/typecheck_fail/enum_distinct.msh b/tests/typecheck_fail/enum_distinct.msh new file mode 100644 index 00000000..a1daa15c --- /dev/null +++ b/tests/typecheck_fail/enum_distinct.msh @@ -0,0 +1,11 @@ +# Two enums are nominally distinct even with parallel members: a value of one +# cannot be used where the other is expected. +enum A = a1 | a2 +enum B = b1 | b2 + +def takesA (A -- str) + c! + "ok" +end + +b1 takesA wl diff --git a/tests/typecheck_fail/enum_nonexhaustive.msh b/tests/typecheck_fail/enum_nonexhaustive.msh new file mode 100644 index 00000000..4697c6d0 --- /dev/null +++ b/tests/typecheck_fail/enum_nonexhaustive.msh @@ -0,0 +1,8 @@ +# A match on an enum that omits a member (and has no wildcard) is +# non-exhaustive and must be rejected. +enum Color = red | green | blue + +red match + red : "r" wl, + blue : "b" wl, +end From 15782c0551d69d9703c8d6e4d311a55cf64398af Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Mon, 29 Jun 2026 19:03:20 -0500 Subject: [PATCH 03/32] Enum: payload stringification + crash/soundness fixes - str/toJson now render enum payloads. str: `member(p ...)`, built with an explicit work stack (no recursion) so deeply nested values can't overflow. toJson: serde externally-tagged (`"m"`, `{"m": v}`, `{"m": [..]}`). - Fix type-checker stack overflow on recursive enums passed through generics: walkTypeVars treats an enum as a ground leaf (payloads carry no type vars). - Reject an empty `match` as non-exhaustive instead of letting it crash at runtime. - Language-agnostic regression fixtures: recursive-generic, empty-match, str/json rendering, and a 50k-deep render overflow guard. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr --- mshell/MShellObject.go | 84 +++++++++++++++---- mshell/TypeCheckProgram.go | 8 +- mshell/TypeUnify.go | 11 +-- tests/success/enum_deep_render.msh | 13 +++ tests/success/enum_deep_render.msh.stdout | 1 + tests/success/enum_recursive_generic.msh | 10 +++ .../success/enum_recursive_generic.msh.stdout | 1 + tests/success/enum_str_json.msh | 16 ++++ tests/success/enum_str_json.msh.stdout | 7 ++ tests/typecheck_fail/enum_empty_match.msh | 5 ++ 10 files changed, 131 insertions(+), 25 deletions(-) create mode 100644 tests/success/enum_deep_render.msh create mode 100644 tests/success/enum_deep_render.msh.stdout create mode 100644 tests/success/enum_recursive_generic.msh create mode 100644 tests/success/enum_recursive_generic.msh.stdout create mode 100644 tests/success/enum_str_json.msh create mode 100644 tests/success/enum_str_json.msh.stdout create mode 100644 tests/typecheck_fail/enum_empty_match.msh diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index 2f78ff4d..c4432f6e 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -354,20 +354,57 @@ type MShellEnum struct { Payload []MShellObject } -func (e *MShellEnum) TypeName() string { return e.EnumName } -func (e *MShellEnum) IsCommandLineable() bool { return true } -func (e *MShellEnum) IsNumeric() bool { return false } -func (e *MShellEnum) FloatNumeric() float64 { return 0 } -func (e *MShellEnum) CommandLine() string { return e.Member } -func (e *MShellEnum) DebugString() string { - if len(e.Payload) == 0 { - return e.EnumName + "." + e.Member - } - parts := make([]string, len(e.Payload)) - for i, p := range e.Payload { - parts[i] = p.DebugString() +func (e *MShellEnum) TypeName() string { return e.EnumName } +func (e *MShellEnum) IsCommandLineable() bool { return true } +func (e *MShellEnum) IsNumeric() bool { return false } +func (e *MShellEnum) FloatNumeric() float64 { return 0 } +func (e *MShellEnum) CommandLine() string { return enumRender(e) } +func (e *MShellEnum) DebugString() string { return e.EnumName + "." + enumRender(e) } + +// enumRender renders an enum value as `member` (nullary) or +// `member(p0 p1 ...)`. Nested enum payloads are expanded with an explicit +// work stack rather than function recursion, so an arbitrarily deep value +// (e.g. a long `node(node(... ) ...)` chain) cannot overflow the call stack. +// Non-enum payloads use their own ToString. +func enumRender(top *MShellEnum) string { + var sb strings.Builder + type task struct { + lit string + obj MShellObject + isLit bool + } + stack := []task{{obj: top}} + for len(stack) > 0 { + t := stack[len(stack)-1] + stack = stack[:len(stack)-1] + if t.isLit { + sb.WriteString(t.lit) + continue + } + en, ok := t.obj.(*MShellEnum) + if !ok { + sb.WriteString(t.obj.ToString()) + continue + } + if len(en.Payload) == 0 { + sb.WriteString(en.Member) + continue + } + // Emit `member ( p0 " " p1 ... )`; push reversed so it pops in order. + seq := make([]task, 0, len(en.Payload)*2+3) + seq = append(seq, task{lit: en.Member, isLit: true}, task{lit: "(", isLit: true}) + for i, p := range en.Payload { + if i > 0 { + seq = append(seq, task{lit: " ", isLit: true}) + } + seq = append(seq, task{obj: p}) + } + seq = append(seq, task{lit: ")", isLit: true}) + for i := len(seq) - 1; i >= 0; i-- { + stack = append(stack, seq[i]) + } } - return e.EnumName + "." + e.Member + "(" + strings.Join(parts, " ") + ")" + return sb.String() } func (e *MShellEnum) Index(index int) (MShellObject, error) { @@ -386,8 +423,25 @@ func (e *MShellEnum) Slice(startInc int, endExc int) (MShellObject, error) { return nil, fmt.Errorf("Cannot slice an enum.\n") } -func (e *MShellEnum) ToJson() string { return fmt.Sprintf("%q", e.Member) } -func (e *MShellEnum) ToString() string { return e.Member } +// ToJson uses serde's externally-tagged convention — the de-facto standard for +// tagged unions in JSON: a nullary member is the bare member string; a member +// with a single payload is `{"member": value}`; with several, `{"member": +// [v0, v1, ...]}`. +func (e *MShellEnum) ToJson() string { + if len(e.Payload) == 0 { + return fmt.Sprintf("%q", e.Member) + } + if len(e.Payload) == 1 { + return fmt.Sprintf("{%q: %s}", e.Member, e.Payload[0].ToJson()) + } + parts := make([]string, len(e.Payload)) + for i, p := range e.Payload { + parts[i] = p.ToJson() + } + return fmt.Sprintf("{%q: [%s]}", e.Member, strings.Join(parts, ", ")) +} + +func (e *MShellEnum) ToString() string { return enumRender(e) } func (e *MShellEnum) IndexErrStr() string { return "" } func (e *MShellEnum) Concat(other MShellObject) (MShellObject, error) { diff --git a/mshell/TypeCheckProgram.go b/mshell/TypeCheckProgram.go index dddb5d4e..27435172 100644 --- a/mshell/TypeCheckProgram.go +++ b/mshell/TypeCheckProgram.go @@ -1203,9 +1203,11 @@ func (c *Checker) checkMatchBlock(matchBlock *MShellParseMatchBlock) { entry := c.captureBranch() if len(matchBlock.Arms) == 0 { - // Empty match block: no arms could fire. Treat as a no-op. - // The runtime would error at first use; the checker keeps - // the subject on the stack. + // An empty match can never fire — it always errors at runtime. Run + // exhaustiveness with no arms so the static check rejects it (no type + // is covered and there is no wildcard) instead of letting it crash at + // runtime. + c.CheckMatchExhaustive(subject, nil, startTok) return } diff --git a/mshell/TypeUnify.go b/mshell/TypeUnify.go index 67a45a7e..b73ea501 100644 --- a/mshell/TypeUnify.go +++ b/mshell/TypeUnify.go @@ -347,13 +347,10 @@ func (a *TypeArena) walkTypeVars(t TypeId, visit func(TypeVarId) bool) bool { } } case TKEnum: - for _, v := range a.enumVariants[n.Extra] { - for _, p := range v.Payload { - if a.walkTypeVars(p, visit) { - return true - } - } - } + // Enums are nominal and ground — payloads are resolved without a + // generic scope, so they never contain a type variable. Treat the + // enum as a leaf; recursing into payloads would loop forever on a + // self-referential enum (e.g. `node(Tree Tree)`). case TKQuote: if a.walkSigVars(a.quoteSigs[n.Extra], visit) { return true diff --git a/tests/success/enum_deep_render.msh b/tests/success/enum_deep_render.msh new file mode 100644 index 00000000..2a8e2a5f --- /dev/null +++ b/tests/success/enum_deep_render.msh @@ -0,0 +1,13 @@ +# A deeply nested enum value must stringify without overflowing: `str` renders +# enum payloads with an explicit work stack, not function recursion. Build a +# 50000-deep tree and print the length of its rendering (deterministic, and a +# recursive renderer would overflow the stack well before this depth). +enum Tree = leaf(int) | node(Tree Tree) +0 leaf t! +0 i! +( + @i 50000 >= if break end + @t 0 leaf node t! + @i 1 + i! +) loop +@t str len str wl diff --git a/tests/success/enum_deep_render.msh.stdout b/tests/success/enum_deep_render.msh.stdout new file mode 100644 index 00000000..d9515471 --- /dev/null +++ b/tests/success/enum_deep_render.msh.stdout @@ -0,0 +1 @@ +700007 diff --git a/tests/success/enum_recursive_generic.msh b/tests/success/enum_recursive_generic.msh new file mode 100644 index 00000000..5d06773b --- /dev/null +++ b/tests/success/enum_recursive_generic.msh @@ -0,0 +1,10 @@ +# Regression: a self-referential enum flowing through a generic parameter +# triggers the type checker's occurs check. The checker must treat an enum as a +# leaf when scanning for type variables; otherwise it recurses into the cyclic +# payload (`node(Tree Tree)`) forever and overflows the stack. +enum Tree = leaf(int) | node(Tree Tree) + +def id (q -- q) end + +3 leaf id drop +"ok" wl diff --git a/tests/success/enum_recursive_generic.msh.stdout b/tests/success/enum_recursive_generic.msh.stdout new file mode 100644 index 00000000..9766475a --- /dev/null +++ b/tests/success/enum_recursive_generic.msh.stdout @@ -0,0 +1 @@ +ok diff --git a/tests/success/enum_str_json.msh b/tests/success/enum_str_json.msh new file mode 100644 index 00000000..d524e89f --- /dev/null +++ b/tests/success/enum_str_json.msh @@ -0,0 +1,16 @@ +# Stringifying enum values. `str` renders the member, with payloads in +# parentheses; `toJson` uses the externally-tagged convention. Nullary members +# render as the bare member name / string. +enum R = ok(str) | failed(int str) | timeout + +"hi" ok str wl +404 "nf" failed str wl +timeout str wl + +"hi" ok toJson wl +404 "nf" failed toJson wl +timeout toJson wl + +# Nested payloads render through, member-first. +enum Tree = leaf(int) | node(Tree Tree) +1 leaf 2 leaf node 3 leaf node str wl diff --git a/tests/success/enum_str_json.msh.stdout b/tests/success/enum_str_json.msh.stdout new file mode 100644 index 00000000..727821a0 --- /dev/null +++ b/tests/success/enum_str_json.msh.stdout @@ -0,0 +1,7 @@ +ok(hi) +failed(404 nf) +timeout +{"ok": "hi"} +{"failed": [404, "nf"]} +"timeout" +node(node(leaf(1) leaf(2)) leaf(3)) diff --git a/tests/typecheck_fail/enum_empty_match.msh b/tests/typecheck_fail/enum_empty_match.msh new file mode 100644 index 00000000..e43d7f5b --- /dev/null +++ b/tests/typecheck_fail/enum_empty_match.msh @@ -0,0 +1,5 @@ +# An empty match covers no members and has no wildcard, so it can never fire +# and always crashes at runtime. It must be rejected as non-exhaustive. +enum Color = red | green | blue + +red match end From 8bc9c0b7a002e8269a9cb6fd732185ce6e6dc0ea Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Mon, 29 Jun 2026 19:09:25 -0500 Subject: [PATCH 04/32] Enum: fix payload paren swallowing the following statement MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit mshell has no statement terminator, so after a nullary member a `(...)` on the next line (a quotation, a filter predicate, etc.) was greedily parsed as that member's payload list — making the parenthesized expression vanish. Require a payload `(` to be attached to the member name (`failed(int str)`); a detached paren belongs to the following code and the member is nullary. Regression fixture: tests/success/enum_then_quote.msh. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr --- mshell/TypeParseIntegration.go | 14 +++++++++++--- tests/success/enum_then_quote.msh | 10 ++++++++++ tests/success/enum_then_quote.msh.stdout | 2 ++ 3 files changed, 23 insertions(+), 3 deletions(-) create mode 100644 tests/success/enum_then_quote.msh create mode 100644 tests/success/enum_then_quote.msh.stdout diff --git a/mshell/TypeParseIntegration.go b/mshell/TypeParseIntegration.go index 29f47a3d..a860b873 100644 --- a/mshell/TypeParseIntegration.go +++ b/mshell/TypeParseIntegration.go @@ -96,13 +96,21 @@ func (parser *MShellParser) ParseEnumDecl() (*MShellEnumDecl, error) { return nil, fmt.Errorf("%d:%d: expected an enum member name (an identifier), got %s", parser.curr.Line, parser.curr.Column, parser.curr.Type) } - memberName := parser.curr.Lexeme + memberTok := parser.curr + memberName := memberTok.Lexeme decl.Members = append(decl.Members, memberName) - decl.MemberToks = append(decl.MemberToks, parser.curr) + decl.MemberToks = append(decl.MemberToks, memberTok) parser.NextToken() // consume member + // A payload list's `(` must be attached to the member name + // (`failed(int str)`), with no space. mshell has no statement + // terminator, so a detached `(` — `green (q) x` or a `(...)` + // starting the next line — belongs to the following code, not the + // enum, and the member is nullary. var payloads []MShellParseItem - if parser.curr.Type == LEFT_PAREN { + if parser.curr.Type == LEFT_PAREN && + parser.curr.Line == memberTok.Line && + parser.curr.Column == memberTok.Column+len(memberTok.Lexeme) { openTok := parser.curr parser.NextToken() // consume ( for parser.curr.Type != RIGHT_PAREN && parser.curr.Type != EOF { diff --git a/tests/success/enum_then_quote.msh b/tests/success/enum_then_quote.msh new file mode 100644 index 00000000..be89844e --- /dev/null +++ b/tests/success/enum_then_quote.msh @@ -0,0 +1,10 @@ +# Regression: a `(...)` statement after an enum declaration must NOT be parsed +# as a payload list for the preceding nullary member. A payload `(` is a payload +# only when attached to the member name (`failed(int str)`); a detached paren +# (whitespace or newline before it) belongs to the following code, and the +# member is nullary. Before the fix, the quotation below was swallowed by the +# enum declaration. +enum C = red | green | blue + +(green) x str wl +[1 2 3] (0 >) filter len str wl diff --git a/tests/success/enum_then_quote.msh.stdout b/tests/success/enum_then_quote.msh.stdout new file mode 100644 index 00000000..50af8b94 --- /dev/null +++ b/tests/success/enum_then_quote.msh.stdout @@ -0,0 +1,2 @@ +green +3 From dde7f95f89d111dfdcfac5452d9724d731be84e4 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Mon, 29 Jun 2026 19:33:25 -0500 Subject: [PATCH 05/32] Enum: switch to `end`-terminated syntax, drop payload parens Replace the parenthesized payload syntax (which required whitespace-significant adjacency to avoid swallowing the next statement) with a block terminated by `end`, like def/if/match/loop: enum CmdResult = ok str | failed int str | timeout end Members are `|`-separated; each carries zero or more space-separated payload types; `end` bounds the member list so it is unambiguous against following code without relying on whitespace. Parser-only change plus docs, fixtures, and Go tests updated to the new surface. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr --- CHANGELOG.md | 24 ++++---- design/literal_or_enum_typing.html | 48 ++++++++------- doc/mshell.md | 12 ++-- doc/type_system.inc.html | 13 +++-- mshell/TypeEnum_test.go | 42 +++++++++---- mshell/TypeParseIntegration.go | 65 +++++++++------------ mshell/TypeUnify.go | 2 +- tests/success/enum.msh | 4 +- tests/success/enum_deep_render.msh | 2 +- tests/success/enum_recursive_generic.msh | 4 +- tests/success/enum_str_json.msh | 8 +-- tests/success/enum_then_quote.msh | 11 ++-- tests/typecheck_fail/enum_distinct.msh | 4 +- tests/typecheck_fail/enum_empty_match.msh | 2 +- tests/typecheck_fail/enum_nonexhaustive.msh | 2 +- 15 files changed, 129 insertions(+), 114 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e9aca0d6..cbc3eac8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,17 +9,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added -- `enum` declarations: a generative tagged sum type, written - `enum Name = a | b | c`, mirroring `type`. Members are bare constructor names, - optionally carrying a payload in parentheses: `enum CmdResult = ok(str) | - failed(int str) | timeout`. A bare member word constructs a value (consuming - any payload from the stack, e.g. `404 "x" failed`), and `match` dispatches on - members with binding (`failed code msg : ...`) and exhaustiveness checking — a - match that omits a member is rejected unless it has a `_` arm. Enums are - nominal: two enums with the same members are distinct types. Member names are - identifiers (not keywords) and are unique across all enums. Payloads may - reference the enum itself, so recursive enums like - `enum Tree = leaf(int) | node(Tree Tree)` are supported. +- `enum` declarations: a generative tagged sum type. Members are separated by + `|` and the declaration is closed by `end` (like `def`/`if`/`match`): + `enum CmdResult = ok str | failed int str | timeout end`. A member is a bare + constructor name optionally followed by payload types; a bare member word + constructs a value (consuming any payload from the stack, e.g. + `404 "x" failed`), and `match` dispatches on members with binding + (`failed code msg : ...`) and exhaustiveness checking — a match that omits a + member (or is empty) is rejected unless it has a `_` arm. Enums are nominal: + two enums with the same members are distinct types. Member names are + identifiers (not keywords) and are unique across all enums. `str` renders a + value as `member(payload ...)` and `toJson` uses the externally-tagged + convention. Payloads may reference the enum itself, so recursive enums like + `enum Tree = leaf int | node Tree Tree end` are supported. - Optional fields in dictionary shape types, written `name?: T` (and `"name"?: T` in `def` signatures). An optional field may be absent from a value; when present, its value is still type-checked. This lets option-style diff --git a/design/literal_or_enum_typing.html b/design/literal_or_enum_typing.html index 2ca5f48b..d5d738ab 100644 --- a/design/literal_or_enum_typing.html +++ b/design/literal_or_enum_typing.html @@ -293,40 +293,44 @@

    Keyword

    -

    Declaration form — inline, mirroring type

    -

    The chosen surface is a one-token delta from the existing type Name = A | B: swap - type for enum, and each arm is a constructor name with an optional - parenthesized payload type list, |-separated. A nullary member takes no - parentheses, so a plain enumeration is identical to the approved form.

    - -
    enum Mode   = read | write | readwrite           # nullary: a plain enumeration
    -enum Shape  = circle(float) | rect(float float) | point
    -enum CmdResult = ok(str) | failed(int str) | timeout
    - -

    The parentheses delimit the payload because mshell has no statement terminator: with - space-separated payloads, a nullary member followed by ordinary code (... | blue then - green describe) is ambiguous — the parser cannot tell whether green is a - payload type of blue or the next statement. Parentheses make the member set - self-delimiting. (Recursive payloads work: enum Tree = leaf(int) | node(Tree Tree).)

    +

    Declaration form — |-separated members, closed by end

    +

    Each arm is a constructor name followed by zero or more space-separated payload types, + arms are |-separated, and the body is closed by end — the same block + terminator def, if, match, and loop already use. A + nullary member is just a bare name.

    + +
    enum Mode   = read | write | readwrite end
    +enum Shape  = circle float | rect float float | point end
    +enum CmdResult = ok str | failed int str | timeout end
    + +

    The end terminator is what keeps the grammar whitespace-insensitive. mshell + has no statement terminator, so an open-ended, space-separated payload list has no way to mark its end: + after a nullary member, the parser otherwise cannot tell a following (...) / + [...] statement from a payload. Delimiters attached to the member name + (failed(int str)) would fix it only by making whitespace significant, which we reject. + end bounds the whole member list instead — and because every payload type is + itself self-delimiting (a quote type (a -- b) closes at its )), space-separated + payloads inside the body are unambiguous. This is exactly why type X = (a -- b) already has + no boundary problem: a type body is one self-bounded expression, whereas an enum body is an open list + that needs a terminator.

    Grammar:

      -
    • enum Name = arm (| arm)*
    • -
    • arm ::= constructorName (( typePrimary+ ))opt   (the - constructor name is a bare identifier; each payload is a type primary, so a union payload must be - named via a type alias)
    • +
    • enum Name = arm (| arm)* end
    • +
    • arm ::= constructorName typePrimary*   (the constructor name is a bare identifier; each payload + is a type primary, so a union payload must be named via a type alias)

    The only new wrinkle vs. a structural union is that an arm's first token is a binding occurrence (a constructor being declared) rather than a reference to an existing type. The enum keyword is what tells the parser to read it that way — no capitalization rule.

    A long block is still fine

    -

    The inline form wraps naturally; arms may be placed one per line, with an optional leading - | for alignment (a small additive allowance over the current type grammar):

    +

    Arms may be placed one per line, with an optional leading | for alignment:

    enum Event =
         | click int int
         | key int
    -    | close
    + | close +end

    7. Construction and matching (use sites)

    Both reuse the existing Maybe machinery verbatim — the whole point of "Maybe diff --git a/doc/mshell.md b/doc/mshell.md index a2b7e741..a719c940 100644 --- a/doc/mshell.md +++ b/doc/mshell.md @@ -630,24 +630,24 @@ For more detail, see the generated Type System help page. ## Enums -An `enum` declares a generative tagged sum type, written like `type` with members separated by `|`: +An `enum` declares a generative tagged sum type: members separated by `|`, closed by `end` (like `def`/`if`/`match`): ``` -enum Color = red | green | blue +enum Color = red | green | blue end ``` A bare member name constructs that value (`green` pushes a `Color`). Member names are identifiers (not keywords) and are unique across all enums. Unlike a union, an enum is nominal — two enums with the same members are distinct types. -Members may carry a payload in parentheses; the constructor consumes those values from the stack. -A nullary member takes no parentheses. Payloads may reference the enum itself (recursive enums). +Members may carry a payload — types written after the member name; the constructor consumes those values from the stack. +A nullary member has no payload. The closing `end` bounds the member list, so payloads are never confused with the following code. Payloads may reference the enum itself (recursive enums). ``` -enum CmdResult = ok(str) | failed(int str) | timeout +enum CmdResult = ok str | failed int str | timeout end 404 "not found" failed # ( int str -- CmdResult ) -enum Tree = leaf(int) | node(Tree Tree) +enum Tree = leaf int | node Tree Tree end ``` `match` dispatches on the member and binds payload values (like `just v`). diff --git a/doc/type_system.inc.html b/doc/type_system.inc.html index 3c7d50b7..ac76e539 100644 --- a/doc/type_system.inc.html +++ b/doc/type_system.inc.html @@ -344,11 +344,11 @@

    Enums Enums Enums 0 { return nil, fmt.Errorf("enum declaration body: %s", joinTypeErrs(errs)) diff --git a/mshell/TypeUnify.go b/mshell/TypeUnify.go index b73ea501..c7295499 100644 --- a/mshell/TypeUnify.go +++ b/mshell/TypeUnify.go @@ -350,7 +350,7 @@ func (a *TypeArena) walkTypeVars(t TypeId, visit func(TypeVarId) bool) bool { // Enums are nominal and ground — payloads are resolved without a // generic scope, so they never contain a type variable. Treat the // enum as a leaf; recursing into payloads would loop forever on a - // self-referential enum (e.g. `node(Tree Tree)`). + // self-referential enum (e.g. `node Tree Tree`). case TKQuote: if a.walkSigVars(a.quoteSigs[n.Extra], visit) { return true diff --git a/tests/success/enum.msh b/tests/success/enum.msh index 60d6f925..191a7cd1 100644 --- a/tests/success/enum.msh +++ b/tests/success/enum.msh @@ -1,5 +1,5 @@ # Enum: declaration, construction, match, and payload-carrying variants. -enum Color = red | green | blue +enum Color = red | green | blue end green match red : "red" wl, @@ -7,7 +7,7 @@ green match blue : "blue" wl, end -enum CmdResult = ok(str) | failed(int str) | timeout +enum CmdResult = ok str | failed int str | timeout end 404 "not found" failed match ok out : @out wl, diff --git a/tests/success/enum_deep_render.msh b/tests/success/enum_deep_render.msh index 2a8e2a5f..952cfe94 100644 --- a/tests/success/enum_deep_render.msh +++ b/tests/success/enum_deep_render.msh @@ -2,7 +2,7 @@ # enum payloads with an explicit work stack, not function recursion. Build a # 50000-deep tree and print the length of its rendering (deterministic, and a # recursive renderer would overflow the stack well before this depth). -enum Tree = leaf(int) | node(Tree Tree) +enum Tree = leaf int | node Tree Tree end 0 leaf t! 0 i! ( diff --git a/tests/success/enum_recursive_generic.msh b/tests/success/enum_recursive_generic.msh index 5d06773b..1f2773dd 100644 --- a/tests/success/enum_recursive_generic.msh +++ b/tests/success/enum_recursive_generic.msh @@ -1,8 +1,8 @@ # Regression: a self-referential enum flowing through a generic parameter # triggers the type checker's occurs check. The checker must treat an enum as a # leaf when scanning for type variables; otherwise it recurses into the cyclic -# payload (`node(Tree Tree)`) forever and overflows the stack. -enum Tree = leaf(int) | node(Tree Tree) +# payload (`node Tree Tree`) forever and overflows the stack. +enum Tree = leaf int | node Tree Tree end def id (q -- q) end diff --git a/tests/success/enum_str_json.msh b/tests/success/enum_str_json.msh index d524e89f..f470bfb3 100644 --- a/tests/success/enum_str_json.msh +++ b/tests/success/enum_str_json.msh @@ -1,7 +1,7 @@ -# Stringifying enum values. `str` renders the member, with payloads in -# parentheses; `toJson` uses the externally-tagged convention. Nullary members +# Stringifying enum values. `str` renders the member, with payloads after the +# member name; `toJson` uses the externally-tagged convention. Nullary members # render as the bare member name / string. -enum R = ok(str) | failed(int str) | timeout +enum R = ok str | failed int str | timeout end "hi" ok str wl 404 "nf" failed str wl @@ -12,5 +12,5 @@ timeout str wl timeout toJson wl # Nested payloads render through, member-first. -enum Tree = leaf(int) | node(Tree Tree) +enum Tree = leaf int | node Tree Tree end 1 leaf 2 leaf node 3 leaf node str wl diff --git a/tests/success/enum_then_quote.msh b/tests/success/enum_then_quote.msh index be89844e..8873b7e4 100644 --- a/tests/success/enum_then_quote.msh +++ b/tests/success/enum_then_quote.msh @@ -1,10 +1,7 @@ -# Regression: a `(...)` statement after an enum declaration must NOT be parsed -# as a payload list for the preceding nullary member. A payload `(` is a payload -# only when attached to the member name (`failed(int str)`); a detached paren -# (whitespace or newline before it) belongs to the following code, and the -# member is nullary. Before the fix, the quotation below was swallowed by the -# enum declaration. -enum C = red | green | blue +# A `(...)` statement after an enum declaration is the following code, not a +# payload of the last member: the enum body is closed by `end`, so the boundary +# is unambiguous and whitespace-insensitive. +enum C = red | green | blue end (green) x str wl [1 2 3] (0 >) filter len str wl diff --git a/tests/typecheck_fail/enum_distinct.msh b/tests/typecheck_fail/enum_distinct.msh index a1daa15c..b43f407e 100644 --- a/tests/typecheck_fail/enum_distinct.msh +++ b/tests/typecheck_fail/enum_distinct.msh @@ -1,7 +1,7 @@ # Two enums are nominally distinct even with parallel members: a value of one # cannot be used where the other is expected. -enum A = a1 | a2 -enum B = b1 | b2 +enum A = a1 | a2 end +enum B = b1 | b2 end def takesA (A -- str) c! diff --git a/tests/typecheck_fail/enum_empty_match.msh b/tests/typecheck_fail/enum_empty_match.msh index e43d7f5b..2c58cdb1 100644 --- a/tests/typecheck_fail/enum_empty_match.msh +++ b/tests/typecheck_fail/enum_empty_match.msh @@ -1,5 +1,5 @@ # An empty match covers no members and has no wildcard, so it can never fire # and always crashes at runtime. It must be rejected as non-exhaustive. -enum Color = red | green | blue +enum Color = red | green | blue end red match end diff --git a/tests/typecheck_fail/enum_nonexhaustive.msh b/tests/typecheck_fail/enum_nonexhaustive.msh index 4697c6d0..fce677a0 100644 --- a/tests/typecheck_fail/enum_nonexhaustive.msh +++ b/tests/typecheck_fail/enum_nonexhaustive.msh @@ -1,6 +1,6 @@ # A match on an enum that omits a member (and has no wildcard) is # non-exhaustive and must be rejected. -enum Color = red | green | blue +enum Color = red | green | blue end red match red : "r" wl, From 52bfa28a11e22aac1592ca2495a70790af1b844b Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Mon, 29 Jun 2026 19:35:37 -0500 Subject: [PATCH 06/32] Enum: support optional leading `|` (ML-style member lists) The design doc advertised ML-style member lists (one per line, each prefixed with `|`), but the parser rejected a leading `|` after `=`. Accept an optional leading `|` so `enum E =\n | a\n | b\nend` parses; the regular `a | b` form is unchanged. Regression fixture: enum_leading_pipe.msh. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr --- mshell/TypeParseIntegration.go | 6 ++++++ tests/success/enum_leading_pipe.msh | 15 +++++++++++++++ tests/success/enum_leading_pipe.msh.stdout | 1 + 3 files changed, 22 insertions(+) create mode 100644 tests/success/enum_leading_pipe.msh create mode 100644 tests/success/enum_leading_pipe.msh.stdout diff --git a/mshell/TypeParseIntegration.go b/mshell/TypeParseIntegration.go index 37bb912b..1ffd8fc3 100644 --- a/mshell/TypeParseIntegration.go +++ b/mshell/TypeParseIntegration.go @@ -96,6 +96,12 @@ func (parser *MShellParser) ParseEnumDecl() (*MShellEnumDecl, error) { } parser.NextToken() // consume = + // Allow an optional leading `|` so members can be written ML-style, one + // per line each prefixed with `|`. + if parser.curr.Type == PIPE { + parser.NextToken() + } + decl := &MShellEnumDecl{Name: nameTok.Lexeme, NameToken: nameTok, StartTok: startTok} var errs []TypeError for { diff --git a/tests/success/enum_leading_pipe.msh b/tests/success/enum_leading_pipe.msh new file mode 100644 index 00000000..774f9993 --- /dev/null +++ b/tests/success/enum_leading_pipe.msh @@ -0,0 +1,15 @@ +# Members may be written ML-style: one per line, each prefixed with an optional +# leading `|`. +enum Suit = + | hearts + | diamonds + | clubs + | spades +end + +clubs match + hearts : "H" wl, + diamonds : "D" wl, + clubs : "C" wl, + spades : "S" wl, +end diff --git a/tests/success/enum_leading_pipe.msh.stdout b/tests/success/enum_leading_pipe.msh.stdout new file mode 100644 index 00000000..3cc58df8 --- /dev/null +++ b/tests/success/enum_leading_pipe.msh.stdout @@ -0,0 +1 @@ +C From d3a18358499f03ba88479e1dbe3d13bd85552e16 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Mon, 29 Jun 2026 20:23:31 -0500 Subject: [PATCH 07/32] Make a lone `_` its own token (the wildcard), reserved as a name MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previously `_` was an ordinary LITERAL special-cased by string comparison, so it could be used as an identifier — including as an enum member name, which the checker treated as a coverable member while the runtime treated `_` as the catch-all wildcard, type-checking a program that mis-dispatched at runtime. Lex a lone `_` as a new UNDERSCORE token. It is the wildcard / ignore marker in all pattern positions (match arm, list/dict element, `just _`, ` _`, enum payload binding), is rejected wherever a name is expected (enum member, def name, ...), and remains usable as a bare argv word (the literal string "_") in a list. `..._`, `_foo`, and `_!` are unaffected. Regression fixtures: typecheck_fail/enum_underscore_member.msh and success/underscore_argv.msh. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr --- mshell/Evaluator.go | 13 +++++++++++-- mshell/Lexer.go | 10 ++++++++++ mshell/TypeCheckProgram.go | 7 +++++-- mshell/TypeChecker.go | 2 +- tests/success/underscore_argv.msh | 7 +++++++ tests/success/underscore_argv.msh.stdout | 2 ++ tests/typecheck_fail/enum_underscore_member.msh | 10 ++++++++++ 7 files changed, 46 insertions(+), 5 deletions(-) create mode 100644 tests/success/underscore_argv.msh create mode 100644 tests/success/underscore_argv.msh.stdout create mode 100644 tests/typecheck_fail/enum_underscore_member.msh diff --git a/mshell/Evaluator.go b/mshell/Evaluator.go index db4ccd76..6cc6f063 100644 --- a/mshell/Evaluator.go +++ b/mshell/Evaluator.go @@ -1114,7 +1114,7 @@ func (state *EvalState) matchPattern(pattern []MShellParseItem, subject MShellOb if len(pattern) == 2 { first, firstOk := pattern[0].(Token) second, secondOk := pattern[1].(Token) - if firstOk && secondOk && second.Type == LITERAL { + if firstOk && secondOk && (second.Type == LITERAL || second.Type == UNDERSCORE) { if first.Type == LITERAL && first.Lexeme == "just" { // Maybe Just destructuring var maybeVal Maybe @@ -1177,6 +1177,8 @@ func (state *EvalState) matchPattern(pattern []MShellParseItem, subject MShellOb // matchTokenPattern matches a single token pattern against a subject. func (state *EvalState) matchTokenPattern(p Token, subject MShellObject) (bool, EvalResult) { switch p.Type { + case UNDERSCORE: + return true, SimpleSuccess() case LITERAL: if p.Lexeme == "_" { return true, SimpleSuccess() @@ -1415,7 +1417,7 @@ func (state *EvalState) matchDictPattern(pattern *MShellParseDict, subject MShel return false, nil, state.FailWithMessage(fmt.Sprintf("%d:%d: Dict pattern value must be a single binding name.\n", startToken.Line, startToken.Column)) } tok, ok := kv.Value[0].(Token) - if !ok || tok.Type != LITERAL { + if !ok || (tok.Type != LITERAL && tok.Type != UNDERSCORE) { return false, nil, state.FailWithMessage(fmt.Sprintf("%d:%d: Dict pattern value must be a literal binding name.\n", startToken.Line, startToken.Column)) } if tok.Lexeme != "_" { @@ -11210,6 +11212,13 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu stack.Push(MShellLiteral{t.Lexeme}) } + } else if t.Type == UNDERSCORE { + // A lone `_` is the pattern wildcard; in a list it is the + // literal argv word "_", and nowhere else does it have meaning. + if callStackItem.CallStackType != CALLSTACKLIST { + return state.FailWithMessage(fmt.Sprintf("%d:%d: '_' is reserved as the match wildcard; use \"_\" for a literal underscore string.\n", t.Line, t.Column)) + } + stack.Push(MShellLiteral{"_"}) } else if t.Type == ASTERISK { obj1, err := stack.Pop() if err != nil { diff --git a/mshell/Lexer.go b/mshell/Lexer.go index 5c31275a..b9224629 100644 --- a/mshell/Lexer.go +++ b/mshell/Lexer.go @@ -110,6 +110,7 @@ const ( FAIL_KEYWORD PURE ENUM + UNDERSCORE // a lone `_`: the match/binding wildcard, reserved as a name ) func (t TokenType) String() string { @@ -286,6 +287,8 @@ func (t TokenType) String() string { return "TYPE" case ENUM: return "ENUM" + case UNDERSCORE: + return "UNDERSCORE" case TRY: return "TRY" case FAIL_KEYWORD: @@ -626,6 +629,13 @@ func (l *Lexer) literalOrKeywordType() TokenType { return VARSTORE } + // A lone `_` is the wildcard token, not an identifier — this is what + // reserves it as a name (it can't be an enum member, def, etc.) and keeps + // the checker and runtime from disagreeing about what `_` means. + if l.current-l.start == 1 && l.input[l.start] == '_' { + return UNDERSCORE + } + return LITERAL } diff --git a/mshell/TypeCheckProgram.go b/mshell/TypeCheckProgram.go index 27435172..fad59a2f 100644 --- a/mshell/TypeCheckProgram.go +++ b/mshell/TypeCheckProgram.go @@ -1362,7 +1362,7 @@ func (c *Checker) armPatternOf(subject TypeId, pattern []MShellParseItem) armPat case 2: t0, ok0 := pattern[0].(Token) t1, ok1 := pattern[1].(Token) - if !ok0 || !ok1 || t1.Type != LITERAL { + if !ok0 || !ok1 || (t1.Type != LITERAL && t1.Type != UNDERSCORE) { return out } if t0.Type == LITERAL && t0.Lexeme == "just" { @@ -1447,7 +1447,7 @@ func (c *Checker) enumMemberPattern(subject TypeId, pattern []MShellParseItem) ( } for i, b := range binds { bt, ok := b.(Token) - if !ok || bt.Type != LITERAL { + if !ok || (bt.Type != LITERAL && bt.Type != UNDERSCORE) { c.errors = append(c.errors, TypeError{ Kind: TErrInvalidMatchPattern, Pos: tok, @@ -1487,6 +1487,9 @@ func (c *Checker) analyzeTokenPattern(tok Token, out *armPattern) { case INTEGER, FLOAT, STRING, SINGLEQUOTESTRING, PATH: // Value literals: legal patterns, but they credit no coverage. out.Recognized = true + case UNDERSCORE: + out.Recognized = true + out.Tag = MatchArmTag{Kind: MatchArmWildcard} case LITERAL: switch tok.Lexeme { case "_": diff --git a/mshell/TypeChecker.go b/mshell/TypeChecker.go index 16759ede..f26694d8 100644 --- a/mshell/TypeChecker.go +++ b/mshell/TypeChecker.go @@ -384,7 +384,7 @@ func (c *Checker) checkOne(tok Token) { // values, and it's load-bearing for `[cmd args] ;`-style // pipelines where forcing the user to quote every word // would defeat the point. - if c.listDepth > 0 && tok.Type == LITERAL { + if c.listDepth > 0 && (tok.Type == LITERAL || tok.Type == UNDERSCORE) { c.stack.Push(TidStr) return } diff --git a/tests/success/underscore_argv.msh b/tests/success/underscore_argv.msh new file mode 100644 index 00000000..9f77ca26 --- /dev/null +++ b/tests/success/underscore_argv.msh @@ -0,0 +1,7 @@ +# A lone `_` is the match wildcard, but it remains usable as a bare argv word +# (the literal string "_") inside a list. The two roles coexist. +[echo _ arg _] ; + +5 match + _ : "wild" wl, +end diff --git a/tests/success/underscore_argv.msh.stdout b/tests/success/underscore_argv.msh.stdout new file mode 100644 index 00000000..518b4723 --- /dev/null +++ b/tests/success/underscore_argv.msh.stdout @@ -0,0 +1,2 @@ +_ arg _ +wild diff --git a/tests/typecheck_fail/enum_underscore_member.msh b/tests/typecheck_fail/enum_underscore_member.msh new file mode 100644 index 00000000..772e8dcc --- /dev/null +++ b/tests/typecheck_fail/enum_underscore_member.msh @@ -0,0 +1,10 @@ +# `_` is the wildcard token, reserved as a name, so it cannot be an enum +# member. (Previously it was accepted and then mis-dispatched at runtime, +# because the checker treated `_` as a member while the matcher treated it as +# the catch-all wildcard.) +enum C = _ | red end + +red match + _ : "u" wl, + red : "r" wl, +end From 80be0a9f4a9490e453a9a3ecd21630b106b261d2 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Mon, 29 Jun 2026 20:53:55 -0500 Subject: [PATCH 08/32] Enum: let payloads reference user `type` aliases MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Enum payload types were resolved in the pre-pass before `type` declarations were registered, so a payload referencing a `type` alias (a shape, union, etc.) failed with "unknown type" regardless of source order — only primitives and other enums worked. Reorder the type-check pre-pass: predeclare enum names, then register `type` declarations, then resolve enum payload bodies and constructors. Now an enum payload may reference any enum or `type` alias in either direction. Regression fixture: tests/success/enum_payload_typealias.msh. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr --- mshell/TypeCheckProgram.go | 19 +++++++++++-------- tests/success/enum_payload_typealias.msh | 19 +++++++++++++++++++ .../success/enum_payload_typealias.msh.stdout | 2 ++ 3 files changed, 32 insertions(+), 8 deletions(-) create mode 100644 tests/success/enum_payload_typealias.msh create mode 100644 tests/success/enum_payload_typealias.msh.stdout diff --git a/mshell/TypeCheckProgram.go b/mshell/TypeCheckProgram.go index fad59a2f..b293623d 100644 --- a/mshell/TypeCheckProgram.go +++ b/mshell/TypeCheckProgram.go @@ -94,10 +94,9 @@ func (c *Checker) RegisterStdlibSigs(defs []MShellDefinition) { // parse tree driving the type stack. Error accumulation lives on the // Checker. func (c *Checker) CheckProgram(file *MShellFile) { - // Pre-pass 0: register all `enum` declarations. Names are predeclared - // first so member payloads can reference any enum (including the enum - // itself) regardless of order; bodies and constructor words follow. Done - // before `type` decls so a `type` body may reference an enum by name. + // Pre-pass 0: predeclare every `enum` name with a placeholder type, so a + // `type` body (next) and an enum payload (after that) can reference any + // enum by name, in any order. var enumDecls []*MShellEnumDecl for _, item := range file.Items { if d, ok := item.(*MShellEnumDecl); ok { @@ -106,16 +105,20 @@ func (c *Checker) CheckProgram(file *MShellFile) { } } } - for _, d := range enumDecls { - c.defineEnum(d) - } - // Pre-pass 1: register all `type` declarations. + // Pre-pass 1: register all `type` declarations. Enum names are already + // available, so a `type` body may reference an enum. for _, item := range file.Items { if d, ok := item.(*MShellTypeDecl); ok { body := c.resolveTypeExpr(d.Body, nil) c.DeclareType(d.Name, body) } } + // Pre-pass 1b: resolve enum payload bodies and register constructor words. + // Both enum names and `type` aliases are now registered, so a payload may + // reference either. + for _, d := range enumDecls { + c.defineEnum(d) + } // Pre-pass 2: register all `def` signatures so call sites (and // recursive self-calls inside def bodies) can resolve them. defSigs := make([]QuoteSig, len(file.Definitions)) diff --git a/tests/success/enum_payload_typealias.msh b/tests/success/enum_payload_typealias.msh new file mode 100644 index 00000000..58c1bbef --- /dev/null +++ b/tests/success/enum_payload_typealias.msh @@ -0,0 +1,19 @@ +# An enum payload may reference a user `type` alias (here a shape), in either +# declaration order. Previously this failed with "unknown type" because enum +# payloads were resolved before `type` declarations were registered. +type Person = {name: str, age: int} +enum Record = person Person | empty end + +{ "name": "Ada", "age": 36 } as Person person match + person p : @p :name? wl, + empty : "empty" wl, +end + +# Order-independent: enum declared before the type alias it uses. +enum Cell = num Count | blank end +type Count = int + +5 as Count num match + num n : @n str wl, + blank : "blank" wl, +end diff --git a/tests/success/enum_payload_typealias.msh.stdout b/tests/success/enum_payload_typealias.msh.stdout new file mode 100644 index 00000000..1a34b8e4 --- /dev/null +++ b/tests/success/enum_payload_typealias.msh.stdout @@ -0,0 +1,2 @@ +Ada +5 From 67b3f55a2f7ee572c687797e121cf2f078ffe796 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Mon, 29 Jun 2026 21:08:35 -0500 Subject: [PATCH 09/32] Fix uniq to accept any value type, not just primitives MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit uniq is typed `([t] -- [t])` (generic) but its runtime only deduplicated a fixed set of primitives and threw for anything else — so `[enum] uniq`, `[dict] uniq`, `[bool] uniq` type-checked but failed at runtime, against the type system's goal of catching such errors statically (or not having them). Deduplicate any value without a fast hash path by structural equality (`Equals`), keeping the primitive fast paths. Enums, dicts, and booleans now dedupe correctly; values whose equality is undefined (lists) are kept rather than erroring, consistent with `=` on them. Regression fixture: tests/success/uniq_enum.msh. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr --- CHANGELOG.md | 6 ++++++ mshell/Evaluator.go | 20 ++++++++++++++++++-- tests/success/uniq_enum.msh | 9 +++++++++ tests/success/uniq_enum.msh.stdout | 2 ++ 4 files changed, 35 insertions(+), 2 deletions(-) create mode 100644 tests/success/uniq_enum.msh create mode 100644 tests/success/uniq_enum.msh.stdout diff --git a/CHANGELOG.md b/CHANGELOG.md index cbc3eac8..822dcf49 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## Unreleased +### Fixed + +- `uniq` now accepts a list of any value type (matching its `([t] -- [t])` + signature) and deduplicates by structural equality, instead of throwing at + runtime for non-primitive elements such as enums, dicts, and booleans. + ### Added - `enum` declarations: a generative tagged sum type. Members are separated by diff --git a/mshell/Evaluator.go b/mshell/Evaluator.go index 6cc6f063..58dd6e38 100644 --- a/mshell/Evaluator.go +++ b/mshell/Evaluator.go @@ -9648,7 +9648,7 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu floatsSeen := make(map[float64]any) dateTimesSeen := make(map[time.Time]any) - for i, item := range listObj.Items { + for _, item := range listObj.Items { switch itemTyped := item.(type) { case MShellString: strItem := itemTyped @@ -9689,7 +9689,23 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu stringsSeen[literalItem.LiteralText] = nil } default: - return state.FailWithMessage(fmt.Sprintf("%d:%d: Cannot remove duplicates from a list with a %s at index %d (%s).\n", t.Line, t.Column, item.TypeName(), i, item.DebugString())) + // Any value without a fast hash path (enum, list, + // dict, bool, bytes, ...) is deduplicated by + // structural equality against the values kept so + // far. O(n^2) for these, but it lets `uniq` accept + // every value type, matching its `([t] -- [t])` + // signature so the type checker never accepts a + // `uniq` that fails at runtime. + seen := false + for _, kept := range newList.Items { + if eq, _ := item.Equals(kept); eq { + seen = true + break + } + } + if !seen { + newList.Items = append(newList.Items, item) + } } } diff --git a/tests/success/uniq_enum.msh b/tests/success/uniq_enum.msh new file mode 100644 index 00000000..6d4fe954 --- /dev/null +++ b/tests/success/uniq_enum.msh @@ -0,0 +1,9 @@ +# `uniq` accepts any value type (matching its `([t] -- [t])` signature) and +# deduplicates by structural equality, so a list of enums dedupes instead of +# throwing at runtime. First-occurrence order is preserved. +enum C = red | green | blue end + +[red green red blue green red] uniq (str) map "," join wl + +# Other equatable values dedupe too (previously a runtime error). +[true false true true] uniq len str wl diff --git a/tests/success/uniq_enum.msh.stdout b/tests/success/uniq_enum.msh.stdout new file mode 100644 index 00000000..41966831 --- /dev/null +++ b/tests/success/uniq_enum.msh.stdout @@ -0,0 +1,2 @@ +red,green,blue +2 From bdd26bd5f889de08236a933aa83ebee186ba14f0 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Mon, 29 Jun 2026 21:28:48 -0500 Subject: [PATCH 10/32] Make equality total and structural for all value types MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Equality was undefined (always errored) for lists, quotations, pipes, and grids, and several types errored on a type mismatch while others returned false — so equality could depend on operand order, and `[1] [1] =` threw. Define `Equals` for every type and make it total: - list / pipe: structural, element-wise (recursive) - grid / gridview / gridrow: structural, cell-wise (by materialized rows) - quotation: identity - str / path / literal: compare by text content (symmetric) - a type mismatch yields false rather than erroring, so equality is order-independent and union members (e.g. int | null) compare cleanly. Genuinely incompatible comparisons remain a static type error, caught by the checker before runtime. uniq now dedupes lists/grids too via this equality. Regression fixture: tests/success/equality.msh. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr --- CHANGELOG.md | 7 ++ mshell/MShellObject.go | 110 +++++++++++++++++++++++------- tests/success/equality.msh | 33 +++++++++ tests/success/equality.msh.stdout | 14 ++++ 4 files changed, 139 insertions(+), 25 deletions(-) create mode 100644 tests/success/equality.msh create mode 100644 tests/success/equality.msh.stdout diff --git a/CHANGELOG.md b/CHANGELOG.md index 822dcf49..c1214221 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed +- Equality (`=`) is now total and defined for every value type. Lists, + quotations, pipes, and grids previously raised "equality not defined" at + runtime; lists/pipes/grids now compare structurally (element- and cell-wise) + and quotations compare by identity. Comparing values of different runtime + types yields `false` rather than an error (a genuinely incompatible + comparison is already a static type error), so the result no longer depends + on operand order and union members like `int | null` compare cleanly. - `uniq` now accepts a list of any value type (matching its `([t] -- [t])` signature) and deduplicates by structural equality, instead of throwing at runtime for non-primitive elements such as enums, dicts, and booleans. diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index c4432f6e..b979a188 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -160,7 +160,7 @@ func (b MShellBinary) Equals(other MShellObject) (bool, error) { return true, nil } } - return false, fmt.Errorf("Cannot compare Binary with %s.\n", other.TypeName()) + return false, nil } func (b MShellBinary) CastString() (string, error) { @@ -2074,62 +2074,87 @@ func (obj MShellLiteral) Equals(other MShellObject) (bool, error) { case MShellPath: return obj.LiteralText == o.Path, nil default: - return false, fmt.Errorf("Cannot compare a literal with a %s.\n", other.TypeName()) + return false, nil } } func (obj MShellBool) Equals(other MShellObject) (bool, error) { asBool, ok := other.(MShellBool) if !ok { - return false, fmt.Errorf("Cannot compare a boolean with a %s.\n", other.TypeName()) + return false, nil } return obj.Value == asBool.Value, nil } +// itemsEqual compares two object slices element-wise by structural equality. +func itemsEqual(a, b []MShellObject) (bool, error) { + if len(a) != len(b) { + return false, nil + } + for i := range a { + eq, err := a[i].Equals(b[i]) + if err != nil || !eq { + return eq, err + } + } + return true, nil +} + func (obj *MShellQuotation) Equals(other MShellObject) (bool, error) { - return false, fmt.Errorf("Equality currently not defined for quotations.\n") + // Quotations are code values; two are equal only when they are the same + // quotation object (reference identity). + o, ok := other.(*MShellQuotation) + return ok && obj == o, nil } func (obj *MShellList) Equals(other MShellObject) (bool, error) { - return false, fmt.Errorf("Equality currently not defined for lists.\n") + o, ok := other.(*MShellList) + if !ok { + return false, nil + } + return itemsEqual(obj.Items, o.Items) } func (obj MShellString) Equals(other MShellObject) (bool, error) { - // Define equality for other as string or as literal. - switch other.(type) { + // str/path/literal compare by their text content (the `=` overloads + // permit str/path comparison); any other type is simply not equal. + switch o := other.(type) { case MShellString: - asString, _ := other.(MShellString) - return obj.Content == asString.Content, nil + return obj.Content == o.Content, nil case MShellLiteral: - asLiteral, _ := other.(MShellLiteral) - return obj.Content == asLiteral.LiteralText, nil + return obj.Content == o.LiteralText, nil + case MShellPath: + return obj.Content == o.Path, nil default: - return false, fmt.Errorf("Cannot compare a string with a %s.\n", other.TypeName()) + return false, nil } } func (obj MShellPath) Equals(other MShellObject) (bool, error) { - // Define equality for other as string or as literal. - switch other.(type) { + switch o := other.(type) { case MShellPath: - asPath, _ := other.(MShellPath) - return obj.Path == asPath.Path, nil + return obj.Path == o.Path, nil case MShellLiteral: - asLiteral, _ := other.(MShellLiteral) - return obj.Path == asLiteral.LiteralText, nil + return obj.Path == o.LiteralText, nil + case MShellString: + return obj.Path == o.Content, nil default: - return false, fmt.Errorf("Cannot compare a path with a %s.\n", other.TypeName()) + return false, nil } } func (obj *MShellPipe) Equals(other MShellObject) (bool, error) { - return false, fmt.Errorf("Equality currently not defined for pipes.\n") + o, ok := other.(*MShellPipe) + if !ok { + return false, nil + } + return itemsEqual(obj.List.Items, o.List.Items) } func (obj MShellInt) Equals(other MShellObject) (bool, error) { asInt, ok := other.(MShellInt) if !ok { - return false, fmt.Errorf("Cannot compare an integer with a %s.\n", other.TypeName()) + return false, nil } return obj.Value == asInt.Value, nil } @@ -2137,7 +2162,7 @@ func (obj MShellInt) Equals(other MShellObject) (bool, error) { func (obj MShellFloat) Equals(other MShellObject) (bool, error) { asFloat, ok := other.(MShellFloat) if !ok { - return false, fmt.Errorf("Cannot compare a float with a %s.\n", other.TypeName()) + return false, nil } return obj.Value == asFloat.Value, nil } @@ -2493,7 +2518,25 @@ func (g *MShellGrid) Concat(other MShellObject) (MShellObject, error) { } func (g *MShellGrid) Equals(other MShellObject) (bool, error) { - return false, fmt.Errorf("Equality currently not defined for grids.\n") + o, ok := other.(*MShellGrid) + if !ok { + return false, nil + } + if g.RowCount != o.RowCount || len(g.Columns) != len(o.Columns) { + return false, nil + } + for i, col := range g.Columns { + if col.Name != o.Columns[i].Name { + return false, nil + } + } + for i := 0; i < g.RowCount; i++ { + eq, err := g.GetRow(i).ToDict().Equals(o.GetRow(i).ToDict()) + if err != nil || !eq { + return eq, err + } + } + return true, nil } func (g *MShellGrid) CastString() (string, error) { @@ -2609,7 +2652,20 @@ func (v *MShellGridView) Concat(other MShellObject) (MShellObject, error) { } func (v *MShellGridView) Equals(other MShellObject) (bool, error) { - return false, fmt.Errorf("Equality currently not defined for grid views.\n") + o, ok := other.(*MShellGridView) + if !ok { + return false, nil + } + if len(v.Indices) != len(o.Indices) { + return false, nil + } + for i := range v.Indices { + eq, err := v.GetRow(i).ToDict().Equals(o.GetRow(i).ToDict()) + if err != nil || !eq { + return eq, err + } + } + return true, nil } func (v *MShellGridView) CastString() (string, error) { @@ -2718,7 +2774,11 @@ func (r *MShellGridRow) Concat(other MShellObject) (MShellObject, error) { } func (r *MShellGridRow) Equals(other MShellObject) (bool, error) { - return false, fmt.Errorf("Equality currently not defined for grid rows.\n") + o, ok := other.(*MShellGridRow) + if !ok { + return false, nil + } + return r.ToDict().Equals(o.ToDict()) } func (r *MShellGridRow) CastString() (string, error) { diff --git a/tests/success/equality.msh b/tests/success/equality.msh new file mode 100644 index 00000000..9259ebf1 --- /dev/null +++ b/tests/success/equality.msh @@ -0,0 +1,33 @@ +# Equality is total and structural for every value type: containers compare +# element-wise, and at runtime a type mismatch yields false rather than an +# error (comparing two different concrete types is itself a static type error, +# caught by the checker, so this matters for unions and unchecked code). + +# Lists (structural, nested, length-sensitive) +[1 2 3] [1 2 3] = str wl +[1 2 3] [1 2 4] = str wl +[[1] [2]] [[1] [2]] = str wl +[1 2] [1 2 3] = str wl + +# str / path / literal compare by text content +"foo" `foo` = str wl + +# Dicts compare structurally, independent of key order +{ "a": 1, "b": 2 } { "b": 2, "a": 1 } = str wl + +# Maybe +"x" just "x" just = str wl +none none = str wl + +# Enums +enum C = red | green end +red red = str wl +red green = str wl + +# uniq now deduplicates any equatable value (lists, enums, ...) +[[1] [1] [2]] uniq len str wl +[red green red green] uniq len str wl + +# Quotations compare by identity +(1 +) dup = str wl +(1 +) (1 +) = str wl diff --git a/tests/success/equality.msh.stdout b/tests/success/equality.msh.stdout new file mode 100644 index 00000000..13d6400c --- /dev/null +++ b/tests/success/equality.msh.stdout @@ -0,0 +1,14 @@ +true +false +true +false +true +true +false +false +true +false +2 +2 +true +false From 6ce234fd76afc16714e7dd6ff3ea5506dc92c992 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Tue, 30 Jun 2026 18:06:54 -0500 Subject: [PATCH 11/32] Fix gridSetCell silently dropping type-mismatched values A grid column uses typed columnar storage (int/float/string/datetime/generic). GridColumn.Set only stored a value when its Go type matched the column type; for any other type it was a silent no-op, so setting a string, enum, bool, or list into a typed column type-checked (gridSetCell is `(Grid str int t -- Grid)`) but left the cell unchanged. Promote the column to generic storage on a type mismatch: materialize the existing typed data, switch to COL_GENERIC, then store the value. Matching-type sets keep the fast typed path, and other rows are preserved. Regression fixture: tests/success/grid_set_cell_mixed.msh. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01BAoaBtTQdsLLfYTyfexcVr --- CHANGELOG.md | 4 +++ mshell/MShellObject.go | 37 ++++++++++++++++++-- tests/success/grid_set_cell_mixed.msh | 11 ++++++ tests/success/grid_set_cell_mixed.msh.stdout | 3 ++ 4 files changed, 52 insertions(+), 3 deletions(-) create mode 100644 tests/success/grid_set_cell_mixed.msh create mode 100644 tests/success/grid_set_cell_mixed.msh.stdout diff --git a/CHANGELOG.md b/CHANGELOG.md index c1214221..f702d156 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed +- `gridSetCell` no longer silently drops a value whose type differs from the + column's original type (e.g. setting a string, enum, or bool into an int + column). The column is promoted to mixed storage so the value is stored, and + the other rows are preserved. - Equality (`=`) is now total and defined for every value type. Lists, quotations, pipes, and grids previously raised "equality not defined" at runtime; lists/pipes/grids now compare structurally (element- and cell-wise) diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index b979a188..d6ce265d 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -2295,28 +2295,59 @@ func (col *GridColumn) Get(index int) MShellObject { } } -// Set sets the value at the given row index +// Set sets the value at the given row index. If a typed column is given a value +// of a different type, the column is promoted to generic storage so the value +// is stored rather than silently dropped. func (col *GridColumn) Set(index int, value MShellObject) { switch col.ColType { case COL_INT: if intVal, ok := value.(MShellInt); ok { col.IntData[index] = int64(intVal.Value) + return } case COL_FLOAT: if floatVal, ok := value.(MShellFloat); ok { col.FloatData[index] = floatVal.Value + return } case COL_STRING: if strVal, ok := value.(MShellString); ok { col.StringData[index] = strVal.Content + return } case COL_DATETIME: if dtVal, ok := value.(*MShellDateTime); ok { col.DateTimeData[index] = dtVal.Time + return } - default: + case COL_GENERIC: col.GenericData[index] = value - } + return + } + // Typed column received a value of a different type: promote the whole + // column to generic storage, then store the value. + col.promoteToGeneric() + col.GenericData[index] = value +} + +// promoteToGeneric materializes a typed column's data into generic storage so +// the column can hold values of any type. It is a no-op for an already-generic +// column. +func (col *GridColumn) promoteToGeneric() { + if col.ColType == COL_GENERIC { + return + } + n := col.Len() + generic := make([]MShellObject, n) + for i := 0; i < n; i++ { + generic[i] = col.Get(i) + } + col.ColType = COL_GENERIC + col.GenericData = generic + col.IntData = nil + col.FloatData = nil + col.StringData = nil + col.DateTimeData = nil } // Len returns the number of rows in the column diff --git a/tests/success/grid_set_cell_mixed.msh b/tests/success/grid_set_cell_mixed.msh new file mode 100644 index 00000000..59a5dcef --- /dev/null +++ b/tests/success/grid_set_cell_mixed.msh @@ -0,0 +1,11 @@ +# gridSetCell stores the value even when its type differs from the column's +# original type: the column is promoted to mixed (generic) storage rather than +# silently dropping the value, and the other rows are preserved. +enum C = red | green end + +# int column, row 0 set to an enum; row 1 (int 2) is preserved. +[| "c" ; 1 ; 2 |] "c" 0 red gridSetCell toJson wl + +# int column receiving a string, and a string column receiving an int. +[| "n" ; 1 |] "n" 0 "X" gridSetCell toJson wl +[| "s" ; "a" |] "s" 0 99 gridSetCell toJson wl diff --git a/tests/success/grid_set_cell_mixed.msh.stdout b/tests/success/grid_set_cell_mixed.msh.stdout new file mode 100644 index 00000000..dd332a96 --- /dev/null +++ b/tests/success/grid_set_cell_mixed.msh.stdout @@ -0,0 +1,3 @@ +[{"c": "red"}, {"c": 2}] +[{"n": "X"}] +[{"s": 99}] From a4b24a0c1ecfdf6f0cd063e4fd40fab8088c8857 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Tue, 30 Jun 2026 20:15:14 -0500 Subject: [PATCH 12/32] Enum robustness: stack-safe deep values, union match, structural sort Adversarial testing of the new enums surfaced four issues, fixed here: - toJson and Equals on a deeply nested enum overflowed the Go stack (unlike str, which already used an explicit work stack). Both now walk payloads iteratively, so arbitrarily deep values are safe. Equality also no longer re-enters itself for enum-vs-non-enum pairs. - An enum used inside a `type` union (`type T = C | int`) type-checked when matched by the enum's type name, but had no runtime implementation, so it always failed with "No matching arm found". The runtime now treats a bare enum type name as a type-test arm matching any member of that enum, and falls through cleanly for other union members. - `sort` replaced every element with its string form, silently dropping enum payloads and changing element types (a list of ints came back as lexically-sorted strings). It now sorts and preserves the original objects using a total structural order: numbers numerically, text lexically, lists positionally, dicts by sorted key/value, enums by declaration order then payload, and different types by a fixed type rank. compareValues is iterative, so sorting deeply nested values is stack-safe too. sortV keeps its string-key behavior but now preserves elements. Docs, CHANGELOG, and regression tests (deep json/equals/sort, union match, structural sort) updated. All suites green: tests 213, typecheck 196, go test. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- CHANGELOG.md | 8 + doc/mshell.md | 18 +- doc/type_system.inc.html | 13 + mshell/Evaluator.go | 70 +++-- mshell/MShellObject.go | 363 +++++++++++++++++++--- tests/success/enum_deep_equals.msh | 17 + tests/success/enum_deep_equals.msh.stdout | 2 + tests/success/enum_deep_json.msh | 14 + tests/success/enum_deep_json.msh.stdout | 1 + tests/success/enum_deep_sort.msh | 17 + tests/success/enum_deep_sort.msh.stdout | 2 + tests/success/enum_union_match.msh | 31 ++ tests/success/enum_union_match.msh.stdout | 4 + tests/success/sort_structural.msh | 23 ++ tests/success/sort_structural.msh.stdout | 6 + tests/success/sort_test.msh | 4 +- 16 files changed, 533 insertions(+), 60 deletions(-) create mode 100644 tests/success/enum_deep_equals.msh create mode 100644 tests/success/enum_deep_equals.msh.stdout create mode 100644 tests/success/enum_deep_json.msh create mode 100644 tests/success/enum_deep_json.msh.stdout create mode 100644 tests/success/enum_deep_sort.msh create mode 100644 tests/success/enum_deep_sort.msh.stdout create mode 100644 tests/success/enum_union_match.msh create mode 100644 tests/success/enum_union_match.msh.stdout create mode 100644 tests/success/sort_structural.msh create mode 100644 tests/success/sort_structural.msh.stdout diff --git a/CHANGELOG.md b/CHANGELOG.md index f702d156..fef640ae 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -23,6 +23,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - `uniq` now accepts a list of any value type (matching its `([t] -- [t])` signature) and deduplicates by structural equality, instead of throwing at runtime for non-primitive elements such as enums, dicts, and booleans. +- `sort` now reorders the original elements and preserves their type, instead of + replacing every element with its string form. Previously `[10 2 1] sort` gave + the strings `1 10 2` (lexical order), and sorting a list of enums silently + dropped their payloads; now numbers sort numerically and stay numbers, and + every value keeps its type. Ordering is a total structural order: numbers + numerically, text lexically, lists positionally, dicts by sorted key/value, + enums by declaration order then payload, and different types by a fixed type + rank. (Use `sortV` for version/string sorting.) ### Added diff --git a/doc/mshell.md b/doc/mshell.md index a719c940..9e6e8925 100644 --- a/doc/mshell.md +++ b/doc/mshell.md @@ -661,6 +661,20 @@ result match end ``` +An enum may also be a member of a `type` union (e.g. `type T = Color | int`). +A `match` on such a union discriminates it with the enum's *type name* as an arm, +which matches any value of that enum: + +``` +enum Color = red | green | blue end +type T = Color | int + +x match + Color : "a color" wl, + int : "an int" wl, +end +``` + ## Definitions Definitions use `def` with an optional metadata dictionary before the type signature. @@ -1267,8 +1281,8 @@ groupBy ## Sorting -- `sort`: Sort list. Converts all items to strings, then sorts using go's `sort.Strings` `(list -- list)` -- `sortV`: Version sort list. Converts all items to strings, then sorts like GNU `sort -V` (`list -- list`) +- `sort`: Sort a list by a total structural order, preserving each element's type (numbers sort numerically and stay numbers; a list of enums keeps its payloads). The order is: numbers numerically, text (str/path/literal) lexically, dates chronologically, bytes bytewise, lists positionally, dicts by sorted key then value, enums by declaration order then payload, and values of different types by a fixed type rank. `([t] -- [t])` +- `sortV`: Version sort list. Converts each item to a string, then sorts like GNU `sort -V`, keeping the original elements. `([t] -- [t])` - `sortBy`: Sort a Grid or GridView by one or more columns ascending. Spec is a column name (str) or list of column names ([str]); priority is left-to-right. Stable; `none` cells sort last; cross-type values in a generic column error. Compose with `reverse` for descending. `(Grid|GridView str|[str] -- Grid)` - `sortByCmp`: Sort a list, Grid, or GridView using a comparison function. The function/quotation receives two items (or two `GridRow`s) and should return -1 when a < b, 0 when a = b, or 1 when a > b. Stable. `[a] (a a -- int) -- [a]` / `(Grid|GridView (GridRow GridRow -- int) -- Grid)` - `reverse`: Reverse a list, Grid, or GridView, returning a new value with elements/rows in reverse order. `(list -- list)` / `(Grid|GridView -- Grid)` diff --git a/doc/type_system.inc.html b/doc/type_system.inc.html index ac76e539..1f989543 100644 --- a/doc/type_system.inc.html +++ b/doc/type_system.inc.html @@ -390,6 +390,19 @@

    Enums +

    Quotations § Back to top

    diff --git a/mshell/Evaluator.go b/mshell/Evaluator.go index 58dd6e38..4cabd9c4 100644 --- a/mshell/Evaluator.go +++ b/mshell/Evaluator.go @@ -331,6 +331,11 @@ type EvalState struct { // are unique across enums, so this flat lookup is enough to construct a // value from a bare member word, including consuming its payload. EnumMembers map[string]EnumMemberInfo + + // EnumTypeNames is the set of declared enum type names. A bare enum type + // name is a valid match-arm pattern (a type test that any member of that + // enum satisfies), e.g. matching a `C | int` union value against `C`. + EnumTypeNames map[string]bool } // EnumMemberInfo records where a member came from and how many payload values @@ -338,6 +343,10 @@ type EvalState struct { type EnumMemberInfo struct { EnumName string Arity int + // Ordinal is the member's 0-based position in its enum declaration, stamped + // onto constructed values (MShellEnum.MemberIndex) so sorting can order by + // declaration order. + Ordinal int } // RegisterEnums scans parse items for `enum` declarations and records each @@ -353,15 +362,24 @@ func (state *EvalState) RegisterEnums(items []MShellParseItem) { if state.EnumMembers == nil { state.EnumMembers = make(map[string]EnumMemberInfo) } + if state.EnumTypeNames == nil { + state.EnumTypeNames = make(map[string]bool) + } + state.EnumTypeNames[d.Name] = true for i, m := range d.Members { if _, exists := state.EnumMembers[m]; exists { continue } - state.EnumMembers[m] = EnumMemberInfo{EnumName: d.Name, Arity: len(d.MemberPayloads[i])} + state.EnumMembers[m] = EnumMemberInfo{EnumName: d.Name, Arity: len(d.MemberPayloads[i]), Ordinal: i} } } } +// isEnumTypeName reports whether name is a declared enum type name. +func (state *EvalState) isEnumTypeName(name string) bool { + return state.EnumTypeNames != nil && state.EnumTypeNames[name] +} + // RebuildDefinitionIndex records the first index for each name, matching // the front-to-back, first-match-wins behavior of the linear scan it replaces. func (state *EvalState) RebuildDefinitionIndex(definitions []MShellDefinition) { @@ -1086,26 +1104,35 @@ func (state *EvalState) processMatchBlock(matchBlock *MShellParseMatchBlock, fra // matchPattern checks if a subject matches a pattern (list of parse items). // Returns (matched bool, bindings map, result EvalResult). func (state *EvalState) matchPattern(pattern []MShellParseItem, subject MShellObject, startToken Token) (bool, map[string]MShellObject, EvalResult) { - // Enum constructor pattern: `member` or `member b1 b2 ...`. Only a member - // name matches an enum value (a sibling member just fails this arm and the - // next is tried); `_` and `none` fall through to the generic handling. + // Enum patterns against an enum value. A member name (`member` or + // `member b1 b2 ...`) matches that member and binds its payload; a sibling + // member just fails this arm and the next is tried. A bare enum *type name* + // (`C`) is a type-test arm that matches any member of that enum — this is + // how a `C | int` union value is discriminated. `_` and `none` fall through + // to the generic handling. if enumVal, ok := subject.(*MShellEnum); ok && len(pattern) >= 1 { if tok, okTok := pattern[0].(Token); okTok && tok.Type == LITERAL && tok.Lexeme != "_" && tok.Lexeme != "none" { - if tok.Lexeme != enumVal.Member { - return false, nil, SimpleSuccess() - } - binds := pattern[1:] - if len(binds) != len(enumVal.Payload) { - return false, nil, state.FailWithMessage(fmt.Sprintf("%d:%d: enum member '%s' binds %d payload value(s), got %d.\n", - tok.Line, tok.Column, tok.Lexeme, len(enumVal.Payload), len(binds))) - } - bindings := make(map[string]MShellObject) - for i, b := range binds { - if bt, okBt := b.(Token); okBt && bt.Lexeme != "_" { - bindings[bt.Lexeme] = enumVal.Payload[i] + if tok.Lexeme == enumVal.Member { + binds := pattern[1:] + if len(binds) != len(enumVal.Payload) { + return false, nil, state.FailWithMessage(fmt.Sprintf("%d:%d: enum member '%s' binds %d payload value(s), got %d.\n", + tok.Line, tok.Column, tok.Lexeme, len(enumVal.Payload), len(binds))) } + bindings := make(map[string]MShellObject) + for i, b := range binds { + if bt, okBt := b.(Token); okBt && bt.Lexeme != "_" { + bindings[bt.Lexeme] = enumVal.Payload[i] + } + } + return true, bindings, SimpleSuccess() } - return true, bindings, SimpleSuccess() + // Not this value's member. A single bare enum type name is a type + // test: it matches iff the value belongs to that enum. + if len(pattern) == 1 && state.isEnumTypeName(tok.Lexeme) { + return tok.Lexeme == enumVal.EnumName, nil, SimpleSuccess() + } + // A sibling member name (or any other literal): this arm fails. + return false, nil, SimpleSuccess() } } @@ -1227,6 +1254,13 @@ func (state *EvalState) matchTokenPattern(p Token, subject MShellObject) (bool, _, ok := subject.(MShellBinary) return ok, SimpleSuccess() } + // A bare enum type name is a type test: it matches an enum value of + // that enum, and simply fails (try the next arm) for any other value. + // This lets a union like `C | int` be discriminated by the arm `C`. + if state.isEnumTypeName(p.Lexeme) { + en, ok := subject.(*MShellEnum) + return ok && en.EnumName == p.Lexeme, SimpleSuccess() + } return false, state.FailWithMessage(fmt.Sprintf("%d:%d: Unknown match pattern literal '%s'. %s\n", p.Line, p.Column, p.Lexeme, matchPatternFormsHint)) case TYPEINT: @@ -5855,7 +5889,7 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu payload[i], _ = stack.Pop() } } - stack.Push(&MShellEnum{EnumName: info.EnumName, Member: t.Lexeme, Payload: payload}) + stack.Push(&MShellEnum{EnumName: info.EnumName, Member: t.Lexeme, MemberIndex: info.Ordinal, Payload: payload}) return SimpleSuccess() } } diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index d6ce265d..098b72b7 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -351,7 +351,12 @@ func (n MShellNull) CastString() (string, error) { type MShellEnum struct { EnumName string Member string - Payload []MShellObject + // MemberIndex is the member's 0-based position in its enum declaration. + // Sorting orders enum values by this (declaration order) rather than by + // member name, so an ordered enum (`low | medium | high`) sorts in the + // author's intended order. Stamped at construction from the enum registry. + MemberIndex int + Payload []MShellObject } func (e *MShellEnum) TypeName() string { return e.EnumName } @@ -426,19 +431,58 @@ func (e *MShellEnum) Slice(startInc int, endExc int) (MShellObject, error) { // ToJson uses serde's externally-tagged convention — the de-facto standard for // tagged unions in JSON: a nullary member is the bare member string; a member // with a single payload is `{"member": value}`; with several, `{"member": -// [v0, v1, ...]}`. +// [v0, v1, ...]}`. Like enumRender, nested enum payloads are expanded with an +// explicit work stack rather than function recursion, so an arbitrarily deep +// value cannot overflow the call stack; output is appended to a single builder +// (no intermediate per-subtree strings), making it O(total output size). +// Non-enum payloads delegate to their own ToJson. func (e *MShellEnum) ToJson() string { - if len(e.Payload) == 0 { - return fmt.Sprintf("%q", e.Member) - } - if len(e.Payload) == 1 { - return fmt.Sprintf("{%q: %s}", e.Member, e.Payload[0].ToJson()) + var sb strings.Builder + type task struct { + lit string + obj MShellObject + isLit bool } - parts := make([]string, len(e.Payload)) - for i, p := range e.Payload { - parts[i] = p.ToJson() + stack := []task{{obj: e}} + for len(stack) > 0 { + t := stack[len(stack)-1] + stack = stack[:len(stack)-1] + if t.isLit { + sb.WriteString(t.lit) + continue + } + en, ok := t.obj.(*MShellEnum) + if !ok { + sb.WriteString(t.obj.ToJson()) + continue + } + if len(en.Payload) == 0 { + fmt.Fprintf(&sb, "%q", en.Member) + continue + } + // Emit `{"member": value}` (single payload) or + // `{"member": [v0, v1, ...]}` (several); push reversed so it pops in + // order, with enum payloads re-expanded by this same loop. + seq := make([]task, 0, len(en.Payload)*2+4) + seq = append(seq, task{lit: fmt.Sprintf("{%q: ", en.Member), isLit: true}) + if len(en.Payload) == 1 { + seq = append(seq, task{obj: en.Payload[0]}) + } else { + seq = append(seq, task{lit: "[", isLit: true}) + for i, p := range en.Payload { + if i > 0 { + seq = append(seq, task{lit: ", ", isLit: true}) + } + seq = append(seq, task{obj: p}) + } + seq = append(seq, task{lit: "]", isLit: true}) + } + seq = append(seq, task{lit: "}", isLit: true}) + for i := len(seq) - 1; i >= 0; i-- { + stack = append(stack, seq[i]) + } } - return fmt.Sprintf("{%q: [%s]}", e.Member, strings.Join(parts, ", ")) + return sb.String() } func (e *MShellEnum) ToString() string { return enumRender(e) } @@ -448,13 +492,32 @@ func (e *MShellEnum) Concat(other MShellObject) (MShellObject, error) { return nil, fmt.Errorf("Cannot concatenate an enum.\n") } +// Equals compares two enum values structurally. Nested enum payloads are +// walked with an explicit pair stack rather than function recursion, so two +// arbitrarily deep values cannot overflow the call stack; only non-enum +// payloads (the leaves) delegate to their own Equals. func (e *MShellEnum) Equals(other MShellObject) (bool, error) { - o, ok := other.(*MShellEnum) - if !ok || e.EnumName != o.EnumName || e.Member != o.Member || len(e.Payload) != len(o.Payload) { - return false, nil - } - for i := range e.Payload { - eq, err := e.Payload[i].Equals(o.Payload[i]) + type pair struct{ a, b MShellObject } + stack := []pair{{a: e, b: other}} + for len(stack) > 0 { + p := stack[len(stack)-1] + stack = stack[:len(stack)-1] + ea, aok := p.a.(*MShellEnum) + eb, bok := p.b.(*MShellEnum) + if aok || bok { + // At least one side is an enum: equal only if both are enums with + // the same name, member, and arity. Payloads are deferred onto the + // stack so this never re-enters Equals on an enum. + if !aok || !bok || ea.EnumName != eb.EnumName || ea.Member != eb.Member || len(ea.Payload) != len(eb.Payload) { + return false, nil + } + for i := range ea.Payload { + stack = append(stack, pair{a: ea.Payload[i], b: eb.Payload[i]}) + } + continue + } + // Neither side is an enum: compare by their own equality. + eq, err := p.a.Equals(p.b) if err != nil || !eq { return false, err } @@ -927,46 +990,268 @@ func NewList(initLength int) *MShellList { } // Sort the list. Returns an error if any item cannot be cast to a string. -func SortList(list *MShellList) (*MShellList, error) { - stringsToSort := make([]string, len(list.Items)) - for i, item := range list.Items { - str, err := item.CastString() - if err != nil { - return nil, fmt.Errorf("Cannot sort a list with a %s inside (%s).\n", item.TypeName(), item.DebugString()) - } - stringsToSort[i] = str +// valueTypeRank assigns each value kind a fixed slot in the cross-type sort +// order, so a list mixing types still sorts totally and deterministically. The +// exact sequence is arbitrary but stable; within a rank, compareValues uses the +// value's natural order. Text kinds (str/path/literal) share a rank and compare +// by content, matching structural equality. +func valueTypeRank(obj MShellObject) int { + switch obj.(type) { + case MShellNull: + return 0 + case MShellBool: + return 1 + case MShellInt, MShellFloat: + return 2 + case MShellString, MShellPath, MShellLiteral: + return 3 + case *MShellDateTime: + return 4 + case MShellBinary: + return 5 + case Maybe, *Maybe: + return 6 + case *MShellList: + return 7 + case *MShellDict: + return 8 + case *MShellEnum: + return 9 + default: + return 10 } +} - // Sort the strings - sort.Strings(stringsToSort) +func cmpInt(a, b int) int { + if a < b { + return -1 + } + if a > b { + return 1 + } + return 0 +} - // Create a new list and add the sorted strings to it - newList := NewList(0) - for _, str := range stringsToSort { - newList.Items = append(newList.Items, MShellString{str}) +func cmpFloat(a, b float64) int { + if a < b { + return -1 + } + if a > b { + return 1 + } + return 0 +} + +// numericFloat returns an int/float value as a float64 for cross-type numeric +// comparison. Only called for MShellInt / MShellFloat. +func numericFloat(obj MShellObject) float64 { + switch v := obj.(type) { + case MShellInt: + return float64(v.Value) + case MShellFloat: + return v.Value + } + return 0 +} + +// textContent returns the underlying string of a text-kind value +// (str / path / literal). Only called for those types. +func textContent(obj MShellObject) string { + switch v := obj.(type) { + case MShellString: + return v.Content + case MShellPath: + return v.Path + case MShellLiteral: + return v.LiteralText + } + return "" +} + +func asMaybe(obj MShellObject) (Maybe, bool) { + switch v := obj.(type) { + case Maybe: + return v, true + case *Maybe: + return *v, true + } + return Maybe{}, false +} + +func sortedDictKeys(m map[string]MShellObject) []string { + keys := make([]string, 0, len(m)) + for k := range m { + keys = append(keys, k) + } + sort.Strings(keys) + return keys +} + +// compareValues returns -1, 0, or 1, giving a total order over every value +// type. Different kinds are ordered by a fixed type rank (valueTypeRank); within +// a kind the natural order is used (numbers numerically with int/float +// interleaved, text lexically, dates chronologically, bytes bytewise). +// Structured values compare lexicographically: lists positionally (shorter +// prefix first), dicts by sorted key then value, enums by name then declaration +// order then payloads. The order agrees with structural equality: compareValues +// returns 0 exactly when the two values are Equals. +// +// The comparison is driven by an explicit work stack rather than recursion, so +// arbitrarily deep values (e.g. a long `node(node(...))` enum chain) cannot +// overflow the call stack. Each task is either a pair of values to compare or a +// precomputed literal result (used for length tiebreaks and dict key / enum +// name comparisons). Pending tasks pop in lexicographic order; the first +// non-zero result short-circuits. Children of a compound value are pushed on top +// of that value's own length-tiebreak, so the tiebreak is only reached when the +// whole prefix compared equal. +func compareValues(a, b MShellObject) int { + type task struct { + a, b MShellObject + lit int + isLit bool + } + stack := []task{{a: a, b: b}} + for len(stack) > 0 { + t := stack[len(stack)-1] + stack = stack[:len(stack)-1] + if t.isLit { + if t.lit != 0 { + return t.lit + } + continue + } + ra, rb := valueTypeRank(t.a), valueTypeRank(t.b) + if ra != rb { + return cmpInt(ra, rb) + } + switch av := t.a.(type) { + case MShellNull: + // Two nulls are equal; move to the next task. + case MShellBool: + bv := t.b.(MShellBool) + if av.Value != bv.Value { + if !av.Value { // false < true + return -1 + } + return 1 + } + case MShellInt: + if bv, ok := t.b.(MShellInt); ok { + if c := cmpInt(av.Value, bv.Value); c != 0 { + return c + } + } else if c := cmpFloat(numericFloat(t.a), numericFloat(t.b)); c != 0 { + return c + } + case MShellFloat: + if c := cmpFloat(numericFloat(t.a), numericFloat(t.b)); c != 0 { + return c + } + case MShellString, MShellPath, MShellLiteral: + if c := strings.Compare(textContent(t.a), textContent(t.b)); c != 0 { + return c + } + case *MShellDateTime: + bt := t.b.(*MShellDateTime).Time + if av.Time.Before(bt) { + return -1 + } + if av.Time.After(bt) { + return 1 + } + case MShellBinary: + if c := bytes.Compare(av, t.b.(MShellBinary)); c != 0 { + return c + } + case Maybe, *Maybe: + am, _ := asMaybe(t.a) + bm, _ := asMaybe(t.b) + an, bn := am.IsNone(), bm.IsNone() + if an != bn { + if an { // none < just + return -1 + } + return 1 + } + if !an { // both `just`: compare payloads + stack = append(stack, task{a: am.obj, b: bm.obj}) + } + case *MShellList: + bl := t.b.(*MShellList) + n := min(len(av.Items), len(bl.Items)) + stack = append(stack, task{lit: cmpInt(len(av.Items), len(bl.Items)), isLit: true}) + for i := n - 1; i >= 0; i-- { + stack = append(stack, task{a: av.Items[i], b: bl.Items[i]}) + } + case *MShellDict: + bd := t.b.(*MShellDict) + ak := sortedDictKeys(av.Items) + bk := sortedDictKeys(bd.Items) + n := min(len(ak), len(bk)) + stack = append(stack, task{lit: cmpInt(len(ak), len(bk)), isLit: true}) + for i := n - 1; i >= 0; i-- { + // Pushed so `key compare` pops before its `value compare`. + stack = append(stack, task{a: av.Items[ak[i]], b: bd.Items[bk[i]]}) + stack = append(stack, task{lit: strings.Compare(ak[i], bk[i]), isLit: true}) + } + case *MShellEnum: + be := t.b.(*MShellEnum) + n := min(len(av.Payload), len(be.Payload)) + stack = append(stack, task{lit: cmpInt(len(av.Payload), len(be.Payload)), isLit: true}) + for i := n - 1; i >= 0; i-- { + stack = append(stack, task{a: av.Payload[i], b: be.Payload[i]}) + } + // Name and member (declaration order) compare before any payload. + stack = append(stack, task{lit: cmpInt(av.MemberIndex, be.MemberIndex), isLit: true}) + stack = append(stack, task{lit: strings.Compare(av.EnumName, be.EnumName), isLit: true}) + default: + // Unorderable kinds (quotation, pipe, grid, ...) share a rank and + // compare equal, so a stable sort leaves them in their original + // relative order. + } } + return 0 +} + +// SortList returns a new list with the same elements sorted by the total order +// compareValues defines. Element identity and type are preserved (a list of +// ints stays ints, enum payloads are kept) — sorting only reorders. +func SortList(list *MShellList) (*MShellList, error) { + newItems := make([]MShellObject, len(list.Items)) + copy(newItems, list.Items) + sort.SliceStable(newItems, func(i, j int) bool { + return compareValues(newItems[i], newItems[j]) < 0 + }) + newList := NewList(0) + newList.Items = newItems CopyListParams(list, newList) return newList, nil } -// Sort the list. Returns an error if any item cannot be cast to a string. +// SortListFunc sorts by a string key (each element's CastString) using the given +// string comparer — used for version sort. Original elements are preserved in +// the result. Returns an error if any element cannot be cast to a string. func SortListFunc(list *MShellList, cmp func(a string, b string) int) (*MShellList, error) { - stringsToSort := make([]string, len(list.Items)) + type keyed struct { + key string + obj MShellObject + } + items := make([]keyed, len(list.Items)) for i, item := range list.Items { str, err := item.CastString() if err != nil { return nil, fmt.Errorf("Cannot sort a list with a %s inside (%s).\n", item.TypeName(), item.DebugString()) } - stringsToSort[i] = str + items[i] = keyed{key: str, obj: item} } - // Sort the strings to function - slices.SortFunc(stringsToSort, cmp) + slices.SortStableFunc(items, func(a, b keyed) int { + return cmp(a.key, b.key) + }) - // Create a new list and add the sorted strings to it newList := NewList(0) - for _, str := range stringsToSort { - newList.Items = append(newList.Items, MShellString{str}) + for _, it := range items { + newList.Items = append(newList.Items, it.obj) } CopyListParams(list, newList) return newList, nil diff --git a/tests/success/enum_deep_equals.msh b/tests/success/enum_deep_equals.msh new file mode 100644 index 00000000..48719b7e --- /dev/null +++ b/tests/success/enum_deep_equals.msh @@ -0,0 +1,17 @@ +# Deeply nested enum values must compare for equality without overflowing: +# `=` walks enum payloads with an explicit pair stack, not function recursion +# (mirroring `str`/`toJson`). Build two independent 50000-deep trees and a +# third that differs only at the very tip; a recursive comparator would +# overflow the stack on values this deep. +enum Tree = leaf int | node Tree Tree end +0 leaf a! 0 leaf b! 0 leaf c! 0 i! +( + @i 50000 >= if break end + @a 0 leaf node a! + @b 0 leaf node b! + @c 0 leaf node c! + @i 1 + i! +) loop +@c 0 leaf 99 leaf node node c! +@a @b = str wl +@a @c = str wl diff --git a/tests/success/enum_deep_equals.msh.stdout b/tests/success/enum_deep_equals.msh.stdout new file mode 100644 index 00000000..da29283a --- /dev/null +++ b/tests/success/enum_deep_equals.msh.stdout @@ -0,0 +1,2 @@ +true +false diff --git a/tests/success/enum_deep_json.msh b/tests/success/enum_deep_json.msh new file mode 100644 index 00000000..81175c78 --- /dev/null +++ b/tests/success/enum_deep_json.msh @@ -0,0 +1,14 @@ +# A deeply nested enum value must serialize to JSON without overflowing: +# `toJson` renders enum payloads with an explicit work stack, not function +# recursion (mirroring `str`/enum_deep_render). Build a 50000-deep tree and +# print the length of its JSON; a recursive serializer would overflow the +# stack well before this depth. +enum Tree = leaf int | node Tree Tree end +0 leaf t! +0 i! +( + @i 50000 >= if break end + @t 0 leaf node t! + @i 1 + i! +) loop +@t toJson len str wl diff --git a/tests/success/enum_deep_json.msh.stdout b/tests/success/enum_deep_json.msh.stdout new file mode 100644 index 00000000..1ca88b97 --- /dev/null +++ b/tests/success/enum_deep_json.msh.stdout @@ -0,0 +1 @@ +1250011 diff --git a/tests/success/enum_deep_sort.msh b/tests/success/enum_deep_sort.msh new file mode 100644 index 00000000..358fb808 --- /dev/null +++ b/tests/success/enum_deep_sort.msh @@ -0,0 +1,17 @@ +# Sorting a list of deeply nested enum values must not overflow: compareValues +# walks payloads with an explicit work stack, not recursion (mirroring `=`, +# `toJson`, and `str`). Two 50000-deep trees share their whole prefix and differ +# only at the tip, so comparing them descends the full depth. A recursive +# comparator would overflow the stack on values this deep. +enum Tree = leaf int | node Tree Tree end +0 leaf a! 0 leaf c! 0 i! +( + @i 50000 >= if break end + @a 0 leaf node a! + @c 0 leaf node c! + @i 1 + i! +) loop +@c 0 leaf 99 leaf node node c! +# Sorting is deterministic regardless of input order, and leaves 2 elements. +[@c @a] sort len str wl +[@c @a] sort [@a @c] sort = str wl diff --git a/tests/success/enum_deep_sort.msh.stdout b/tests/success/enum_deep_sort.msh.stdout new file mode 100644 index 00000000..7600dd4b --- /dev/null +++ b/tests/success/enum_deep_sort.msh.stdout @@ -0,0 +1,2 @@ +2 +true diff --git a/tests/success/enum_union_match.msh b/tests/success/enum_union_match.msh new file mode 100644 index 00000000..1f9232f0 --- /dev/null +++ b/tests/success/enum_union_match.msh @@ -0,0 +1,31 @@ +# An enum can be a member of a `type` union, and a `match` discriminates the +# union by the enum's type name: a bare enum type name (`C`) is a type-test arm +# that matches any value of that enum, while a non-matching value (here an int, +# or a value of a different enum) falls through to the next arm. +enum C = red | green | blue end +type T = C | int + +red as T match + C : "a color" wl, + int : "an int" wl, +end + +42 as T match + C : "a color" wl, + int : "an int" wl, +end + +# A union of two enums, discriminated by each enum's type name. +enum A = a1 | a2 end +enum B = b1 | b2 end +type AB = A | B + +b1 as AB match + A : "an A" wl, + B : "a B" wl, +end + +a2 as AB match + A : "an A" wl, + B : "a B" wl, +end diff --git a/tests/success/enum_union_match.msh.stdout b/tests/success/enum_union_match.msh.stdout new file mode 100644 index 00000000..42a132b5 --- /dev/null +++ b/tests/success/enum_union_match.msh.stdout @@ -0,0 +1,4 @@ +a color +an int +a B +an A diff --git a/tests/success/sort_structural.msh b/tests/success/sort_structural.msh new file mode 100644 index 00000000..572e8db1 --- /dev/null +++ b/tests/success/sort_structural.msh @@ -0,0 +1,23 @@ +# `sort` reorders the original elements by a total structural order and never +# changes their type (the old implementation replaced every element with a +# string, dropping enum payloads). Numbers sort numerically and stay numbers, +# enums sort by declaration order then payload, dicts by sorted key/value, and a +# mixed-type list sorts deterministically by a fixed type rank (numbers < text). + +# Numbers keep their type (sum works) and sort numerically, not lexically. +[10 2 1] sort (str) map "," join wl +[10 2 1] sort sum str wl + +# Enums sort by member declaration order (low < medium < high), not by name. +enum Priority = low | medium | high end +[high low medium high low] sort (str) map "," join wl + +# Same member: payloads break the tie. +enum Tree = leaf int | node Tree Tree end +[3 leaf 1 leaf 2 leaf] sort (str) map "," join wl + +# Dicts compare by sorted key then value. +[ { "b": 2, "a": 9 } { "a": 1, "b": 1 } { "a": 1, "b": 2 } ] sort (toJson) map " | " join wl + +# Mixed types: fixed type rank (numbers before text), deterministic. +[hello 1 'c' 'A'] sort (str) map "," join wl diff --git a/tests/success/sort_structural.msh.stdout b/tests/success/sort_structural.msh.stdout new file mode 100644 index 00000000..c22eea2b --- /dev/null +++ b/tests/success/sort_structural.msh.stdout @@ -0,0 +1,6 @@ +1,2,10 +13 +low,low,medium,high,high +leaf(1),leaf(2),leaf(3) +{"a": 1, "b": 1} | {"a": 1, "b": 2} | {"a": 9, "b": 2} +1,A,c,hello diff --git a/tests/success/sort_test.msh b/tests/success/sort_test.msh index 61a38dbf..1d0f888f 100644 --- a/tests/success/sort_test.msh +++ b/tests/success/sort_test.msh @@ -1,5 +1,7 @@ "# Basic sort test" wl -[hello 1 'c' 'A'] sort uw +# sort preserves element types (the int stays an int), so stringify for display. +# Across types the sort order is by a fixed type rank (numbers before text). +[hello 1 'c' 'A'] sort (str) map uw "# Unique sort test" wl [z y 'x' y z] uniq sort uw From 8abcaeecace36bce62551db5e8043014d2686add Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Tue, 30 Jun 2026 20:55:15 -0500 Subject: [PATCH 13/32] Enum match: reject non-name payload bindings at runtime The runtime enum payload-binding loop bound by token lexeme and silently skipped non-token items, so it accepted match arms the type checker rejects: an operator-token name like `ok x` (x lexes as INTERPRET) was bound, and a malformed binding like `items [a b]` was ignored while the arm still matched. Now each payload binding must be a plain name (LITERAL) or the `_` wildcard, failing with a clear message otherwise. This mirrors the checker (enumMemberPattern) and the `just`/type-test binding forms, whose runtime and checker already agree, so all three binding forms are now consistent across type-check and run. Adds tests/fail/enum_bad_payload_binding.msh. All suites green: tests 214, typecheck 197, go test. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/Evaluator.go | 23 ++++++++++++++++++- tests/fail/enum_bad_payload_binding.msh | 8 +++++++ .../fail/enum_bad_payload_binding.msh.stderr | 1 + 3 files changed, 31 insertions(+), 1 deletion(-) create mode 100644 tests/fail/enum_bad_payload_binding.msh create mode 100644 tests/fail/enum_bad_payload_binding.msh.stderr diff --git a/mshell/Evaluator.go b/mshell/Evaluator.go index 4cabd9c4..ef112e89 100644 --- a/mshell/Evaluator.go +++ b/mshell/Evaluator.go @@ -1101,6 +1101,15 @@ func (state *EvalState) processMatchBlock(matchBlock *MShellParseMatchBlock, fra return state.FailWithMessage(fmt.Sprintf("%d:%d: No matching arm found in match block and no wildcard '_' arm provided.\n", startToken.Line, startToken.Column)) } +// parseItemLexeme renders a parse item for a diagnostic: a token's lexeme, or a +// non-token pattern's debug form. +func parseItemLexeme(item MShellParseItem) string { + if tok, ok := item.(Token); ok { + return tok.Lexeme + } + return item.DebugString() +} + // matchPattern checks if a subject matches a pattern (list of parse items). // Returns (matched bool, bindings map, result EvalResult). func (state *EvalState) matchPattern(pattern []MShellParseItem, subject MShellObject, startToken Token) (bool, map[string]MShellObject, EvalResult) { @@ -1119,8 +1128,20 @@ func (state *EvalState) matchPattern(pattern []MShellParseItem, subject MShellOb tok.Line, tok.Column, tok.Lexeme, len(enumVal.Payload), len(binds))) } bindings := make(map[string]MShellObject) + for _, b := range binds { + // A payload binding must be a plain name (a LITERAL) or the + // `_` wildcard — not a keyword/operator token (`end`, `x`, + // ...) or a nested pattern. This mirrors the type checker + // (enumMemberPattern) and the `just`/type-test binding forms, + // so the runtime never accepts an arm the checker rejects. + bt, okBt := b.(Token) + if !okBt || (bt.Type != LITERAL && bt.Type != UNDERSCORE) { + return false, nil, state.FailWithMessage(fmt.Sprintf("%d:%d: enum member '%s' payload bindings must be names, not '%s'.\n", + tok.Line, tok.Column, tok.Lexeme, parseItemLexeme(b))) + } + } for i, b := range binds { - if bt, okBt := b.(Token); okBt && bt.Lexeme != "_" { + if bt := b.(Token); bt.Lexeme != "_" { bindings[bt.Lexeme] = enumVal.Payload[i] } } diff --git a/tests/fail/enum_bad_payload_binding.msh b/tests/fail/enum_bad_payload_binding.msh new file mode 100644 index 00000000..26ed8441 --- /dev/null +++ b/tests/fail/enum_bad_payload_binding.msh @@ -0,0 +1,8 @@ +# An enum payload binding must be a plain name (or `_`). A nested pattern (or a +# keyword/operator token) is rejected at runtime, matching the type checker and +# the `just`/type-test binding forms. +enum Box = items [int] | z end +[1 2 3] items match + items [a b] : "matched" wl, + z : "z" wl, +end diff --git a/tests/fail/enum_bad_payload_binding.msh.stderr b/tests/fail/enum_bad_payload_binding.msh.stderr new file mode 100644 index 00000000..2682d61e --- /dev/null +++ b/tests/fail/enum_bad_payload_binding.msh.stderr @@ -0,0 +1 @@ +6:3: enum member 'items' payload bindings must be names, not '['a', 'b']'. From 99ac5b10dc3d06e9fc0c95353e2bfdd4a7423fa4 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Tue, 30 Jun 2026 21:37:34 -0500 Subject: [PATCH 14/32] Enum match: see through a `type` alias (brand) to the enum Naming an enum via `type Color2 = C` wraps it in a TKBrand, so the checker's match logic (which tests Kind == TKEnum) no longer recognized the members: a match on the aliased type was rejected as "unrecognized pattern", even though brands are runtime-erased and the value matched correctly at run time. This was inconsistent with a branded union: `type T = int | str` stays a TKUnion node, so its arms remain matchable through the brand. Enums took the opaque-wrapper path instead. Now enumMemberPattern and CheckMatchExhaustive unwrap a TKBrand to its underlying enum before dispatching, so a branded enum matches (and enforces exhaustiveness over) its members exactly like a branded union matches its arms. The brand stays nominal at value boundaries (an explicit `as` is still needed to pass an enum where the alias is expected). Checker-only change; the runtime already matched branded enums correctly. Tests: tests/success/enum_branded_match.msh and tests/typecheck_fail/enum_branded_nonexhaustive.msh. All suites green: tests 215, typecheck 199, go test. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/TypeBranch.go | 7 +++++ mshell/TypeCheckProgram.go | 5 ++++ tests/success/enum_branded_match.msh | 27 +++++++++++++++++++ tests/success/enum_branded_match.msh.stdout | 3 +++ .../enum_branded_nonexhaustive.msh | 10 +++++++ 5 files changed, 52 insertions(+) create mode 100644 tests/success/enum_branded_match.msh create mode 100644 tests/success/enum_branded_match.msh.stdout create mode 100644 tests/typecheck_fail/enum_branded_nonexhaustive.msh diff --git a/mshell/TypeBranch.go b/mshell/TypeBranch.go index c760cead..4542b667 100644 --- a/mshell/TypeBranch.go +++ b/mshell/TypeBranch.go @@ -117,7 +117,14 @@ func (c *Checker) CheckMatchExhaustive(matched TypeId, arms []MatchArmTag, callS } } + // Unwrap a `type X = Enum` brand so exhaustiveness dispatches on the + // underlying enum and checks coverage over its members — mirroring how a + // branded union (still a TKUnion node) is checked over its arms. n := c.arena.Node(matched) + if n.Kind == TKBrand { + matched = c.underlying(matched) + n = c.arena.Node(matched) + } switch n.Kind { case TKMaybe: hasJust, hasNone := false, false diff --git a/mshell/TypeCheckProgram.go b/mshell/TypeCheckProgram.go index b293623d..ba7c3683 100644 --- a/mshell/TypeCheckProgram.go +++ b/mshell/TypeCheckProgram.go @@ -1418,6 +1418,11 @@ func (c *Checker) enumMemberPattern(subject TypeId, pattern []MShellParseItem) ( return armPattern{}, false } resolved := c.subst.Apply(c.arena, subject) + // Unwrap a `type X = Enum` brand so a branded enum matches by its members, + // just as a branded union (`type T = int | str`) matches by its arms. + if c.arena.Node(resolved).Kind == TKBrand { + resolved = c.underlying(resolved) + } sn := c.arena.Node(resolved) if sn.Kind != TKEnum { return armPattern{}, false diff --git a/tests/success/enum_branded_match.msh b/tests/success/enum_branded_match.msh new file mode 100644 index 00000000..a63a4a6a --- /dev/null +++ b/tests/success/enum_branded_match.msh @@ -0,0 +1,27 @@ +# An enum named via a `type` alias is a distinct branded type, but it can still +# be `match`ed by its members — just as a branded union (`type T = int | str`) +# is matched by its arms. Exhaustiveness is enforced over the members, payload +# binding works through the brand, and the brand stays nominal at call +# boundaries (an explicit `as` is needed to pass an enum where the alias is). +enum C = red | green | blue end +type Color2 = C + +red as Color2 match + red : "r" wl, + green : "g" wl, + blue : "b" wl, +end + +enum R = ok int | failed str | z end +type R2 = R +404 as int drop +5 ok as R2 match + ok n : @n str wl, + failed m : @m wl, + z : "z" wl, +end + +def paint (Color2 -- str) + match red: "is red", green: "is green", blue: "is blue", end +end +blue as Color2 paint wl diff --git a/tests/success/enum_branded_match.msh.stdout b/tests/success/enum_branded_match.msh.stdout new file mode 100644 index 00000000..c75b5130 --- /dev/null +++ b/tests/success/enum_branded_match.msh.stdout @@ -0,0 +1,3 @@ +r +5 +is blue diff --git a/tests/typecheck_fail/enum_branded_nonexhaustive.msh b/tests/typecheck_fail/enum_branded_nonexhaustive.msh new file mode 100644 index 00000000..7be9ae0c --- /dev/null +++ b/tests/typecheck_fail/enum_branded_nonexhaustive.msh @@ -0,0 +1,10 @@ +# Exhaustiveness is enforced through a `type` alias of an enum: matching a +# branded enum must still cover every member (or use `_`). Here `blue` is +# missing, so the match is rejected — the brand does not hide the members. +enum C = red | green | blue end +type Color2 = C + +red as Color2 match + red : "r" wl, + green : "g" wl, +end From 556b382cda3edcc7c455ab976ed2b8c490c5e22e Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Wed, 1 Jul 2026 18:03:06 -0500 Subject: [PATCH 15/32] Enum: drop the EnumName prefix from DebugString A list's ToString renders elements via DebugString, and the enum's DebugString was the only value type to inject an `EnumName.` prefix. So an enum inside a list printed as `[C.red C.green]`, inconsistent with its standalone form (`red`), `map`-ed form (`red`), and dict/JSON form (`"red"`). DebugString now returns the same member form as ToString, so an enum renders identically in every context. The member name is globally unique, so the type prefix added no disambiguation (unlike the quotes a string's DebugString adds). Test: tests/success/enum_render_contexts.msh. Suites green: tests 217, typecheck 202, go test. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/MShellObject.go | 2 +- tests/success/enum_render_contexts.msh | 12 ++++++++++++ tests/success/enum_render_contexts.msh.stdout | 6 ++++++ 3 files changed, 19 insertions(+), 1 deletion(-) create mode 100644 tests/success/enum_render_contexts.msh create mode 100644 tests/success/enum_render_contexts.msh.stdout diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index 098b72b7..e628d06a 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -364,7 +364,7 @@ func (e *MShellEnum) IsCommandLineable() bool { return true } func (e *MShellEnum) IsNumeric() bool { return false } func (e *MShellEnum) FloatNumeric() float64 { return 0 } func (e *MShellEnum) CommandLine() string { return enumRender(e) } -func (e *MShellEnum) DebugString() string { return e.EnumName + "." + enumRender(e) } +func (e *MShellEnum) DebugString() string { return enumRender(e) } // enumRender renders an enum value as `member` (nullary) or // `member(p0 p1 ...)`. Nested enum payloads are expanded with an explicit diff --git a/tests/success/enum_render_contexts.msh b/tests/success/enum_render_contexts.msh new file mode 100644 index 00000000..57c12293 --- /dev/null +++ b/tests/success/enum_render_contexts.msh @@ -0,0 +1,12 @@ +# An enum value renders the same way in every context: standalone, inside a +# list, and via `map` all use the member form (`red`, `leaf(3)`) with no +# `EnumName.` prefix. (A dict / toJson still use the JSON-tagged form.) +enum C = red | green | blue end +red str wl +[red green blue] str wl +[red green blue] (str) map "," join wl + +enum T = leaf int | node T T end +3 leaf str wl +[ 3 leaf 1 leaf ] str wl +1 leaf 2 leaf node str wl diff --git a/tests/success/enum_render_contexts.msh.stdout b/tests/success/enum_render_contexts.msh.stdout new file mode 100644 index 00000000..cde9f53c --- /dev/null +++ b/tests/success/enum_render_contexts.msh.stdout @@ -0,0 +1,6 @@ +red +[red green blue] +red,green,blue +leaf(3) +[leaf(3) leaf(1)] +node(leaf(1) leaf(2)) From 622ce23e35b701dd2bbcd036a79682b1d60ecd01 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Wed, 1 Jul 2026 18:23:49 -0500 Subject: [PATCH 16/32] Fix Maybe equality: accept *Maybe, so None==None and Just==Just work MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Maybe.Equals asserted other.(Maybe) (the value form), but the runtime constructs Maybe values as *Maybe pointers everywhere (just/none and every lookup/parse builtin push &Maybe{...}). So the assertion missed every operand and the method bailed to false — making *all* Maybe-vs-Maybe comparisons false, including `none none =`, `5 just 5 just =`, and `Maybe[enum]` equality. By extension `uniq` could not dedupe Maybes, and any list/dict/enum containing a Maybe compared unequal to an identical value. Now it unwraps other via the existing asMaybe helper (value or pointer), matching how the match code and compareValues already handle both forms. The receiver side already worked via Go's value-method promotion on *Maybe. The equality.msh test had baked the wrong answers into its expected output (None==None and Just==Just recorded as false); corrected and expanded with real Just/None and Maybe[enum] assertions. Suites green: tests 217, typecheck 203, go test. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/MShellObject.go | 5 ++++- tests/success/equality.msh | 13 ++++++++++--- tests/success/equality.msh.stdout | 6 ++++++ 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index e628d06a..fb430c53 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -243,7 +243,10 @@ func (m Maybe) Concat(other MShellObject) (MShellObject, error) { } func (m Maybe) Equals(other MShellObject) (bool, error) { - otherMaybe, ok := other.(Maybe) + // Maybe values are constructed as *Maybe at runtime, so accept either the + // value or pointer form — a plain other.(Maybe) misses every *Maybe and + // would make all Maybe-vs-Maybe comparisons (including None==None) false. + otherMaybe, ok := asMaybe(other) if !ok { return false, nil } diff --git a/tests/success/equality.msh b/tests/success/equality.msh index 9259ebf1..6484e3e2 100644 --- a/tests/success/equality.msh +++ b/tests/success/equality.msh @@ -15,14 +15,21 @@ # Dicts compare structurally, independent of key order { "a": 1, "b": 2 } { "b": 2, "a": 1 } = str wl -# Maybe -"x" just "x" just = str wl +# Maybe: None==None, Just==Just (equal / unequal payloads), Just != None, and +# a Maybe nested in a list all compare structurally. none none = str wl +"x" just "x" just = str wl +5 just 5 just = str wl +5 just 6 just = str wl +5 just none = str wl +[5 just] [5 just] = str wl -# Enums +# Enums (including Maybe[enum], the common optional-enum case) enum C = red | green end red red = str wl red green = str wl +red just red just = str wl +red just green just = str wl # uniq now deduplicates any equatable value (lists, enums, ...) [[1] [1] [2]] uniq len str wl diff --git a/tests/success/equality.msh.stdout b/tests/success/equality.msh.stdout index 13d6400c..23ac1554 100644 --- a/tests/success/equality.msh.stdout +++ b/tests/success/equality.msh.stdout @@ -4,9 +4,15 @@ true false true true +true +true +true false false true +true +false +true false 2 2 From 3d6f3f94381b6315c5d43098c5d20db61ef441b7 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Wed, 1 Jul 2026 18:51:40 -0500 Subject: [PATCH 17/32] Match: unwrap a type-alias brand on the subject once, for all pattern forms MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The branded-enum fix unwrapped the TKBrand in two per-site spots (enumMemberPattern and CheckMatchExhaustive), so a branded enum matched its members — but the `just`/`none` and ` name` binding paths in armPatternOf still checked the raw subject kind. So matching a branded Maybe (e.g. `type MC = Maybe[C]`, a named optional enum) by `just v`/`none` was rejected by the checker (`@v` unbound / non-exhaustive) even though the runtime matched it fine — brands are runtime-erased. Unwrap the brand once where the match subject is established (checkMatchBlock), so every arm form — enum member, `just v`, ` name`, list/dict — and the exhaustiveness check see the underlying type uniformly. The two per-site unwraps are removed (redundant now); enumMemberPattern and CheckMatchExhaustive get the already-unwrapped subject. Brands stay nominal at value boundaries (an `as` is still needed to pass an enum where the alias is expected). Tests: tests/success/branded_maybe_match.msh and tests/typecheck_fail/branded_maybe_nonexhaustive.msh. Suites green: tests 218, typecheck 205, go test. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/TypeBranch.go | 7 ----- mshell/TypeCheckProgram.go | 16 +++++++---- tests/success/branded_maybe_match.msh | 28 +++++++++++++++++++ tests/success/branded_maybe_match.msh.stdout | 4 +++ .../branded_maybe_nonexhaustive.msh | 7 +++++ 5 files changed, 50 insertions(+), 12 deletions(-) create mode 100644 tests/success/branded_maybe_match.msh create mode 100644 tests/success/branded_maybe_match.msh.stdout create mode 100644 tests/typecheck_fail/branded_maybe_nonexhaustive.msh diff --git a/mshell/TypeBranch.go b/mshell/TypeBranch.go index 4542b667..c760cead 100644 --- a/mshell/TypeBranch.go +++ b/mshell/TypeBranch.go @@ -117,14 +117,7 @@ func (c *Checker) CheckMatchExhaustive(matched TypeId, arms []MatchArmTag, callS } } - // Unwrap a `type X = Enum` brand so exhaustiveness dispatches on the - // underlying enum and checks coverage over its members — mirroring how a - // branded union (still a TKUnion node) is checked over its arms. n := c.arena.Node(matched) - if n.Kind == TKBrand { - matched = c.underlying(matched) - n = c.arena.Node(matched) - } switch n.Kind { case TKMaybe: hasJust, hasNone := false, false diff --git a/mshell/TypeCheckProgram.go b/mshell/TypeCheckProgram.go index fc679a54..2a52c0b0 100644 --- a/mshell/TypeCheckProgram.go +++ b/mshell/TypeCheckProgram.go @@ -1206,6 +1206,15 @@ func (c *Checker) checkMatchBlock(matchBlock *MShellParseMatchBlock) { // exhaustiveness check compare against `str` by type id, and the literal // value carries no meaning for pattern matching. subject := c.arena.WidenStrLit(c.stack.items[c.stack.Len()-1]) + // See through a `type X = ...` brand once, here, so every arm form (enum + // member, `just`/` name` binding, list/dict pattern) and the + // exhaustiveness check match against the underlying type. A brand is + // nominal for typing but has no runtime representation, so a branded enum + // matches its members, a branded Maybe its `just`/`none`, etc. — exactly + // as the unbranded types do, which is what the runtime already does. + if resolved := c.subst.Apply(c.arena, subject); c.arena.Node(resolved).Kind == TKBrand { + subject = c.underlying(resolved) + } entry := c.captureBranch() if len(matchBlock.Arms) == 0 { @@ -1420,12 +1429,9 @@ func (c *Checker) enumMemberPattern(subject TypeId, pattern []MShellParseItem) ( if !ok || tok.Type != LITERAL { return armPattern{}, false } + // The subject is already brand-unwrapped by checkMatchBlock, so a branded + // enum (`type X = Enum`) arrives here as its underlying TKEnum. resolved := c.subst.Apply(c.arena, subject) - // Unwrap a `type X = Enum` brand so a branded enum matches by its members, - // just as a branded union (`type T = int | str`) matches by its arms. - if c.arena.Node(resolved).Kind == TKBrand { - resolved = c.underlying(resolved) - } sn := c.arena.Node(resolved) if sn.Kind != TKEnum { return armPattern{}, false diff --git a/tests/success/branded_maybe_match.msh b/tests/success/branded_maybe_match.msh new file mode 100644 index 00000000..6f4cd0b4 --- /dev/null +++ b/tests/success/branded_maybe_match.msh @@ -0,0 +1,28 @@ +# A `Maybe` (or enum, or any type) named via a `type` alias is a distinct brand, +# but `match` sees through the brand to the underlying type — so a branded +# Maybe matches by `just v` / `none`, a branded enum by its members, and a +# branded primitive by a type-keyword arm. The brand is nominal for typing but +# has no runtime form, matching what the runtime already does. +enum C = red | green | blue end +type MC = Maybe[C] + +red just as MC match + just v : @v str wl, + none : "n" wl, +end + +none as MC match + just v : @v str wl, + none : "n" wl, +end + +# Branded Maybe of a primitive, with a type-keyword binding inside. +type MI = Maybe[int] +7 just as MI match + just n : @n 1 + str wl, + none : "n" wl, +end + +# Branded primitive, matched with a type-keyword + binding. +type MyInt = int +5 as MyInt match int n : @n str wl, _ : "o" wl, end diff --git a/tests/success/branded_maybe_match.msh.stdout b/tests/success/branded_maybe_match.msh.stdout new file mode 100644 index 00000000..373f91b9 --- /dev/null +++ b/tests/success/branded_maybe_match.msh.stdout @@ -0,0 +1,4 @@ +red +n +8 +5 diff --git a/tests/typecheck_fail/branded_maybe_nonexhaustive.msh b/tests/typecheck_fail/branded_maybe_nonexhaustive.msh new file mode 100644 index 00000000..d1fb35bb --- /dev/null +++ b/tests/typecheck_fail/branded_maybe_nonexhaustive.msh @@ -0,0 +1,7 @@ +# Exhaustiveness is enforced through a `type` alias of a Maybe: a branded Maybe +# match must still cover both `just` and `none` (or use `_`). Here `none` is +# missing, so it is rejected — the brand does not hide the cases. +type MI = Maybe[int] +5 just as MI match + just v : @v str wl, +end From 3413d03ddded5a252a6e26872c5f68168de6a057 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Wed, 1 Jul 2026 19:52:56 -0500 Subject: [PATCH 18/32] Fix exponential blowup comparing enum values with shared substructure MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A value built by reusing one subtree twice per level (`@t @t node` in a loop) is a DAG: n heap nodes but 2^n paths when walked as a tree. Equality and ordering walked it structurally with no notion of sharing, so `=`, `uniq`, and `sort` on such values went exponential — depth 24 took 0.7s, each +1 level doubled it, and depth 40 (41 actual nodes) would run for hours. Measured, not theoretical. Two layers of defense, both zero-cost for ordinary values: - sameRef fast path: a pointer-identical pair is equal by definition, so the enum Equals pair loop, compareValues, and itemsEqual skip it instead of expanding. This alone collapses every same-reference case (self compare, dup, a shared subtree meeting itself) from 2^n to n. Typed per-kind pointer comparisons, so it can never hit Go's non-comparable interface panic (e.g. MShellBinary). - dagGuard threshold memo: two *independently built* DAGs share no pointers across operands, so the fast path never fires. The walks count pops; past 2^19 steps they memoize already-expanded enum/list/dict pointer pairs and skip repeats, making the comparison polynomial in actual nodes. Sound in a LIFO walk: a duplicate only pops after the first occurrence's expansion fully resolved, and any mismatch returns immediately. The memo is capped (2^18 entries) so a huge linear value cannot balloon memory; below the threshold the guard is one integer increment and never allocates. Depth-40 self-compare: was ~13 hours extrapolated, now 0.035s. The depth-64 regression test (tests/success/enum_dag_equality.msh) covers both modes plus uniq/sort and an unequal tip. Deep linear values (50k suite tests, 4M manual) are unaffected. Suites green: tests 219, typecheck 206, go test. Co-Authored-By: Claude Fable 5 Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/MShellObject.go | 123 +++++++++++++++++++++ tests/success/enum_dag_equality.msh | 34 ++++++ tests/success/enum_dag_equality.msh.stdout | 8 ++ 3 files changed, 165 insertions(+) create mode 100644 tests/success/enum_dag_equality.msh create mode 100644 tests/success/enum_dag_equality.msh.stdout diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index fb430c53..6abf3541 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -501,10 +501,18 @@ func (e *MShellEnum) Concat(other MShellObject) (MShellObject, error) { // payloads (the leaves) delegate to their own Equals. func (e *MShellEnum) Equals(other MShellObject) (bool, error) { type pair struct{ a, b MShellObject } + var guard dagGuard stack := []pair{{a: e, b: other}} for len(stack) > 0 { p := stack[len(stack)-1] stack = stack[:len(stack)-1] + // Shared substructure: a pointer-identical pair is equal by + // definition, and a pair this walk already expanded compared equal + // (see dagGuard). Skipping both keeps DAG-shaped values (a subtree + // reused twice per level) linear instead of 2^n. + if sameRef(p.a, p.b) || guard.skip(p.a, p.b) { + continue + } ea, aok := p.a.(*MShellEnum) eb, bok := p.b.(*MShellEnum) if aok || bok { @@ -1090,6 +1098,108 @@ func sortedDictKeys(m map[string]MShellObject) []string { return keys } +// sameRef reports whether a and b are the identical heap object, for the kinds +// that can form shared substructure (a value built as `@t @t node` reuses one +// subtree twice). A pointer-identical pair is equal by definition, so equality +// and ordering walks skip it instead of expanding it — without this, walking a +// value with n levels of sharing costs 2^n. Only pointer kinds are compared: +// comparing interfaces holding non-comparable dynamic types (e.g. MShellBinary, +// a []byte) panics at runtime. +func sameRef(a, b MShellObject) bool { + switch av := a.(type) { + case *MShellEnum: + bv, ok := b.(*MShellEnum) + return ok && av == bv + case *MShellList: + bv, ok := b.(*MShellList) + return ok && av == bv + case *MShellDict: + bv, ok := b.(*MShellDict) + return ok && av == bv + case *Maybe: + bv, ok := b.(*Maybe) + return ok && av == bv + case *MShellDateTime: + bv, ok := b.(*MShellDateTime) + return ok && av == bv + case *MShellQuotation: + bv, ok := b.(*MShellQuotation) + return ok && av == bv + } + return false +} + +// dagGuard bounds a comparison walk over values with shared substructure that +// sameRef alone cannot catch: two *independently built* DAGs share no pointers +// across operands, so every level re-expands and the walk goes exponential. +// The guard counts pops; once a walk runs long enough to suggest blowup, it +// memoizes the pointer pairs it has already expanded and skips repeats. +// +// Skipping a repeated pair is sound in a LIFO walk: the first occurrence's +// entire expansion resolves before any later duplicate (which sat lower in the +// stack) pops, and a mismatch anywhere returns from the walk immediately — so +// if a duplicate pops at all, its subtree already compared equal. +// +// Ordinary comparisons never allocate: below the step threshold the guard is +// one integer increment. The memo is capped so a legitimately huge linear +// value (millions of distinct pairs, no repeats) cannot balloon memory; a +// blowup DAG has few distinct pairs and fits far below the cap. +type dagGuard struct { + steps int + memo map[refPair]bool +} + +type refPair struct{ a, b MShellObject } + +const dagStepThreshold = 1 << 19 +const dagMemoCap = 1 << 18 + +// skip reports whether this pair was already expanded earlier in the walk. +// Call once per popped pair; it records the pair (past the threshold) so +// later duplicates skip. +func (g *dagGuard) skip(a, b MShellObject) bool { + g.steps++ + if g.steps < dagStepThreshold { + return false + } + key, ok := refPairKey(a, b) + if !ok { + return false + } + if g.memo == nil { + g.memo = make(map[refPair]bool, 1024) + } + if g.memo[key] { + return true + } + if len(g.memo) < dagMemoCap { + g.memo[key] = true + } + return false +} + +// refPairKey returns a comparable identity key when both values are the same +// container pointer kind — the kinds whose repeated pairs cause blowup. +// Interface keys are only safe when the dynamic values are comparable, which +// pointers are; scalar kinds are cheap to compare directly and get no key. +func refPairKey(a, b MShellObject) (refPair, bool) { + switch a.(type) { + case *MShellEnum: + if _, ok := b.(*MShellEnum); ok { + return refPair{a, b}, true + } + case *MShellList: + if _, ok := b.(*MShellList); ok { + return refPair{a, b}, true + } + case *MShellDict: + if _, ok := b.(*MShellDict); ok { + return refPair{a, b}, true + } + } + return refPair{}, false +} + // compareValues returns -1, 0, or 1, giving a total order over every value // type. Different kinds are ordered by a fixed type rank (valueTypeRank); within // a kind the natural order is used (numbers numerically with int/float @@ -1113,6 +1223,7 @@ func compareValues(a, b MShellObject) int { lit int isLit bool } + var guard dagGuard stack := []task{{a: a, b: b}} for len(stack) > 0 { t := stack[len(stack)-1] @@ -1123,6 +1234,13 @@ func compareValues(a, b MShellObject) int { } continue } + // Shared substructure: a pointer-identical pair compares 0 by + // definition, and a pair this walk already expanded proved 0 (any + // non-zero would have returned; see dagGuard). Skipping both keeps + // DAG-shaped values linear instead of 2^n. + if sameRef(t.a, t.b) || guard.skip(t.a, t.b) { + continue + } ra, rb := valueTypeRank(t.a), valueTypeRank(t.b) if ra != rb { return cmpInt(ra, rb) @@ -2380,6 +2498,11 @@ func itemsEqual(a, b []MShellObject) (bool, error) { return false, nil } for i := range a { + // Pointer-identical elements are equal by definition; skipping them + // keeps lists with shared substructure from re-walking it. + if sameRef(a[i], b[i]) { + continue + } eq, err := a[i].Equals(b[i]) if err != nil || !eq { return eq, err diff --git a/tests/success/enum_dag_equality.msh b/tests/success/enum_dag_equality.msh new file mode 100644 index 00000000..08b677e2 --- /dev/null +++ b/tests/success/enum_dag_equality.msh @@ -0,0 +1,34 @@ +# Equality and ordering on enum values with shared substructure must not blow +# up: `@t @t node` reuses one subtree twice per level, so after 64 levels the +# value is a DAG with 65 nodes but 2^64 tree paths. The comparison walks skip +# pointer-identical pairs (and, past a step threshold, memoize already-expanded +# pairs), so these finish instantly; a naive structural walk would run for +# centuries. +enum T = leaf int | node T T end + +# One DAG, compared against itself / its own reference. +0 leaf t! +0 i! +( @i 64 >= if break end @t @t node t! @i 1 + i! ) loop +@t @t = str wl +@t dup = str wl +[ @t @t ] uniq len str wl +[ @t @t ] sort len str wl + +# Two DAGs built independently: no pointers are shared across the operands, so +# the pointer fast path never fires — this exercises the memoized mode. +0 leaf a! +0 i! +( @i 64 >= if break end @a @a node a! @i 1 + i! ) loop +0 leaf b! +0 i! +( @i 64 >= if break end @b @b node b! @i 1 + i! ) loop +@a @b = str wl + +# A third DAG differing at the bottom leaf: unequal, found without blowup. +1 leaf c! +0 i! +( @i 64 >= if break end @c @c node c! @i 1 + i! ) loop +@a @c = str wl +[ @a @c @b ] sort len str wl +[ @a @c @b ] uniq len str wl diff --git a/tests/success/enum_dag_equality.msh.stdout b/tests/success/enum_dag_equality.msh.stdout new file mode 100644 index 00000000..e102c904 --- /dev/null +++ b/tests/success/enum_dag_equality.msh.stdout @@ -0,0 +1,8 @@ +true +true +1 +2 +true +false +3 +2 From 8087f6c0f9b500d5d03e5b37eb9a5c2aaf35f6d6 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Wed, 1 Jul 2026 20:06:19 -0500 Subject: [PATCH 19/32] Enum: reject a def whose name collides with an enum member MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The spec'd invariant — enum members share the word namespace, collisions are declaration errors — was only enforced in one direction. defineEnum (pre-pass 1b) checks nameBuiltins, which at that point holds Go builtins and stdlib sigs, so a member colliding with those is caught. But user def signatures register in pre-pass 2, after the enums, with no reverse check — so a member colliding with a same-file def was silently accepted in either textual order. That was a real soundness hole: the checker resolved the shared word to the enum constructor (e.g. when the context demanded the enum type) while the runtime resolves definitions before enum members and ran the def — so a cleanly type-checked program failed at runtime ("Unknown match pattern literal") or diverged in stack shape for payload constructors. defineEnum now records registered member names (enumMemberToks), and def registration rejects a def whose name is a member, mirroring the existing member-vs-def error. Def-vs-def duplication is untouched (a separate, deferred decision). Test: tests/typecheck_fail/enum_member_def_collision.msh. Suites green: tests 219, typecheck 207, go test. Co-Authored-By: Claude Fable 5 Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/TypeCheckProgram.go | 12 ++++++++++++ mshell/TypeChecker.go | 8 ++++++++ mshell/TypeEnum.go | 4 ++++ tests/typecheck_fail/enum_member_def_collision.msh | 8 ++++++++ 4 files changed, 32 insertions(+) create mode 100644 tests/typecheck_fail/enum_member_def_collision.msh diff --git a/mshell/TypeCheckProgram.go b/mshell/TypeCheckProgram.go index 2a52c0b0..facb13ac 100644 --- a/mshell/TypeCheckProgram.go +++ b/mshell/TypeCheckProgram.go @@ -127,6 +127,18 @@ func (c *Checker) CheckProgram(file *MShellFile) { sig := c.ResolveDefSig(def.Inputs, def.Outputs) defSigs[i] = sig nameId := c.names.Intern(def.Name) + // Enum constructors share the word namespace and registered in + // pre-pass 1b; a def reusing a member name would resolve to the + // constructor in the checker but to the def at runtime, so reject it — + // the mirror of defineEnum rejecting a member that collides with an + // existing def or builtin. + if _, isMember := c.enumMemberToks[nameId]; isMember { + c.errors = append(c.errors, TypeError{ + Kind: TErrTypeParse, Pos: def.NameToken, + Hint: "definition '" + def.Name + "' conflicts with an enum member of the same name", + }) + continue + } c.nameBuiltins[nameId] = append(c.nameBuiltins[nameId], sig) } // Pre-pass 3: type-check each def body against its declared sig. diff --git a/mshell/TypeChecker.go b/mshell/TypeChecker.go index e82da8ba..e856c400 100644 --- a/mshell/TypeChecker.go +++ b/mshell/TypeChecker.go @@ -101,6 +101,14 @@ type Checker struct { // type names are NOT stored here — they are recognized directly. typeEnv map[NameId]TypeId + // enumMemberToks records every registered enum member name (value: the + // member's declaration token). Enum constructors and user defs share the + // word namespace, and enums register before same-file defs — so def + // registration checks this to reject a def whose name collides with a + // member, mirroring defineEnum rejecting a member that collides with an + // existing def or builtin. + enumMemberToks map[NameId]Token + // Quote-body inference state (Phase 7). When inferring is true, // applySig responds to stack underflow by synthesizing fresh type // variables instead of reporting an error; those vars accumulate diff --git a/mshell/TypeEnum.go b/mshell/TypeEnum.go index 4a20ee81..9180ed96 100644 --- a/mshell/TypeEnum.go +++ b/mshell/TypeEnum.go @@ -80,5 +80,9 @@ func (c *Checker) defineEnum(d *MShellEnumDecl) { continue } c.nameBuiltins[mid] = append(c.nameBuiltins[mid], QuoteSig{Inputs: u.payloads, Outputs: []TypeId{enumType}}) + if c.enumMemberToks == nil { + c.enumMemberToks = make(map[NameId]Token, len(uniq)) + } + c.enumMemberToks[mid] = u.tok } } diff --git a/tests/typecheck_fail/enum_member_def_collision.msh b/tests/typecheck_fail/enum_member_def_collision.msh new file mode 100644 index 00000000..7c3a68a3 --- /dev/null +++ b/tests/typecheck_fail/enum_member_def_collision.msh @@ -0,0 +1,8 @@ +# Enum constructors and defs share the word namespace, so a def reusing a +# member name must be rejected (in either textual order — enums register +# first regardless). Without this, the checker resolves the word to the +# constructor while the runtime runs the def, and a type-checked program +# fails at runtime. +enum E = foo | z end +def foo ( -- int) 42 end +foo str wl From c441027fa2ad569e000895a728e571dd3abb058a Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Wed, 1 Jul 2026 20:55:27 -0500 Subject: [PATCH 20/32] Register startup-file type/enum declarations in both runtime and checker `enum` declarations only functioned in the main script: RegisterEnums had a single call site on the script path, so an enum declared in the stdlib, the user init file, or at the interactive prompt parsed and evaluated as a no-op and its member words fell through to the bare-literal path ("Found literal token"). `type` aliases in startup files were similarly invisible to the checker, whose pre-passes walked only the main file's items. Type-checking and running are separate today but should semantically match, so both sides now read startup declarations: - Runtime: loadStartupFile registers each startup file's enums on the EvalState (covering script and interactive sessions), and the REPL line executor registers enums declared at the prompt, next to its existing def handling. - Checker: loadStartupFile retains startup top-level items; TypeCheckProgram passes them to the new Checker.RegisterStartupTypes, which runs the same three-phase pre-pass order as CheckProgram (enum names, type aliases, enum bodies + constructor words). Declaration bodies are not type-checked, matching the stdlib-def treatment. The LSP diagnostics pass registers the stdlib's items the same way. Collision checks now span files in both directions: a startup enum member rejects a colliding program def, and vice versa. Tests: startup enum/type visible to checker + cross-file collision (TypeEnum_test.go), startup-file enum registers constructors and constructs at runtime (Startup_test.go). Suites green: tests 219, typecheck 207, go test. Co-Authored-By: Claude Fable 5 Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/Main.go | 41 +++++++++++++++++------- mshell/Startup_test.go | 56 +++++++++++++++++++++++++++++++-- mshell/TypeCheckProgram.go | 36 ++++++++++++++++++++- mshell/TypeCheckProgram_test.go | 2 +- mshell/TypeEnum_test.go | 47 +++++++++++++++++++++++++++ mshell/lsp.go | 15 +++++---- mshell/lsp_test.go | 2 +- 7 files changed, 175 insertions(+), 24 deletions(-) diff --git a/mshell/Main.go b/mshell/Main.go index 42d55c6f..ef9b9dca 100644 --- a/mshell/Main.go +++ b/mshell/Main.go @@ -166,7 +166,7 @@ func getStartupFileSpecs(options startupLoadOptions) (startupFileSpec, startupFi return stdlibSpec, initSpec, nil } -func loadStartupFile(path string, description string, stack *MShellStack, context ExecuteContext, state *EvalState, definitions *[]MShellDefinition) error { +func loadStartupFile(path string, description string, stack *MShellStack, context ExecuteContext, state *EvalState, definitions *[]MShellDefinition, items *[]MShellParseItem) error { sourceBytes, err := os.ReadFile(path) if err != nil { if errors.Is(err, os.ErrNotExist) { @@ -182,6 +182,13 @@ func loadStartupFile(path string, description string, stack *MShellStack, contex *definitions = append(*definitions, parsedFile.Definitions...) state.AddCompletionDefinitions(parsedFile.Definitions) + // Register enum constructors declared in this startup file, and retain the + // top-level items so the type checker can register the file's `type` and + // `enum` declarations — startup declarations behave like the main file's. + state.RegisterEnums(parsedFile.Items) + if items != nil { + *items = append(*items, parsedFile.Items...) + } if len(parsedFile.Items) > 0 { callStackItem := CallStackItem{ @@ -257,16 +264,17 @@ func preflightStartupFile(spec startupFileSpec) string { return fmt.Sprintf("present at %s (parses ok; not evaluated because the other startup file failed first)", spec.path) } -func loadStartupDefinitions(options startupLoadOptions, stack *MShellStack, context ExecuteContext, state *EvalState) ([]MShellDefinition, error) { +func loadStartupDefinitions(options startupLoadOptions, stack *MShellStack, context ExecuteContext, state *EvalState) ([]MShellDefinition, []MShellParseItem, error) { stdlibSpec, initSpec, err := getStartupFileSpecs(options) if err != nil { - return nil, err + return nil, nil, err } definitions := make([]MShellDefinition, 0) - if err := loadStartupFile(stdlibSpec.path, stdlibSpec.description, stack, context, state, &definitions); err != nil { + var items []MShellParseItem + if err := loadStartupFile(stdlibSpec.path, stdlibSpec.description, stack, context, state, &definitions, &items); err != nil { initStatus := preflightStartupFile(initSpec) - return nil, &startupLoadError{ + return nil, nil, &startupLoadError{ which: "stdlib", spec: stdlibSpec, options: options, @@ -276,14 +284,14 @@ func loadStartupDefinitions(options startupLoadOptions, stack *MShellStack, cont } } - if err := loadStartupFile(initSpec.path, initSpec.description, stack, context, state, &definitions); err != nil { + if err := loadStartupFile(initSpec.path, initSpec.description, stack, context, state, &definitions, &items); err != nil { if !initSpec.required && errors.Is(err, os.ErrNotExist) { - return definitions, nil + return definitions, items, nil } - return nil, &startupLoadError{which: "init", spec: initSpec, options: options, cause: err} + return nil, nil, &startupLoadError{which: "init", spec: initSpec, options: options, cause: err} } - return definitions, nil + return definitions, items, nil } // formatStartupErrorMessage builds a multi-line explanation of how msh searches @@ -821,7 +829,7 @@ func main() { var allDefinitions []MShellDefinition - startupDefinitions, err := loadStartupDefinitions(startupLoadOptions{ + startupDefinitions, startupItems, err := loadStartupDefinitions(startupLoadOptions{ version: effectiveVersion, allowEnvOverrides: allowStartupEnvOverrides, requireInit: requireVersionedInit, @@ -836,7 +844,7 @@ func main() { state.AddCompletionDefinitions(file.Definitions) if checkTypes { - errs, ok := TypeCheckProgram(file, startupDefinitions) + errs, ok := TypeCheckProgram(file, startupDefinitions, startupItems) if !ok { for _, e := range errs { fmt.Fprintln(os.Stderr, e) @@ -2976,6 +2984,11 @@ func (state *TermState) ExecuteCurrentCommand() (bool, int) { state.evalState.AddCompletionDefinitions(parsed.Definitions) } + // Register enum constructors declared on this line, so an interactive + // `enum` declaration works like one in a script: its member words + // construct values on subsequent lines. + state.evalState.RegisterEnums(parsed.Items) + if len(parsed.Items) > 0 { state.initCallStackItem.MShellParseItem = parsed.Items[0] result := state.evalState.Evaluate(parsed.Items, &state.stack, state.context, state.stdLibDefs, state.initCallStackItem) @@ -3162,11 +3175,15 @@ func (state *TermState) getCurrentPos() (int, int, error) { } func stdLibDefinitions(stack *MShellStack, context ExecuteContext, state *EvalState) ([]MShellDefinition, error) { - return loadStartupDefinitions(startupLoadOptions{ + // The interactive path has no whole-program type-check pass, so the + // startup items (already registered on the EvalState for runtime enum + // construction inside loadStartupFile) are not needed here. + defs, _, err := loadStartupDefinitions(startupLoadOptions{ version: mshellVersion, allowEnvOverrides: true, requireInit: false, }, stack, context, state) + return defs, err } func registerTempFileForCleanup(tempFileName string) { diff --git a/mshell/Startup_test.go b/mshell/Startup_test.go index be7902dd..88c067df 100644 --- a/mshell/Startup_test.go +++ b/mshell/Startup_test.go @@ -151,7 +151,7 @@ func TestLoadStartupDefinitionsLoadsVersionedStdlibAndInit(t *testing.T) { stack, context, state := newStartupTestContext() - definitions, err := loadStartupDefinitions(startupLoadOptions{ + definitions, _, err := loadStartupDefinitions(startupLoadOptions{ version: version, allowEnvOverrides: false, requireInit: true, @@ -215,7 +215,7 @@ func TestLoadStartupDefinitionsRequiresInitForExplicitVersion(t *testing.T) { stack, context, state := newStartupTestContext() - _, err := loadStartupDefinitions(startupLoadOptions{ + _, _, err := loadStartupDefinitions(startupLoadOptions{ version: version, allowEnvOverrides: false, requireInit: true, @@ -251,7 +251,7 @@ func TestLoadStartupDefinitionsAllowsMissingInitForImplicitVersion(t *testing.T) stack, context, state := newStartupTestContext() - definitions, err := loadStartupDefinitions(startupLoadOptions{ + definitions, _, err := loadStartupDefinitions(startupLoadOptions{ version: version, allowEnvOverrides: true, requireInit: false, @@ -417,3 +417,53 @@ func TestEnvWithoutStartupOverridesRemovesOnlyStartupVars(t *testing.T) { t.Fatalf("filtered env missing KEEP_ME: %q", filteredJoined) } } + +func TestStartupFileEnumRegistersConstructors(t *testing.T) { + // An `enum` declared in a startup file (stdlib / init) must register its + // constructors on the EvalState, so a member word in the main program (or + // at the interactive prompt) constructs a value instead of falling through + // to the bare-literal path. + dir := t.TempDir() + path := filepath.Join(dir, "init.msh") + if err := os.WriteFile(path, []byte("enum Status = active | inactive end\n"), 0644); err != nil { + t.Fatalf("WriteFile(init) error = %v", err) + } + + stack, context, state := newStartupTestContext() + var defs []MShellDefinition + var items []MShellParseItem + if err := loadStartupFile(path, "test init", &stack, context, &state, &defs, &items); err != nil { + t.Fatalf("loadStartupFile() error = %v", err) + } + + info, ok := state.EnumMembers["active"] + if !ok { + t.Fatalf("expected member 'active' registered from startup file") + } + if info.EnumName != "Status" || info.Arity != 0 { + t.Fatalf("EnumMembers[active] = %+v, want Status arity 0", info) + } + if len(items) == 0 { + t.Fatalf("expected startup items to be retained for the checker") + } + + parsed, err := parseMShellInput("active", &TokenFile{"main"}) + if err != nil { + t.Fatalf("parse error: %v", err) + } + callStackItem := CallStackItem{MShellParseItem: parsed.Items[0], Name: "main", CallStackType: CALLSTACKFILE} + result := state.Evaluate(parsed.Items, &stack, context, defs, callStackItem) + if !result.Success { + t.Fatalf("evaluating member word failed") + } + if len(stack) != 1 { + t.Fatalf("len(stack) = %d, want 1", len(stack)) + } + en, ok := stack[0].(*MShellEnum) + if !ok { + t.Fatalf("stack top = %T (%s), want *MShellEnum", stack[0], stack[0].DebugString()) + } + if en.EnumName != "Status" || en.Member != "active" { + t.Fatalf("enum value = %s.%s, want Status.active", en.EnumName, en.Member) + } +} diff --git a/mshell/TypeCheckProgram.go b/mshell/TypeCheckProgram.go index facb13ac..2d01bf9a 100644 --- a/mshell/TypeCheckProgram.go +++ b/mshell/TypeCheckProgram.go @@ -43,12 +43,18 @@ import ( // type-checked here — std.msh exercises features (process lists, // format strings, dynamic exec) the v1 checker does not yet model, // and we trust the runtime tests catch breakage there. -func TypeCheckProgram(file *MShellFile, stdlibDefs []MShellDefinition) (errors []string, ok bool) { +// +// startupItems is the startup files' top-level parse items; their `type` +// and `enum` declarations are registered (bodies are not checked, matching +// the def treatment) so the checker sees the same declarations the runtime +// does. +func TypeCheckProgram(file *MShellFile, stdlibDefs []MShellDefinition, startupItems []MShellParseItem) (errors []string, ok bool) { arena := NewTypeArena() names := NewNameTable() checker := NewChecker(arena, names) checker.RegisterStdlibSigs(stdlibDefs) + checker.RegisterStartupTypes(startupItems) checker.CheckProgram(file) out := make([]string, 0, len(checker.errors)) @@ -89,6 +95,34 @@ func (c *Checker) RegisterStdlibSigs(defs []MShellDefinition) { } } +// RegisterStartupTypes registers the `type` and `enum` declarations found in +// the startup files' top-level items (the stdlib, then the user init file), +// so the checked program sees the same declarations the runtime does. It runs +// the same three-phase order as CheckProgram's own pre-passes — enum names, +// then type aliases, then enum payload bodies + constructor words — so +// startup declarations may reference each other in any order. Call after +// RegisterStdlibSigs (so a member colliding with a startup def is caught) and +// before CheckProgram (whose def pre-pass catches the reverse collision). +func (c *Checker) RegisterStartupTypes(items []MShellParseItem) { + var enumDecls []*MShellEnumDecl + for _, item := range items { + if d, ok := item.(*MShellEnumDecl); ok { + if c.predeclareEnum(d) { + enumDecls = append(enumDecls, d) + } + } + } + for _, item := range items { + if d, ok := item.(*MShellTypeDecl); ok { + body := c.resolveTypeExpr(d.Body, nil) + c.DeclareType(d.Name, body) + } + } + for _, d := range enumDecls { + c.defineEnum(d) + } +} + // CheckProgram is the file-level type-check pass. It registers all // type declarations and user-defined function sigs, then walks the // parse tree driving the type stack. Error accumulation lives on the diff --git a/mshell/TypeCheckProgram_test.go b/mshell/TypeCheckProgram_test.go index 21d4a994..b7e646da 100644 --- a/mshell/TypeCheckProgram_test.go +++ b/mshell/TypeCheckProgram_test.go @@ -15,7 +15,7 @@ func parseAndCheck(t *testing.T, src string) ([]string, bool) { if err != nil { t.Fatalf("parse error: %v", err) } - return TypeCheckProgram(file, nil) + return TypeCheckProgram(file, nil, nil) } func TestTypeCheckProgramEmpty(t *testing.T) { diff --git a/mshell/TypeEnum_test.go b/mshell/TypeEnum_test.go index e5eda544..5cd57a7c 100644 --- a/mshell/TypeEnum_test.go +++ b/mshell/TypeEnum_test.go @@ -121,3 +121,50 @@ func TestEnumRecursivePayload(t *testing.T) { t.Fatalf("self-referential enum payload should type-check; errs=%v ok=%v", errs, ok) } } + +// parseItemsForTest parses source and returns its top-level items, for tests +// that feed startup-file declarations to the checker. +func parseItemsForTest(t *testing.T, src string) []MShellParseItem { + t.Helper() + l := NewLexer(src, nil) + p := NewMShellParser(l) + file, err := p.ParseFile() + if err != nil { + t.Fatalf("parse error: %v", err) + } + return file.Items +} + +func TestStartupEnumAndTypeVisibleToChecker(t *testing.T) { + // `enum` and `type` declarations in a startup file (stdlib / init) are + // registered before the main program is checked, so the program can + // construct members, match on them, and reference the alias — the same + // declarations the runtime registers. + startup := parseItemsForTest(t, "enum Status = active | inactive end\ntype Tagged = {name: str, s: Status}") + l := NewLexer("active match\n active : \"A\" wl,\n inactive : \"I\" wl,\nend\n{ \"name\": \"x\", \"s\": active } as Tagged drop", nil) + p := NewMShellParser(l) + file, err := p.ParseFile() + if err != nil { + t.Fatalf("parse error: %v", err) + } + errs, ok := TypeCheckProgram(file, nil, startup) + if !ok || len(errs) != 0 { + t.Fatalf("startup enum/type should be visible to the checker; errs=%v ok=%v", errs, ok) + } +} + +func TestDefCollidingWithStartupEnumMemberRejected(t *testing.T) { + // The member/def collision check spans files: a program def reusing a + // startup enum's member name is rejected, same as a same-file collision. + startup := parseItemsForTest(t, "enum E = foo | zz end") + l := NewLexer("def foo ( -- int) 42 end\nfoo drop", nil) + p := NewMShellParser(l) + file, err := p.ParseFile() + if err != nil { + t.Fatalf("parse error: %v", err) + } + errs, ok := TypeCheckProgram(file, nil, startup) + if ok { + t.Fatalf("def colliding with startup enum member should fail; errs=%v", errs) + } +} diff --git a/mshell/lsp.go b/mshell/lsp.go index a7a34377..b8d55288 100644 --- a/mshell/lsp.go +++ b/mshell/lsp.go @@ -41,6 +41,7 @@ type lspServer struct { envNames map[string]struct{} candsBuf []string stdlibDefs []MShellDefinition + stdlibItems []MShellParseItem // stdlib top-level items; `type`/`enum` decls registered per diagnostics pass builtinSigs map[string][]string // name -> formatted "(in -- out)" sigs from the type checker stdlibHover map[string][]string // name -> formatted sigs for stdlib defs } @@ -134,10 +135,11 @@ func RunLSP(in io.Reader, out io.Writer) error { envNames: make(map[string]struct{}), } - if defs, err := loadStdlibDefsForLSP(); err != nil { + if defs, items, err := loadStdlibDefsForLSP(); err != nil { logLSP(fmt.Sprintf("type-check diagnostics: stdlib unavailable (%v); proceeding without stdlib sigs", err)) } else { server.stdlibDefs = defs + server.stdlibItems = items } server.builtinSigs, server.stdlibHover = buildHoverIndex(server.stdlibDefs) @@ -181,23 +183,23 @@ func buildHoverIndex(stdlibDefs []MShellDefinition) (map[string][]string, map[st // MSHSTDLIB if set, else the version-keyed install path), parses it, // and returns its definitions. The bodies are not evaluated; we only // need the signatures to register as builtins for the type-checker. -func loadStdlibDefsForLSP() ([]MShellDefinition, error) { +func loadStdlibDefsForLSP() ([]MShellDefinition, []MShellParseItem, error) { stdlibSpec, _, err := getStartupFileSpecs(startupLoadOptions{ version: mshellVersion, allowEnvOverrides: true, }) if err != nil { - return nil, err + return nil, nil, err } source, err := os.ReadFile(stdlibSpec.path) if err != nil { - return nil, err + return nil, nil, err } parsed, err := parseMShellInput(string(source), &TokenFile{stdlibSpec.path}) if err != nil { - return nil, err + return nil, nil, err } - return parsed.Definitions, nil + return parsed.Definitions, parsed.Items, nil } func (s *lspServer) run() error { @@ -548,6 +550,7 @@ func (s *lspServer) computeDiagnostics(text string) []protocol.Diagnostic { names := NewNameTable() checker := NewChecker(arena, names) checker.RegisterStdlibSigs(s.stdlibDefs) + checker.RegisterStartupTypes(s.stdlibItems) checker.CheckProgram(file) errs := checker.Errors() diff --git a/mshell/lsp_test.go b/mshell/lsp_test.go index 689d5a33..d92d4a72 100644 --- a/mshell/lsp_test.go +++ b/mshell/lsp_test.go @@ -1328,7 +1328,7 @@ func TestCompletionWordIncludesBuiltinAndStdlib(t *testing.T) { } func TestBuildHoverIndexCoversTypedBuiltinsAndStdlib(t *testing.T) { - stdlibDefs, err := loadStdlibDefsForLSP() + stdlibDefs, _, err := loadStdlibDefsForLSP() if err != nil { t.Skipf("stdlib not available in test environment: %v", err) } From debe03e77c25bfcd378c6d08b61e249eb4e17562 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Wed, 1 Jul 2026 21:40:44 -0500 Subject: [PATCH 21/32] Unify render/JSON/equality into shared iterative walkers over all containers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Deep values with ALTERNATING container kinds crashed with a fatal Go stack overflow: `enum E = m Maybe[E] | z end` built ~4M deep (a linked list with optional next — a natural shape) killed `str`, `toJson`, and `=`. The earlier per-type stack-safety fixes only covered pure enum nesting — each iterative walker delegated non-enum payloads to that child's own recursive method (Maybe.ToString, Maybe.Equals, list/dict ToJson, ...), so every enum→Maybe→enum cycle added Go frames and alternation restored O(depth) recursion. compareValues was already immune because it expands every kind inline; this applies the same design to the other two walks: - renderValue(obj, flavor): one work-stack renderer for ToString / DebugString / ToJson, expanding enum, Maybe, list, dict, and pipe inline with each kind's exact existing format (flavor switches per child the way the old methods did: list children render as DebugString, a dict's `str` form is its JSON, enum payloads render as ToString). Scalars, grids, and quotations stay leaves via their own methods. - equalsIter(a, b): one pair-stack equality walk expanding the same kinds inline, keeping the sameRef fast path and the dagGuard shared-substructure memo that previously lived only on the enum walk (so lists/dicts/Maybes now get DAG protection too). All the per-type methods (enum, Maybe, list, dict, pipe) are one-line routings into the shared walkers; enumRender, the enum ToJson walker, the old recursive bodies, itemsEqual, and DebugStrs are deleted. Net -30 lines. Output is byte-identical across the suites, with two deliberate changes: multi-key dict DebugString now emits sorted keys (it iterated Go map order — nondeterministic — before), and dict equality no longer short-circuits on a TypeName mismatch, so a str and a literal with equal text compare equal inside dicts exactly as they do at top level. The 4M alternating chain now renders (18M chars), serializes (14M), and compares cleanly; enum↔dict alternation at 500k likewise. Regression test: tests/success/enum_alternating_deep.msh (50k, mirroring the deep-test family). Suites green: tests 220, typecheck 208, go test. Co-Authored-By: Claude Fable 5 Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/MShellObject.go | 573 ++++++++---------- tests/success/enum_alternating_deep.msh | 25 + .../success/enum_alternating_deep.msh.stdout | 4 + 3 files changed, 286 insertions(+), 316 deletions(-) create mode 100644 tests/success/enum_alternating_deep.msh create mode 100644 tests/success/enum_alternating_deep.msh.stdout diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index 6abf3541..27107aaa 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -199,10 +199,7 @@ func (m Maybe) CommandLine() string { // This is meant for things like error messages, should be limited in length to 30 chars or so. func (m Maybe) DebugString() string { - if m.obj == nil { - return "None" - } - return fmt.Sprintf("Maybe(%s)", m.obj.DebugString()) + return renderValue(m, flavorDebug) } func (m Maybe) Index(index int) (MShellObject, error) { return nil, fmt.Errorf("Cannot index into a Maybe.\n") @@ -221,17 +218,11 @@ func (m Maybe) Slice(startInc int, endExc int) (MShellObject, error) { } func (m Maybe) ToJson() string { - if m.obj == nil { - return "null" - } - return m.obj.ToJson() + return renderValue(m, flavorJson) } func (m Maybe) ToString() string { - if m.obj == nil { - return "None" - } - return fmt.Sprintf("Just(%s)", m.obj.ToString()) + return renderValue(m, flavorStr) } func (m Maybe) IndexErrStr() string { @@ -243,24 +234,7 @@ func (m Maybe) Concat(other MShellObject) (MShellObject, error) { } func (m Maybe) Equals(other MShellObject) (bool, error) { - // Maybe values are constructed as *Maybe at runtime, so accept either the - // value or pointer form — a plain other.(Maybe) misses every *Maybe and - // would make all Maybe-vs-Maybe comparisons (including None==None) false. - otherMaybe, ok := asMaybe(other) - if !ok { - return false, nil - } - - if m.obj == nil && otherMaybe.obj == nil { - return true, nil - } - - if m.obj == nil || otherMaybe.obj == nil { - return false, nil - } - - equal, err := m.obj.Equals(otherMaybe.obj) - return equal, err + return equalsIter(m, other) } func (m Maybe) CastString() (string, error) { @@ -366,54 +340,8 @@ func (e *MShellEnum) TypeName() string { return e.EnumName } func (e *MShellEnum) IsCommandLineable() bool { return true } func (e *MShellEnum) IsNumeric() bool { return false } func (e *MShellEnum) FloatNumeric() float64 { return 0 } -func (e *MShellEnum) CommandLine() string { return enumRender(e) } -func (e *MShellEnum) DebugString() string { return enumRender(e) } - -// enumRender renders an enum value as `member` (nullary) or -// `member(p0 p1 ...)`. Nested enum payloads are expanded with an explicit -// work stack rather than function recursion, so an arbitrarily deep value -// (e.g. a long `node(node(... ) ...)` chain) cannot overflow the call stack. -// Non-enum payloads use their own ToString. -func enumRender(top *MShellEnum) string { - var sb strings.Builder - type task struct { - lit string - obj MShellObject - isLit bool - } - stack := []task{{obj: top}} - for len(stack) > 0 { - t := stack[len(stack)-1] - stack = stack[:len(stack)-1] - if t.isLit { - sb.WriteString(t.lit) - continue - } - en, ok := t.obj.(*MShellEnum) - if !ok { - sb.WriteString(t.obj.ToString()) - continue - } - if len(en.Payload) == 0 { - sb.WriteString(en.Member) - continue - } - // Emit `member ( p0 " " p1 ... )`; push reversed so it pops in order. - seq := make([]task, 0, len(en.Payload)*2+3) - seq = append(seq, task{lit: en.Member, isLit: true}, task{lit: "(", isLit: true}) - for i, p := range en.Payload { - if i > 0 { - seq = append(seq, task{lit: " ", isLit: true}) - } - seq = append(seq, task{obj: p}) - } - seq = append(seq, task{lit: ")", isLit: true}) - for i := len(seq) - 1; i >= 0; i-- { - stack = append(stack, seq[i]) - } - } - return sb.String() -} +func (e *MShellEnum) CommandLine() string { return renderValue(e, flavorStr) } +func (e *MShellEnum) DebugString() string { return renderValue(e, flavorStr) } func (e *MShellEnum) Index(index int) (MShellObject, error) { return nil, fmt.Errorf("Cannot index into an enum.\n") @@ -434,106 +362,21 @@ func (e *MShellEnum) Slice(startInc int, endExc int) (MShellObject, error) { // ToJson uses serde's externally-tagged convention — the de-facto standard for // tagged unions in JSON: a nullary member is the bare member string; a member // with a single payload is `{"member": value}`; with several, `{"member": -// [v0, v1, ...]}`. Like enumRender, nested enum payloads are expanded with an -// explicit work stack rather than function recursion, so an arbitrarily deep -// value cannot overflow the call stack; output is appended to a single builder -// (no intermediate per-subtree strings), making it O(total output size). -// Non-enum payloads delegate to their own ToJson. +// [v0, v1, ...]}`. Rendering runs on renderValue's shared work stack, so an +// arbitrarily deep value cannot overflow the call stack. func (e *MShellEnum) ToJson() string { - var sb strings.Builder - type task struct { - lit string - obj MShellObject - isLit bool - } - stack := []task{{obj: e}} - for len(stack) > 0 { - t := stack[len(stack)-1] - stack = stack[:len(stack)-1] - if t.isLit { - sb.WriteString(t.lit) - continue - } - en, ok := t.obj.(*MShellEnum) - if !ok { - sb.WriteString(t.obj.ToJson()) - continue - } - if len(en.Payload) == 0 { - fmt.Fprintf(&sb, "%q", en.Member) - continue - } - // Emit `{"member": value}` (single payload) or - // `{"member": [v0, v1, ...]}` (several); push reversed so it pops in - // order, with enum payloads re-expanded by this same loop. - seq := make([]task, 0, len(en.Payload)*2+4) - seq = append(seq, task{lit: fmt.Sprintf("{%q: ", en.Member), isLit: true}) - if len(en.Payload) == 1 { - seq = append(seq, task{obj: en.Payload[0]}) - } else { - seq = append(seq, task{lit: "[", isLit: true}) - for i, p := range en.Payload { - if i > 0 { - seq = append(seq, task{lit: ", ", isLit: true}) - } - seq = append(seq, task{obj: p}) - } - seq = append(seq, task{lit: "]", isLit: true}) - } - seq = append(seq, task{lit: "}", isLit: true}) - for i := len(seq) - 1; i >= 0; i-- { - stack = append(stack, seq[i]) - } - } - return sb.String() + return renderValue(e, flavorJson) } -func (e *MShellEnum) ToString() string { return enumRender(e) } +func (e *MShellEnum) ToString() string { return renderValue(e, flavorStr) } func (e *MShellEnum) IndexErrStr() string { return "" } func (e *MShellEnum) Concat(other MShellObject) (MShellObject, error) { return nil, fmt.Errorf("Cannot concatenate an enum.\n") } -// Equals compares two enum values structurally. Nested enum payloads are -// walked with an explicit pair stack rather than function recursion, so two -// arbitrarily deep values cannot overflow the call stack; only non-enum -// payloads (the leaves) delegate to their own Equals. func (e *MShellEnum) Equals(other MShellObject) (bool, error) { - type pair struct{ a, b MShellObject } - var guard dagGuard - stack := []pair{{a: e, b: other}} - for len(stack) > 0 { - p := stack[len(stack)-1] - stack = stack[:len(stack)-1] - // Shared substructure: a pointer-identical pair is equal by - // definition, and a pair this walk already expanded compared equal - // (see dagGuard). Skipping both keeps DAG-shaped values (a subtree - // reused twice per level) linear instead of 2^n. - if sameRef(p.a, p.b) || guard.skip(p.a, p.b) { - continue - } - ea, aok := p.a.(*MShellEnum) - eb, bok := p.b.(*MShellEnum) - if aok || bok { - // At least one side is an enum: equal only if both are enums with - // the same name, member, and arity. Payloads are deferred onto the - // stack so this never re-enters Equals on an enum. - if !aok || !bok || ea.EnumName != eb.EnumName || ea.Member != eb.Member || len(ea.Payload) != len(eb.Payload) { - return false, nil - } - for i := range ea.Payload { - stack = append(stack, pair{a: ea.Payload[i], b: eb.Payload[i]}) - } - continue - } - // Neither side is an enum: compare by their own equality. - eq, err := p.a.Equals(p.b) - if err != nil || !eq { - return false, err - } - } - return true, nil + return equalsIter(e, other) } func (e *MShellEnum) CastString() (string, error) { return e.Member, nil } @@ -662,16 +505,7 @@ func (*MShellDict) CommandLine() string { // This is meant for things like error messages, should be limited in length to 30 chars or so. func (d *MShellDict) DebugString() string { - // TODO: implement this - - sb := strings.Builder{} - sb.WriteString("Dictionary{") - for key, value := range d.Items { - sb.WriteString(fmt.Sprintf("%s: %s, ", key, value.DebugString())) - } - sb.WriteString("}") - return sb.String() - + return renderValue(d, flavorDebug) } func (*MShellDict) Index(index int) (MShellObject, error) { return nil, fmt.Errorf("Cannot index into a dictionary.\n") @@ -687,43 +521,7 @@ func (*MShellDict) Slice(startInc int, endExc int) (MShellObject, error) { return nil, fmt.Errorf("Cannot slice a dictionary.\n") } func (d *MShellDict) ToJson() string { - var sb strings.Builder - - if len(d.Items) == 0 { - return "{}" - } - - if len(d.Items) == 1 { - for key, value := range d.Items { - keyEnc, _ := json.Marshal(key) - return fmt.Sprintf("{%s: %s}", string(keyEnc), value.ToJson()) - } - } - - keys := make([]string, 0, len(d.Items)) - for key := range d.Items { - keys = append(keys, key) - } - sort.Strings(keys) - - sb.WriteString("{") - - // Write the first key-value pair - firstKey := keys[0] - firstValue := d.Items[firstKey] - - firstKeyEnc, _ := json.Marshal(firstKey) - sb.WriteString(fmt.Sprintf("%s: %s", string(firstKeyEnc), firstValue.ToJson())) - - for _, key := range keys[1:] { - value := d.Items[key] - keyEnc, _ := json.Marshal(key) - sb.WriteString(fmt.Sprintf(", %s: %s", string(keyEnc), value.ToJson())) - } - - sb.WriteString("}") - - return sb.String() + return renderValue(d, flavorJson) } func (d *MShellDict) ToString() string { // This is what is used with 'str' command @@ -739,51 +537,7 @@ func (*MShellDict) Concat(other MShellObject) (MShellObject, error) { } func (thisDict *MShellDict) Equals(other MShellObject) (bool, error) { - thisKeys := make([]string, 0, len(thisDict.Items)) - for key := range thisDict.Items { - thisKeys = append(thisKeys, key) - } - sort.Strings(thisKeys) - - otherDict, ok := other.(*MShellDict) - if !ok { - return false, nil - } - - otherKeys := make([]string, 0, len(otherDict.Items)) - for key := range otherDict.Items { - otherKeys = append(otherKeys, key) - } - sort.Strings(otherKeys) - - if len(thisKeys) != len(otherKeys) { - return false, nil - } - - for i, key := range thisKeys { - if key != otherKeys[i] { - return false, nil - } - } - - for _, key := range thisKeys { - thisValue := thisDict.Items[key] - otherValue := otherDict.Items[key] - - if thisValue.TypeName() != otherValue.TypeName() { - return false, nil - } - - equal, err := thisValue.Equals(otherValue) - if err != nil { - return false, err - } - if !equal { - return false, nil - } - } - - return true, nil + return equalsIter(thisDict, other) } // This is meant for completely unambiougous conversion to a string value. @@ -1200,6 +954,242 @@ func refPairKey(a, b MShellObject) (refPair, bool) { return refPair{}, false } +// renderFlavor selects which of a value's three textual forms renderValue +// emits: flavorStr is ToString (the `str` form), flavorDebug is DebugString +// (stack dumps, list display), flavorJson is ToJson. Containers pick their +// children's flavor the same way the per-type methods always did: a list +// renders children as DebugString, a dict's `str` form is its JSON form, an +// enum renders payloads with ToString, and Maybe keeps its own flavor. +type renderFlavor uint8 + +const ( + flavorStr renderFlavor = iota + flavorDebug + flavorJson +) + +type renderTask struct { + lit string + obj MShellObject + flavor renderFlavor + isLit bool +} + +func renderLit(s string) renderTask { return renderTask{lit: s, isLit: true} } + +// renderJoin builds the task sequence `open item0 sep item1 sep ... close`, +// rendering each item in the given flavor. +func renderJoin(open, sep, close string, items []MShellObject, flavor renderFlavor) []renderTask { + seq := make([]renderTask, 0, len(items)*2+2) + if open != "" { + seq = append(seq, renderLit(open)) + } + for i, it := range items { + if i > 0 { + seq = append(seq, renderLit(sep)) + } + seq = append(seq, renderTask{obj: it, flavor: flavor}) + } + if close != "" { + seq = append(seq, renderLit(close)) + } + return seq +} + +// renderValue renders a value in the requested flavor with one explicit work +// stack instead of method recursion, expanding every container kind — enum, +// Maybe, list, dict, pipe — inline. Arbitrarily deep values therefore cannot +// overflow the call stack even when kinds alternate (enum→Maybe→enum, ...), +// which per-type iterative renderers could not guarantee: each one delegated +// other kinds to the child's own recursive method. Leaf kinds (scalars, +// grids, quotations) still render via their own methods; their nesting depth +// is bounded by their own structure. +func renderValue(root MShellObject, flavor renderFlavor) string { + var sb strings.Builder + stack := []renderTask{{obj: root, flavor: flavor}} + // push schedules seq to pop in order (reversed onto the LIFO stack). + push := func(seq []renderTask) { + for i := len(seq) - 1; i >= 0; i-- { + stack = append(stack, seq[i]) + } + } + for len(stack) > 0 { + t := stack[len(stack)-1] + stack = stack[:len(stack)-1] + if t.isLit { + sb.WriteString(t.lit) + continue + } + if m, ok := asMaybe(t.obj); ok { + switch { + case m.IsNone() && t.flavor == flavorJson: + sb.WriteString("null") + case m.IsNone(): + sb.WriteString("None") + case t.flavor == flavorJson: + push([]renderTask{{obj: m.obj, flavor: flavorJson}}) + case t.flavor == flavorDebug: + push([]renderTask{renderLit("Maybe("), {obj: m.obj, flavor: flavorDebug}, renderLit(")")}) + default: + push([]renderTask{renderLit("Just("), {obj: m.obj, flavor: flavorStr}, renderLit(")")}) + } + continue + } + switch v := t.obj.(type) { + case *MShellEnum: + if t.flavor == flavorJson { + // serde's externally-tagged convention: a nullary member is + // the bare member string, one payload is {"member": value}, + // several are {"member": [v0, v1, ...]}. + if len(v.Payload) == 0 { + fmt.Fprintf(&sb, "%q", v.Member) + continue + } + seq := make([]renderTask, 0, len(v.Payload)*2+4) + seq = append(seq, renderLit(fmt.Sprintf("{%q: ", v.Member))) + if len(v.Payload) == 1 { + seq = append(seq, renderTask{obj: v.Payload[0], flavor: flavorJson}) + } else { + seq = append(seq, renderJoin("[", ", ", "]", v.Payload, flavorJson)...) + } + seq = append(seq, renderLit("}")) + push(seq) + continue + } + // `member` (nullary) or `member(p0 p1 ...)`, payloads as ToString. + if len(v.Payload) == 0 { + sb.WriteString(v.Member) + continue + } + push(renderJoin(v.Member+"(", " ", ")", v.Payload, flavorStr)) + case *MShellList: + if t.flavor == flavorJson { + push(renderJoin("[", ", ", "]", v.Items, flavorJson)) + } else { + push(renderJoin("[", " ", "]", v.Items, flavorDebug)) + } + case *MShellPipe: + if t.flavor == flavorJson { + push(renderJoin("[", ", ", "]", v.List.Items, flavorJson)) + } else { + push(renderJoin("", " | ", "", v.List.Items, flavorDebug)) + } + case *MShellDict: + keys := sortedDictKeys(v.Items) + if t.flavor == flavorDebug { + seq := make([]renderTask, 0, len(keys)*3+2) + seq = append(seq, renderLit("Dictionary{")) + for _, k := range keys { + seq = append(seq, renderLit(k+": "), renderTask{obj: v.Items[k], flavor: flavorDebug}, renderLit(", ")) + } + seq = append(seq, renderLit("}")) + push(seq) + continue + } + // The `str` form of a dict is its JSON form. + if len(keys) == 0 { + sb.WriteString("{}") + continue + } + seq := make([]renderTask, 0, len(keys)*2+2) + seq = append(seq, renderLit("{")) + for i, k := range keys { + keyEnc, _ := json.Marshal(k) + if i > 0 { + seq = append(seq, renderLit(", ")) + } + seq = append(seq, renderLit(string(keyEnc)+": "), renderTask{obj: v.Items[k], flavor: flavorJson}) + } + seq = append(seq, renderLit("}")) + push(seq) + default: + switch t.flavor { + case flavorDebug: + sb.WriteString(t.obj.DebugString()) + case flavorJson: + sb.WriteString(t.obj.ToJson()) + default: + sb.WriteString(t.obj.ToString()) + } + } + } + return sb.String() +} + +// equalsIter is structural equality over any two values, walked with one +// explicit pair stack that expands every container kind — enum, Maybe, list, +// dict, pipe — inline, so deep values cannot overflow the call stack even +// when kinds alternate. Pointer-identical pairs are skipped (equal by +// definition), and past a step threshold already-expanded pairs are memoized +// (see dagGuard), so shared substructure cannot blow up exponentially. Leaf +// kinds compare via their own Equals. +func equalsIter(a, b MShellObject) (bool, error) { + type pair struct{ a, b MShellObject } + var guard dagGuard + stack := []pair{{a: a, b: b}} + for len(stack) > 0 { + p := stack[len(stack)-1] + stack = stack[:len(stack)-1] + if sameRef(p.a, p.b) || guard.skip(p.a, p.b) { + continue + } + if am, aok := asMaybe(p.a); aok { + bm, bok := asMaybe(p.b) + if !bok || am.IsNone() != bm.IsNone() { + return false, nil + } + if !am.IsNone() { + stack = append(stack, pair{a: am.obj, b: bm.obj}) + } + continue + } + switch av := p.a.(type) { + case *MShellEnum: + bv, ok := p.b.(*MShellEnum) + if !ok || av.EnumName != bv.EnumName || av.Member != bv.Member || len(av.Payload) != len(bv.Payload) { + return false, nil + } + for i := range av.Payload { + stack = append(stack, pair{a: av.Payload[i], b: bv.Payload[i]}) + } + case *MShellList: + bv, ok := p.b.(*MShellList) + if !ok || len(av.Items) != len(bv.Items) { + return false, nil + } + for i := range av.Items { + stack = append(stack, pair{a: av.Items[i], b: bv.Items[i]}) + } + case *MShellPipe: + bv, ok := p.b.(*MShellPipe) + if !ok || len(av.List.Items) != len(bv.List.Items) { + return false, nil + } + for i := range av.List.Items { + stack = append(stack, pair{a: av.List.Items[i], b: bv.List.Items[i]}) + } + case *MShellDict: + bv, ok := p.b.(*MShellDict) + if !ok || len(av.Items) != len(bv.Items) { + return false, nil + } + for key, aval := range av.Items { + bval, ok := bv.Items[key] + if !ok { + return false, nil + } + stack = append(stack, pair{a: aval, b: bval}) + } + default: + eq, err := p.a.Equals(p.b) + if err != nil || !eq { + return eq, err + } + } + } + return true, nil +} + // compareValues returns -1, 0, or 1, giving a total order over every value // type. Different kinds are ordered by a fixed type rank (valueTypeRank); within // a kind the natural order is used (numbers numerically with int/float @@ -1660,19 +1650,6 @@ func (obj MShellFloat) CommandLine() string { return strconv.FormatFloat(obj.Value, 'f', -1, 64) } -// DebugString -func DebugStrs(objs []MShellObject) []string { - debugStrs := make([]string, len(objs)) - for i, obj := range objs { - if obj == nil { - debugStrs[i] = "nil" - } else { - debugStrs[i] = obj.DebugString() - } - } - return debugStrs -} - func (obj MShellLiteral) DebugString() string { return obj.LiteralText } @@ -1706,8 +1683,8 @@ func (obj *MShellQuotation) DebugString() string { } func (obj *MShellList) DebugString() string { - // Join the tokens with a space, surrounded by '[' and ']' - return "[" + strings.Join(DebugStrs(obj.Items), " ") + "]" + // Elements joined with a space, surrounded by '[' and ']' + return renderValue(obj, flavorDebug) } func cleanStringForTerminal(input string) string { @@ -1752,8 +1729,8 @@ func (obj MShellPath) DebugString() string { } func (obj *MShellPipe) DebugString() string { - // Join each item with a ' | ' - return strings.Join(DebugStrs(obj.List.Items), " | ") + // Each item joined with ' | ' + return renderValue(obj, flavorDebug) } func (obj MShellInt) DebugString() string { @@ -2249,17 +2226,7 @@ func (obj *MShellQuotation) ToJson() string { } func (obj *MShellList) ToJson() string { - builder := strings.Builder{} - builder.WriteString("[") - if len(obj.Items) > 0 { - builder.WriteString(obj.Items[0].ToJson()) - for _, item := range obj.Items[1:] { - builder.WriteString(", ") - builder.WriteString(item.ToJson()) - } - } - builder.WriteString("]") - return builder.String() + return renderValue(obj, flavorJson) } func (obj MShellString) ToJson() string { @@ -2492,24 +2459,6 @@ func (obj MShellBool) Equals(other MShellObject) (bool, error) { return obj.Value == asBool.Value, nil } -// itemsEqual compares two object slices element-wise by structural equality. -func itemsEqual(a, b []MShellObject) (bool, error) { - if len(a) != len(b) { - return false, nil - } - for i := range a { - // Pointer-identical elements are equal by definition; skipping them - // keeps lists with shared substructure from re-walking it. - if sameRef(a[i], b[i]) { - continue - } - eq, err := a[i].Equals(b[i]) - if err != nil || !eq { - return eq, err - } - } - return true, nil -} func (obj *MShellQuotation) Equals(other MShellObject) (bool, error) { // Quotations are code values; two are equal only when they are the same @@ -2519,11 +2468,7 @@ func (obj *MShellQuotation) Equals(other MShellObject) (bool, error) { } func (obj *MShellList) Equals(other MShellObject) (bool, error) { - o, ok := other.(*MShellList) - if !ok { - return false, nil - } - return itemsEqual(obj.Items, o.Items) + return equalsIter(obj, other) } func (obj MShellString) Equals(other MShellObject) (bool, error) { @@ -2555,11 +2500,7 @@ func (obj MShellPath) Equals(other MShellObject) (bool, error) { } func (obj *MShellPipe) Equals(other MShellObject) (bool, error) { - o, ok := other.(*MShellPipe) - if !ok { - return false, nil - } - return itemsEqual(obj.List.Items, o.List.Items) + return equalsIter(obj, other) } func (obj MShellInt) Equals(other MShellObject) (bool, error) { diff --git a/tests/success/enum_alternating_deep.msh b/tests/success/enum_alternating_deep.msh new file mode 100644 index 00000000..9ae0be88 --- /dev/null +++ b/tests/success/enum_alternating_deep.msh @@ -0,0 +1,25 @@ +# Deeply nested values must render, serialize, and compare without overflowing +# even when container kinds alternate: rendering/JSON/equality all run on one +# shared work-stack walker (renderValue / equalsIter) that expands enum, Maybe, +# list, dict, and pipe inline. Per-type iterative walkers were not enough — an +# enum→Maybe→enum chain re-entered each type's recursive method and overflowed +# the Go stack well before this depth. +enum E = m Maybe[E] | z end +z e! +0 i! +( + @i 50000 >= if break end + @e just m e! + @i 1 + i! +) loop +@e str len str wl +@e toJson len str wl +z e2! +0 i! +( + @i 50000 >= if break end + @e2 just m e2! + @i 1 + i! +) loop +@e @e2 = str wl +@e @e2 just m = str wl diff --git a/tests/success/enum_alternating_deep.msh.stdout b/tests/success/enum_alternating_deep.msh.stdout new file mode 100644 index 00000000..731afcab --- /dev/null +++ b/tests/success/enum_alternating_deep.msh.stdout @@ -0,0 +1,4 @@ +450001 +350003 +true +false From 0d78c14689ab4affe3ed1c64cfafee6293b0961f Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Thu, 2 Jul 2026 18:58:20 -0500 Subject: [PATCH 22/32] Error on rendering cyclic values instead of hanging MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A list appended into itself (in-place `append`) is a genuinely cyclic value, and an enum payload list can close a cycle through the enum (`enum Box = wrap [Box] | z end`). Rendering one never terminated: the work-stack renderer re-expanded the same pointer forever, growing output and task stack without bound (the old recursive renderers at least died fast with a stack overflow). Equality and sorting already terminate via the pointer-identity fast path and pair memoization. mshell is strict — a cycle is always the degenerate artifact of appending a container into itself, never a meaningful value — so user-facing conversion now errors: `str` and `toJson` on a cyclic value fail with "Cannot convert a cyclic value (a container that contains itself) to a string/JSON". renderValueDetect tracks the containers currently being expanded as an on-path set (pointer kinds only; a DAG merely revisits a *finished* pointer and still renders fine), unwinding via exit sentinels. Reaching an on-path pointer emits a `` marker and reports cycled=true. Internal rendering (DebugString in error messages, stack dumps) stays total via the marker — those paths cannot propagate errors and must never hang. The on-path lookup is gated on the pointer-kind check, since hashing an interface holding an unhashable dynamic type (MShellBinary, a []byte) panics even on a map read. Test: tests/fail/cyclic_render.msh (cycle equality terminates, then `str` errors). Suites green: tests 221, typecheck 208, go test. Co-Authored-By: Claude Fable 5 Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/Evaluator.go | 11 +++- mshell/MShellObject.go | 98 +++++++++++++++++++++++------ tests/fail/cyclic_render.msh | 10 +++ tests/fail/cyclic_render.msh.stderr | 1 + 4 files changed, 98 insertions(+), 22 deletions(-) create mode 100644 tests/fail/cyclic_render.msh create mode 100644 tests/fail/cyclic_render.msh.stderr diff --git a/mshell/Evaluator.go b/mshell/Evaluator.go index 9161a63b..9c220940 100644 --- a/mshell/Evaluator.go +++ b/mshell/Evaluator.go @@ -9442,7 +9442,10 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu return state.FailWithMessage(fmt.Sprintf("%d:%d: Cannot do 'toJson' operation on an empty stack.\n", t.Line, t.Column)) } - jsonStr := obj1.ToJson() + jsonStr, cycled := renderValueDetect(obj1, flavorJson) + if cycled { + return state.FailWithMessage(fmt.Sprintf("%d:%d: Cannot convert a cyclic value (a container that contains itself) to JSON.\n", t.Line, t.Column)) + } stack.Push(MShellString{jsonStr}) } else if t.Lexeme == "typeof" { obj1, err := stack.Pop() @@ -12481,7 +12484,11 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu return state.FailWithMessage(fmt.Sprintf("%d:%d: Cannot convert an empty stack to a string.\n", t.Line, t.Column)) } - stack.Push(MShellString{obj.ToString()}) + strVal, cycled := renderValueDetect(obj, flavorStr) + if cycled { + return state.FailWithMessage(fmt.Sprintf("%d:%d: Cannot convert a cyclic value (a container that contains itself) to a string.\n", t.Line, t.Column)) + } + stack.Push(MShellString{strVal}) } else if t.Type == INDEXER { // Token Type obj1, err := stack.Pop() if err != nil { diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index 27107aaa..7b5fe552 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -973,6 +973,9 @@ type renderTask struct { obj MShellObject flavor renderFlavor isLit bool + // isExit marks the sentinel popped after a container's children have + // rendered; it removes the container from the on-path cycle set. + isExit bool } func renderLit(s string) renderTask { return renderTask{lit: s, isLit: true} } @@ -996,16 +999,47 @@ func renderJoin(open, sep, close string, items []MShellObject, flavor renderFlav return seq } -// renderValue renders a value in the requested flavor with one explicit work -// stack instead of method recursion, expanding every container kind — enum, -// Maybe, list, dict, pipe — inline. Arbitrarily deep values therefore cannot -// overflow the call stack even when kinds alternate (enum→Maybe→enum, ...), -// which per-type iterative renderers could not guarantee: each one delegated -// other kinds to the child's own recursive method. Leaf kinds (scalars, -// grids, quotations) still render via their own methods; their nesting depth -// is bounded by their own structure. +// cycleTrackable reports whether obj is a heap container that could sit on a +// reference cycle (built via in-place list/dict mutation, e.g. a list appended +// to itself). Only pointer kinds qualify — value kinds are copied and cannot +// be revisited by identity. +func cycleTrackable(obj MShellObject) bool { + switch obj.(type) { + case *MShellEnum, *MShellList, *MShellDict, *Maybe, *MShellPipe: + return true + } + return false +} + +// renderValue renders a value in the requested flavor. It is total: a cyclic +// value renders with a `` marker at the back-reference, which keeps +// internal rendering (error messages, stack dumps) from hanging. User-facing +// operations (`str`, `toJson`) call renderValueDetect instead and report a +// cyclic value as an error — mshell is strict, so a cycle is always the +// degenerate result of appending a container into itself, not a value with a +// meaningful rendering. func renderValue(root MShellObject, flavor renderFlavor) string { + s, _ := renderValueDetect(root, flavor) + return s +} + +// renderValueDetect renders a value in the requested flavor with one explicit +// work stack instead of method recursion, expanding every container kind — +// enum, Maybe, list, dict, pipe — inline. Arbitrarily deep values therefore +// cannot overflow the call stack even when kinds alternate (enum→Maybe→enum, +// ...), which per-type iterative renderers could not guarantee: each one +// delegated other kinds to the child's own recursive method. Leaf kinds +// (scalars, grids, quotations) still render via their own methods; their +// nesting depth is bounded by their own structure. +// +// Containers currently being expanded are tracked as an on-path set; reaching +// one again is a true reference cycle (a DAG merely revisits a finished +// pointer, which is fine), so the walk emits `` instead of descending +// and reports cycled=true. +func renderValueDetect(root MShellObject, flavor renderFlavor) (string, bool) { var sb strings.Builder + cycled := false + var onPath map[MShellObject]bool stack := []renderTask{{obj: root, flavor: flavor}} // push schedules seq to pop in order (reversed onto the LIFO stack). push := func(seq []renderTask) { @@ -1013,6 +1047,18 @@ func renderValue(root MShellObject, flavor renderFlavor) string { stack = append(stack, seq[i]) } } + // enter marks t.obj as on the current path and schedules its removal + // after seq (the container's children) has fully rendered. + enter := func(obj MShellObject, seq []renderTask) []renderTask { + if !cycleTrackable(obj) { + return seq + } + if onPath == nil { + onPath = make(map[MShellObject]bool, 8) + } + onPath[obj] = true + return append(seq, renderTask{obj: obj, isExit: true}) + } for len(stack) > 0 { t := stack[len(stack)-1] stack = stack[:len(stack)-1] @@ -1020,6 +1066,18 @@ func renderValue(root MShellObject, flavor renderFlavor) string { sb.WriteString(t.lit) continue } + if t.isExit { + delete(onPath, t.obj) + continue + } + // Only pointer kinds are ever on the path; the guard also keeps + // unhashable dynamic types (MShellBinary, a []byte) away from the + // map lookup, which would panic even on a read. + if cycleTrackable(t.obj) && onPath[t.obj] { + sb.WriteString("") + cycled = true + continue + } if m, ok := asMaybe(t.obj); ok { switch { case m.IsNone() && t.flavor == flavorJson: @@ -1027,11 +1085,11 @@ func renderValue(root MShellObject, flavor renderFlavor) string { case m.IsNone(): sb.WriteString("None") case t.flavor == flavorJson: - push([]renderTask{{obj: m.obj, flavor: flavorJson}}) + push(enter(t.obj, []renderTask{{obj: m.obj, flavor: flavorJson}})) case t.flavor == flavorDebug: - push([]renderTask{renderLit("Maybe("), {obj: m.obj, flavor: flavorDebug}, renderLit(")")}) + push(enter(t.obj, []renderTask{renderLit("Maybe("), {obj: m.obj, flavor: flavorDebug}, renderLit(")")})) default: - push([]renderTask{renderLit("Just("), {obj: m.obj, flavor: flavorStr}, renderLit(")")}) + push(enter(t.obj, []renderTask{renderLit("Just("), {obj: m.obj, flavor: flavorStr}, renderLit(")")})) } continue } @@ -1053,7 +1111,7 @@ func renderValue(root MShellObject, flavor renderFlavor) string { seq = append(seq, renderJoin("[", ", ", "]", v.Payload, flavorJson)...) } seq = append(seq, renderLit("}")) - push(seq) + push(enter(t.obj, seq)) continue } // `member` (nullary) or `member(p0 p1 ...)`, payloads as ToString. @@ -1061,18 +1119,18 @@ func renderValue(root MShellObject, flavor renderFlavor) string { sb.WriteString(v.Member) continue } - push(renderJoin(v.Member+"(", " ", ")", v.Payload, flavorStr)) + push(enter(t.obj, renderJoin(v.Member+"(", " ", ")", v.Payload, flavorStr))) case *MShellList: if t.flavor == flavorJson { - push(renderJoin("[", ", ", "]", v.Items, flavorJson)) + push(enter(t.obj, renderJoin("[", ", ", "]", v.Items, flavorJson))) } else { - push(renderJoin("[", " ", "]", v.Items, flavorDebug)) + push(enter(t.obj, renderJoin("[", " ", "]", v.Items, flavorDebug))) } case *MShellPipe: if t.flavor == flavorJson { - push(renderJoin("[", ", ", "]", v.List.Items, flavorJson)) + push(enter(t.obj, renderJoin("[", ", ", "]", v.List.Items, flavorJson))) } else { - push(renderJoin("", " | ", "", v.List.Items, flavorDebug)) + push(enter(t.obj, renderJoin("", " | ", "", v.List.Items, flavorDebug))) } case *MShellDict: keys := sortedDictKeys(v.Items) @@ -1083,7 +1141,7 @@ func renderValue(root MShellObject, flavor renderFlavor) string { seq = append(seq, renderLit(k+": "), renderTask{obj: v.Items[k], flavor: flavorDebug}, renderLit(", ")) } seq = append(seq, renderLit("}")) - push(seq) + push(enter(t.obj, seq)) continue } // The `str` form of a dict is its JSON form. @@ -1101,7 +1159,7 @@ func renderValue(root MShellObject, flavor renderFlavor) string { seq = append(seq, renderLit(string(keyEnc)+": "), renderTask{obj: v.Items[k], flavor: flavorJson}) } seq = append(seq, renderLit("}")) - push(seq) + push(enter(t.obj, seq)) default: switch t.flavor { case flavorDebug: @@ -1113,7 +1171,7 @@ func renderValue(root MShellObject, flavor renderFlavor) string { } } } - return sb.String() + return sb.String(), cycled } // equalsIter is structural equality over any two values, walked with one diff --git a/tests/fail/cyclic_render.msh b/tests/fail/cyclic_render.msh new file mode 100644 index 00000000..ba23816f --- /dev/null +++ b/tests/fail/cyclic_render.msh @@ -0,0 +1,10 @@ +# mshell is strict: a cyclic value (a container appended into itself) is a +# degenerate artifact of in-place mutation, so converting one to a string or +# JSON is an error rather than a hang. Equality and sorting on cyclic values +# still terminate (pointer-identity fast path + pair memoization). +enum Box = wrap [Box] | z end +[] x! +@x wrap e! +@x @e append drop +@e dup = str wl +@e str wl diff --git a/tests/fail/cyclic_render.msh.stderr b/tests/fail/cyclic_render.msh.stderr new file mode 100644 index 00000000..f45c549a --- /dev/null +++ b/tests/fail/cyclic_render.msh.stderr @@ -0,0 +1 @@ +10:4: Cannot convert a cyclic value (a container that contains itself) to a string. From da5f6a9ebac16758bf8d06a5fbf98e7efc150090 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Thu, 2 Jul 2026 19:38:50 -0500 Subject: [PATCH 23/32] Infer the match subject inside quotations MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A `match` as the body of an inferred quotation always failed the checker with "stack underflow at 'match' (match subject)" even though it runs fine — rejecting the canonical way to consume enums: [ 1 leaf 2 leaf ] (match leaf n : @n, node a b : 0, end) map checkMatchBlock errored on an empty stack unconditionally; other constructs (operators, `if`) participate in quote-body inference, where applySig responds to underflow by synthesizing fresh input vars. The failure predates enums (Maybe matches in map had it too), but match is the enum eliminator, so `(match ...) map/filter/each` never type-checking bit constantly. Two pieces: - Under inference, an empty stack at `match` synthesizes the subject exactly as applySig's underflow path does (bottom of stack, front of inferInputs), so the quotation infers a one-input signature. - A subject that is still an unresolved var is pinned from the first arm pattern that names a type — an enum member determines its enum (member names are global), `just`/`none` determine Maybe[fresh]. Pinning happens before the entry branch is captured, so every arm's analysis, payload bindings, and the exhaustiveness check see the resolved subject (per-arm substitution checkpoints would roll back a per-site pin). Value literals and type keywords deliberately do not pin: a type-keyword match may be discriminating a union, which pinning would wrongly narrow — those matches still check via their wildcard arm. Exhaustiveness now works inside quotations too: a match that omits a member is rejected, and a pinned enum quotation applied to a list of a different element type fails overload resolution as it should. Tests: tests/success/enum_match_in_quote.msh (enum map/filter/each, Maybe, value literals) and tests/typecheck_fail/enum_match_in_quote_nonexhaustive.msh. Suites green: tests 222, typecheck 210, go test. Co-Authored-By: Claude Fable 5 Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/TypeCheckProgram.go | 61 +++++++++++++++++-- tests/success/enum_match_in_quote.msh | 16 +++++ tests/success/enum_match_in_quote.msh.stdout | 6 ++ .../enum_match_in_quote_nonexhaustive.msh | 4 ++ 4 files changed, 81 insertions(+), 6 deletions(-) create mode 100644 tests/success/enum_match_in_quote.msh create mode 100644 tests/success/enum_match_in_quote.msh.stdout create mode 100644 tests/typecheck_fail/enum_match_in_quote_nonexhaustive.msh diff --git a/mshell/TypeCheckProgram.go b/mshell/TypeCheckProgram.go index 2d01bf9a..d5750877 100644 --- a/mshell/TypeCheckProgram.go +++ b/mshell/TypeCheckProgram.go @@ -1241,12 +1241,21 @@ func formatPatternItem(it MShellParseItem) string { func (c *Checker) checkMatchBlock(matchBlock *MShellParseMatchBlock) { startTok := matchBlock.GetStartToken() if c.stack.Len() == 0 { - c.errors = append(c.errors, TypeError{ - Kind: TErrStackUnderflow, - Pos: startTok, - Hint: "match subject", - }) - return + if !c.inferring { + c.errors = append(c.errors, TypeError{ + Kind: TErrStackUnderflow, + Pos: startTok, + Hint: "match subject", + }) + return + } + // Quote-body inference: the subject is the quote's own input. + // Synthesize a fresh var exactly as applySig's underflow path does — + // at the bottom of the stack and the front of inferInputs — so + // `(match ... end) map` infers a one-input quote instead of erroring. + v := c.subst.FreshVar(c.arena) + c.inferInputs = append([]TypeId{v}, c.inferInputs...) + c.stack.items = append([]TypeId{v}, c.stack.items...) } // Widen a string-literal subject to `str`: match arms and the // exhaustiveness check compare against `str` by type id, and the literal @@ -1261,6 +1270,16 @@ func (c *Checker) checkMatchBlock(matchBlock *MShellParseMatchBlock) { if resolved := c.subst.Apply(c.arena, subject); c.arena.Node(resolved).Kind == TKBrand { subject = c.underlying(resolved) } + // An unresolved subject (a quote input under inference) is pinned from the + // first arm pattern that names a type: an enum member determines its enum + // (member names are global) and `just`/`none` determine Maybe. Pinning + // happens before the entry branch is captured so every arm and the + // exhaustiveness check see the resolved subject. + if c.arena.Node(c.subst.Apply(c.arena, subject)).Kind == TKVar { + if pin, ok := c.matchSubjectPin(matchBlock); ok { + c.unify(subject, pin) + } + } entry := c.captureBranch() if len(matchBlock.Arms) == 0 { @@ -1317,6 +1336,36 @@ func (c *Checker) checkMatchBlock(matchBlock *MShellParseMatchBlock) { c.reconcileArmBranches(armBranches, armLabels, entry, startTok) } +// matchSubjectPin returns the concrete type a match's arm patterns determine +// for an as-yet-unresolved subject. An enum-member head names its enum (member +// names are unique across enums), and a `just`/`none` head names Maybe[T] with +// a fresh T. ok=false when no arm determines a type — value literals and type +// keywords deliberately do not pin, since a type-keyword match may be +// discriminating a union and pinning would wrongly narrow the input. +func (c *Checker) matchSubjectPin(matchBlock *MShellParseMatchBlock) (TypeId, bool) { + for _, arm := range matchBlock.Arms { + if len(arm.Pattern) == 0 { + continue + } + tok, ok := arm.Pattern[0].(Token) + if !ok || tok.Type != LITERAL { + continue + } + if tok.Lexeme == "just" || tok.Lexeme == "none" { + return c.arena.MakeMaybe(c.subst.FreshVar(c.arena)), true + } + mid := c.names.Intern(tok.Lexeme) + if _, isMember := c.enumMemberToks[mid]; isMember { + // A member's constructor sig has the enum as its only output. + sigs := c.nameBuiltins[mid] + if len(sigs) > 0 && len(sigs[0].Outputs) == 1 && c.arena.Node(sigs[0].Outputs[0]).Kind == TKEnum { + return sigs[0].Outputs[0], true + } + } + } + return TidNothing, false +} + // armPattern is the single interpretation of a match arm pattern. One // analysis feeds all four consumers that used to re-pattern-match the // arm independently: recognition diagnostics (Recognized), the diff --git a/tests/success/enum_match_in_quote.msh b/tests/success/enum_match_in_quote.msh new file mode 100644 index 00000000..ac91746f --- /dev/null +++ b/tests/success/enum_match_in_quote.msh @@ -0,0 +1,16 @@ +# A `match` may be the body of an inferred quotation: the checker synthesizes +# the quote's input as the match subject (like any other underflow under +# inference) and pins it from the first arm that names a type — an enum member +# determines its enum, `just`/`none` determine Maybe. This is the canonical +# way to consume a list of enum values. +enum T = leaf int | node T T end +[ 1 leaf 2 leaf ] (match leaf n : @n, node a b : 0, end) map (str) map "," join wl + +enum C = red | green | blue end +[ red green blue ] (match red : true, green : false, blue : true, end) filter len str wl + +[ 5 just none ] (match just v : @v, none : 0, end) map (str) map "," join wl + +[1 2 3] (match 1 : "one", _ : "other", end) map "," join wl + +[ red green ] (match red : "r" wl, green : "g" wl, blue : "b" wl, end) each diff --git a/tests/success/enum_match_in_quote.msh.stdout b/tests/success/enum_match_in_quote.msh.stdout new file mode 100644 index 00000000..8bd43448 --- /dev/null +++ b/tests/success/enum_match_in_quote.msh.stdout @@ -0,0 +1,6 @@ +1,2 +2 +5,0 +one,other,other +r +g diff --git a/tests/typecheck_fail/enum_match_in_quote_nonexhaustive.msh b/tests/typecheck_fail/enum_match_in_quote_nonexhaustive.msh new file mode 100644 index 00000000..0b96d55d --- /dev/null +++ b/tests/typecheck_fail/enum_match_in_quote_nonexhaustive.msh @@ -0,0 +1,4 @@ +# Exhaustiveness is enforced inside inferred quotations too: the subject is +# pinned to the member's enum, so a match that omits a member is rejected. +enum C = red | green | blue end +[ red ] (match red : 1, green : 2, end) map drop From 88d8de32970a73b36b5f82c5ed127f1bab401dbb Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Thu, 2 Jul 2026 20:02:33 -0500 Subject: [PATCH 24/32] Dedupe identical pairs at push time: fix exponential cliff past the memo cap MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Comparing two independently built self-doubling DAGs deeper than dagMemoCap (2^18 levels) hung: the cap's "stop inserting" overflow policy left every level beyond 262144 un-memoized, and each un-memoized level doubles the walk. Measured: depth 200k compared in 1s, depth 300k ran effectively forever (2^38000 work). A first attempt — clearing the memo generationally on overflow — did NOT fix it (also measured): the pending duplicate for each upper level pops only after the entire subtree between, so the working set spans all levels and defeats any bounded memo regardless of eviction policy. The structural fix: deduplicate pointer-identical pairs when a container pushes its children. A self-doubling value (`@t @t node`, `[ @x @x ]`) expands to the SAME pair twice; pushing it once makes the whole family linear at any depth with no memo involvement — equality at depth 300k/600k/ 4M now runs in 1.1s/2.1s/14.4s (linear), and sort past the cap likewise. Applied in equalsIter (pushPairsDedup for enum payloads, list and pipe elements) and in compareValues' enum and list arms (skipping is sound in both walks: an identical pointer pair contributes equal/0). The generational clear is kept — the memo still covers cross-parent duplicate pairs (diamond-shaped sharing) up to the cap — and its comment now states honestly what it does and does not defend against. Extends tests/success/enum_dag_equality.msh with a 300k-deep (past-cap) independent-DAG comparison. Suites green: tests 222, typecheck 210, go test. Co-Authored-By: Claude Fable 5 Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/MShellObject.go | 62 ++++++++++++++++------ tests/success/enum_dag_equality.msh | 12 +++++ tests/success/enum_dag_equality.msh.stdout | 1 + 3 files changed, 60 insertions(+), 15 deletions(-) diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index 7b5fe552..663f5da5 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -926,9 +926,18 @@ func (g *dagGuard) skip(a, b MShellObject) bool { if g.memo[key] { return true } - if len(g.memo) < dagMemoCap { - g.memo[key] = true + // Generational overflow: when the memo is full, clear it and keep + // inserting rather than stopping (stopping would freeze the memo on the + // walk's earliest pairs). Note the memo is NOT the defense against + // self-doubling values — a walk whose pending-duplicate working set + // exceeds the cap defeats any bounded memo (measured, not theorized). + // That family is handled structurally by push-time pair dedup + // (pushPairsDedup); the memo covers cross-parent duplicate pairs + // (diamond-shaped sharing) up to the cap. + if len(g.memo) >= dagMemoCap { + g.memo = make(map[refPair]bool, 1024) } + g.memo[key] = true return false } @@ -1181,10 +1190,30 @@ func renderValueDetect(root MShellObject, flavor renderFlavor) (string, bool) { // definition), and past a step threshold already-expanded pairs are memoized // (see dagGuard), so shared substructure cannot blow up exponentially. Leaf // kinds compare via their own Equals. +type eqPair struct{ a, b MShellObject } + +// pushPairsDedup pushes element-wise comparison pairs, skipping a pair that is +// pointer-identical to the one just pushed. A self-doubling value +// (`@t @t node`, `[ @x @x ]`) expands to the SAME pair twice; pushing it once +// makes that whole family linear at any depth, with no reliance on the +// bounded dagGuard memo (whose eviction cannot cover a working set larger +// than its cap). +func pushPairsDedup(stack []eqPair, as, bs []MShellObject) []eqPair { + var lastA, lastB MShellObject + for i := range as { + ca, cb := as[i], bs[i] + if i > 0 && sameRef(ca, lastA) && sameRef(cb, lastB) { + continue + } + lastA, lastB = ca, cb + stack = append(stack, eqPair{a: ca, b: cb}) + } + return stack +} + func equalsIter(a, b MShellObject) (bool, error) { - type pair struct{ a, b MShellObject } var guard dagGuard - stack := []pair{{a: a, b: b}} + stack := []eqPair{{a: a, b: b}} for len(stack) > 0 { p := stack[len(stack)-1] stack = stack[:len(stack)-1] @@ -1197,7 +1226,7 @@ func equalsIter(a, b MShellObject) (bool, error) { return false, nil } if !am.IsNone() { - stack = append(stack, pair{a: am.obj, b: bm.obj}) + stack = append(stack, eqPair{a: am.obj, b: bm.obj}) } continue } @@ -1207,25 +1236,19 @@ func equalsIter(a, b MShellObject) (bool, error) { if !ok || av.EnumName != bv.EnumName || av.Member != bv.Member || len(av.Payload) != len(bv.Payload) { return false, nil } - for i := range av.Payload { - stack = append(stack, pair{a: av.Payload[i], b: bv.Payload[i]}) - } + stack = pushPairsDedup(stack, av.Payload, bv.Payload) case *MShellList: bv, ok := p.b.(*MShellList) if !ok || len(av.Items) != len(bv.Items) { return false, nil } - for i := range av.Items { - stack = append(stack, pair{a: av.Items[i], b: bv.Items[i]}) - } + stack = pushPairsDedup(stack, av.Items, bv.Items) case *MShellPipe: bv, ok := p.b.(*MShellPipe) if !ok || len(av.List.Items) != len(bv.List.Items) { return false, nil } - for i := range av.List.Items { - stack = append(stack, pair{a: av.List.Items[i], b: bv.List.Items[i]}) - } + stack = pushPairsDedup(stack, av.List.Items, bv.List.Items) case *MShellDict: bv, ok := p.b.(*MShellDict) if !ok || len(av.Items) != len(bv.Items) { @@ -1236,7 +1259,7 @@ func equalsIter(a, b MShellObject) (bool, error) { if !ok { return false, nil } - stack = append(stack, pair{a: aval, b: bval}) + stack = append(stack, eqPair{a: aval, b: bval}) } default: eq, err := p.a.Equals(p.b) @@ -1350,6 +1373,11 @@ func compareValues(a, b MShellObject) int { n := min(len(av.Items), len(bl.Items)) stack = append(stack, task{lit: cmpInt(len(av.Items), len(bl.Items)), isLit: true}) for i := n - 1; i >= 0; i-- { + // Skip a pair pointer-identical to its neighbor: it compares 0 + // and would double the walk on self-doubling values. + if i > 0 && sameRef(av.Items[i], av.Items[i-1]) && sameRef(bl.Items[i], bl.Items[i-1]) { + continue + } stack = append(stack, task{a: av.Items[i], b: bl.Items[i]}) } case *MShellDict: @@ -1368,6 +1396,10 @@ func compareValues(a, b MShellObject) int { n := min(len(av.Payload), len(be.Payload)) stack = append(stack, task{lit: cmpInt(len(av.Payload), len(be.Payload)), isLit: true}) for i := n - 1; i >= 0; i-- { + // Skip a pair pointer-identical to its neighbor (see list arm). + if i > 0 && sameRef(av.Payload[i], av.Payload[i-1]) && sameRef(be.Payload[i], be.Payload[i-1]) { + continue + } stack = append(stack, task{a: av.Payload[i], b: be.Payload[i]}) } // Name and member (declaration order) compare before any payload. diff --git a/tests/success/enum_dag_equality.msh b/tests/success/enum_dag_equality.msh index 08b677e2..ebb8d5e9 100644 --- a/tests/success/enum_dag_equality.msh +++ b/tests/success/enum_dag_equality.msh @@ -32,3 +32,15 @@ enum T = leaf int | node T T end @a @c = str wl [ @a @c @b ] sort len str wl [ @a @c @b ] uniq len str wl + +# Past the dagGuard memo cap (2^18): self-doubling pairs are deduplicated at +# push time, so independent DAGs deeper than the cap stay linear — a bounded +# memo alone cannot cover a pending-duplicate working set larger than itself. +0 leaf p! 0 leaf q! 0 i! +( + @i 300000 >= if break end + @p @p node p! + @q @q node q! + @i 1 + i! +) loop +@p @q = str wl diff --git a/tests/success/enum_dag_equality.msh.stdout b/tests/success/enum_dag_equality.msh.stdout index e102c904..1654dd67 100644 --- a/tests/success/enum_dag_equality.msh.stdout +++ b/tests/success/enum_dag_equality.msh.stdout @@ -6,3 +6,4 @@ true false 3 2 +true From e7658a77a57ffd48adc713bcd41c7e6fef3ecf8b Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Thu, 2 Jul 2026 20:11:11 -0500 Subject: [PATCH 25/32] Dedupe identical pairs in the dict comparison arms too MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The push-time pair dedup that closed the exponential cliff for self-doubling values covered enum payloads, lists, and pipes — but not the two dict arms. A dict-shaped doubling value ({ "l": @d, "r": @d } per level, or the same structure through an enum's {str: E} payload) pushed the identical pointer pair once per key and re-opened the cliff: equality at depth 200k ran in 1.4s, depth 300k (past the memo cap) hung. Measured, same boundary as the list/enum case. equalsIter's dict arm now skips a pair pointer-identical to the last one it pushed, and compareValues' dict arm skips the identical *value* pair while always keeping the key comparison. Dict-DAG equality at 300k/600k now runs 1.5s/3.3s (linear), sort works, and a deep unequal pair is still detected. Extends tests/success/enum_dag_equality.msh with a 300k dict-payload DAG. Suites green: tests 222, typecheck 210, go test. Co-Authored-By: Claude Fable 5 Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/MShellObject.go | 21 ++++++++++++++++++++- tests/success/enum_dag_equality.msh | 13 +++++++++++++ tests/success/enum_dag_equality.msh.stdout | 1 + 3 files changed, 34 insertions(+), 1 deletion(-) diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index 663f5da5..8c8ffeb6 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -1254,11 +1254,21 @@ func equalsIter(a, b MShellObject) (bool, error) { if !ok || len(av.Items) != len(bv.Items) { return false, nil } + // Track the last pushed pair to skip consecutive identical ones — + // the dict-shaped self-doubling value ({ "l": @d, "r": @d }) + // pushes the same pointer pair once per key, and without dedup + // that doubles the walk per level (same cliff pushPairsDedup + // closes for lists and enum payloads). + var lastA, lastB MShellObject for key, aval := range av.Items { bval, ok := bv.Items[key] if !ok { return false, nil } + if sameRef(aval, lastA) && sameRef(bval, lastB) { + continue + } + lastA, lastB = aval, bval stack = append(stack, eqPair{a: aval, b: bval}) } default: @@ -1386,9 +1396,18 @@ func compareValues(a, b MShellObject) int { bk := sortedDictKeys(bd.Items) n := min(len(ak), len(bk)) stack = append(stack, task{lit: cmpInt(len(ak), len(bk)), isLit: true}) + var lastVA, lastVB MShellObject for i := n - 1; i >= 0; i-- { // Pushed so `key compare` pops before its `value compare`. - stack = append(stack, task{a: av.Items[ak[i]], b: bd.Items[bk[i]]}) + // A value pair pointer-identical to the neighboring key's is + // skipped (it compares 0) — the dict-shaped self-doubling + // value would otherwise double the walk per level. The key + // comparison itself always stays. + va, vb := av.Items[ak[i]], bd.Items[bk[i]] + if !(sameRef(va, lastVA) && sameRef(vb, lastVB)) { + stack = append(stack, task{a: va, b: vb}) + lastVA, lastVB = va, vb + } stack = append(stack, task{lit: strings.Compare(ak[i], bk[i]), isLit: true}) } case *MShellEnum: diff --git a/tests/success/enum_dag_equality.msh b/tests/success/enum_dag_equality.msh index ebb8d5e9..ab1ac4d7 100644 --- a/tests/success/enum_dag_equality.msh +++ b/tests/success/enum_dag_equality.msh @@ -44,3 +44,16 @@ enum T = leaf int | node T T end @i 1 + i! ) loop @p @q = str wl + +# Dict-shaped self-doubling past the cap: the dict arms dedupe consecutive +# identical value pairs the same way (an enum with a dict payload closes the +# same doubling structure through {str: E}). +enum D = md {str: D} | zd end +zd u! zd v! 0 i! +( + @i 300000 >= if break end + { "l": @u, "r": @u } md u! + { "l": @v, "r": @v } md v! + @i 1 + i! +) loop +@u @v = str wl diff --git a/tests/success/enum_dag_equality.msh.stdout b/tests/success/enum_dag_equality.msh.stdout index 1654dd67..877a2259 100644 --- a/tests/success/enum_dag_equality.msh.stdout +++ b/tests/success/enum_dag_equality.msh.stdout @@ -7,3 +7,4 @@ false 3 2 true +true From 6341f99acf5aa655a73f4688397ab00db2746049 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Thu, 2 Jul 2026 20:21:34 -0500 Subject: [PATCH 26/32] Make the comparison pair-memo unbounded: end the DAG blowup class globally MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Four successive fixes (memo cap, generational eviction, push-dedup for enum/list/pipe, then dict arms) each closed one self-doubling pattern and left the next container or sharing shape exponential past the cap — a non-consecutive pattern like [x y x] per level still hung at depth 300k. The chain of same-shaped patches was defending the wrong invariant: the memo cap itself. The memo is now unbounded. Every revisited pointer pair memo-hits, so ANY sharing pattern — consecutive, alternating, cross-parent, any container mix, any depth — is polynomial in actual heap nodes. There is no boundary left to probe. The memory trade is proportionate, not abstract: the memo only activates past the step threshold (2^19), so ordinary comparisons never allocate, and a comparison big enough to grow a large memo already holds operands larger than the memo. Measured: the pathological 4M-deep linear compare (every pair distinct — the case the cap was protecting) runs 21.9s at 2.6GB peak, of which the two operands are over half; the alternating [x y x] DAG at 300k, which defeated every previous patch, compares in 3.0s. Push-time dedup stays as a constant-factor fast path (it also avoids the pre-threshold spin for shallow doubling), but is no longer load-bearing for termination. Extends tests/success/enum_dag_equality.msh with the non-consecutive alternating pattern at 300k. Suites green: tests 222, typecheck 210, go test. Co-Authored-By: Claude Fable 5 Claude-Session: https://claude.ai/code/session_01E3aH9BBud5rjDoHUF8daaJ --- mshell/MShellObject.go | 29 +++++++++++----------- tests/success/enum_dag_equality.msh | 13 ++++++++++ tests/success/enum_dag_equality.msh.stdout | 1 + 3 files changed, 28 insertions(+), 15 deletions(-) diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index 8c8ffeb6..3c66c248 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -895,9 +895,9 @@ func sameRef(a, b MShellObject) bool { // if a duplicate pops at all, its subtree already compared equal. // // Ordinary comparisons never allocate: below the step threshold the guard is -// one integer increment. The memo is capped so a legitimately huge linear -// value (millions of distinct pairs, no repeats) cannot balloon memory; a -// blowup DAG has few distinct pairs and fits far below the cap. +// one integer increment. Past it, the memo grows without bound — see skip for +// why an unbounded memo is the correct trade (a bounded one turns "assumption +// exceeded" into exponential time). type dagGuard struct { steps int memo map[refPair]bool @@ -906,11 +906,21 @@ type dagGuard struct { type refPair struct{ a, b MShellObject } const dagStepThreshold = 1 << 19 -const dagMemoCap = 1 << 18 // skip reports whether this pair was already expanded earlier in the walk. // Call once per popped pair; it records the pair (past the threshold) so // later duplicates skip. +// +// The memo is deliberately UNBOUNDED. Every revisited pointer pair memo-hits, +// which makes a comparison polynomial in actual heap nodes for any sharing +// pattern — self-doubling, alternating, cross-parent diamonds, any container +// mix, any depth. Earlier versions capped the memo to bound memory and then +// patched the resulting exponential cliffs case by case (generational +// eviction, per-container dedup); any bounded memo loses to a working set +// larger than its bound (measured, not theorized), so no cap. Memory tracks +// pairs actually walked: it activates only past the step threshold, so +// ordinary comparisons never allocate, and a comparison large enough to build +// a big memo already holds operands larger than the memo itself. func (g *dagGuard) skip(a, b MShellObject) bool { g.steps++ if g.steps < dagStepThreshold { @@ -926,17 +936,6 @@ func (g *dagGuard) skip(a, b MShellObject) bool { if g.memo[key] { return true } - // Generational overflow: when the memo is full, clear it and keep - // inserting rather than stopping (stopping would freeze the memo on the - // walk's earliest pairs). Note the memo is NOT the defense against - // self-doubling values — a walk whose pending-duplicate working set - // exceeds the cap defeats any bounded memo (measured, not theorized). - // That family is handled structurally by push-time pair dedup - // (pushPairsDedup); the memo covers cross-parent duplicate pairs - // (diamond-shaped sharing) up to the cap. - if len(g.memo) >= dagMemoCap { - g.memo = make(map[refPair]bool, 1024) - } g.memo[key] = true return false } diff --git a/tests/success/enum_dag_equality.msh b/tests/success/enum_dag_equality.msh index ab1ac4d7..064f00a6 100644 --- a/tests/success/enum_dag_equality.msh +++ b/tests/success/enum_dag_equality.msh @@ -57,3 +57,16 @@ zd u! zd v! 0 i! @i 1 + i! ) loop @u @v = str wl + +# Class closure: a NON-consecutive alternating sharing pattern ([x y x] per +# level) defeats push-time dedup entirely and, past any bounded memo, every +# such pattern re-explodes — so the pair memo is unbounded. Any sharing +# pattern, any container mix, any depth is polynomial in actual nodes. +[0] xa! [1] ya! [0] xb! [1] yb! 0 i! +( + @i 300000 >= if break end + [ @xa @ya @xa ] t! [ @ya @xa @ya ] ya! @t xa! + [ @xb @yb @xb ] t! [ @yb @xb @yb ] yb! @t xb! + @i 1 + i! +) loop +@xa @xb = str wl diff --git a/tests/success/enum_dag_equality.msh.stdout b/tests/success/enum_dag_equality.msh.stdout index 877a2259..f7d26a30 100644 --- a/tests/success/enum_dag_equality.msh.stdout +++ b/tests/success/enum_dag_equality.msh.stdout @@ -8,3 +8,4 @@ false 2 true true +true From 2a326e369c2ff5c4380a1ae96ad0e3238982a32d Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Thu, 2 Jul 2026 21:50:42 -0500 Subject: [PATCH 27/32] Collapse the pointer-kind switches and drop the superseded push-dedup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The comparison/render walkers carried three hand-maintained "which types are pointers" switches (sameRef, refPairKey, cycleTrackable) that had already drifted apart. Replace all three with one isRefKind predicate that checks the dynamic kind via reflection: "is a pointer" is exactly the property every site cares about (heap identity ⇒ can share substructure, can cycle, safe to compare and use as a map key), and a newly added pointer kind is covered with no list to keep in sync. Also delete the push-time neighbor-dedup machinery (pushPairsDedup plus three hand-rolled copies in the dict/list/enum arms). It was written to compensate for the bounded dagGuard memo — its own comment still cited the memo's eviction — but the memo is unbounded now, which subsumes the whole trick: below the step threshold duplicate expansion is capped by the threshold itself, past it every repeated pair memo-hits. Timing on the DAG stress tests is unchanged. Net -71 lines, no behavior change. Co-Authored-By: Claude Fable 5 --- mshell/MShellObject.go | 163 ++++++++++++----------------------------- 1 file changed, 46 insertions(+), 117 deletions(-) diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index 3c66c248..aa060fad 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -7,6 +7,7 @@ import ( "fmt" "golang.org/x/net/html" "os" + "reflect" "regexp" "slices" "sort" @@ -852,42 +853,37 @@ func sortedDictKeys(m map[string]MShellObject) []string { return keys } -// sameRef reports whether a and b are the identical heap object, for the kinds -// that can form shared substructure (a value built as `@t @t node` reuses one -// subtree twice). A pointer-identical pair is equal by definition, so equality -// and ordering walks skip it instead of expanding it — without this, walking a -// value with n levels of sharing costs 2^n. Only pointer kinds are compared: -// comparing interfaces holding non-comparable dynamic types (e.g. MShellBinary, -// a []byte) panics at runtime. +// isRefKind reports whether obj's dynamic type is a pointer — a value with +// heap identity. Only these kinds can form shared substructure or reference +// cycles, and only these are safe in interface comparisons and as map keys: a +// value kind may wrap a non-comparable type (MShellBinary, a []byte), which +// panics at runtime. Checking the dynamic kind instead of enumerating types +// means a newly added pointer kind is covered with no list to keep in sync. +func isRefKind(obj MShellObject) bool { + return obj != nil && reflect.TypeOf(obj).Kind() == reflect.Pointer +} + +// sameRef reports whether a and b are the identical heap object (a value built +// as `@t @t node` reuses one subtree twice). A pointer-identical pair is equal +// by definition, so equality and ordering walks skip it instead of expanding +// it — without this, walking a value with n levels of sharing costs 2^n. +// Interface equality is safe here: if b's dynamic type differs from a's the +// comparison is false without inspecting values, and if it matches, isRefKind +// guarantees it is a comparable pointer type. func sameRef(a, b MShellObject) bool { - switch av := a.(type) { - case *MShellEnum: - bv, ok := b.(*MShellEnum) - return ok && av == bv - case *MShellList: - bv, ok := b.(*MShellList) - return ok && av == bv - case *MShellDict: - bv, ok := b.(*MShellDict) - return ok && av == bv - case *Maybe: - bv, ok := b.(*Maybe) - return ok && av == bv - case *MShellDateTime: - bv, ok := b.(*MShellDateTime) - return ok && av == bv - case *MShellQuotation: - bv, ok := b.(*MShellQuotation) - return ok && av == bv - } - return false + return isRefKind(a) && a == b } // dagGuard bounds a comparison walk over values with shared substructure that -// sameRef alone cannot catch: two *independently built* DAGs share no pointers -// across operands, so every level re-expands and the walk goes exponential. -// The guard counts pops; once a walk runs long enough to suggest blowup, it -// memoizes the pointer pairs it has already expanded and skips repeats. +// sameRef alone cannot catch: whenever the two operands are not the same +// pointer (a value compared against an independently built copy, or two +// distinct subtrees each shared internally), repeated substructure produces +// repeated *pairs*, every one re-expands, and the walk goes exponential. The +// guard counts pops; once a walk runs long enough to suggest blowup, it +// memoizes the pointer pairs it has already expanded and skips repeats. This +// is the single mechanism for the whole blowup class: duplicate expansion +// below the threshold is capped by the threshold itself, and past it every +// repeated pair memo-hits. // // Skipping a repeated pair is sound in a LIFO walk: the first occurrence's // entire expansion resolves before any later duplicate (which sat lower in the @@ -926,10 +922,12 @@ func (g *dagGuard) skip(a, b MShellObject) bool { if g.steps < dagStepThreshold { return false } - key, ok := refPairKey(a, b) - if !ok { + // Only pointer kinds get keys: repeated pairs of anything else cannot + // cause blowup, and only pointers are guaranteed comparable as map keys. + if !isRefKind(a) || !isRefKind(b) { return false } + key := refPair{a, b} if g.memo == nil { g.memo = make(map[refPair]bool, 1024) } @@ -940,28 +938,6 @@ func (g *dagGuard) skip(a, b MShellObject) bool { return false } -// refPairKey returns a comparable identity key when both values are the same -// container pointer kind — the kinds whose repeated pairs cause blowup. -// Interface keys are only safe when the dynamic values are comparable, which -// pointers are; scalar kinds are cheap to compare directly and get no key. -func refPairKey(a, b MShellObject) (refPair, bool) { - switch a.(type) { - case *MShellEnum: - if _, ok := b.(*MShellEnum); ok { - return refPair{a, b}, true - } - case *MShellList: - if _, ok := b.(*MShellList); ok { - return refPair{a, b}, true - } - case *MShellDict: - if _, ok := b.(*MShellDict); ok { - return refPair{a, b}, true - } - } - return refPair{}, false -} - // renderFlavor selects which of a value's three textual forms renderValue // emits: flavorStr is ToString (the `str` form), flavorDebug is DebugString // (stack dumps, list display), flavorJson is ToJson. Containers pick their @@ -1007,18 +983,6 @@ func renderJoin(open, sep, close string, items []MShellObject, flavor renderFlav return seq } -// cycleTrackable reports whether obj is a heap container that could sit on a -// reference cycle (built via in-place list/dict mutation, e.g. a list appended -// to itself). Only pointer kinds qualify — value kinds are copied and cannot -// be revisited by identity. -func cycleTrackable(obj MShellObject) bool { - switch obj.(type) { - case *MShellEnum, *MShellList, *MShellDict, *Maybe, *MShellPipe: - return true - } - return false -} - // renderValue renders a value in the requested flavor. It is total: a cyclic // value renders with a `` marker at the back-reference, which keeps // internal rendering (error messages, stack dumps) from hanging. User-facing @@ -1056,9 +1020,11 @@ func renderValueDetect(root MShellObject, flavor renderFlavor) (string, bool) { } } // enter marks t.obj as on the current path and schedules its removal - // after seq (the container's children) has fully rendered. + // after seq (the container's children) has fully rendered. Only pointer + // kinds are tracked — value kinds are copied and cannot be revisited by + // identity, so they cannot sit on a reference cycle. enter := func(obj MShellObject, seq []renderTask) []renderTask { - if !cycleTrackable(obj) { + if !isRefKind(obj) { return seq } if onPath == nil { @@ -1081,7 +1047,7 @@ func renderValueDetect(root MShellObject, flavor renderFlavor) (string, bool) { // Only pointer kinds are ever on the path; the guard also keeps // unhashable dynamic types (MShellBinary, a []byte) away from the // map lookup, which would panic even on a read. - if cycleTrackable(t.obj) && onPath[t.obj] { + if isRefKind(t.obj) && onPath[t.obj] { sb.WriteString("") cycled = true continue @@ -1191,21 +1157,12 @@ func renderValueDetect(root MShellObject, flavor renderFlavor) (string, bool) { // kinds compare via their own Equals. type eqPair struct{ a, b MShellObject } -// pushPairsDedup pushes element-wise comparison pairs, skipping a pair that is -// pointer-identical to the one just pushed. A self-doubling value -// (`@t @t node`, `[ @x @x ]`) expands to the SAME pair twice; pushing it once -// makes that whole family linear at any depth, with no reliance on the -// bounded dagGuard memo (whose eviction cannot cover a working set larger -// than its cap). -func pushPairsDedup(stack []eqPair, as, bs []MShellObject) []eqPair { - var lastA, lastB MShellObject +// pushPairs pushes element-wise comparison pairs onto the walk stack. +// Duplicate pairs from shared substructure are not filtered here; the +// dagGuard memo handles them (see dagGuard). +func pushPairs(stack []eqPair, as, bs []MShellObject) []eqPair { for i := range as { - ca, cb := as[i], bs[i] - if i > 0 && sameRef(ca, lastA) && sameRef(cb, lastB) { - continue - } - lastA, lastB = ca, cb - stack = append(stack, eqPair{a: ca, b: cb}) + stack = append(stack, eqPair{a: as[i], b: bs[i]}) } return stack } @@ -1235,39 +1192,29 @@ func equalsIter(a, b MShellObject) (bool, error) { if !ok || av.EnumName != bv.EnumName || av.Member != bv.Member || len(av.Payload) != len(bv.Payload) { return false, nil } - stack = pushPairsDedup(stack, av.Payload, bv.Payload) + stack = pushPairs(stack, av.Payload, bv.Payload) case *MShellList: bv, ok := p.b.(*MShellList) if !ok || len(av.Items) != len(bv.Items) { return false, nil } - stack = pushPairsDedup(stack, av.Items, bv.Items) + stack = pushPairs(stack, av.Items, bv.Items) case *MShellPipe: bv, ok := p.b.(*MShellPipe) if !ok || len(av.List.Items) != len(bv.List.Items) { return false, nil } - stack = pushPairsDedup(stack, av.List.Items, bv.List.Items) + stack = pushPairs(stack, av.List.Items, bv.List.Items) case *MShellDict: bv, ok := p.b.(*MShellDict) if !ok || len(av.Items) != len(bv.Items) { return false, nil } - // Track the last pushed pair to skip consecutive identical ones — - // the dict-shaped self-doubling value ({ "l": @d, "r": @d }) - // pushes the same pointer pair once per key, and without dedup - // that doubles the walk per level (same cliff pushPairsDedup - // closes for lists and enum payloads). - var lastA, lastB MShellObject for key, aval := range av.Items { bval, ok := bv.Items[key] if !ok { return false, nil } - if sameRef(aval, lastA) && sameRef(bval, lastB) { - continue - } - lastA, lastB = aval, bval stack = append(stack, eqPair{a: aval, b: bval}) } default: @@ -1382,11 +1329,6 @@ func compareValues(a, b MShellObject) int { n := min(len(av.Items), len(bl.Items)) stack = append(stack, task{lit: cmpInt(len(av.Items), len(bl.Items)), isLit: true}) for i := n - 1; i >= 0; i-- { - // Skip a pair pointer-identical to its neighbor: it compares 0 - // and would double the walk on self-doubling values. - if i > 0 && sameRef(av.Items[i], av.Items[i-1]) && sameRef(bl.Items[i], bl.Items[i-1]) { - continue - } stack = append(stack, task{a: av.Items[i], b: bl.Items[i]}) } case *MShellDict: @@ -1395,18 +1337,9 @@ func compareValues(a, b MShellObject) int { bk := sortedDictKeys(bd.Items) n := min(len(ak), len(bk)) stack = append(stack, task{lit: cmpInt(len(ak), len(bk)), isLit: true}) - var lastVA, lastVB MShellObject for i := n - 1; i >= 0; i-- { // Pushed so `key compare` pops before its `value compare`. - // A value pair pointer-identical to the neighboring key's is - // skipped (it compares 0) — the dict-shaped self-doubling - // value would otherwise double the walk per level. The key - // comparison itself always stays. - va, vb := av.Items[ak[i]], bd.Items[bk[i]] - if !(sameRef(va, lastVA) && sameRef(vb, lastVB)) { - stack = append(stack, task{a: va, b: vb}) - lastVA, lastVB = va, vb - } + stack = append(stack, task{a: av.Items[ak[i]], b: bd.Items[bk[i]]}) stack = append(stack, task{lit: strings.Compare(ak[i], bk[i]), isLit: true}) } case *MShellEnum: @@ -1414,10 +1347,6 @@ func compareValues(a, b MShellObject) int { n := min(len(av.Payload), len(be.Payload)) stack = append(stack, task{lit: cmpInt(len(av.Payload), len(be.Payload)), isLit: true}) for i := n - 1; i >= 0; i-- { - // Skip a pair pointer-identical to its neighbor (see list arm). - if i > 0 && sameRef(av.Payload[i], av.Payload[i-1]) && sameRef(be.Payload[i], be.Payload[i-1]) { - continue - } stack = append(stack, task{a: av.Payload[i], b: be.Payload[i]}) } // Name and member (declaration order) compare before any payload. From 2e2f4b6d2bd1ac3ef2eddeb44b5cf74e105c9ff5 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Thu, 2 Jul 2026 21:51:02 -0500 Subject: [PATCH 28/32] Error on duplicate definition names instead of silently ignoring them MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Definition lookup is first-match-wins over [stdlib, init, script], so a second def of a name never took effect — it was silently dead code while the first definition kept running. A script "redefining" a stdlib word, or an interactive redefinition, quietly did nothing; and since the type checker registered the duplicate as an overload, a call could type-check against the dead def's signature while the runtime executed the other body. (Found via tests/success/enum_recursive_generic.msh, whose local `def id` was shadowed by std.msh's id all along; renamed to `ident`.) Reject duplicates everywhere instead: - Runtime: FindDuplicateDefinition checks startup loading, script startup+file assembly, and each interactive input line (which is rejected with the session continuing). The error reports both positions. - Checker: def registration records every definition name (mirroring the enum-member collision check) and rejects a repeat in both RegisterStdlibSigs and CheckProgram, so --type-check-only and the LSP report it too. A stdlib def records its name even when its sig defers to a table builtin (the 2unpack case) — runtime lookup still resolves to it. Def-shadowing-builtins stays legal; std.msh does that on purpose. - Lexer: makeToken now stamps Token.TokenFile, which was wired through the lexer but never assigned. Cross-file collisions can then name the other file: "already defined at lib/std.msh:62:5". No existing test fixture is affected; stdin input still formats as bare line:col. Co-Authored-By: Claude Fable 5 --- CHANGELOG.md | 11 +++++++ doc/mshell.md | 5 +++ mshell/Evaluator.go | 32 +++++++++++++++++++ mshell/Lexer.go | 1 + mshell/Main.go | 17 ++++++++++ mshell/TypeCheckProgram.go | 28 ++++++++++++++++ mshell/TypeChecker.go | 8 +++++ tests/fail/duplicate_def.msh | 5 +++ tests/fail/duplicate_def.msh.stderr | 1 + tests/success/enum_recursive_generic.msh | 4 +-- tests/typecheck_fail/duplicate_def.msh | 4 +++ tests/typecheck_fail/duplicate_def_stdlib.msh | 4 +++ 12 files changed, 118 insertions(+), 2 deletions(-) create mode 100644 tests/fail/duplicate_def.msh create mode 100644 tests/fail/duplicate_def.msh.stderr create mode 100644 tests/typecheck_fail/duplicate_def.msh create mode 100644 tests/typecheck_fail/duplicate_def_stdlib.msh diff --git a/CHANGELOG.md b/CHANGELOG.md index 13281322..92a51b19 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## Unreleased +### Changed + +- Breaking: defining a name that is already defined is now an error, at runtime + and in the type checker. This covers a second `def` in the same file, a + script `def` whose name is already taken by the standard library or an init + file, and an interactive redefinition. Definition lookup is first-match-wins, + so a duplicate never took effect anyway — it was silently dead code while the + first definition kept running; the error makes that visible. The message + reports both positions: `Duplicate definition 'id'; already defined at + lib/std.msh:62:5.` + ### Fixed - `gridSetCell` no longer silently drops a value whose type differs from the diff --git a/doc/mshell.md b/doc/mshell.md index c666c2f1..07f0b174 100644 --- a/doc/mshell.md +++ b/doc/mshell.md @@ -714,6 +714,11 @@ end Metadata values must be static: strings (single or double quoted), integers, floats, booleans, or nested lists/dicts of the same. Interpolated strings are not allowed. +Definition names must be unique. +Defining a name that is already defined — by the same file, the standard library, or an init file — is an error, +both at runtime and in the type checker. +(Lookup is first-match-wins, so a duplicate would never take effect; the error makes that visible.) + ### Tail-Call Optimization Recursive definitions in tail position are optimized to avoid stack overflow. diff --git a/mshell/Evaluator.go b/mshell/Evaluator.go index 9c220940..eda38f95 100644 --- a/mshell/Evaluator.go +++ b/mshell/Evaluator.go @@ -475,6 +475,38 @@ func (state *EvalState) lookupDefinition(definitions []MShellDefinition, name st return definitions[i], true } +// tokenPosStr formats a token's position as `path:line:col` (or `line:col` +// when the source file is unknown, e.g. stdin) for use in error messages. +func tokenPosStr(t Token) string { + if t.TokenFile != nil && t.TokenFile.Path != "" { + return fmt.Sprintf("%s:%d:%d", t.TokenFile.Path, t.Line, t.Column) + } + return fmt.Sprintf("%d:%d", t.Line, t.Column) +} + +// FindDuplicateDefinition scans the given definition slices, in order, and +// returns an error for the first name defined twice. Definition lookup is +// first-match-wins, so a second definition of a name is never an override — +// it would be silently dead code. Erroring keeps a script (or init file, or +// interactive input) from redefining a name already taken by the standard +// library, an earlier startup file, or itself. +func FindDuplicateDefinition(defLists ...[]MShellDefinition) error { + seen := make(map[string]Token) + for _, defs := range defLists { + for i := range defs { + name := defs[i].Name + prev, exists := seen[name] + if !exists { + seen[name] = defs[i].NameToken + continue + } + return fmt.Errorf("%s: Duplicate definition '%s'; already defined at %s.\n", + tokenPosStr(defs[i].NameToken), name, tokenPosStr(prev)) + } + } + return nil +} + func (state *EvalState) AddCompletionDefinitions(definitions []MShellDefinition) { if state.CompletionDefinitions == nil { state.CompletionDefinitions = make(map[string][]MShellDefinition) diff --git a/mshell/Lexer.go b/mshell/Lexer.go index 7b6bf50c..66eee988 100644 --- a/mshell/Lexer.go +++ b/mshell/Lexer.go @@ -400,6 +400,7 @@ func (l *Lexer) makeToken(tokenType TokenType) Token { Start: l.start, Lexeme: lexeme, Type: tokenType, + TokenFile: l.tokenFile, } } diff --git a/mshell/Main.go b/mshell/Main.go index ef9b9dca..d01ff600 100644 --- a/mshell/Main.go +++ b/mshell/Main.go @@ -181,6 +181,9 @@ func loadStartupFile(path string, description string, stack *MShellStack, contex } *definitions = append(*definitions, parsedFile.Definitions...) + if err := FindDuplicateDefinition(*definitions); err != nil { + return fmt.Errorf("error loading %s at %s: %w", description, path, err) + } state.AddCompletionDefinitions(parsedFile.Definitions) // Register enum constructors declared in this startup file, and retain the // top-level items so the type checker can register the file's `type` and @@ -861,6 +864,14 @@ func main() { } } + // Definition lookup is first-match-wins, so a script def whose name is + // already taken (by the stdlib, the init file, or the script itself) + // would be silently dead code. Reject it instead. + if err := FindDuplicateDefinition(allDefinitions); err != nil { + fmt.Fprint(os.Stderr, err.Error()) + os.Exit(1) + } + if len(file.Items) == 0 { os.Exit(0) } @@ -2980,6 +2991,12 @@ func (state *TermState) ExecuteCurrentCommand() (bool, int) { term.Restore(state.stdInFd, &state.oldState) if len(parsed.Definitions) > 0 { + // Definition lookup is first-match-wins, so a redefinition would be + // silently ignored rather than take effect; reject the input instead. + if err := FindDuplicateDefinition(state.stdLibDefs, parsed.Definitions); err != nil { + fmt.Fprint(os.Stderr, err.Error()) + goto PromptPrint + } state.stdLibDefs = append(state.stdLibDefs, parsed.Definitions...) state.evalState.AddCompletionDefinitions(parsed.Definitions) } diff --git a/mshell/TypeCheckProgram.go b/mshell/TypeCheckProgram.go index d5750877..2c0a0c3b 100644 --- a/mshell/TypeCheckProgram.go +++ b/mshell/TypeCheckProgram.go @@ -87,6 +87,12 @@ func (c *Checker) RegisterStdlibSigs(defs []MShellDefinition) { for i := range defs { def := &defs[i] nameId := c.names.Intern(def.Name) + // Record the name even when the sig registration below is skipped: + // the runtime's first-match-wins lookup still resolves to this def, + // so a later def of the same name is a duplicate regardless. + if c.recordDefName(nameId, def) { + continue + } if _, exists := c.nameBuiltins[nameId]; exists { continue } @@ -95,6 +101,25 @@ func (c *Checker) RegisterStdlibSigs(defs []MShellDefinition) { } } +// recordDefName registers a definition's name for duplicate detection. If the +// name is already taken by an earlier definition, it records an error and +// returns true (mirroring the runtime's FindDuplicateDefinition, where the +// first definition wins and a duplicate would be silently dead code). +func (c *Checker) recordDefName(nameId NameId, def *MShellDefinition) bool { + if prev, exists := c.defNameToks[nameId]; exists { + c.errors = append(c.errors, TypeError{ + Kind: TErrTypeParse, Pos: def.NameToken, + Hint: "duplicate definition '" + def.Name + "'; already defined at " + tokenPosStr(prev), + }) + return true + } + if c.defNameToks == nil { + c.defNameToks = make(map[NameId]Token) + } + c.defNameToks[nameId] = def.NameToken + return false +} + // RegisterStartupTypes registers the `type` and `enum` declarations found in // the startup files' top-level items (the stdlib, then the user init file), // so the checked program sees the same declarations the runtime does. It runs @@ -173,6 +198,9 @@ func (c *Checker) CheckProgram(file *MShellFile) { }) continue } + if c.recordDefName(nameId, def) { + continue + } c.nameBuiltins[nameId] = append(c.nameBuiltins[nameId], sig) } // Pre-pass 3: type-check each def body against its declared sig. diff --git a/mshell/TypeChecker.go b/mshell/TypeChecker.go index e856c400..2bb712cd 100644 --- a/mshell/TypeChecker.go +++ b/mshell/TypeChecker.go @@ -109,6 +109,14 @@ type Checker struct { // existing def or builtin. enumMemberToks map[NameId]Token + // defNameToks records every registered definition name (value: the def's + // name token). Runtime definition lookup is first-match-wins, so a second + // def of a name is silently dead code, not an override; def registration + // checks this and rejects the duplicate, mirroring the runtime's + // FindDuplicateDefinition. Stdlib/init defs register before file defs, + // so a script redefining a stdlib name is caught too. + defNameToks map[NameId]Token + // Quote-body inference state (Phase 7). When inferring is true, // applySig responds to stack underflow by synthesizing fresh type // variables instead of reporting an error; those vars accumulate diff --git a/tests/fail/duplicate_def.msh b/tests/fail/duplicate_def.msh new file mode 100644 index 00000000..831b2e97 --- /dev/null +++ b/tests/fail/duplicate_def.msh @@ -0,0 +1,5 @@ +# Defining the same name twice is an error: definition lookup is +# first-match-wins, so the second def would be silently dead code. +def greet (-- str) "hi" end +def greet (-- str) "yo" end +greet wl diff --git a/tests/fail/duplicate_def.msh.stderr b/tests/fail/duplicate_def.msh.stderr new file mode 100644 index 00000000..74ff8249 --- /dev/null +++ b/tests/fail/duplicate_def.msh.stderr @@ -0,0 +1 @@ +4:5: Duplicate definition 'greet'; already defined at 3:5. diff --git a/tests/success/enum_recursive_generic.msh b/tests/success/enum_recursive_generic.msh index 1f2773dd..51fbf6c3 100644 --- a/tests/success/enum_recursive_generic.msh +++ b/tests/success/enum_recursive_generic.msh @@ -4,7 +4,7 @@ # payload (`node Tree Tree`) forever and overflows the stack. enum Tree = leaf int | node Tree Tree end -def id (q -- q) end +def ident (q -- q) end -3 leaf id drop +3 leaf ident drop "ok" wl diff --git a/tests/typecheck_fail/duplicate_def.msh b/tests/typecheck_fail/duplicate_def.msh new file mode 100644 index 00000000..0455839f --- /dev/null +++ b/tests/typecheck_fail/duplicate_def.msh @@ -0,0 +1,4 @@ +# A name defined twice in one file is rejected by the checker. +def greet (-- str) "hi" end +def greet (-- str) "yo" end +greet wl diff --git a/tests/typecheck_fail/duplicate_def_stdlib.msh b/tests/typecheck_fail/duplicate_def_stdlib.msh new file mode 100644 index 00000000..c8015faf --- /dev/null +++ b/tests/typecheck_fail/duplicate_def_stdlib.msh @@ -0,0 +1,4 @@ +# Redefining a name already defined by the standard library is rejected: +# lookup is first-match-wins, so this def could never take effect. +def id (q -- q) end +3 id drop From 3b7686002dfc2d94d9663e3ca2470c149e56ca5e Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Fri, 3 Jul 2026 10:01:53 -0500 Subject: [PATCH 29/32] PR polish: CHANGELOG gaps and a comment overclaim MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add the two user-facing branch changes missing from Unreleased: the cyclic-value render error (str/toJson error instead of hanging) and match type-checking inside inferred quotations ((match ...) map). Also correct compareValues' doc comment: compare-0 coincides with Equals only for orderable kinds — quotations and grids share a rank and compare 0 while Equals still distinguishes them. Co-Authored-By: Claude Fable 5 --- CHANGELOG.md | 8 ++++++++ mshell/MShellObject.go | 8 ++++++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 92a51b19..4bab1cdc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -31,6 +31,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 types yields `false` rather than an error (a genuinely incompatible comparison is already a static type error), so the result no longer depends on operand order and union members like `int | null` compare cleanly. +- Converting a cyclic value (a container appended into itself) with `str` or + `toJson` now fails with a clear error instead of hanging forever. Internal + rendering (error messages, stack dumps) prints a `` marker at the + back-reference instead. +- A `match` used as the body of an inferred quotation (e.g. + `(match leaf n : @n, node a b : 0, end) map`) now type-checks; it previously + always failed with "stack underflow at 'match'", rejecting the canonical way + to consume enums and Maybe values inside `map`/`filter`/`each`. - `uniq` now accepts a list of any value type (matching its `([t] -- [t])` signature) and deduplicates by structural equality, instead of throwing at runtime for non-primitive elements such as enums, dicts, and booleans. diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index aa060fad..54ed3d17 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -1233,8 +1233,12 @@ func equalsIter(a, b MShellObject) (bool, error) { // interleaved, text lexically, dates chronologically, bytes bytewise). // Structured values compare lexicographically: lists positionally (shorter // prefix first), dicts by sorted key then value, enums by name then declaration -// order then payloads. The order agrees with structural equality: compareValues -// returns 0 exactly when the two values are Equals. +// order then payloads. For those kinds the order agrees with structural +// equality: compareValues returns 0 exactly when the two values are Equals. +// Unorderable kinds (quotation, grid, ...) are the exception — they share a +// rank and always compare 0, so a stable sort preserves their original order, +// while Equals still distinguishes them (identity for quotations, cell-wise +// for grids). // // The comparison is driven by an explicit work stack rather than recursion, so // arbitrarily deep values (e.g. a long `node(node(...))` enum chain) cannot From fc7ba60c50ecd09120001993014e87eeef7cdbb8 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Fri, 3 Jul 2026 10:08:42 -0500 Subject: [PATCH 30/32] Syntax highlighting: cover the enum keyword and new int literals - doc/base.html: style mshellENUM (new this branch) and mshellTYPE (pre-existing gap) with the other declaration keywords, so --html-highlighted enum/type snippets render like def/match/end. - Sublime: add enum, type, and match to the keyword list, and 0o/0b integer literal patterns alongside the existing hex one. The VS Code grammar highlights no keywords at all (pre-existing design), so it needs no enum entry to stay consistent. Co-Authored-By: Claude Fable 5 --- doc/base.html | 2 +- sublime/msh.sublime-syntax | 6 +++++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/doc/base.html b/doc/base.html index 6e5491ee..2ea961f7 100644 --- a/doc/base.html +++ b/doc/base.html @@ -27,7 +27,7 @@ color: #0000FF; } - .mshellIF, .mshellELSE, .mshellELSESTAR, .mshellSTARIF, .mshellEND, .mshellDEF, .mshellMATCH { + .mshellIF, .mshellELSE, .mshellELSESTAR, .mshellSTARIF, .mshellEND, .mshellDEF, .mshellMATCH, .mshellENUM, .mshellTYPE { color: #0F4C81; font-weight: bold; } diff --git a/sublime/msh.sublime-syntax b/sublime/msh.sublime-syntax index 965d4832..3f9d4a07 100644 --- a/sublime/msh.sublime-syntax +++ b/sublime/msh.sublime-syntax @@ -26,7 +26,7 @@ contexts: scope: keyword.control.msh - match: '\\*if' scope: keyword.control.msh - - match: '\\b(def|end|if|iff|loop|read|str|break|continue|else)\\b' + - match: '\\b(def|end|if|iff|loop|read|str|break|continue|else|match|enum|type)\\b' scope: keyword.control.msh - match: '\\b(and|or|not)\\b' scope: keyword.operator.word.msh @@ -48,6 +48,10 @@ contexts: numbers: - match: '\\b0[xX][0-9A-Fa-f]+\\b' scope: constant.numeric.integer.hex.msh + - match: '\\b0[oO][0-7]+\\b' + scope: constant.numeric.integer.octal.msh + - match: '\\b0[bB][01]+\\b' + scope: constant.numeric.integer.binary.msh - match: '\\b\\d+\\.\\d*(?:[eE][+-]?\\d+)?\\b' scope: constant.numeric.float.msh - match: '\\b\\d+(?:[eE][+-]?\\d+)?\\b' From b28b00aa8758bbe5476ca090c823134823464156 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Fri, 3 Jul 2026 11:41:33 -0500 Subject: [PATCH 31/32] VS Code grammar: highlight keywords, word operators, types, numbers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The grammar only highlighted comments, booleans, strings, and variables — no keyword rules at all, so def/end/match and the new enum/type rendered as plain text. Add the same rule set as the Sublime grammar: control keywords (including enum and type and the else*/*if forms), and/or/not, soe, the int/float/bool type names, and numeric literals including the new 0x/0o/0b integer forms. Rules are placed after the variable patterns so a same-position tie like `str!` keeps resolving to the variable-store rule. Co-Authored-By: Claude Fable 5 --- code/syntaxes/mshell.textmate.json | 40 ++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/code/syntaxes/mshell.textmate.json b/code/syntaxes/mshell.textmate.json index b7bfb655..678189e1 100644 --- a/code/syntaxes/mshell.textmate.json +++ b/code/syntaxes/mshell.textmate.json @@ -45,6 +45,46 @@ { "name": "variable.other.set.mshell", "match": "@[a-zA-Z0-9_]+" + }, + { + "name": "keyword.control.mshell", + "match": "else\\*|\\*if" + }, + { + "name": "keyword.control.mshell", + "match": "\\b(def|end|if|iff|loop|read|str|break|continue|else|match|enum|type)\\b" + }, + { + "name": "keyword.operator.word.mshell", + "match": "\\b(and|or|not)\\b" + }, + { + "name": "keyword.other.mshell", + "match": "\\bsoe\\b" + }, + { + "name": "storage.type.mshell", + "match": "\\b(int|float|bool)\\b" + }, + { + "name": "constant.numeric.integer.hex.mshell", + "match": "\\b0[xX][0-9A-Fa-f]+\\b" + }, + { + "name": "constant.numeric.integer.octal.mshell", + "match": "\\b0[oO][0-7]+\\b" + }, + { + "name": "constant.numeric.integer.binary.mshell", + "match": "\\b0[bB][01]+\\b" + }, + { + "name": "constant.numeric.float.mshell", + "match": "\\b\\d+\\.\\d*(?:[eE][+-]?\\d+)?\\b" + }, + { + "name": "constant.numeric.integer.mshell", + "match": "\\b\\d+(?:[eE][+-]?\\d+)?\\b" } ], "repository": { From 7b7e03ea4935491a41cce133a62e33b584d97857 Mon Sep 17 00:00:00 2001 From: Mitchell Paulus Date: Fri, 3 Jul 2026 20:15:21 -0500 Subject: [PATCH 32/32] Bring the enum design doc up to the shipped design MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The doc predated several final decisions and still read as a proposal. Update it to be the accurate decision record: - Status header: implemented (V1), with a map of what shipped vs. what stayed future work. - Fix examples that no longer parse: `read`/`write` are lexer keywords and cannot be member names; the §9/§10.A examples were missing the mandatory `end` terminator the doc itself decided on in §6. - §8: the shipped name-resolution rule replaced the draft's CmdResult.ok qualification — member names are globally unique and member/def collisions are declaration errors in both directions. - §9: record the shipped serialization (externally-tagged toJson, member(payload) str form), structural equality, and declaration-order sorting. - §10: per-tier status (A shipped; shape payload types shipped, their destructuring sugar not; B/D/E/F future). - §11: rewrite the end-to-end example against the shipped feature — an explicit parseMode boundary word instead of the unimplemented Mode.decode/Mode.values string backing. Example verified to run and type-check. - §12: open questions annotated with how each resolved. Design-dir edit explicitly requested (doc was AI-authored). Co-Authored-By: Claude Fable 5 --- design/literal_or_enum_typing.html | 157 ++++++++++++++++++----------- 1 file changed, 99 insertions(+), 58 deletions(-) diff --git a/design/literal_or_enum_typing.html b/design/literal_or_enum_typing.html index d5d738ab..d9bc41e3 100644 --- a/design/literal_or_enum_typing.html +++ b/design/literal_or_enum_typing.html @@ -163,11 +163,14 @@

    Enums & Generative Types

    -

    Status: design exploration / not implemented. Records the decision to +

    Status: implemented (V1). Records the decision to add a single generative tagged sum type (declared with enum), the reasoning that got there (the structural-vs-generative distinction and the Haskell / Rust / TypeScript - prior art), the chosen surface syntax, and a sketch of representation, ergonomic sugar, and open - questions. For review.

    + prior art), and the shipped surface syntax. The core shipped as designed: payload-carrying members, + recursive references, exhaustive match with payload binding, structural + equality/ordering/JSON. §8 records where the shipped name-resolution rule replaced this doc's draft, + §10 marks each sugar tier's status (A is the shipped base form; B–F remain future work), and + §12 records how each open question resolved.

    The motivating request was type ConfigOption = "string1" | "string2". Working @@ -198,7 +201,7 @@

    2. Where mshell stands today

    Surface type Name = <expr>, | unions, as castshaveTypeExpr.go, TypeParseIntegration.go One generative tagged sum type: Maybe[t] = just t | nonehaveMShellObject.go:172; just/none Match with constructor destructuring + exhaustivenesshaveTypeBranch.go; match - A declaration that introduces new constructorsmissing— (this proposal) + A declaration that introduces new constructorshaveenumTypeEnum.go, MShellEnum in MShellObject.go @@ -274,9 +277,10 @@

    4. Prior art: Haskell, Rust, TypeScript

    5. Decision

    Add a single generative tagged sum type declaration. Keep type exactly as it is (the transparent / branded structural form). The colloquial "enum of constants" is the all-nullary - special case of the one mechanism — not a second concept. This subsumes Maybe (which - becomes "the built-in enum Maybe[t] = just t | none") and unlocks "make illegal states - unrepresentable" for command results, parse results, and JSON.

    + special case of the one mechanism — not a second concept. This conceptually subsumes Maybe + ("the built-in enum Maybe[t] = just t | none"; it stays a distinct built-in until enums + grow type parameters — see §10.E) and unlocks "make illegal states unrepresentable" for command + results, parse results, and JSON.

    6. Syntax

    @@ -299,7 +303,7 @@

    Declaration form — |-separated members, closed by e terminator def, if, match, and loop already use. A nullary member is just a bare name.

    -
    enum Mode   = read | write | readwrite end
    +  
    enum Mode   = readonly | writeonly | readwrite end
     enum Shape  = circle float | rect float float | point end
     enum CmdResult = ok str | failed int str | timeout end
    @@ -317,8 +321,10 @@

    Declaration form — |-separated members, closed by e

    Grammar:

    • enum Name = arm (| arm)* end
    • -
    • arm ::= constructorName typePrimary*   (the constructor name is a bare identifier; each payload - is a type primary, so a union payload must be named via a type alias)
    • +
    • arm ::= constructorName typePrimary*   (the constructor name is a bare identifier — not a + language keyword such as read or str, and not a name already taken by a + def or another enum's member, see §8; each payload is a type primary, so a union payload + must be named via a type alias)

    The only new wrinkle vs. a structural union is that an arm's first token is a binding occurrence (a constructor being declared) rather than a reference to an existing type. The @@ -355,21 +361,27 @@

    7. Construction and matching (use sites)

    The payoff: add a fourth constructor later and every match that forgot it becomes a compile error.

    -

    8. Namespacing, not case

    +

    8. Unique member names, not case, not namespacing

    Capitalization is deliberately given no meaning anywhere. The one job case used to do — telling a - bare constructor apart from a bare variable / def — is handled by - namespacing plus checker context, the same way just/none and - branded unions already resolve:

    + bare constructor apart from a bare variable / def — had a draft answer here + (a qualified CmdResult.ok form plus checker-context resolution). What shipped is + simpler and stricter: collisions are declaration errors, so a bare member name is always + unambiguous and no qualified form exists:

      -
    • A qualified form CmdResult.ok is always available and unambiguous.
    • -
    • The bare form ok is allowed when the checker can infer the expected enum - type from context — the match subject, a signature slot, or an as. This - is the deferred-to-checker disambiguation already flagged for branded unions in - ai/type_checker.md.
    • -
    • A nullary constructor (e.g. timeout) that would collide with a def of the - same name resolves to the def unless context expects the enum; write - CmdResult.timeout to force the constructor.
    • +
    • Member names are unique across all enums. Declaring a member another enum already + uses is a static error, so a bare ok identifies its enum by itself.
    • +
    • Members and defs share the word namespace, and a collision in either direction + is an error: an enum member may not reuse an existing def/builtin name, and a + def may not reuse a member name. (The same rule applies between defs + themselves — a duplicate definition name is an error, since lookup is first-match-wins and a + duplicate would be silently dead code.)
    • +
    • Member names may not be language keywords (read, str, + type, …), and a lone _ is reserved as the match + wildcard.
    +

    match arms need no qualification either: the checker identifies the subject's enum from + the member names in the arms, including when the match is the body of an inferred + quotation ((match … end) map).

    9. Representation & implementation sketch

      @@ -390,21 +402,27 @@

      9. Representation & implementation sketch

      just v path (TypeCheckProgram.go:1348).
    1. Forward / recursive types work via the existing top-level pre-pass that reserves placeholder TypeIds for type headers (extend it to enum): - enum Json = jnull | jbool bool | jnum float | jstr str | jarr [Json] | jobj {str: Json}.
    2. -
    3. Serialization. Payload variants have no canonical string form, so JSON/argv I/O needs - explicit encode/decode (serde-style). The all-nullary case is special-cased — see §10.B.
    4. + enum Json = jnull | jbool bool | jnum float | jstr str | jarr [Json] | jobj {str: Json} end. +
    5. Serialization (as shipped). str renders member / + member(p0 p1 ...); toJson uses serde's externally-tagged convention (a + nullary member is the bare member string, one payload is {"member": value}, several are + {"member": [v0, v1, ...]}); equality is structural and total; sort orders + members by declaration order (a stored member index), so low | medium | high sorts as + the author intended. Parsing JSON/argv strings back into an enum still needs an explicit + decode word — see §10.B and §11.

    10. Conciseness sugar for common cases

    -

    Ranked by value-for-effort. Tiers A–C are cheap and high-value; D–F are speculative until - usage demands them.

    +

    Ranked by value-for-effort. Status: A is the shipped base form, and the payload half of C + (a shape as a payload type) shipped for free; B, C's destructuring sugar, and D–F remain future + work — recorded here as designed so they can land without re-deciding.

    -

    A. The nullary one-liner (already the base form)

    -
    enum Env   = dev | staging | prod
    -enum Level = debug | info | warn | error
    +

    A. The nullary one-liner (already the base form) — shipped

    +
    enum Env   = dev | staging | prod end
    +enum Level = debug | info | warn | error end

    This is the everyday "configuration option" case, and it is already as terse as it can be.

    -

    B. Auto string backing + derived decode / encode / values

    +

    B. Auto string backing + derived decode / encode / values — future

    For an all-nullary enum, auto-back each constructor by its name and derive three functions, since config crosses a string boundary. Override the backing with = when the wire format differs:

    enum Method = get = "GET" | post = "POST" | put = "PUT"
    @@ -445,8 +463,11 @@ 

    The backing string is wire-only — two paths to a value, no implicit co bare-literal coercion is explicitly not a goal — it would be mshell's first implicit coercion and would leak the wire format into program logic.

    -

    C. Shape payloads → free named destructuring

    -

    Let a variant's payload be a shape; this reuses TKShape and the existing +

    C. Shape payloads → free named destructuring — half shipped

    +

    Shipped: a payload may already be any type expression, including a shape — + enum Event = click {"x": int} | close end works today, with a match arm binding the whole + dict (click d : @d :x? …). Future: the sugar below, destructuring + the shape into named bindings in the pattern itself; it reuses TKShape and the existing { 'k': name } dict-pattern matcher (including optional ?: fields) for both construction and destructuring:

    enum Event = click {x: int, y: int} | key {code: int, shift?: bool} | close
    @@ -457,15 +478,17 @@ 

    C. Shape payloads → free named destructuring

    close : "bye" wl, end
    -

    D. Combined arms (|) for fan-in

    +

    D. Combined arms (|) for fan-in — future

    status match
         pending | running : "in progress" wl,
         done              : "complete" wl,
         failed e          : @e wl,
     end
    -

    E. Generic enums subsume the built-ins

    -
    enum Result[t, e] = ok t | err e
    +  

    E. Generic enums subsume the built-ins — future

    +

    Enums are not yet generic: a recursive payload (node Tree Tree) works, type parameters do + not, so Maybe remains the built-in instance rather than being literally subsumed.

    +
    enum Result[t, e] = ok t | err e end
     # and Maybe[t] = just t | none is simply built in

    F. ?-style propagation (speculative)

    @@ -474,44 +497,62 @@

    F. ?-style propagation (speculative)

    @x parseInt match just v : @v, none : return end to something like @x parseInt!?. Defer until Result is idiomatic.

    -

    11. End-to-end example

    -
    enum Mode = read | write | readwrite = "rw"
    +  

    11. End-to-end example (as shipped)

    +

    The wire boundary is an explicit match on the string; §10.B's derived + Mode.decode / Mode.values would replace parseMode if backing + ever lands:

    +
    enum Mode = readonly | writeonly | readwrite end
    +
    +def parseMode (str -- Maybe[Mode])
    +    match
    +        "ro" : readonly just,
    +        "wo" : writeonly just,
    +        "rw" : readwrite just,
    +        _    : none,
    +    end
    +end
     
     def openFile (path Mode -- Handle)
         mode!
         @mode match
    -        read      : @path openRead,
    -        write     : @path openWrite,
    +        readonly  : @path openRead,
    +        writeonly : @path openWrite,
             readwrite : @path openRW,
         end
     end
     
     # at the boundary: a string from argv, validated exactly once
    -$1 Mode.decode match
    +$1 parseMode match
         just m : somePath @m openFile use,
    -    none   : $"--mode must be one of {Mode.values}" wl 1 exit,
    +    none   : "--mode must be one of ro, wo, rw" wl 1 exit,
     end

    The checker guarantees openFile handles every mode, that no unvalidated string reaches it, - and that adding a append constructor breaks the build at openFile until it is + and that adding an append constructor breaks the build at openFile until it is handled.

    -

    12. Open questions

    +

    12. Open questions — as resolved

      -
    • Backing types & defaults: auto-back nullary enums by constructor name verbatim, - or lowercased? Allow int backings as well as str?
    • -
    • Ergonomics — decided (V1): a value is produced only by a constructor or - decode; a bare backing literal does not stand in for a member and - str as Method is rejected (see §10.B). A checked as remains a clean, +
    • Backing types & defaults — still open: auto-back nullary enums by + constructor name verbatim, or lowercased? Allow int backings as well as + str? Deferred with the rest of §10.B; nothing shipped constrains the answer.
    • +
    • Ergonomics — decided (V1), holds as shipped: a value is produced only by a + constructor (or a hand-written parse word returning Maybe, as in §11; §10.B's derived + decode would join them). A bare backing literal does not stand in for a member + and str as Method is rejected (see §10.B). A checked as remains a clean, additive follow-on gated on literal types; implicit coercion is ruled out.
    • -
    • Constructor namespace collisions: confirm the "bare resolves to def - unless context expects the enum; qualified forces the constructor" rule (§8) is the one we want.
    • -
    • Match-form ambiguity (already a noted risk in ai/type_checker.md): the - parser must not commit constructor-pattern vs. type-pattern vs. Maybe-pattern; defer to the - checker, which knows the static subject type.
    • -
    • Sequencing: ship nullary + backed enums first (covers config, low runtime cost), then - add payload variants? Both are the same declaration form, so this is a staging choice, not a fork.
    • -
    • Reserved word: adding enum as a keyword — audit for existing user - identifiers / completions.
    • +
    • Constructor namespace collisions — resolved, differently than §8's draft: + member names are globally unique and member/def collisions are declaration errors in + both directions; there is no qualified form and bare names are always unambiguous (§8).
    • +
    • Match-form ambiguity — resolved as designed: the parser stays uncommitted; + the checker discriminates constructor-pattern vs. type-pattern vs. Maybe-pattern by the + subject's static type, inferring the subject even when the match is an inferred + quotation's body. A bare enum type name is a valid arm for an enum inside a + type union.
    • +
    • Sequencing — resolved: nullary and payload variants shipped together (one + declaration form); backed enums remain future work.
    • +
    • Reserved word — done: enum is a lexer keyword, covered in the + docs highlighter and the Sublime / VS Code grammars; a lone _ is reserved as the + match wildcard at the same time.