diff --git a/CHANGELOG.md b/CHANGELOG.md index 8ba38cf6..4bab1cdc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,8 +7,65 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## Unreleased +### Changed + +- Breaking: defining a name that is already defined is now an error, at runtime + and in the type checker. This covers a second `def` in the same file, a + script `def` whose name is already taken by the standard library or an init + file, and an interactive redefinition. Definition lookup is first-match-wins, + so a duplicate never took effect anyway — it was silently dead code while the + first definition kept running; the error makes that visible. The message + reports both positions: `Duplicate definition 'id'; already defined at + lib/std.msh:62:5.` + +### Fixed + +- `gridSetCell` no longer silently drops a value whose type differs from the + column's original type (e.g. setting a string, enum, or bool into an int + column). The column is promoted to mixed storage so the value is stored, and + the other rows are preserved. +- Equality (`=`) is now total and defined for every value type. Lists, + quotations, pipes, and grids previously raised "equality not defined" at + runtime; lists/pipes/grids now compare structurally (element- and cell-wise) + and quotations compare by identity. Comparing values of different runtime + types yields `false` rather than an error (a genuinely incompatible + comparison is already a static type error), so the result no longer depends + on operand order and union members like `int | null` compare cleanly. +- Converting a cyclic value (a container appended into itself) with `str` or + `toJson` now fails with a clear error instead of hanging forever. Internal + rendering (error messages, stack dumps) prints a `` marker at the + back-reference instead. +- A `match` used as the body of an inferred quotation (e.g. + `(match leaf n : @n, node a b : 0, end) map`) now type-checks; it previously + always failed with "stack underflow at 'match'", rejecting the canonical way + to consume enums and Maybe values inside `map`/`filter`/`each`. +- `uniq` now accepts a list of any value type (matching its `([t] -- [t])` + signature) and deduplicates by structural equality, instead of throwing at + runtime for non-primitive elements such as enums, dicts, and booleans. +- `sort` now reorders the original elements and preserves their type, instead of + replacing every element with its string form. Previously `[10 2 1] sort` gave + the strings `1 10 2` (lexical order), and sorting a list of enums silently + dropped their payloads; now numbers sort numerically and stay numbers, and + every value keeps its type. Ordering is a total structural order: numbers + numerically, text lexically, lists positionally, dicts by sorted key/value, + enums by declaration order then payload, and different types by a fixed type + rank. (Use `sortV` for version/string sorting.) + ### Added +- `enum` declarations: a generative tagged sum type. Members are separated by + `|` and the declaration is closed by `end` (like `def`/`if`/`match`): + `enum CmdResult = ok str | failed int str | timeout end`. A member is a bare + constructor name optionally followed by payload types; a bare member word + constructs a value (consuming any payload from the stack, e.g. + `404 "x" failed`), and `match` dispatches on members with binding + (`failed code msg : ...`) and exhaustiveness checking — a match that omits a + member (or is empty) is rejected unless it has a `_` arm. Enums are nominal: + two enums with the same members are distinct types. Member names are + identifiers (not keywords) and are unique across all enums. `str` renders a + value as `member(payload ...)` and `toJson` uses the externally-tagged + convention. Payloads may reference the enum itself, so recursive enums like + `enum Tree = leaf int | node Tree Tree end` are supported. - Octal, hexadecimal, and binary integer literals via `0o`, `0x`, and `0b` prefixes (case-insensitive), e.g. `0o644`, `0xFF`, `0b101`. The base is purely a way of writing the literal; the value is an ordinary integer and prints in diff --git a/ai/enum_implementation_plan.md b/ai/enum_implementation_plan.md new file mode 100644 index 00000000..1e74346d --- /dev/null +++ b/ai/enum_implementation_plan.md @@ -0,0 +1,139 @@ +# Enum (generative tagged sum type) — implementation plan + +Companion to `design/literal_or_enum_typing.html` (the design + rationale). This is the +file-by-file build plan. Plans live here in `ai/`; the design lives in `design/`. + +**Status: implemented on branch `enum-types`** — nullary + payload-carrying members, +construction, nominal distinctness, and `match` (member dispatch, payload binding, +exhaustiveness) all ship in one PR. Payloads use a parenthesized list (`member(T..)`) +rather than the space-separated form originally sketched, because mshell has no statement +terminator and space-separated payloads are ambiguous against following code. +Out of scope, as agreed: derived `decode`/`encode`/`values`, backing strings, qualified +`Enum.member` names, generics, and `Result` (Maybe suffices). JSON stays a structural union. + +## Scope & non-goals + +In scope (a generative tagged sum type declared with `enum`, inline `= a | b | c`): + +- `enum Name = c1 | c2 | ...` and `enum Name = c1 t.. | c2 t.. | ...`. +- Constructors are case-free; produced only by a constructor word or `decode` (**Position 1** — + no implicit coercion, `str as Enum` rejected). +- `match` over members with exhaustiveness; payload binding reuses the `just v` path. + +Explicit non-goals (per the design + owner direction): + +- **No `decode` / `encode` / `values` derived functions** in v1. Reading config back is handled at + the use site with `match` (or whatever fits). This removes the wire/serialization surface. +- **No backing strings** (`member = "wire"`) in v1 — they only existed to feed `decode`/`encode`. + The member's own name is its identity. (Easy to add later when serialization is wanted.) +- **No qualified `Enum.member` / `Enum.method` dispatch** in v1. Members are referenced by bare name, + resolved by context; member names are unique across enums (collision is a declaration error). This + removes the `.`-lexing / qualified-dispatch unknown entirely. +- **No `Result` type.** `Maybe` already covers the common case. +- **No change to JSON typing.** `JsonScalar` / `Json` stay *structural unions* — their variants are + distinguishable by structural type, so they do not need tags. Enums are only for cases structure + cannot discriminate (e.g. two variants with the same payload type) and for closed config sets. +- **No generic enums** (`Enum[t, e]`) in v1. +- **No `?`-propagation sugar** in v1. +- A *checked* `"GET" as Method` stays deferred (it needs literal/singleton types). + +## Phasing + +Two PRs. Phase 1 (nullary enums) is now small — declaration, construction, match — with **no +serialization surface and no qualified names**. Phase 2 adds payload variants and the tagged runtime +value where `Evaluator.go` gets touched substantially. + +--- + +## Phase 1 — Nullary enums + +`enum Mode = read | write | readwrite`, match-by-member + exhaustiveness, Position 1. The full v1 +surface is: declare, construct (bare member word), and `match`. No `decode`/`encode`/`values`, no +backing strings, no qualified names. + +The runtime value just needs to carry **which member it is** (enum `NameId` + member `NameId`). A +lightweight value suffices; no new heavy `MShellObject` is required for Phase 1. + +### Type system + +1. **`Lexer.go`** — add an `ENUM` keyword token (via `literalOrKeywordType`). Audit existing user + identifiers/usages of `enum`. +2. **`TypeParseIntegration.go`** — add `MShellEnumDecl` (parallel to `MShellTypeDecl`) and + `ParseEnumDecl`: `enum` Name `=` member (`|` member)*, where a member is a bare `LITERAL`. No + backing clause. +3. **`Parser.go`** — add `case ENUM:` beside `case TYPE:` (≈ line 677) dispatching to `ParseEnumDecl`. +4. **`Type.go`** — add `TKEnum` kind + a variants side table (`[]EnumVariant{Name NameId; Payload + []TypeId}`; Payload empty in Phase 1), `MakeEnum`, hashconsing key, and accessors. Nominal + identity = the declaration `NameId` (two `enum`s with identical members stay distinct, like brands). +5. **`Type.go` / `TypeUnify.go`** — extend `walkTypeVars` and `typeRewriter.mapType` with a `TKEnum` + arm (recurse payload types; none in Phase 1). `unify` (`TypeChecker.go`): `TKEnum` unifies only + with the same enum (by name). +6. **Constructors as words** — register each member as a nullary sig `( -- Mode)`. Members live in a + **global constructor namespace**; a member name duplicated across two enums is a declaration error + in v1 (no qualification to disambiguate yet). A bare member word resolves to its enum; where an + expected enum type is in context (match subject, sig slot) that pins it. +7. **Pre-pass registration** — mirror `DeclareType` registration (`TypeCheckProgram.go:99-101`): + collect `enum` headers with placeholder TypeIds, resolve bodies, register constructor words, + detect cross-enum member-name collisions. +8. **Match** — `analyzeTokenPattern` (`TypeCheckProgram.go:1381`): an enum member name is a + recognized pattern that **credits coverage** against the enum's closed set (flip the + "value literals credit no coverage" behavior at `:1402` for enum subjects). `TypeBranch.go`: + exhaustiveness over the member set; narrowing (subject known to be that member in the arm). + +### Runtime (`Evaluator.go`) + +9. A lightweight enum value (enum + member ids). Constructor evaluation pushes it; member-pattern + matching extends the `matchTokenPattern` path that already handles `none`/type keywords near + `:1117`; plus equality, `DebugString`, `ToJson`. + +### Docs / housekeeping + +10. `doc/type_system.inc.html` + `doc/mshell.md` (rebuild with `cd doc; msh build.msh`). +11. `CHANGELOG.md` → Unreleased / Added. +12. `lib/std.msh` completions, in the documented Vim-fold pattern. +13. Tests: `tests/` (+ `typecheck_test.sh`) and `mshell/ go test`. Cover: decl parse, construct, + match exhaustive (no `_`), non-exhaustive rejected, member narrowing, `str as Enum` rejected, + two enums with same members stay distinct, duplicate member name across enums rejected. + +--- + +## Phase 2 — Payload-carrying variants + +`enum CmdResult = ok str | failed int str | timeout`. Adds: + +1. **Parser** — arms parse a constructor name followed by payload type exprs (reuse + `parseTypeExpr` productions for each payload). +2. **`Type.go`** — `EnumVariant.Payload` populated; payload types flow through hashconsing and the + rewriter arms added in Phase 1. +3. **Constructors with payloads** — `failed : (int str -- CmdResult)`, postfix, consume from the + stack like `5 just`. +4. **Runtime value** — a new `MShellObject` generalizing `Maybe`: `{ enum NameId; tag; payload + []MShellObject }`. `Maybe` is the proven two-variant precedent; follow its equality/`DebugString`/ + `ToJson` shape. Phase-1 nullary values fold in as the empty-payload case. +5. **Match payload binding** — extend the `just v`-style binding (`TypeCheckProgram.go:1348`, + `Evaluator.go:1055`) to N payloads: `failed c e : ...` binds `c`, `e`. +6. **Recursive enums** — already work via the placeholder-TypeId pre-pass. +7. Docs / changelog / completions / tests as above (payload construct + destructure + recursive + enum + exhaustiveness with payloads). + +(Serialization helpers — `decode`/`encode`/backing strings — remain out of scope until a concrete +need appears; config reads are handled with `match` at the use site.) + +--- + +## Process + +- New feature branch before any code (per `CLAUDE.md`). +- Build in `mshell/` (`go build -o ...`, in-repo cache if needed) before testing. +- `gofmt` only with explicit permission. +- `CHANGELOG.md` for user-facing additions; `mshell/BuiltInList.go` kept in sync if builtins added. + +## Decisions still to nail before coding Phase 1 + +None blocking. The former unknowns (qualified-name dispatch, backing defaults, decode/encode +delivery) are all dropped from v1 scope above. Remaining small calls can be made during the build: + +- Exact lexical home for the lightweight runtime enum value (new `MShellObject` vs. reuse). +- Whether a bare member word with **no** expected-type context (e.g. stored straight into a var) is + allowed (resolves via the global member namespace) or requires a context — default: allowed, since + member names are unique across enums in v1. diff --git a/code/syntaxes/mshell.textmate.json b/code/syntaxes/mshell.textmate.json index b7bfb655..678189e1 100644 --- a/code/syntaxes/mshell.textmate.json +++ b/code/syntaxes/mshell.textmate.json @@ -45,6 +45,46 @@ { "name": "variable.other.set.mshell", "match": "@[a-zA-Z0-9_]+" + }, + { + "name": "keyword.control.mshell", + "match": "else\\*|\\*if" + }, + { + "name": "keyword.control.mshell", + "match": "\\b(def|end|if|iff|loop|read|str|break|continue|else|match|enum|type)\\b" + }, + { + "name": "keyword.operator.word.mshell", + "match": "\\b(and|or|not)\\b" + }, + { + "name": "keyword.other.mshell", + "match": "\\bsoe\\b" + }, + { + "name": "storage.type.mshell", + "match": "\\b(int|float|bool)\\b" + }, + { + "name": "constant.numeric.integer.hex.mshell", + "match": "\\b0[xX][0-9A-Fa-f]+\\b" + }, + { + "name": "constant.numeric.integer.octal.mshell", + "match": "\\b0[oO][0-7]+\\b" + }, + { + "name": "constant.numeric.integer.binary.mshell", + "match": "\\b0[bB][01]+\\b" + }, + { + "name": "constant.numeric.float.mshell", + "match": "\\b\\d+\\.\\d*(?:[eE][+-]?\\d+)?\\b" + }, + { + "name": "constant.numeric.integer.mshell", + "match": "\\b\\d+(?:[eE][+-]?\\d+)?\\b" } ], "repository": { diff --git a/design/literal_or_enum_typing.html b/design/literal_or_enum_typing.html new file mode 100644 index 00000000..d9bc41e3 --- /dev/null +++ b/design/literal_or_enum_typing.html @@ -0,0 +1,567 @@ + + + + + + Enums & Generative Types — mshell design + + + +
+ +
+

Enums & Generative Types

+

Status: implemented (V1). Records the decision to + add a single generative tagged sum type (declared with enum), the + reasoning that got there (the structural-vs-generative distinction and the Haskell / Rust / TypeScript + prior art), and the shipped surface syntax. The core shipped as designed: payload-carrying members, + recursive references, exhaustive match with payload binding, structural + equality/ordering/JSON. §8 records where the shipped name-resolution rule replaced this doc's draft, + §10 marks each sugar tier's status (A is the shipped base form; B–F remain future work), and + §12 records how each open question resolved.

+
+ +

The motivating request was type ConfigOption = "string1" | "string2". Working + through it showed the real missing primitive is not "literal types" but a generative + declaration — one that mints constructors and tags values at runtime — of which a + plain enumeration is just the simplest case. The proposal is one new keyword, enum, whose + grammar is a one-token delta from the existing type X = A | B.

+ +

1. The question, answered in one line

+

"At what point does an enum differ from a regular type definition with constructors?"

+

It differs exactly when the declaration introduces constructors — new ways to + make values that the runtime must tag — rather than merely naming or + combining types that already have values. Below that point ("an enum" of bare constants) the difference + is cosmetic; at or above it (variants carrying payloads) it is a genuinely new capability that today's + type cannot express. So there is one mechanism, and the colloquial "enum" is its degenerate + case.

+ +

2. Where mshell stands today

+

Most of the machinery already exists, which is why this is an extension and not a bolt-on:

+ + + + + + + + + + + + + +
CapabilityStatusWhere
Hashconsed type arena (TypeId = structural identity)haveType.go
Structural unions A | B (flatten / sort / dedupe)haveTKUnion, MakeUnion
Nominal brands / newtypes (type X = ...)haveTKBrand, brandify (TypeCast.go)
HM unification, type vars, occurs check; subtyping folded into unifyhaveTypeUnify.go, TypeChecker.go
Surface type Name = <expr>, | unions, as castshaveTypeExpr.go, TypeParseIntegration.go
One generative tagged sum type: Maybe[t] = just t | nonehaveMShellObject.go:172; just/none
Match with constructor destructuring + exhaustivenesshaveTypeBranch.go; match
A declaration that introduces new constructorshaveenumTypeEnum.go, MShellEnum in MShellObject.go
+ +

Two structural facts frame everything:

+ + +

3. The conceptual spine: structural vs. generative

+ +

Two genuinely different kinds of declaration:

+ + + + + + + + + + +
type / unions / brandsgenerative sum type (enum)
Introduces new constructors?noyes
Discriminates by…structural typestored tag
Runtime footprint of the wrappernone (re-tag only)a stored tag
Two cases with the same payload type?impossiblefine
How you build a value<structural value> as Xcall a constructor
+ +

The litmus test

+

Try to express two cases that share a representation:

+
ok str | err str        # two cases, SAME payload type
+

A structural union str | str collapses to str; even branded, both cases are + str at runtime, so match — which discriminates by structural type — + cannot tell an ok from an err. The moment two variants share a representation + (or you want ok 5 and err 5 to be different values), you need a stored + tag, i.e. a real sum type. That is the precise line where "enum" stops being expressible as a + type.

+ +

4. Prior art: Haskell, Rust, TypeScript

+ + + + + + + + +
Tag / discriminant"enum" is…Structural unions?type keyword
Haskellimplicit (the constructor is the tag)an all-nullary data; enum ops via the derivable Enum classno — sum types are the only uniontransparent alias (structural)
Rustimplicit (compiler-managed)the C-like case of the one enum keywordno — must declare an enumtransparent alias (structural)
TypeScriptexplicit, hand-written data fielda separate weak runtime construct, mostly avoidedyes (untagged) — the default idiomstructural; unions + literals do the work
+ + + +

Punchline: Haskell and Rust both concluded the tagged sum type is the primitive and + the enum is its degenerate nullary case — one generative form, with type reserved for + transparent structural aliases. TypeScript went structural-first and bolted enum on, which + is exactly the schism to avoid. mshell already has the structural side (unions; brands ≈ transparent + aliases) and one generative sum type (Maybe), so it sits closer to Haskell/Rust. + The natural, non-bolted-on move is theirs.

+ +

5. Decision

+

Add a single generative tagged sum type declaration. Keep type exactly as + it is (the transparent / branded structural form). The colloquial "enum of constants" is the all-nullary + special case of the one mechanism — not a second concept. This conceptually subsumes Maybe + ("the built-in enum Maybe[t] = just t | none"; it stays a distinct built-in until enums + grow type parameters — see §10.E) and unlocks "make illegal states unrepresentable" for command + results, parse results, and JSON.

+ +

6. Syntax

+ +

Keyword

+

The keyword — not capitalization — signals "everything here is a constructor," which is what + lets the design be entirely case-free (see §8). Candidates:

+ + + + + + + + +
KeywordPrecedentNote
enum (chosen)Rust, Swiftmost recognizable; Rust/Swift legitimized it for payload-carrying variants
variantOCaml / Reasonaccurate, no baggage; less universally known
oneofProtobufself-describing; slightly informal
unionrejected: | already means a structural union
+ +

Declaration form — |-separated members, closed by end

+

Each arm is a constructor name followed by zero or more space-separated payload types, + arms are |-separated, and the body is closed by end — the same block + terminator def, if, match, and loop already use. A + nullary member is just a bare name.

+ +
enum Mode   = readonly | writeonly | readwrite end
+enum Shape  = circle float | rect float float | point end
+enum CmdResult = ok str | failed int str | timeout end
+ +

The end terminator is what keeps the grammar whitespace-insensitive. mshell + has no statement terminator, so an open-ended, space-separated payload list has no way to mark its end: + after a nullary member, the parser otherwise cannot tell a following (...) / + [...] statement from a payload. Delimiters attached to the member name + (failed(int str)) would fix it only by making whitespace significant, which we reject. + end bounds the whole member list instead — and because every payload type is + itself self-delimiting (a quote type (a -- b) closes at its )), space-separated + payloads inside the body are unambiguous. This is exactly why type X = (a -- b) already has + no boundary problem: a type body is one self-bounded expression, whereas an enum body is an open list + that needs a terminator.

+ +

Grammar:

+ +

The only new wrinkle vs. a structural union is that an arm's first token is a binding + occurrence (a constructor being declared) rather than a reference to an existing type. The + enum keyword is what tells the parser to read it that way — no capitalization rule.

+ +

A long block is still fine

+

Arms may be placed one per line, with an optional leading | for alignment:

+
enum Event =
+    | click int int
+    | key int
+    | close
+end
+ +

7. Construction and matching (use sites)

+

Both reuse the existing Maybe machinery verbatim — the whole point of "Maybe + is just the built-in case."

+ +

Construction is postfix, exactly like 5 just (just : (a -- Maybe[a])). + A constructor is a word whose stack effect is "consume the payloads, push the enum":

+
"hello\n" ok            # ( str -- CmdResult )
+404 "not found" failed  # ( int str -- CmdResult )
+timeout                 # ( -- CmdResult ), like bare `none`
+ +

Match reuses the just v binding arm; payload names bind in the body. No + _ is needed when every constructor is covered — the set is closed, so the checker + proves exhaustiveness (the same path just/none use in + TypeBranch.go):

+
cmd run match
+    ok out     : @out wl,
+    failed c e : $"{@c}: {@e}" wl,
+    timeout    : "timed out" wl,
+end
+ +

The payoff: add a fourth constructor later and every match that forgot it becomes a + compile error.

+ +

8. Unique member names, not case, not namespacing

+

Capitalization is deliberately given no meaning anywhere. The one job case used to do — telling a + bare constructor apart from a bare variable / def — had a draft answer here + (a qualified CmdResult.ok form plus checker-context resolution). What shipped is + simpler and stricter: collisions are declaration errors, so a bare member name is always + unambiguous and no qualified form exists:

+ +

match arms need no qualification either: the checker identifies the subject's enum from + the member names in the arms, including when the match is the body of an inferred + quotation ((match … end) map).

+ +

9. Representation & implementation sketch

+
    +
  1. Type kind. Add TKEnum (nominal, keyed by the declaration's + NameId) with a side table of variants: each variant is a (NameId, []TypeId) + payload list. Hashconsing, walkTypeVars, and typeRewriter.mapType gain one + arm each (recurse through payload types; nominal identity by name).
  2. +
  3. Constructors register as words with quote signatures + (failed : (int str -- CmdResult)), so application and overload machinery type-check them + with zero special-casing. Nullary constructors are ( -- Enum), like none.
  4. +
  5. Runtime value. A new MShellObject generalizing Maybe: a tag + (the variant NameId / index) plus a payload slice. Maybe is the pre-existing + two-variant instance, so the model is already proven; equality and DebugString follow its + shape.
  6. +
  7. Match. Constructor patterns credit coverage against the declaration's closed set + (flip today's "value literals credit no coverage", TypeCheckProgram.go:1402, for enum + subjects); exhaustiveness reuses the Maybe path; payload binding reuses the + just v path (TypeCheckProgram.go:1348).
  8. +
  9. Forward / recursive types work via the existing top-level pre-pass that reserves + placeholder TypeIds for type headers (extend it to enum): + enum Json = jnull | jbool bool | jnum float | jstr str | jarr [Json] | jobj {str: Json} end.
  10. +
  11. Serialization (as shipped). str renders member / + member(p0 p1 ...); toJson uses serde's externally-tagged convention (a + nullary member is the bare member string, one payload is {"member": value}, several are + {"member": [v0, v1, ...]}); equality is structural and total; sort orders + members by declaration order (a stored member index), so low | medium | high sorts as + the author intended. Parsing JSON/argv strings back into an enum still needs an explicit + decode word — see §10.B and §11.
  12. +
+ +

10. Conciseness sugar for common cases

+

Ranked by value-for-effort. Status: A is the shipped base form, and the payload half of C + (a shape as a payload type) shipped for free; B, C's destructuring sugar, and D–F remain future + work — recorded here as designed so they can land without re-deciding.

+ +

A. The nullary one-liner (already the base form) — shipped

+
enum Env   = dev | staging | prod end
+enum Level = debug | info | warn | error end
+

This is the everyday "configuration option" case, and it is already as terse as it can be.

+ +

B. Auto string backing + derived decode / encode / values — future

+

For an all-nullary enum, auto-back each constructor by its name and derive three functions, since config + crosses a string boundary. Override the backing with = when the wire format differs:

+
enum Method = get = "GET" | post = "POST" | put = "PUT"
+
+"GET" Method.decode    # ( str -- Maybe[Method] )   runtime-validated
+@m    Method.encode    # ( Method -- str )
+Method.values          # ( -- [Method] )            all members, in order
+

An all-nullary, all-backed enum can be represented at runtime as its backing string (free + serialization, no tag object) — the Rust "fieldless variants get a compact representation" move. + Payload-carrying enums fall back to the tagged value of §9.

+
$CONFIG_LEVEL Level.decode match
+    just lvl : @lvl configureLogging,
+    none     : $"bad level: {$CONFIG_LEVEL}; try {Level.values}" wl 1 exit,
+end
+ +

The backing string is wire-only — two paths to a value, no implicit coercion

+

Decided (V1): the only ways to produce an enum value are a constructor or + decode. The backing string is purely a serialization detail, never a way to name a + member in code — the same separation Rust/Serde draws (you write Method::Get in code; + "GET" is the wire format's business).

+ + + + + + +
PathSignatureFor
constructorget : ( -- Method)a compile-time-known member — the normal in-source form
decode( str -- Maybe[Method])a runtime string of unknown value, validated at the boundary
+

A bare str is not a Method, and + "GET" as Method is rejected: with no literal/singleton types, the checker + cannot verify a plain str is a valid backing, so an as here would be an + unchecked re-tag that could mint a Method matching no arm. This keeps the language's + existing "no implicit coercions" invariant intact and the structural→nominal boundary crisp: + constructors are the only way to make one. It also costs no source terseness — the concise form + was never "GET", it is the constructor get.

+

Deferred, additive: if literal/singleton types are ever added for other reasons, a + checked "GET" as Method (compile-time membership-verified, "GXT" as Method + an error) becomes a safe, optional nicety that can be layered on then with no migration. Implicit + bare-literal coercion is explicitly not a goal — it would be mshell's first + implicit coercion and would leak the wire format into program logic.

+ +

C. Shape payloads → free named destructuring — half shipped

+

Shipped: a payload may already be any type expression, including a shape — + enum Event = click {"x": int} | close end works today, with a match arm binding the whole + dict (click d : @d :x? …). Future: the sugar below, destructuring + the shape into named bindings in the pattern itself; it reuses TKShape and the existing + { 'k': name } dict-pattern matcher (including optional ?: fields) for both + construction and destructuring:

+
enum Event = click {x: int, y: int} | key {code: int, shift?: bool} | close
+
+ev match
+    click {x: x, y: y} : $"click {@x},{@y}" wl,
+    key   {code: c}    : @c handleKey,
+    close              : "bye" wl,
+end
+ +

D. Combined arms (|) for fan-in — future

+
status match
+    pending | running : "in progress" wl,
+    done              : "complete" wl,
+    failed e          : @e wl,
+end
+ +

E. Generic enums subsume the built-ins — future

+

Enums are not yet generic: a recursive payload (node Tree Tree) works, type parameters do + not, so Maybe remains the built-in instance rather than being literally subsumed.

+
enum Result[t, e] = ok t | err e end
+# and Maybe[t] = just t | none is simply built in
+ +

F. ?-style propagation (speculative)

+

mshell already has guard-style return. For a designated Result-shaped enum, an + unwrap-or-early-return operator (Rust's ?) could compress + @x parseInt match just v : @v, none : return end to something like @x parseInt!?. + Defer until Result is idiomatic.

+ +

11. End-to-end example (as shipped)

+

The wire boundary is an explicit match on the string; §10.B's derived + Mode.decode / Mode.values would replace parseMode if backing + ever lands:

+
enum Mode = readonly | writeonly | readwrite end
+
+def parseMode (str -- Maybe[Mode])
+    match
+        "ro" : readonly just,
+        "wo" : writeonly just,
+        "rw" : readwrite just,
+        _    : none,
+    end
+end
+
+def openFile (path Mode -- Handle)
+    mode!
+    @mode match
+        readonly  : @path openRead,
+        writeonly : @path openWrite,
+        readwrite : @path openRW,
+    end
+end
+
+# at the boundary: a string from argv, validated exactly once
+$1 parseMode match
+    just m : somePath @m openFile use,
+    none   : "--mode must be one of ro, wo, rw" wl 1 exit,
+end
+

The checker guarantees openFile handles every mode, that no unvalidated string reaches it, + and that adding an append constructor breaks the build at openFile until it is + handled.

+ +

12. Open questions — as resolved

+ + +
+

Companion to ai/type_checker.md (the implemented type-system design) and + design/optional_dict_keys.html. File locations and line numbers reference the tree at time + of writing and may drift. Earlier drafts of this doc compared three options (literal types, backed enum, + payload enum); §3–§5 record why the generative sum type is the single primitive those collapse + into.

+ +
+ + diff --git a/doc/base.html b/doc/base.html index 6e5491ee..2ea961f7 100644 --- a/doc/base.html +++ b/doc/base.html @@ -27,7 +27,7 @@ color: #0000FF; } - .mshellIF, .mshellELSE, .mshellELSESTAR, .mshellSTARIF, .mshellEND, .mshellDEF, .mshellMATCH { + .mshellIF, .mshellELSE, .mshellELSESTAR, .mshellSTARIF, .mshellEND, .mshellDEF, .mshellMATCH, .mshellENUM, .mshellTYPE { color: #0F4C81; font-weight: bold; } diff --git a/doc/mshell.md b/doc/mshell.md index e2592b6b..07f0b174 100644 --- a/doc/mshell.md +++ b/doc/mshell.md @@ -655,6 +655,53 @@ An operation that is valid for only some members is a type error — dividing an For more detail, see the generated Type System help page. +## Enums + +An `enum` declares a generative tagged sum type: members separated by `|`, closed by `end` (like `def`/`if`/`match`): + +``` +enum Color = red | green | blue end +``` + +A bare member name constructs that value (`green` pushes a `Color`). +Member names are identifiers (not keywords) and are unique across all enums. +Unlike a union, an enum is nominal — two enums with the same members are distinct types. + +Members may carry a payload — types written after the member name; the constructor consumes those values from the stack. +A nullary member has no payload. The closing `end` bounds the member list, so payloads are never confused with the following code. Payloads may reference the enum itself (recursive enums). + +``` +enum CmdResult = ok str | failed int str | timeout end +404 "not found" failed # ( int str -- CmdResult ) + +enum Tree = leaf int | node Tree Tree end +``` + +`match` dispatches on the member and binds payload values (like `just v`). +A match must cover every member or include a `_` arm; omitting a member is a static error. + +``` +result match + ok out : @out wl, + failed c e : $"{@e} ({@c})" wl, + timeout : "timed out" wl, +end +``` + +An enum may also be a member of a `type` union (e.g. `type T = Color | int`). +A `match` on such a union discriminates it with the enum's *type name* as an arm, +which matches any value of that enum: + +``` +enum Color = red | green | blue end +type T = Color | int + +x match + Color : "a color" wl, + int : "an int" wl, +end +``` + ## Definitions Definitions use `def` with an optional metadata dictionary before the type signature. @@ -667,6 +714,11 @@ end Metadata values must be static: strings (single or double quoted), integers, floats, booleans, or nested lists/dicts of the same. Interpolated strings are not allowed. +Definition names must be unique. +Defining a name that is already defined — by the same file, the standard library, or an init file — is an error, +both at runtime and in the type checker. +(Lookup is first-match-wins, so a duplicate would never take effect; the error makes that visible.) + ### Tail-Call Optimization Recursive definitions in tail position are optimized to avoid stack overflow. @@ -1265,8 +1317,8 @@ groupBy ## Sorting -- `sort`: Sort list. Converts all items to strings, then sorts using go's `sort.Strings` `(list -- list)` -- `sortV`: Version sort list. Converts all items to strings, then sorts like GNU `sort -V` (`list -- list`) +- `sort`: Sort a list by a total structural order, preserving each element's type (numbers sort numerically and stay numbers; a list of enums keeps its payloads). The order is: numbers numerically, text (str/path/literal) lexically, dates chronologically, bytes bytewise, lists positionally, dicts by sorted key then value, enums by declaration order then payload, and values of different types by a fixed type rank. `([t] -- [t])` +- `sortV`: Version sort list. Converts each item to a string, then sorts like GNU `sort -V`, keeping the original elements. `([t] -- [t])` - `sortBy`: Sort a Grid or GridView by one or more columns ascending. Spec is a column name (str) or list of column names ([str]); priority is left-to-right. Stable; `none` cells sort last; cross-type values in a generic column error. Compose with `reverse` for descending. `(Grid|GridView str|[str] -- Grid)` - `sortByCmp`: Sort a list, Grid, or GridView using a comparison function. The function/quotation receives two items (or two `GridRow`s) and should return -1 when a < b, 0 when a = b, or 1 when a > b. Stable. `[a] (a a -- int) -- [a]` / `(Grid|GridView (GridRow GridRow -- int) -- Grid)` - `reverse`: Reverse a list, Grid, or GridView, returning a new value with elements/rows in reverse order. `(list -- list)` / `(Grid|GridView -- Grid)` diff --git a/doc/type_system.inc.html b/doc/type_system.inc.html index bae627c4..f79d23d1 100644 --- a/doc/type_system.inc.html +++ b/doc/type_system.inc.html @@ -11,6 +11,7 @@
  • Type Expressions
  • Dictionaries
  • Lists
  • +
  • Enums
  • Quotations
  • Control Flow
  • Current Boundaries
  • @@ -341,6 +342,69 @@

    Heterogeneous Lists can be understood as [int | str], but the checker does not use index :0: to prove "this position is always int" unless the value is converted or asserted in another way.

    +

    Enums § Back to top

    + +

    +An enum declares a named type whose values are a fixed set of members (constructors). +The members are separated by |, and the declaration is closed by end — like def, if, and match. +Unlike a union, an enum is generative: each member is a brand-new value that the language tags, so two enums with the same members are still distinct types, and a member can carry data that another member does not. +

    + +
    enum Color = red | green | blue end
    + +

    +A member name is written bare to construct that value. +Member names are ordinary identifiers (they may not be language keywords), and every member name is unique across all enums. +

    + +
    green                  # a value of type Color
    + +

    +Members may carry a payload: a list of types written after the member name. +The constructor then consumes those values from the stack, just like any other word. +A member with no payload is nullary. +The closing end marks where the member list stops, so a payload list is never confused with the code that follows. +

    + +
    enum CmdResult = ok str | failed int str | timeout end
    +
    +"output"        ok       # ( str -- CmdResult )
    +404 "not found" failed   # ( int str -- CmdResult )
    +timeout                  # ( -- CmdResult )
    + +

    +A payload may reference the enum itself, so recursive enums are allowed. +

    + +
    enum Tree = leaf int | node Tree Tree end
    + +

    +Use match to dispatch on the member. +A payload member binds its values to names in the arm body, the same way just v binds a Maybe payload. +The match must cover every member, or include a wildcard _ arm; a match that omits a member is a static error, so adding a member later forces every match to handle it. +

    + +
    def render (CmdResult -- str)
    +    match
    +        ok out     : @out,
    +        failed c e : $"{@e} ({@c})",
    +        timeout    : "timed out",
    +    end
    +end
    + +

    +An enum can also be one member of a type union, such as type T = Color | int. +A match on the union discriminates it with the enum's type name as an arm, which matches any value of that enum: +

    + +
    enum Color = red | green | blue end
    +type T = Color | int
    +
    +x match
    +    Color : "a color",
    +    int   : "an int",
    +end
    +

    Quotations § Back to top

    diff --git a/mshell/Evaluator.go b/mshell/Evaluator.go index a2c35f48..eda38f95 100644 --- a/mshell/Evaluator.go +++ b/mshell/Evaluator.go @@ -396,6 +396,59 @@ type EvalState struct { defIndex map[string]int defIndexLen int + + // EnumMembers maps a member name to its enum and payload arity. Populated + // from `enum` declarations (RegisterEnums) before evaluation; member names + // are unique across enums, so this flat lookup is enough to construct a + // value from a bare member word, including consuming its payload. + EnumMembers map[string]EnumMemberInfo + + // EnumTypeNames is the set of declared enum type names. A bare enum type + // name is a valid match-arm pattern (a type test that any member of that + // enum satisfies), e.g. matching a `C | int` union value against `C`. + EnumTypeNames map[string]bool +} + +// EnumMemberInfo records where a member came from and how many payload values +// its constructor consumes from the stack. +type EnumMemberInfo struct { + EnumName string + Arity int + // Ordinal is the member's 0-based position in its enum declaration, stamped + // onto constructed values (MShellEnum.MemberIndex) so sorting can order by + // declaration order. + Ordinal int +} + +// RegisterEnums scans parse items for `enum` declarations and records each +// member, so a bare member word can be constructed at evaluation time. Called +// once before top-level evaluation; mirrors the checker's enum pre-pass so the +// two agree regardless of declaration order. +func (state *EvalState) RegisterEnums(items []MShellParseItem) { + for _, item := range items { + d, ok := item.(*MShellEnumDecl) + if !ok { + continue + } + if state.EnumMembers == nil { + state.EnumMembers = make(map[string]EnumMemberInfo) + } + if state.EnumTypeNames == nil { + state.EnumTypeNames = make(map[string]bool) + } + state.EnumTypeNames[d.Name] = true + for i, m := range d.Members { + if _, exists := state.EnumMembers[m]; exists { + continue + } + state.EnumMembers[m] = EnumMemberInfo{EnumName: d.Name, Arity: len(d.MemberPayloads[i]), Ordinal: i} + } + } +} + +// isEnumTypeName reports whether name is a declared enum type name. +func (state *EvalState) isEnumTypeName(name string) bool { + return state.EnumTypeNames != nil && state.EnumTypeNames[name] } // RebuildDefinitionIndex records the first index for each name, matching @@ -422,6 +475,38 @@ func (state *EvalState) lookupDefinition(definitions []MShellDefinition, name st return definitions[i], true } +// tokenPosStr formats a token's position as `path:line:col` (or `line:col` +// when the source file is unknown, e.g. stdin) for use in error messages. +func tokenPosStr(t Token) string { + if t.TokenFile != nil && t.TokenFile.Path != "" { + return fmt.Sprintf("%s:%d:%d", t.TokenFile.Path, t.Line, t.Column) + } + return fmt.Sprintf("%d:%d", t.Line, t.Column) +} + +// FindDuplicateDefinition scans the given definition slices, in order, and +// returns an error for the first name defined twice. Definition lookup is +// first-match-wins, so a second definition of a name is never an override — +// it would be silently dead code. Erroring keeps a script (or init file, or +// interactive input) from redefining a name already taken by the standard +// library, an earlier startup file, or itself. +func FindDuplicateDefinition(defLists ...[]MShellDefinition) error { + seen := make(map[string]Token) + for _, defs := range defLists { + for i := range defs { + name := defs[i].Name + prev, exists := seen[name] + if !exists { + seen[name] = defs[i].NameToken + continue + } + return fmt.Errorf("%s: Duplicate definition '%s'; already defined at %s.\n", + tokenPosStr(defs[i].NameToken), name, tokenPosStr(prev)) + } + } + return nil +} + func (state *EvalState) AddCompletionDefinitions(definitions []MShellDefinition) { if state.CompletionDefinitions == nil { state.CompletionDefinitions = make(map[string][]MShellDefinition) @@ -909,6 +994,11 @@ func (state *EvalState) processToken(token MShellParseItem, frame *EvaluationFra // Static-only: type declarations have no runtime effect by design. return SimpleSuccess() + case *MShellEnumDecl: + // Static-only: enum declarations have no runtime effect; members are + // pre-registered via RegisterEnums. + return SimpleSuccess() + case *MShellAsCast: // Static-only: `as` is a checker hint; no runtime work. return SimpleSuccess() @@ -1114,15 +1204,68 @@ func (state *EvalState) processMatchBlock(matchBlock *MShellParseMatchBlock, fra return state.FailWithMessage(fmt.Sprintf("%d:%d: No matching arm found in match block and no wildcard '_' arm provided.\n", startToken.Line, startToken.Column)) } +// parseItemLexeme renders a parse item for a diagnostic: a token's lexeme, or a +// non-token pattern's debug form. +func parseItemLexeme(item MShellParseItem) string { + if tok, ok := item.(Token); ok { + return tok.Lexeme + } + return item.DebugString() +} + // matchPattern checks if a subject matches a pattern (list of parse items). // Returns (matched bool, bindings map, result EvalResult). func (state *EvalState) matchPattern(pattern []MShellParseItem, subject MShellObject, startToken Token) (bool, map[string]MShellObject, EvalResult) { + // Enum patterns against an enum value. A member name (`member` or + // `member b1 b2 ...`) matches that member and binds its payload; a sibling + // member just fails this arm and the next is tried. A bare enum *type name* + // (`C`) is a type-test arm that matches any member of that enum — this is + // how a `C | int` union value is discriminated. `_` and `none` fall through + // to the generic handling. + if enumVal, ok := subject.(*MShellEnum); ok && len(pattern) >= 1 { + if tok, okTok := pattern[0].(Token); okTok && tok.Type == LITERAL && tok.Lexeme != "_" && tok.Lexeme != "none" { + if tok.Lexeme == enumVal.Member { + binds := pattern[1:] + if len(binds) != len(enumVal.Payload) { + return false, nil, state.FailWithMessage(fmt.Sprintf("%d:%d: enum member '%s' binds %d payload value(s), got %d.\n", + tok.Line, tok.Column, tok.Lexeme, len(enumVal.Payload), len(binds))) + } + bindings := make(map[string]MShellObject) + for _, b := range binds { + // A payload binding must be a plain name (a LITERAL) or the + // `_` wildcard — not a keyword/operator token (`end`, `x`, + // ...) or a nested pattern. This mirrors the type checker + // (enumMemberPattern) and the `just`/type-test binding forms, + // so the runtime never accepts an arm the checker rejects. + bt, okBt := b.(Token) + if !okBt || (bt.Type != LITERAL && bt.Type != UNDERSCORE) { + return false, nil, state.FailWithMessage(fmt.Sprintf("%d:%d: enum member '%s' payload bindings must be names, not '%s'.\n", + tok.Line, tok.Column, tok.Lexeme, parseItemLexeme(b))) + } + } + for i, b := range binds { + if bt := b.(Token); bt.Lexeme != "_" { + bindings[bt.Lexeme] = enumVal.Payload[i] + } + } + return true, bindings, SimpleSuccess() + } + // Not this value's member. A single bare enum type name is a type + // test: it matches iff the value belongs to that enum. + if len(pattern) == 1 && state.isEnumTypeName(tok.Lexeme) { + return tok.Lexeme == enumVal.EnumName, nil, SimpleSuccess() + } + // A sibling member name (or any other literal): this arm fails. + return false, nil, SimpleSuccess() + } + } + // Handle multi-token patterns (e.g., "just v" for maybe destructuring, // or " name" for type-test binding). if len(pattern) == 2 { first, firstOk := pattern[0].(Token) second, secondOk := pattern[1].(Token) - if firstOk && secondOk && second.Type == LITERAL { + if firstOk && secondOk && (second.Type == LITERAL || second.Type == UNDERSCORE) { if first.Type == LITERAL && first.Lexeme == "just" { // Maybe Just destructuring var maybeVal Maybe @@ -1185,6 +1328,8 @@ func (state *EvalState) matchPattern(pattern []MShellParseItem, subject MShellOb // matchTokenPattern matches a single token pattern against a subject. func (state *EvalState) matchTokenPattern(p Token, subject MShellObject) (bool, EvalResult) { switch p.Type { + case UNDERSCORE: + return true, SimpleSuccess() case LITERAL: if p.Lexeme == "_" { return true, SimpleSuccess() @@ -1233,6 +1378,13 @@ func (state *EvalState) matchTokenPattern(p Token, subject MShellObject) (bool, _, ok := subject.(MShellBinary) return ok, SimpleSuccess() } + // A bare enum type name is a type test: it matches an enum value of + // that enum, and simply fails (try the next arm) for any other value. + // This lets a union like `C | int` be discriminated by the arm `C`. + if state.isEnumTypeName(p.Lexeme) { + en, ok := subject.(*MShellEnum) + return ok && en.EnumName == p.Lexeme, SimpleSuccess() + } return false, state.FailWithMessage(fmt.Sprintf("%d:%d: Unknown match pattern literal '%s'. %s\n", p.Line, p.Column, p.Lexeme, matchPatternFormsHint)) case TYPEINT: @@ -1423,7 +1575,7 @@ func (state *EvalState) matchDictPattern(pattern *MShellParseDict, subject MShel return false, nil, state.FailWithMessage(fmt.Sprintf("%d:%d: Dict pattern value must be a single binding name.\n", startToken.Line, startToken.Column)) } tok, ok := kv.Value[0].(Token) - if !ok || tok.Type != LITERAL { + if !ok || (tok.Type != LITERAL && tok.Type != UNDERSCORE) { return false, nil, state.FailWithMessage(fmt.Sprintf("%d:%d: Dict pattern value must be a literal binding name.\n", startToken.Line, startToken.Column)) } if tok.Lexeme != "_" { @@ -5847,6 +5999,25 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu return SimpleSuccess() } + // Enum constructor: a bare member word consumes its payload + // (if any) from the stack and pushes the enum value. + if state.EnumMembers != nil { + if info, ok := state.EnumMembers[t.Lexeme]; ok { + var payload []MShellObject + if info.Arity > 0 { + if len(*stack) < info.Arity { + return state.FailWithMessage(fmt.Sprintf("%d:%d: enum constructor '%s' needs %d payload value(s) on the stack.\n", t.Line, t.Column, t.Lexeme, info.Arity)) + } + payload = make([]MShellObject, info.Arity) + for i := info.Arity - 1; i >= 0; i-- { + payload[i], _ = stack.Pop() + } + } + stack.Push(&MShellEnum{EnumName: info.EnumName, Member: t.Lexeme, MemberIndex: info.Ordinal, Payload: payload}) + return SimpleSuccess() + } + } + if t.Lexeme == "stack" { // Print current stack fmt.Fprint(os.Stderr, stack.String()) @@ -9303,7 +9474,10 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu return state.FailWithMessage(fmt.Sprintf("%d:%d: Cannot do 'toJson' operation on an empty stack.\n", t.Line, t.Column)) } - jsonStr := obj1.ToJson() + jsonStr, cycled := renderValueDetect(obj1, flavorJson) + if cycled { + return state.FailWithMessage(fmt.Sprintf("%d:%d: Cannot convert a cyclic value (a container that contains itself) to JSON.\n", t.Line, t.Column)) + } stack.Push(MShellString{jsonStr}) } else if t.Lexeme == "typeof" { obj1, err := stack.Pop() @@ -9682,7 +9856,7 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu floatsSeen := make(map[float64]any) dateTimesSeen := make(map[time.Time]any) - for i, item := range listObj.Items { + for _, item := range listObj.Items { switch itemTyped := item.(type) { case MShellString: strItem := itemTyped @@ -9723,7 +9897,23 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu stringsSeen[literalItem.LiteralText] = nil } default: - return state.FailWithMessage(fmt.Sprintf("%d:%d: Cannot remove duplicates from a list with a %s at index %d (%s).\n", t.Line, t.Column, item.TypeName(), i, item.DebugString())) + // Any value without a fast hash path (enum, list, + // dict, bool, bytes, ...) is deduplicated by + // structural equality against the values kept so + // far. O(n^2) for these, but it lets `uniq` accept + // every value type, matching its `([t] -- [t])` + // signature so the type checker never accepts a + // `uniq` that fails at runtime. + seen := false + for _, kept := range newList.Items { + if eq, _ := item.Equals(kept); eq { + seen = true + break + } + } + if !seen { + newList.Items = append(newList.Items, item) + } } } @@ -11246,6 +11436,13 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu stack.Push(MShellLiteral{t.Lexeme}) } + } else if t.Type == UNDERSCORE { + // A lone `_` is the pattern wildcard; in a list it is the + // literal argv word "_", and nowhere else does it have meaning. + if callStackItem.CallStackType != CALLSTACKLIST { + return state.FailWithMessage(fmt.Sprintf("%d:%d: '_' is reserved as the match wildcard; use \"_\" for a literal underscore string.\n", t.Line, t.Column)) + } + stack.Push(MShellLiteral{"_"}) } else if t.Type == ASTERISK { obj1, err := stack.Pop() if err != nil { @@ -12319,7 +12516,11 @@ func (state *EvalState) evaluateToken(t Token, stack *MShellStack, context Execu return state.FailWithMessage(fmt.Sprintf("%d:%d: Cannot convert an empty stack to a string.\n", t.Line, t.Column)) } - stack.Push(MShellString{obj.ToString()}) + strVal, cycled := renderValueDetect(obj, flavorStr) + if cycled { + return state.FailWithMessage(fmt.Sprintf("%d:%d: Cannot convert a cyclic value (a container that contains itself) to a string.\n", t.Line, t.Column)) + } + stack.Push(MShellString{strVal}) } else if t.Type == INDEXER { // Token Type obj1, err := stack.Pop() if err != nil { diff --git a/mshell/Lexer.go b/mshell/Lexer.go index 7a641f4b..66eee988 100644 --- a/mshell/Lexer.go +++ b/mshell/Lexer.go @@ -109,6 +109,8 @@ const ( TRY FAIL_KEYWORD PURE + ENUM + UNDERSCORE // a lone `_`: the match/binding wildcard, reserved as a name ) func (t TokenType) String() string { @@ -283,6 +285,10 @@ func (t TokenType) String() string { return "AS" case TYPE: return "TYPE" + case ENUM: + return "ENUM" + case UNDERSCORE: + return "UNDERSCORE" case TRY: return "TRY" case FAIL_KEYWORD: @@ -394,6 +400,7 @@ func (l *Lexer) makeToken(tokenType TokenType) Token { Start: l.start, Lexeme: lexeme, Type: tokenType, + TokenFile: l.tokenFile, } } @@ -556,6 +563,14 @@ func (l *Lexer) literalOrKeywordType() TokenType { } return l.checkKeyword(2, "se", ELSE) case 'n': + if l.curLen() > 2 { + switch l.input[l.start+2] { + case 'd': + return l.checkKeyword(3, "", END) + case 'u': + return l.checkKeyword(3, "m", ENUM) + } + } return l.checkKeyword(2, "d", END) } } @@ -641,6 +656,13 @@ func (l *Lexer) literalOrKeywordType() TokenType { return VARSTORE } + // A lone `_` is the wildcard token, not an identifier — this is what + // reserves it as a name (it can't be an enum member, def, etc.) and keeps + // the checker and runtime from disagreeing about what `_` means. + if l.current-l.start == 1 && l.input[l.start] == '_' { + return UNDERSCORE + } + return LITERAL } diff --git a/mshell/MShellObject.go b/mshell/MShellObject.go index f68309f6..54ed3d17 100644 --- a/mshell/MShellObject.go +++ b/mshell/MShellObject.go @@ -7,6 +7,7 @@ import ( "fmt" "golang.org/x/net/html" "os" + "reflect" "regexp" "slices" "sort" @@ -160,7 +161,7 @@ func (b MShellBinary) Equals(other MShellObject) (bool, error) { return true, nil } } - return false, fmt.Errorf("Cannot compare Binary with %s.\n", other.TypeName()) + return false, nil } func (b MShellBinary) CastString() (string, error) { @@ -199,10 +200,7 @@ func (m Maybe) CommandLine() string { // This is meant for things like error messages, should be limited in length to 30 chars or so. func (m Maybe) DebugString() string { - if m.obj == nil { - return "None" - } - return fmt.Sprintf("Maybe(%s)", m.obj.DebugString()) + return renderValue(m, flavorDebug) } func (m Maybe) Index(index int) (MShellObject, error) { return nil, fmt.Errorf("Cannot index into a Maybe.\n") @@ -221,17 +219,11 @@ func (m Maybe) Slice(startInc int, endExc int) (MShellObject, error) { } func (m Maybe) ToJson() string { - if m.obj == nil { - return "null" - } - return m.obj.ToJson() + return renderValue(m, flavorJson) } func (m Maybe) ToString() string { - if m.obj == nil { - return "None" - } - return fmt.Sprintf("Just(%s)", m.obj.ToString()) + return renderValue(m, flavorStr) } func (m Maybe) IndexErrStr() string { @@ -243,21 +235,7 @@ func (m Maybe) Concat(other MShellObject) (MShellObject, error) { } func (m Maybe) Equals(other MShellObject) (bool, error) { - otherMaybe, ok := other.(Maybe) - if !ok { - return false, nil - } - - if m.obj == nil && otherMaybe.obj == nil { - return true, nil - } - - if m.obj == nil || otherMaybe.obj == nil { - return false, nil - } - - equal, err := m.obj.Equals(otherMaybe.obj) - return equal, err + return equalsIter(m, other) } func (m Maybe) CastString() (string, error) { @@ -341,6 +319,71 @@ func (n MShellNull) CastString() (string, error) { // }}} +// Enum {{{ + +// MShellEnum is a value of a user-declared `enum` (a generative tagged sum +// type): the enum's declared name, the chosen member, and the member's payload +// values (nil for a nullary member). Member names are unique across enums, so +// the member identifies the value; the enum name rides along for diagnostics +// and `match`. +type MShellEnum struct { + EnumName string + Member string + // MemberIndex is the member's 0-based position in its enum declaration. + // Sorting orders enum values by this (declaration order) rather than by + // member name, so an ordered enum (`low | medium | high`) sorts in the + // author's intended order. Stamped at construction from the enum registry. + MemberIndex int + Payload []MShellObject +} + +func (e *MShellEnum) TypeName() string { return e.EnumName } +func (e *MShellEnum) IsCommandLineable() bool { return true } +func (e *MShellEnum) IsNumeric() bool { return false } +func (e *MShellEnum) FloatNumeric() float64 { return 0 } +func (e *MShellEnum) CommandLine() string { return renderValue(e, flavorStr) } +func (e *MShellEnum) DebugString() string { return renderValue(e, flavorStr) } + +func (e *MShellEnum) Index(index int) (MShellObject, error) { + return nil, fmt.Errorf("Cannot index into an enum.\n") +} + +func (e *MShellEnum) SliceStart(startInclusive int) (MShellObject, error) { + return nil, fmt.Errorf("Cannot slice an enum.\n") +} + +func (e *MShellEnum) SliceEnd(end int) (MShellObject, error) { + return nil, fmt.Errorf("Cannot slice an enum.\n") +} + +func (e *MShellEnum) Slice(startInc int, endExc int) (MShellObject, error) { + return nil, fmt.Errorf("Cannot slice an enum.\n") +} + +// ToJson uses serde's externally-tagged convention — the de-facto standard for +// tagged unions in JSON: a nullary member is the bare member string; a member +// with a single payload is `{"member": value}`; with several, `{"member": +// [v0, v1, ...]}`. Rendering runs on renderValue's shared work stack, so an +// arbitrarily deep value cannot overflow the call stack. +func (e *MShellEnum) ToJson() string { + return renderValue(e, flavorJson) +} + +func (e *MShellEnum) ToString() string { return renderValue(e, flavorStr) } +func (e *MShellEnum) IndexErrStr() string { return "" } + +func (e *MShellEnum) Concat(other MShellObject) (MShellObject, error) { + return nil, fmt.Errorf("Cannot concatenate an enum.\n") +} + +func (e *MShellEnum) Equals(other MShellObject) (bool, error) { + return equalsIter(e, other) +} + +func (e *MShellEnum) CastString() (string, error) { return e.Member, nil } + +// }}} + // Date time {{{ type MShellDateTime struct { @@ -463,16 +506,7 @@ func (*MShellDict) CommandLine() string { // This is meant for things like error messages, should be limited in length to 30 chars or so. func (d *MShellDict) DebugString() string { - // TODO: implement this - - sb := strings.Builder{} - sb.WriteString("Dictionary{") - for key, value := range d.Items { - sb.WriteString(fmt.Sprintf("%s: %s, ", key, value.DebugString())) - } - sb.WriteString("}") - return sb.String() - + return renderValue(d, flavorDebug) } func (*MShellDict) Index(index int) (MShellObject, error) { return nil, fmt.Errorf("Cannot index into a dictionary.\n") @@ -488,43 +522,7 @@ func (*MShellDict) Slice(startInc int, endExc int) (MShellObject, error) { return nil, fmt.Errorf("Cannot slice a dictionary.\n") } func (d *MShellDict) ToJson() string { - var sb strings.Builder - - if len(d.Items) == 0 { - return "{}" - } - - if len(d.Items) == 1 { - for key, value := range d.Items { - keyEnc, _ := json.Marshal(key) - return fmt.Sprintf("{%s: %s}", string(keyEnc), value.ToJson()) - } - } - - keys := make([]string, 0, len(d.Items)) - for key := range d.Items { - keys = append(keys, key) - } - sort.Strings(keys) - - sb.WriteString("{") - - // Write the first key-value pair - firstKey := keys[0] - firstValue := d.Items[firstKey] - - firstKeyEnc, _ := json.Marshal(firstKey) - sb.WriteString(fmt.Sprintf("%s: %s", string(firstKeyEnc), firstValue.ToJson())) - - for _, key := range keys[1:] { - value := d.Items[key] - keyEnc, _ := json.Marshal(key) - sb.WriteString(fmt.Sprintf(", %s: %s", string(keyEnc), value.ToJson())) - } - - sb.WriteString("}") - - return sb.String() + return renderValue(d, flavorJson) } func (d *MShellDict) ToString() string { // This is what is used with 'str' command @@ -540,51 +538,7 @@ func (*MShellDict) Concat(other MShellObject) (MShellObject, error) { } func (thisDict *MShellDict) Equals(other MShellObject) (bool, error) { - thisKeys := make([]string, 0, len(thisDict.Items)) - for key := range thisDict.Items { - thisKeys = append(thisKeys, key) - } - sort.Strings(thisKeys) - - otherDict, ok := other.(*MShellDict) - if !ok { - return false, nil - } - - otherKeys := make([]string, 0, len(otherDict.Items)) - for key := range otherDict.Items { - otherKeys = append(otherKeys, key) - } - sort.Strings(otherKeys) - - if len(thisKeys) != len(otherKeys) { - return false, nil - } - - for i, key := range thisKeys { - if key != otherKeys[i] { - return false, nil - } - } - - for _, key := range thisKeys { - thisValue := thisDict.Items[key] - otherValue := otherDict.Items[key] - - if thisValue.TypeName() != otherValue.TypeName() { - return false, nil - } - - equal, err := thisValue.Equals(otherValue) - if err != nil { - return false, err - } - if !equal { - return false, nil - } - } - - return true, nil + return equalsIter(thisDict, other) } // This is meant for completely unambiougous conversion to a string value. @@ -802,46 +756,654 @@ func NewList(initLength int) *MShellList { } // Sort the list. Returns an error if any item cannot be cast to a string. -func SortList(list *MShellList) (*MShellList, error) { - stringsToSort := make([]string, len(list.Items)) - for i, item := range list.Items { - str, err := item.CastString() - if err != nil { - return nil, fmt.Errorf("Cannot sort a list with a %s inside (%s).\n", item.TypeName(), item.DebugString()) +// valueTypeRank assigns each value kind a fixed slot in the cross-type sort +// order, so a list mixing types still sorts totally and deterministically. The +// exact sequence is arbitrary but stable; within a rank, compareValues uses the +// value's natural order. Text kinds (str/path/literal) share a rank and compare +// by content, matching structural equality. +func valueTypeRank(obj MShellObject) int { + switch obj.(type) { + case MShellNull: + return 0 + case MShellBool: + return 1 + case MShellInt, MShellFloat: + return 2 + case MShellString, MShellPath, MShellLiteral: + return 3 + case *MShellDateTime: + return 4 + case MShellBinary: + return 5 + case Maybe, *Maybe: + return 6 + case *MShellList: + return 7 + case *MShellDict: + return 8 + case *MShellEnum: + return 9 + default: + return 10 + } +} + +func cmpInt(a, b int) int { + if a < b { + return -1 + } + if a > b { + return 1 + } + return 0 +} + +func cmpFloat(a, b float64) int { + if a < b { + return -1 + } + if a > b { + return 1 + } + return 0 +} + +// numericFloat returns an int/float value as a float64 for cross-type numeric +// comparison. Only called for MShellInt / MShellFloat. +func numericFloat(obj MShellObject) float64 { + switch v := obj.(type) { + case MShellInt: + return float64(v.Value) + case MShellFloat: + return v.Value + } + return 0 +} + +// textContent returns the underlying string of a text-kind value +// (str / path / literal). Only called for those types. +func textContent(obj MShellObject) string { + switch v := obj.(type) { + case MShellString: + return v.Content + case MShellPath: + return v.Path + case MShellLiteral: + return v.LiteralText + } + return "" +} + +func asMaybe(obj MShellObject) (Maybe, bool) { + switch v := obj.(type) { + case Maybe: + return v, true + case *Maybe: + return *v, true + } + return Maybe{}, false +} + +func sortedDictKeys(m map[string]MShellObject) []string { + keys := make([]string, 0, len(m)) + for k := range m { + keys = append(keys, k) + } + sort.Strings(keys) + return keys +} + +// isRefKind reports whether obj's dynamic type is a pointer — a value with +// heap identity. Only these kinds can form shared substructure or reference +// cycles, and only these are safe in interface comparisons and as map keys: a +// value kind may wrap a non-comparable type (MShellBinary, a []byte), which +// panics at runtime. Checking the dynamic kind instead of enumerating types +// means a newly added pointer kind is covered with no list to keep in sync. +func isRefKind(obj MShellObject) bool { + return obj != nil && reflect.TypeOf(obj).Kind() == reflect.Pointer +} + +// sameRef reports whether a and b are the identical heap object (a value built +// as `@t @t node` reuses one subtree twice). A pointer-identical pair is equal +// by definition, so equality and ordering walks skip it instead of expanding +// it — without this, walking a value with n levels of sharing costs 2^n. +// Interface equality is safe here: if b's dynamic type differs from a's the +// comparison is false without inspecting values, and if it matches, isRefKind +// guarantees it is a comparable pointer type. +func sameRef(a, b MShellObject) bool { + return isRefKind(a) && a == b +} + +// dagGuard bounds a comparison walk over values with shared substructure that +// sameRef alone cannot catch: whenever the two operands are not the same +// pointer (a value compared against an independently built copy, or two +// distinct subtrees each shared internally), repeated substructure produces +// repeated *pairs*, every one re-expands, and the walk goes exponential. The +// guard counts pops; once a walk runs long enough to suggest blowup, it +// memoizes the pointer pairs it has already expanded and skips repeats. This +// is the single mechanism for the whole blowup class: duplicate expansion +// below the threshold is capped by the threshold itself, and past it every +// repeated pair memo-hits. +// +// Skipping a repeated pair is sound in a LIFO walk: the first occurrence's +// entire expansion resolves before any later duplicate (which sat lower in the +// stack) pops, and a mismatch anywhere returns from the walk immediately — so +// if a duplicate pops at all, its subtree already compared equal. +// +// Ordinary comparisons never allocate: below the step threshold the guard is +// one integer increment. Past it, the memo grows without bound — see skip for +// why an unbounded memo is the correct trade (a bounded one turns "assumption +// exceeded" into exponential time). +type dagGuard struct { + steps int + memo map[refPair]bool +} + +type refPair struct{ a, b MShellObject } + +const dagStepThreshold = 1 << 19 + +// skip reports whether this pair was already expanded earlier in the walk. +// Call once per popped pair; it records the pair (past the threshold) so +// later duplicates skip. +// +// The memo is deliberately UNBOUNDED. Every revisited pointer pair memo-hits, +// which makes a comparison polynomial in actual heap nodes for any sharing +// pattern — self-doubling, alternating, cross-parent diamonds, any container +// mix, any depth. Earlier versions capped the memo to bound memory and then +// patched the resulting exponential cliffs case by case (generational +// eviction, per-container dedup); any bounded memo loses to a working set +// larger than its bound (measured, not theorized), so no cap. Memory tracks +// pairs actually walked: it activates only past the step threshold, so +// ordinary comparisons never allocate, and a comparison large enough to build +// a big memo already holds operands larger than the memo itself. +func (g *dagGuard) skip(a, b MShellObject) bool { + g.steps++ + if g.steps < dagStepThreshold { + return false + } + // Only pointer kinds get keys: repeated pairs of anything else cannot + // cause blowup, and only pointers are guaranteed comparable as map keys. + if !isRefKind(a) || !isRefKind(b) { + return false + } + key := refPair{a, b} + if g.memo == nil { + g.memo = make(map[refPair]bool, 1024) + } + if g.memo[key] { + return true + } + g.memo[key] = true + return false +} + +// renderFlavor selects which of a value's three textual forms renderValue +// emits: flavorStr is ToString (the `str` form), flavorDebug is DebugString +// (stack dumps, list display), flavorJson is ToJson. Containers pick their +// children's flavor the same way the per-type methods always did: a list +// renders children as DebugString, a dict's `str` form is its JSON form, an +// enum renders payloads with ToString, and Maybe keeps its own flavor. +type renderFlavor uint8 + +const ( + flavorStr renderFlavor = iota + flavorDebug + flavorJson +) + +type renderTask struct { + lit string + obj MShellObject + flavor renderFlavor + isLit bool + // isExit marks the sentinel popped after a container's children have + // rendered; it removes the container from the on-path cycle set. + isExit bool +} + +func renderLit(s string) renderTask { return renderTask{lit: s, isLit: true} } + +// renderJoin builds the task sequence `open item0 sep item1 sep ... close`, +// rendering each item in the given flavor. +func renderJoin(open, sep, close string, items []MShellObject, flavor renderFlavor) []renderTask { + seq := make([]renderTask, 0, len(items)*2+2) + if open != "" { + seq = append(seq, renderLit(open)) + } + for i, it := range items { + if i > 0 { + seq = append(seq, renderLit(sep)) + } + seq = append(seq, renderTask{obj: it, flavor: flavor}) + } + if close != "" { + seq = append(seq, renderLit(close)) + } + return seq +} + +// renderValue renders a value in the requested flavor. It is total: a cyclic +// value renders with a `` marker at the back-reference, which keeps +// internal rendering (error messages, stack dumps) from hanging. User-facing +// operations (`str`, `toJson`) call renderValueDetect instead and report a +// cyclic value as an error — mshell is strict, so a cycle is always the +// degenerate result of appending a container into itself, not a value with a +// meaningful rendering. +func renderValue(root MShellObject, flavor renderFlavor) string { + s, _ := renderValueDetect(root, flavor) + return s +} + +// renderValueDetect renders a value in the requested flavor with one explicit +// work stack instead of method recursion, expanding every container kind — +// enum, Maybe, list, dict, pipe — inline. Arbitrarily deep values therefore +// cannot overflow the call stack even when kinds alternate (enum→Maybe→enum, +// ...), which per-type iterative renderers could not guarantee: each one +// delegated other kinds to the child's own recursive method. Leaf kinds +// (scalars, grids, quotations) still render via their own methods; their +// nesting depth is bounded by their own structure. +// +// Containers currently being expanded are tracked as an on-path set; reaching +// one again is a true reference cycle (a DAG merely revisits a finished +// pointer, which is fine), so the walk emits `` instead of descending +// and reports cycled=true. +func renderValueDetect(root MShellObject, flavor renderFlavor) (string, bool) { + var sb strings.Builder + cycled := false + var onPath map[MShellObject]bool + stack := []renderTask{{obj: root, flavor: flavor}} + // push schedules seq to pop in order (reversed onto the LIFO stack). + push := func(seq []renderTask) { + for i := len(seq) - 1; i >= 0; i-- { + stack = append(stack, seq[i]) } - stringsToSort[i] = str } + // enter marks t.obj as on the current path and schedules its removal + // after seq (the container's children) has fully rendered. Only pointer + // kinds are tracked — value kinds are copied and cannot be revisited by + // identity, so they cannot sit on a reference cycle. + enter := func(obj MShellObject, seq []renderTask) []renderTask { + if !isRefKind(obj) { + return seq + } + if onPath == nil { + onPath = make(map[MShellObject]bool, 8) + } + onPath[obj] = true + return append(seq, renderTask{obj: obj, isExit: true}) + } + for len(stack) > 0 { + t := stack[len(stack)-1] + stack = stack[:len(stack)-1] + if t.isLit { + sb.WriteString(t.lit) + continue + } + if t.isExit { + delete(onPath, t.obj) + continue + } + // Only pointer kinds are ever on the path; the guard also keeps + // unhashable dynamic types (MShellBinary, a []byte) away from the + // map lookup, which would panic even on a read. + if isRefKind(t.obj) && onPath[t.obj] { + sb.WriteString("") + cycled = true + continue + } + if m, ok := asMaybe(t.obj); ok { + switch { + case m.IsNone() && t.flavor == flavorJson: + sb.WriteString("null") + case m.IsNone(): + sb.WriteString("None") + case t.flavor == flavorJson: + push(enter(t.obj, []renderTask{{obj: m.obj, flavor: flavorJson}})) + case t.flavor == flavorDebug: + push(enter(t.obj, []renderTask{renderLit("Maybe("), {obj: m.obj, flavor: flavorDebug}, renderLit(")")})) + default: + push(enter(t.obj, []renderTask{renderLit("Just("), {obj: m.obj, flavor: flavorStr}, renderLit(")")})) + } + continue + } + switch v := t.obj.(type) { + case *MShellEnum: + if t.flavor == flavorJson { + // serde's externally-tagged convention: a nullary member is + // the bare member string, one payload is {"member": value}, + // several are {"member": [v0, v1, ...]}. + if len(v.Payload) == 0 { + fmt.Fprintf(&sb, "%q", v.Member) + continue + } + seq := make([]renderTask, 0, len(v.Payload)*2+4) + seq = append(seq, renderLit(fmt.Sprintf("{%q: ", v.Member))) + if len(v.Payload) == 1 { + seq = append(seq, renderTask{obj: v.Payload[0], flavor: flavorJson}) + } else { + seq = append(seq, renderJoin("[", ", ", "]", v.Payload, flavorJson)...) + } + seq = append(seq, renderLit("}")) + push(enter(t.obj, seq)) + continue + } + // `member` (nullary) or `member(p0 p1 ...)`, payloads as ToString. + if len(v.Payload) == 0 { + sb.WriteString(v.Member) + continue + } + push(enter(t.obj, renderJoin(v.Member+"(", " ", ")", v.Payload, flavorStr))) + case *MShellList: + if t.flavor == flavorJson { + push(enter(t.obj, renderJoin("[", ", ", "]", v.Items, flavorJson))) + } else { + push(enter(t.obj, renderJoin("[", " ", "]", v.Items, flavorDebug))) + } + case *MShellPipe: + if t.flavor == flavorJson { + push(enter(t.obj, renderJoin("[", ", ", "]", v.List.Items, flavorJson))) + } else { + push(enter(t.obj, renderJoin("", " | ", "", v.List.Items, flavorDebug))) + } + case *MShellDict: + keys := sortedDictKeys(v.Items) + if t.flavor == flavorDebug { + seq := make([]renderTask, 0, len(keys)*3+2) + seq = append(seq, renderLit("Dictionary{")) + for _, k := range keys { + seq = append(seq, renderLit(k+": "), renderTask{obj: v.Items[k], flavor: flavorDebug}, renderLit(", ")) + } + seq = append(seq, renderLit("}")) + push(enter(t.obj, seq)) + continue + } + // The `str` form of a dict is its JSON form. + if len(keys) == 0 { + sb.WriteString("{}") + continue + } + seq := make([]renderTask, 0, len(keys)*2+2) + seq = append(seq, renderLit("{")) + for i, k := range keys { + keyEnc, _ := json.Marshal(k) + if i > 0 { + seq = append(seq, renderLit(", ")) + } + seq = append(seq, renderLit(string(keyEnc)+": "), renderTask{obj: v.Items[k], flavor: flavorJson}) + } + seq = append(seq, renderLit("}")) + push(enter(t.obj, seq)) + default: + switch t.flavor { + case flavorDebug: + sb.WriteString(t.obj.DebugString()) + case flavorJson: + sb.WriteString(t.obj.ToJson()) + default: + sb.WriteString(t.obj.ToString()) + } + } + } + return sb.String(), cycled +} - // Sort the strings - sort.Strings(stringsToSort) +// equalsIter is structural equality over any two values, walked with one +// explicit pair stack that expands every container kind — enum, Maybe, list, +// dict, pipe — inline, so deep values cannot overflow the call stack even +// when kinds alternate. Pointer-identical pairs are skipped (equal by +// definition), and past a step threshold already-expanded pairs are memoized +// (see dagGuard), so shared substructure cannot blow up exponentially. Leaf +// kinds compare via their own Equals. +type eqPair struct{ a, b MShellObject } - // Create a new list and add the sorted strings to it - newList := NewList(0) - for _, str := range stringsToSort { - newList.Items = append(newList.Items, MShellString{str}) +// pushPairs pushes element-wise comparison pairs onto the walk stack. +// Duplicate pairs from shared substructure are not filtered here; the +// dagGuard memo handles them (see dagGuard). +func pushPairs(stack []eqPair, as, bs []MShellObject) []eqPair { + for i := range as { + stack = append(stack, eqPair{a: as[i], b: bs[i]}) } + return stack +} + +func equalsIter(a, b MShellObject) (bool, error) { + var guard dagGuard + stack := []eqPair{{a: a, b: b}} + for len(stack) > 0 { + p := stack[len(stack)-1] + stack = stack[:len(stack)-1] + if sameRef(p.a, p.b) || guard.skip(p.a, p.b) { + continue + } + if am, aok := asMaybe(p.a); aok { + bm, bok := asMaybe(p.b) + if !bok || am.IsNone() != bm.IsNone() { + return false, nil + } + if !am.IsNone() { + stack = append(stack, eqPair{a: am.obj, b: bm.obj}) + } + continue + } + switch av := p.a.(type) { + case *MShellEnum: + bv, ok := p.b.(*MShellEnum) + if !ok || av.EnumName != bv.EnumName || av.Member != bv.Member || len(av.Payload) != len(bv.Payload) { + return false, nil + } + stack = pushPairs(stack, av.Payload, bv.Payload) + case *MShellList: + bv, ok := p.b.(*MShellList) + if !ok || len(av.Items) != len(bv.Items) { + return false, nil + } + stack = pushPairs(stack, av.Items, bv.Items) + case *MShellPipe: + bv, ok := p.b.(*MShellPipe) + if !ok || len(av.List.Items) != len(bv.List.Items) { + return false, nil + } + stack = pushPairs(stack, av.List.Items, bv.List.Items) + case *MShellDict: + bv, ok := p.b.(*MShellDict) + if !ok || len(av.Items) != len(bv.Items) { + return false, nil + } + for key, aval := range av.Items { + bval, ok := bv.Items[key] + if !ok { + return false, nil + } + stack = append(stack, eqPair{a: aval, b: bval}) + } + default: + eq, err := p.a.Equals(p.b) + if err != nil || !eq { + return eq, err + } + } + } + return true, nil +} + +// compareValues returns -1, 0, or 1, giving a total order over every value +// type. Different kinds are ordered by a fixed type rank (valueTypeRank); within +// a kind the natural order is used (numbers numerically with int/float +// interleaved, text lexically, dates chronologically, bytes bytewise). +// Structured values compare lexicographically: lists positionally (shorter +// prefix first), dicts by sorted key then value, enums by name then declaration +// order then payloads. For those kinds the order agrees with structural +// equality: compareValues returns 0 exactly when the two values are Equals. +// Unorderable kinds (quotation, grid, ...) are the exception — they share a +// rank and always compare 0, so a stable sort preserves their original order, +// while Equals still distinguishes them (identity for quotations, cell-wise +// for grids). +// +// The comparison is driven by an explicit work stack rather than recursion, so +// arbitrarily deep values (e.g. a long `node(node(...))` enum chain) cannot +// overflow the call stack. Each task is either a pair of values to compare or a +// precomputed literal result (used for length tiebreaks and dict key / enum +// name comparisons). Pending tasks pop in lexicographic order; the first +// non-zero result short-circuits. Children of a compound value are pushed on top +// of that value's own length-tiebreak, so the tiebreak is only reached when the +// whole prefix compared equal. +func compareValues(a, b MShellObject) int { + type task struct { + a, b MShellObject + lit int + isLit bool + } + var guard dagGuard + stack := []task{{a: a, b: b}} + for len(stack) > 0 { + t := stack[len(stack)-1] + stack = stack[:len(stack)-1] + if t.isLit { + if t.lit != 0 { + return t.lit + } + continue + } + // Shared substructure: a pointer-identical pair compares 0 by + // definition, and a pair this walk already expanded proved 0 (any + // non-zero would have returned; see dagGuard). Skipping both keeps + // DAG-shaped values linear instead of 2^n. + if sameRef(t.a, t.b) || guard.skip(t.a, t.b) { + continue + } + ra, rb := valueTypeRank(t.a), valueTypeRank(t.b) + if ra != rb { + return cmpInt(ra, rb) + } + switch av := t.a.(type) { + case MShellNull: + // Two nulls are equal; move to the next task. + case MShellBool: + bv := t.b.(MShellBool) + if av.Value != bv.Value { + if !av.Value { // false < true + return -1 + } + return 1 + } + case MShellInt: + if bv, ok := t.b.(MShellInt); ok { + if c := cmpInt(av.Value, bv.Value); c != 0 { + return c + } + } else if c := cmpFloat(numericFloat(t.a), numericFloat(t.b)); c != 0 { + return c + } + case MShellFloat: + if c := cmpFloat(numericFloat(t.a), numericFloat(t.b)); c != 0 { + return c + } + case MShellString, MShellPath, MShellLiteral: + if c := strings.Compare(textContent(t.a), textContent(t.b)); c != 0 { + return c + } + case *MShellDateTime: + bt := t.b.(*MShellDateTime).Time + if av.Time.Before(bt) { + return -1 + } + if av.Time.After(bt) { + return 1 + } + case MShellBinary: + if c := bytes.Compare(av, t.b.(MShellBinary)); c != 0 { + return c + } + case Maybe, *Maybe: + am, _ := asMaybe(t.a) + bm, _ := asMaybe(t.b) + an, bn := am.IsNone(), bm.IsNone() + if an != bn { + if an { // none < just + return -1 + } + return 1 + } + if !an { // both `just`: compare payloads + stack = append(stack, task{a: am.obj, b: bm.obj}) + } + case *MShellList: + bl := t.b.(*MShellList) + n := min(len(av.Items), len(bl.Items)) + stack = append(stack, task{lit: cmpInt(len(av.Items), len(bl.Items)), isLit: true}) + for i := n - 1; i >= 0; i-- { + stack = append(stack, task{a: av.Items[i], b: bl.Items[i]}) + } + case *MShellDict: + bd := t.b.(*MShellDict) + ak := sortedDictKeys(av.Items) + bk := sortedDictKeys(bd.Items) + n := min(len(ak), len(bk)) + stack = append(stack, task{lit: cmpInt(len(ak), len(bk)), isLit: true}) + for i := n - 1; i >= 0; i-- { + // Pushed so `key compare` pops before its `value compare`. + stack = append(stack, task{a: av.Items[ak[i]], b: bd.Items[bk[i]]}) + stack = append(stack, task{lit: strings.Compare(ak[i], bk[i]), isLit: true}) + } + case *MShellEnum: + be := t.b.(*MShellEnum) + n := min(len(av.Payload), len(be.Payload)) + stack = append(stack, task{lit: cmpInt(len(av.Payload), len(be.Payload)), isLit: true}) + for i := n - 1; i >= 0; i-- { + stack = append(stack, task{a: av.Payload[i], b: be.Payload[i]}) + } + // Name and member (declaration order) compare before any payload. + stack = append(stack, task{lit: cmpInt(av.MemberIndex, be.MemberIndex), isLit: true}) + stack = append(stack, task{lit: strings.Compare(av.EnumName, be.EnumName), isLit: true}) + default: + // Unorderable kinds (quotation, pipe, grid, ...) share a rank and + // compare equal, so a stable sort leaves them in their original + // relative order. + } + } + return 0 +} + +// SortList returns a new list with the same elements sorted by the total order +// compareValues defines. Element identity and type are preserved (a list of +// ints stays ints, enum payloads are kept) — sorting only reorders. +func SortList(list *MShellList) (*MShellList, error) { + newItems := make([]MShellObject, len(list.Items)) + copy(newItems, list.Items) + sort.SliceStable(newItems, func(i, j int) bool { + return compareValues(newItems[i], newItems[j]) < 0 + }) + newList := NewList(0) + newList.Items = newItems CopyListParams(list, newList) return newList, nil } -// Sort the list. Returns an error if any item cannot be cast to a string. +// SortListFunc sorts by a string key (each element's CastString) using the given +// string comparer — used for version sort. Original elements are preserved in +// the result. Returns an error if any element cannot be cast to a string. func SortListFunc(list *MShellList, cmp func(a string, b string) int) (*MShellList, error) { - stringsToSort := make([]string, len(list.Items)) + type keyed struct { + key string + obj MShellObject + } + items := make([]keyed, len(list.Items)) for i, item := range list.Items { str, err := item.CastString() if err != nil { return nil, fmt.Errorf("Cannot sort a list with a %s inside (%s).\n", item.TypeName(), item.DebugString()) } - stringsToSort[i] = str + items[i] = keyed{key: str, obj: item} } - // Sort the strings to function - slices.SortFunc(stringsToSort, cmp) + slices.SortStableFunc(items, func(a, b keyed) int { + return cmp(a.key, b.key) + }) - // Create a new list and add the sorted strings to it newList := NewList(0) - for _, str := range stringsToSort { - newList.Items = append(newList.Items, MShellString{str}) + for _, it := range items { + newList.Items = append(newList.Items, it.obj) } CopyListParams(list, newList) return newList, nil @@ -1129,19 +1691,6 @@ func (obj MShellFloat) CommandLine() string { return strconv.FormatFloat(obj.Value, 'f', -1, 64) } -// DebugString -func DebugStrs(objs []MShellObject) []string { - debugStrs := make([]string, len(objs)) - for i, obj := range objs { - if obj == nil { - debugStrs[i] = "nil" - } else { - debugStrs[i] = obj.DebugString() - } - } - return debugStrs -} - func (obj MShellLiteral) DebugString() string { return obj.LiteralText } @@ -1175,8 +1724,8 @@ func (obj *MShellQuotation) DebugString() string { } func (obj *MShellList) DebugString() string { - // Join the tokens with a space, surrounded by '[' and ']' - return "[" + strings.Join(DebugStrs(obj.Items), " ") + "]" + // Elements joined with a space, surrounded by '[' and ']' + return renderValue(obj, flavorDebug) } func cleanStringForTerminal(input string) string { @@ -1221,8 +1770,8 @@ func (obj MShellPath) DebugString() string { } func (obj *MShellPipe) DebugString() string { - // Join each item with a ' | ' - return strings.Join(DebugStrs(obj.List.Items), " | ") + // Each item joined with ' | ' + return renderValue(obj, flavorDebug) } func (obj MShellInt) DebugString() string { @@ -1718,17 +2267,7 @@ func (obj *MShellQuotation) ToJson() string { } func (obj *MShellList) ToJson() string { - builder := strings.Builder{} - builder.WriteString("[") - if len(obj.Items) > 0 { - builder.WriteString(obj.Items[0].ToJson()) - for _, item := range obj.Items[1:] { - builder.WriteString(", ") - builder.WriteString(item.ToJson()) - } - } - builder.WriteString("]") - return builder.String() + return renderValue(obj, flavorJson) } func (obj MShellString) ToJson() string { @@ -1949,62 +2488,66 @@ func (obj MShellLiteral) Equals(other MShellObject) (bool, error) { case MShellPath: return obj.LiteralText == o.Path, nil default: - return false, fmt.Errorf("Cannot compare a literal with a %s.\n", other.TypeName()) + return false, nil } } func (obj MShellBool) Equals(other MShellObject) (bool, error) { asBool, ok := other.(MShellBool) if !ok { - return false, fmt.Errorf("Cannot compare a boolean with a %s.\n", other.TypeName()) + return false, nil } return obj.Value == asBool.Value, nil } + func (obj *MShellQuotation) Equals(other MShellObject) (bool, error) { - return false, fmt.Errorf("Equality currently not defined for quotations.\n") + // Quotations are code values; two are equal only when they are the same + // quotation object (reference identity). + o, ok := other.(*MShellQuotation) + return ok && obj == o, nil } func (obj *MShellList) Equals(other MShellObject) (bool, error) { - return false, fmt.Errorf("Equality currently not defined for lists.\n") + return equalsIter(obj, other) } func (obj MShellString) Equals(other MShellObject) (bool, error) { - // Define equality for other as string or as literal. - switch other.(type) { + // str/path/literal compare by their text content (the `=` overloads + // permit str/path comparison); any other type is simply not equal. + switch o := other.(type) { case MShellString: - asString, _ := other.(MShellString) - return obj.Content == asString.Content, nil + return obj.Content == o.Content, nil case MShellLiteral: - asLiteral, _ := other.(MShellLiteral) - return obj.Content == asLiteral.LiteralText, nil + return obj.Content == o.LiteralText, nil + case MShellPath: + return obj.Content == o.Path, nil default: - return false, fmt.Errorf("Cannot compare a string with a %s.\n", other.TypeName()) + return false, nil } } func (obj MShellPath) Equals(other MShellObject) (bool, error) { - // Define equality for other as string or as literal. - switch other.(type) { + switch o := other.(type) { case MShellPath: - asPath, _ := other.(MShellPath) - return obj.Path == asPath.Path, nil + return obj.Path == o.Path, nil case MShellLiteral: - asLiteral, _ := other.(MShellLiteral) - return obj.Path == asLiteral.LiteralText, nil + return obj.Path == o.LiteralText, nil + case MShellString: + return obj.Path == o.Content, nil default: - return false, fmt.Errorf("Cannot compare a path with a %s.\n", other.TypeName()) + return false, nil } } func (obj *MShellPipe) Equals(other MShellObject) (bool, error) { - return false, fmt.Errorf("Equality currently not defined for pipes.\n") + return equalsIter(obj, other) } func (obj MShellInt) Equals(other MShellObject) (bool, error) { asInt, ok := other.(MShellInt) if !ok { - return false, fmt.Errorf("Cannot compare an integer with a %s.\n", other.TypeName()) + return false, nil } return obj.Value == asInt.Value, nil } @@ -2012,7 +2555,7 @@ func (obj MShellInt) Equals(other MShellObject) (bool, error) { func (obj MShellFloat) Equals(other MShellObject) (bool, error) { asFloat, ok := other.(MShellFloat) if !ok { - return false, fmt.Errorf("Cannot compare a float with a %s.\n", other.TypeName()) + return false, nil } return obj.Value == asFloat.Value, nil } @@ -2145,28 +2688,59 @@ func (col *GridColumn) Get(index int) MShellObject { } } -// Set sets the value at the given row index +// Set sets the value at the given row index. If a typed column is given a value +// of a different type, the column is promoted to generic storage so the value +// is stored rather than silently dropped. func (col *GridColumn) Set(index int, value MShellObject) { switch col.ColType { case COL_INT: if intVal, ok := value.(MShellInt); ok { col.IntData[index] = int64(intVal.Value) + return } case COL_FLOAT: if floatVal, ok := value.(MShellFloat); ok { col.FloatData[index] = floatVal.Value + return } case COL_STRING: if strVal, ok := value.(MShellString); ok { col.StringData[index] = strVal.Content + return } case COL_DATETIME: if dtVal, ok := value.(*MShellDateTime); ok { col.DateTimeData[index] = dtVal.Time + return } - default: + case COL_GENERIC: col.GenericData[index] = value + return + } + // Typed column received a value of a different type: promote the whole + // column to generic storage, then store the value. + col.promoteToGeneric() + col.GenericData[index] = value +} + +// promoteToGeneric materializes a typed column's data into generic storage so +// the column can hold values of any type. It is a no-op for an already-generic +// column. +func (col *GridColumn) promoteToGeneric() { + if col.ColType == COL_GENERIC { + return + } + n := col.Len() + generic := make([]MShellObject, n) + for i := 0; i < n; i++ { + generic[i] = col.Get(i) } + col.ColType = COL_GENERIC + col.GenericData = generic + col.IntData = nil + col.FloatData = nil + col.StringData = nil + col.DateTimeData = nil } // Len returns the number of rows in the column @@ -2368,7 +2942,25 @@ func (g *MShellGrid) Concat(other MShellObject) (MShellObject, error) { } func (g *MShellGrid) Equals(other MShellObject) (bool, error) { - return false, fmt.Errorf("Equality currently not defined for grids.\n") + o, ok := other.(*MShellGrid) + if !ok { + return false, nil + } + if g.RowCount != o.RowCount || len(g.Columns) != len(o.Columns) { + return false, nil + } + for i, col := range g.Columns { + if col.Name != o.Columns[i].Name { + return false, nil + } + } + for i := 0; i < g.RowCount; i++ { + eq, err := g.GetRow(i).ToDict().Equals(o.GetRow(i).ToDict()) + if err != nil || !eq { + return eq, err + } + } + return true, nil } func (g *MShellGrid) CastString() (string, error) { @@ -2484,7 +3076,20 @@ func (v *MShellGridView) Concat(other MShellObject) (MShellObject, error) { } func (v *MShellGridView) Equals(other MShellObject) (bool, error) { - return false, fmt.Errorf("Equality currently not defined for grid views.\n") + o, ok := other.(*MShellGridView) + if !ok { + return false, nil + } + if len(v.Indices) != len(o.Indices) { + return false, nil + } + for i := range v.Indices { + eq, err := v.GetRow(i).ToDict().Equals(o.GetRow(i).ToDict()) + if err != nil || !eq { + return eq, err + } + } + return true, nil } func (v *MShellGridView) CastString() (string, error) { @@ -2593,7 +3198,11 @@ func (r *MShellGridRow) Concat(other MShellObject) (MShellObject, error) { } func (r *MShellGridRow) Equals(other MShellObject) (bool, error) { - return false, fmt.Errorf("Equality currently not defined for grid rows.\n") + o, ok := other.(*MShellGridRow) + if !ok { + return false, nil + } + return r.ToDict().Equals(o.ToDict()) } func (r *MShellGridRow) CastString() (string, error) { diff --git a/mshell/Main.go b/mshell/Main.go index eba1046e..d01ff600 100644 --- a/mshell/Main.go +++ b/mshell/Main.go @@ -166,7 +166,7 @@ func getStartupFileSpecs(options startupLoadOptions) (startupFileSpec, startupFi return stdlibSpec, initSpec, nil } -func loadStartupFile(path string, description string, stack *MShellStack, context ExecuteContext, state *EvalState, definitions *[]MShellDefinition) error { +func loadStartupFile(path string, description string, stack *MShellStack, context ExecuteContext, state *EvalState, definitions *[]MShellDefinition, items *[]MShellParseItem) error { sourceBytes, err := os.ReadFile(path) if err != nil { if errors.Is(err, os.ErrNotExist) { @@ -181,7 +181,17 @@ func loadStartupFile(path string, description string, stack *MShellStack, contex } *definitions = append(*definitions, parsedFile.Definitions...) + if err := FindDuplicateDefinition(*definitions); err != nil { + return fmt.Errorf("error loading %s at %s: %w", description, path, err) + } state.AddCompletionDefinitions(parsedFile.Definitions) + // Register enum constructors declared in this startup file, and retain the + // top-level items so the type checker can register the file's `type` and + // `enum` declarations — startup declarations behave like the main file's. + state.RegisterEnums(parsedFile.Items) + if items != nil { + *items = append(*items, parsedFile.Items...) + } if len(parsedFile.Items) > 0 { callStackItem := CallStackItem{ @@ -257,16 +267,17 @@ func preflightStartupFile(spec startupFileSpec) string { return fmt.Sprintf("present at %s (parses ok; not evaluated because the other startup file failed first)", spec.path) } -func loadStartupDefinitions(options startupLoadOptions, stack *MShellStack, context ExecuteContext, state *EvalState) ([]MShellDefinition, error) { +func loadStartupDefinitions(options startupLoadOptions, stack *MShellStack, context ExecuteContext, state *EvalState) ([]MShellDefinition, []MShellParseItem, error) { stdlibSpec, initSpec, err := getStartupFileSpecs(options) if err != nil { - return nil, err + return nil, nil, err } definitions := make([]MShellDefinition, 0) - if err := loadStartupFile(stdlibSpec.path, stdlibSpec.description, stack, context, state, &definitions); err != nil { + var items []MShellParseItem + if err := loadStartupFile(stdlibSpec.path, stdlibSpec.description, stack, context, state, &definitions, &items); err != nil { initStatus := preflightStartupFile(initSpec) - return nil, &startupLoadError{ + return nil, nil, &startupLoadError{ which: "stdlib", spec: stdlibSpec, options: options, @@ -276,14 +287,14 @@ func loadStartupDefinitions(options startupLoadOptions, stack *MShellStack, cont } } - if err := loadStartupFile(initSpec.path, initSpec.description, stack, context, state, &definitions); err != nil { + if err := loadStartupFile(initSpec.path, initSpec.description, stack, context, state, &definitions, &items); err != nil { if !initSpec.required && errors.Is(err, os.ErrNotExist) { - return definitions, nil + return definitions, items, nil } - return nil, &startupLoadError{which: "init", spec: initSpec, options: options, cause: err} + return nil, nil, &startupLoadError{which: "init", spec: initSpec, options: options, cause: err} } - return definitions, nil + return definitions, items, nil } // formatStartupErrorMessage builds a multi-line explanation of how msh searches @@ -821,7 +832,7 @@ func main() { var allDefinitions []MShellDefinition - startupDefinitions, err := loadStartupDefinitions(startupLoadOptions{ + startupDefinitions, startupItems, err := loadStartupDefinitions(startupLoadOptions{ version: effectiveVersion, allowEnvOverrides: allowStartupEnvOverrides, requireInit: requireVersionedInit, @@ -836,7 +847,7 @@ func main() { state.AddCompletionDefinitions(file.Definitions) if checkTypes { - errs, ok := TypeCheckProgram(file, startupDefinitions) + errs, ok := TypeCheckProgram(file, startupDefinitions, startupItems) if !ok { for _, e := range errs { fmt.Fprintln(os.Stderr, e) @@ -853,10 +864,22 @@ func main() { } } + // Definition lookup is first-match-wins, so a script def whose name is + // already taken (by the stdlib, the init file, or the script itself) + // would be silently dead code. Reject it instead. + if err := FindDuplicateDefinition(allDefinitions); err != nil { + fmt.Fprint(os.Stderr, err.Error()) + os.Exit(1) + } + if len(file.Items) == 0 { os.Exit(0) } + // Register enum constructors before evaluation so bare member words can + // be constructed (mirrors the checker's enum pre-pass). + state.RegisterEnums(file.Items) + callStackItem := CallStackItem{ MShellParseItem: nil, Name: "main", @@ -2968,10 +2991,21 @@ func (state *TermState) ExecuteCurrentCommand() (bool, int) { term.Restore(state.stdInFd, &state.oldState) if len(parsed.Definitions) > 0 { + // Definition lookup is first-match-wins, so a redefinition would be + // silently ignored rather than take effect; reject the input instead. + if err := FindDuplicateDefinition(state.stdLibDefs, parsed.Definitions); err != nil { + fmt.Fprint(os.Stderr, err.Error()) + goto PromptPrint + } state.stdLibDefs = append(state.stdLibDefs, parsed.Definitions...) state.evalState.AddCompletionDefinitions(parsed.Definitions) } + // Register enum constructors declared on this line, so an interactive + // `enum` declaration works like one in a script: its member words + // construct values on subsequent lines. + state.evalState.RegisterEnums(parsed.Items) + if len(parsed.Items) > 0 { state.initCallStackItem.MShellParseItem = parsed.Items[0] result := state.evalState.Evaluate(parsed.Items, &state.stack, state.context, state.stdLibDefs, state.initCallStackItem) @@ -3158,11 +3192,15 @@ func (state *TermState) getCurrentPos() (int, int, error) { } func stdLibDefinitions(stack *MShellStack, context ExecuteContext, state *EvalState) ([]MShellDefinition, error) { - return loadStartupDefinitions(startupLoadOptions{ + // The interactive path has no whole-program type-check pass, so the + // startup items (already registered on the EvalState for runtime enum + // construction inside loadStartupFile) are not needed here. + defs, _, err := loadStartupDefinitions(startupLoadOptions{ version: mshellVersion, allowEnvOverrides: true, requireInit: false, }, stack, context, state) + return defs, err } func registerTempFileForCleanup(tempFileName string) { diff --git a/mshell/Parser.go b/mshell/Parser.go index a31692f0..9a4a127a 100644 --- a/mshell/Parser.go +++ b/mshell/Parser.go @@ -680,6 +680,12 @@ func (parser *MShellParser) ParseFile() (file *MShellFile, err error) { return file, err } file.Items = append(file.Items, decl) + case ENUM: + decl, err := parser.ParseEnumDecl() + if err != nil { + return file, err + } + file.Items = append(file.Items, decl) case VER: if file.Version != "" { return file, fmt.Errorf("%d:%d: Duplicate VER directive; version already set to %q", parser.curr.Line, parser.curr.Column, file.Version) diff --git a/mshell/Startup_test.go b/mshell/Startup_test.go index be7902dd..88c067df 100644 --- a/mshell/Startup_test.go +++ b/mshell/Startup_test.go @@ -151,7 +151,7 @@ func TestLoadStartupDefinitionsLoadsVersionedStdlibAndInit(t *testing.T) { stack, context, state := newStartupTestContext() - definitions, err := loadStartupDefinitions(startupLoadOptions{ + definitions, _, err := loadStartupDefinitions(startupLoadOptions{ version: version, allowEnvOverrides: false, requireInit: true, @@ -215,7 +215,7 @@ func TestLoadStartupDefinitionsRequiresInitForExplicitVersion(t *testing.T) { stack, context, state := newStartupTestContext() - _, err := loadStartupDefinitions(startupLoadOptions{ + _, _, err := loadStartupDefinitions(startupLoadOptions{ version: version, allowEnvOverrides: false, requireInit: true, @@ -251,7 +251,7 @@ func TestLoadStartupDefinitionsAllowsMissingInitForImplicitVersion(t *testing.T) stack, context, state := newStartupTestContext() - definitions, err := loadStartupDefinitions(startupLoadOptions{ + definitions, _, err := loadStartupDefinitions(startupLoadOptions{ version: version, allowEnvOverrides: true, requireInit: false, @@ -417,3 +417,53 @@ func TestEnvWithoutStartupOverridesRemovesOnlyStartupVars(t *testing.T) { t.Fatalf("filtered env missing KEEP_ME: %q", filteredJoined) } } + +func TestStartupFileEnumRegistersConstructors(t *testing.T) { + // An `enum` declared in a startup file (stdlib / init) must register its + // constructors on the EvalState, so a member word in the main program (or + // at the interactive prompt) constructs a value instead of falling through + // to the bare-literal path. + dir := t.TempDir() + path := filepath.Join(dir, "init.msh") + if err := os.WriteFile(path, []byte("enum Status = active | inactive end\n"), 0644); err != nil { + t.Fatalf("WriteFile(init) error = %v", err) + } + + stack, context, state := newStartupTestContext() + var defs []MShellDefinition + var items []MShellParseItem + if err := loadStartupFile(path, "test init", &stack, context, &state, &defs, &items); err != nil { + t.Fatalf("loadStartupFile() error = %v", err) + } + + info, ok := state.EnumMembers["active"] + if !ok { + t.Fatalf("expected member 'active' registered from startup file") + } + if info.EnumName != "Status" || info.Arity != 0 { + t.Fatalf("EnumMembers[active] = %+v, want Status arity 0", info) + } + if len(items) == 0 { + t.Fatalf("expected startup items to be retained for the checker") + } + + parsed, err := parseMShellInput("active", &TokenFile{"main"}) + if err != nil { + t.Fatalf("parse error: %v", err) + } + callStackItem := CallStackItem{MShellParseItem: parsed.Items[0], Name: "main", CallStackType: CALLSTACKFILE} + result := state.Evaluate(parsed.Items, &stack, context, defs, callStackItem) + if !result.Success { + t.Fatalf("evaluating member word failed") + } + if len(stack) != 1 { + t.Fatalf("len(stack) = %d, want 1", len(stack)) + } + en, ok := stack[0].(*MShellEnum) + if !ok { + t.Fatalf("stack top = %T (%s), want *MShellEnum", stack[0], stack[0].DebugString()) + } + if en.EnumName != "Status" || en.Member != "active" { + t.Fatalf("enum value = %s.%s, want Status.active", en.EnumName, en.Member) + } +} diff --git a/mshell/Type.go b/mshell/Type.go index 674a9e03..7b97ae61 100644 --- a/mshell/Type.go +++ b/mshell/Type.go @@ -60,6 +60,11 @@ const ( TKGridView // Extra = index into gridSchemas (0 = unknown schema) TKGridRow // Extra = index into gridSchemas (0 = unknown schema) + // Generative tagged sum type (a user `enum`). Nominal: identity is the + // declaration NameId, not the member set. A = decl NameId; Extra = index + // into enumVariants. See design/literal_or_enum_typing.html. + TKEnum + // TKStrLit is a `str` refined with a statically known value: A holds the // interned NameId of the literal content. It is a subtype of `str` — // unify and every container constructor widen it back to TidStr — so it @@ -102,6 +107,8 @@ func (k TypeKind) String() string { return "GridView" case TKGridRow: return "GridRow" + case TKEnum: + return "Enum" case TKStrLit: return "StrLit" } @@ -140,6 +147,15 @@ type ShapeField struct { Optional bool } +// EnumVariant is one constructor of a TKEnum. Payload is the (ordered) +// list of payload types the constructor carries; it is empty for a +// nullary member. Order is meaningful — it sets member order for +// diagnostics and exhaustiveness. +type EnumVariant struct { + Name NameId + Payload []TypeId +} + // GridSchemaCol is one column in a TKGrid / TKGridView / TKGridRow schema. // Order is meaningful (grids have column order). type GridSchemaCol struct { @@ -186,6 +202,7 @@ type TypeArena struct { unionMembers [][]TypeId // each slice is sorted, deduped gridSchemas []GridSchema gridSchemaCons map[string]uint32 + enumVariants [][]EnumVariant } // NewTypeArena constructs an arena pre-populated with the primitive ids @@ -223,6 +240,8 @@ func NewTypeArena() *TypeArena { a.shapeFields = append(a.shapeFields, nil) a.quoteSigs = append(a.quoteSigs, QuoteSig{}) a.overloadedQuoteSigs = append(a.overloadedQuoteSigs, nil) + // Reserve enumVariants[0] as a placeholder so non-zero Extra is meaningful. + a.enumVariants = append(a.enumVariants, nil) return a } @@ -401,6 +420,58 @@ func (a *TypeArena) MakeOverloadedQuote(sigs []QuoteSig) TypeId { return id } +// MakeEnum returns the canonical TypeId for a user enum named by nameId, +// with the given variants. Identity is nominal: it keys on the declaration +// NameId alone, so two enums with identical members are distinct types and +// the same enum always interns to the same TypeId. A duplicate declaration +// is caught higher up (DeclareEnum); reaching here with a name already +// interned returns the existing id. +func (a *TypeArena) MakeEnum(nameId NameId, variants []EnumVariant) TypeId { + key := "E:" + strconv.FormatUint(uint64(nameId), 10) + if id, ok := a.cons[key]; ok { + return id + } + cp := make([]EnumVariant, len(variants)) + copy(cp, variants) + idx := uint32(len(a.enumVariants)) + a.enumVariants = append(a.enumVariants, cp) + id := a.append(TypeNode{Kind: TKEnum, A: uint32(nameId), Extra: idx}) + a.cons[key] = id + return id +} + +// SetEnumVariants replaces the variant list of an already-created enum type. +// Used to finalize an enum after its name was registered with a placeholder +// (empty) variant list, so member payloads can reference the enum itself or +// other enums regardless of declaration order. +func (a *TypeArena) SetEnumVariants(id TypeId, variants []EnumVariant) { + n := a.Node(id) + if n.Kind != TKEnum { + panic("TypeArena.SetEnumVariants: not an enum") + } + cp := make([]EnumVariant, len(variants)) + copy(cp, variants) + a.enumVariants[n.Extra] = cp +} + +// EnumNameId returns the declaration NameId of an enum type. +func (a *TypeArena) EnumNameId(id TypeId) NameId { + n := a.Node(id) + if n.Kind != TKEnum { + panic("TypeArena.EnumNameId: not an enum") + } + return NameId(n.A) +} + +// EnumVariants returns the variants of an enum type. Caller must not mutate. +func (a *TypeArena) EnumVariants(id TypeId) []EnumVariant { + n := a.Node(id) + if n.Kind != TKEnum { + panic("TypeArena.EnumVariants: not an enum") + } + return a.enumVariants[n.Extra] +} + // MakeGrid returns the canonical TypeId for a grid type. schemaIdx of 0 // denotes "schema unknown" (the V1 default until schema tracking lands). func (a *TypeArena) MakeGrid(schemaIdx uint32) TypeId { diff --git a/mshell/TypeBranch.go b/mshell/TypeBranch.go index 877f0a7a..c760cead 100644 --- a/mshell/TypeBranch.go +++ b/mshell/TypeBranch.go @@ -76,6 +76,10 @@ const ( MatchArmFalse // bool literal `false` pattern // MatchArmEmptyList: `[]` pattern. Covers empty lists. MatchArmEmptyList + // MatchArmEnumMember: an enum constructor pattern (`member` or + // `member b1 b2 ...`). TypeArm holds the enum type, EnumMember the + // member's NameId. + MatchArmEnumMember // MatchArmListWithRest: `[a ...rest]`, `[a b ...rest]`, or // `[...rest]` — any list pattern with a `...name` element. // Covers all lists whose length is at least the number of @@ -87,8 +91,9 @@ const ( // type effects flow through ReconcileArms; this struct only feeds // the exhaustiveness check. type MatchArmTag struct { - Kind MatchArmKind - TypeArm TypeId // valid when Kind == MatchArmType + Kind MatchArmKind + TypeArm TypeId // valid when Kind == MatchArmType or MatchArmEnumMember + EnumMember NameId // valid when Kind == MatchArmEnumMember } // CheckMatchExhaustive verifies that arms cover every inhabitant of @@ -178,6 +183,30 @@ func (c *Checker) CheckMatchExhaustive(matched TypeId, arms []MatchArmTag, callS } } + case TKEnum: + variants := c.arena.enumVariants[n.Extra] + covered := make(map[NameId]bool, len(variants)) + for _, arm := range arms { + if arm.Kind == MatchArmEnumMember && c.subst.Apply(c.arena, arm.TypeArm) == matched { + covered[arm.EnumMember] = true + } + } + var missing []string + for _, v := range variants { + if !covered[v.Name] { + missing = append(missing, c.names.Name(v.Name)) + } + } + if len(missing) == 0 { + return true + } + c.errors = append(c.errors, TypeError{ + Kind: TErrNonExhaustiveMatch, + Pos: callSite, + Hint: "enum match must cover every member or include a wildcard; missing: " + strings.Join(missing, ", "), + }) + return false + case TKList: // A list's inhabitants split by length: zero (empty) vs // one-or-more. `[]` covers empty; any list pattern that ends diff --git a/mshell/TypeCheckProgram.go b/mshell/TypeCheckProgram.go index 94038f05..2c0a0c3b 100644 --- a/mshell/TypeCheckProgram.go +++ b/mshell/TypeCheckProgram.go @@ -43,12 +43,18 @@ import ( // type-checked here — std.msh exercises features (process lists, // format strings, dynamic exec) the v1 checker does not yet model, // and we trust the runtime tests catch breakage there. -func TypeCheckProgram(file *MShellFile, stdlibDefs []MShellDefinition) (errors []string, ok bool) { +// +// startupItems is the startup files' top-level parse items; their `type` +// and `enum` declarations are registered (bodies are not checked, matching +// the def treatment) so the checker sees the same declarations the runtime +// does. +func TypeCheckProgram(file *MShellFile, stdlibDefs []MShellDefinition, startupItems []MShellParseItem) (errors []string, ok bool) { arena := NewTypeArena() names := NewNameTable() checker := NewChecker(arena, names) checker.RegisterStdlibSigs(stdlibDefs) + checker.RegisterStartupTypes(startupItems) checker.CheckProgram(file) out := make([]string, 0, len(checker.errors)) @@ -81,6 +87,12 @@ func (c *Checker) RegisterStdlibSigs(defs []MShellDefinition) { for i := range defs { def := &defs[i] nameId := c.names.Intern(def.Name) + // Record the name even when the sig registration below is skipped: + // the runtime's first-match-wins lookup still resolves to this def, + // so a later def of the same name is a duplicate regardless. + if c.recordDefName(nameId, def) { + continue + } if _, exists := c.nameBuiltins[nameId]; exists { continue } @@ -89,18 +101,83 @@ func (c *Checker) RegisterStdlibSigs(defs []MShellDefinition) { } } +// recordDefName registers a definition's name for duplicate detection. If the +// name is already taken by an earlier definition, it records an error and +// returns true (mirroring the runtime's FindDuplicateDefinition, where the +// first definition wins and a duplicate would be silently dead code). +func (c *Checker) recordDefName(nameId NameId, def *MShellDefinition) bool { + if prev, exists := c.defNameToks[nameId]; exists { + c.errors = append(c.errors, TypeError{ + Kind: TErrTypeParse, Pos: def.NameToken, + Hint: "duplicate definition '" + def.Name + "'; already defined at " + tokenPosStr(prev), + }) + return true + } + if c.defNameToks == nil { + c.defNameToks = make(map[NameId]Token) + } + c.defNameToks[nameId] = def.NameToken + return false +} + +// RegisterStartupTypes registers the `type` and `enum` declarations found in +// the startup files' top-level items (the stdlib, then the user init file), +// so the checked program sees the same declarations the runtime does. It runs +// the same three-phase order as CheckProgram's own pre-passes — enum names, +// then type aliases, then enum payload bodies + constructor words — so +// startup declarations may reference each other in any order. Call after +// RegisterStdlibSigs (so a member colliding with a startup def is caught) and +// before CheckProgram (whose def pre-pass catches the reverse collision). +func (c *Checker) RegisterStartupTypes(items []MShellParseItem) { + var enumDecls []*MShellEnumDecl + for _, item := range items { + if d, ok := item.(*MShellEnumDecl); ok { + if c.predeclareEnum(d) { + enumDecls = append(enumDecls, d) + } + } + } + for _, item := range items { + if d, ok := item.(*MShellTypeDecl); ok { + body := c.resolveTypeExpr(d.Body, nil) + c.DeclareType(d.Name, body) + } + } + for _, d := range enumDecls { + c.defineEnum(d) + } +} + // CheckProgram is the file-level type-check pass. It registers all // type declarations and user-defined function sigs, then walks the // parse tree driving the type stack. Error accumulation lives on the // Checker. func (c *Checker) CheckProgram(file *MShellFile) { - // Pre-pass 1: register all `type` declarations. + // Pre-pass 0: predeclare every `enum` name with a placeholder type, so a + // `type` body (next) and an enum payload (after that) can reference any + // enum by name, in any order. + var enumDecls []*MShellEnumDecl + for _, item := range file.Items { + if d, ok := item.(*MShellEnumDecl); ok { + if c.predeclareEnum(d) { + enumDecls = append(enumDecls, d) + } + } + } + // Pre-pass 1: register all `type` declarations. Enum names are already + // available, so a `type` body may reference an enum. for _, item := range file.Items { if d, ok := item.(*MShellTypeDecl); ok { body := c.resolveTypeExpr(d.Body, nil) c.DeclareType(d.Name, body) } } + // Pre-pass 1b: resolve enum payload bodies and register constructor words. + // Both enum names and `type` aliases are now registered, so a payload may + // reference either. + for _, d := range enumDecls { + c.defineEnum(d) + } // Pre-pass 2: register all `def` signatures so call sites (and // recursive self-calls inside def bodies) can resolve them. defSigs := make([]QuoteSig, len(file.Definitions)) @@ -109,6 +186,21 @@ func (c *Checker) CheckProgram(file *MShellFile) { sig := c.ResolveDefSig(def.Inputs, def.Outputs) defSigs[i] = sig nameId := c.names.Intern(def.Name) + // Enum constructors share the word namespace and registered in + // pre-pass 1b; a def reusing a member name would resolve to the + // constructor in the checker but to the def at runtime, so reject it — + // the mirror of defineEnum rejecting a member that collides with an + // existing def or builtin. + if _, isMember := c.enumMemberToks[nameId]; isMember { + c.errors = append(c.errors, TypeError{ + Kind: TErrTypeParse, Pos: def.NameToken, + Hint: "definition '" + def.Name + "' conflicts with an enum member of the same name", + }) + continue + } + if c.recordDefName(nameId, def) { + continue + } c.nameBuiltins[nameId] = append(c.nameBuiltins[nameId], sig) } // Pre-pass 3: type-check each def body against its declared sig. @@ -1177,23 +1269,53 @@ func formatPatternItem(it MShellParseItem) string { func (c *Checker) checkMatchBlock(matchBlock *MShellParseMatchBlock) { startTok := matchBlock.GetStartToken() if c.stack.Len() == 0 { - c.errors = append(c.errors, TypeError{ - Kind: TErrStackUnderflow, - Pos: startTok, - Hint: "match subject", - }) - return + if !c.inferring { + c.errors = append(c.errors, TypeError{ + Kind: TErrStackUnderflow, + Pos: startTok, + Hint: "match subject", + }) + return + } + // Quote-body inference: the subject is the quote's own input. + // Synthesize a fresh var exactly as applySig's underflow path does — + // at the bottom of the stack and the front of inferInputs — so + // `(match ... end) map` infers a one-input quote instead of erroring. + v := c.subst.FreshVar(c.arena) + c.inferInputs = append([]TypeId{v}, c.inferInputs...) + c.stack.items = append([]TypeId{v}, c.stack.items...) } // Widen a string-literal subject to `str`: match arms and the // exhaustiveness check compare against `str` by type id, and the literal // value carries no meaning for pattern matching. subject := c.arena.WidenStrLit(c.stack.items[c.stack.Len()-1]) + // See through a `type X = ...` brand once, here, so every arm form (enum + // member, `just`/` name` binding, list/dict pattern) and the + // exhaustiveness check match against the underlying type. A brand is + // nominal for typing but has no runtime representation, so a branded enum + // matches its members, a branded Maybe its `just`/`none`, etc. — exactly + // as the unbranded types do, which is what the runtime already does. + if resolved := c.subst.Apply(c.arena, subject); c.arena.Node(resolved).Kind == TKBrand { + subject = c.underlying(resolved) + } + // An unresolved subject (a quote input under inference) is pinned from the + // first arm pattern that names a type: an enum member determines its enum + // (member names are global) and `just`/`none` determine Maybe. Pinning + // happens before the entry branch is captured so every arm and the + // exhaustiveness check see the resolved subject. + if c.arena.Node(c.subst.Apply(c.arena, subject)).Kind == TKVar { + if pin, ok := c.matchSubjectPin(matchBlock); ok { + c.unify(subject, pin) + } + } entry := c.captureBranch() if len(matchBlock.Arms) == 0 { - // Empty match block: no arms could fire. Treat as a no-op. - // The runtime would error at first use; the checker keeps - // the subject on the stack. + // An empty match can never fire — it always errors at runtime. Run + // exhaustiveness with no arms so the static check rejects it (no type + // is covered and there is no wildcard) instead of letting it crash at + // runtime. + c.CheckMatchExhaustive(subject, nil, startTok) return } @@ -1242,6 +1364,36 @@ func (c *Checker) checkMatchBlock(matchBlock *MShellParseMatchBlock) { c.reconcileArmBranches(armBranches, armLabels, entry, startTok) } +// matchSubjectPin returns the concrete type a match's arm patterns determine +// for an as-yet-unresolved subject. An enum-member head names its enum (member +// names are unique across enums), and a `just`/`none` head names Maybe[T] with +// a fresh T. ok=false when no arm determines a type — value literals and type +// keywords deliberately do not pin, since a type-keyword match may be +// discriminating a union and pinning would wrongly narrow the input. +func (c *Checker) matchSubjectPin(matchBlock *MShellParseMatchBlock) (TypeId, bool) { + for _, arm := range matchBlock.Arms { + if len(arm.Pattern) == 0 { + continue + } + tok, ok := arm.Pattern[0].(Token) + if !ok || tok.Type != LITERAL { + continue + } + if tok.Lexeme == "just" || tok.Lexeme == "none" { + return c.arena.MakeMaybe(c.subst.FreshVar(c.arena)), true + } + mid := c.names.Intern(tok.Lexeme) + if _, isMember := c.enumMemberToks[mid]; isMember { + // A member's constructor sig has the enum as its only output. + sigs := c.nameBuiltins[mid] + if len(sigs) > 0 && len(sigs[0].Outputs) == 1 && c.arena.Node(sigs[0].Outputs[0]).Kind == TKEnum { + return sigs[0].Outputs[0], true + } + } + } + return TidNothing, false +} + // armPattern is the single interpretation of a match arm pattern. One // analysis feeds all four consumers that used to re-pattern-match the // arm independently: recognition diagnostics (Recognized), the @@ -1289,6 +1441,14 @@ func (c *Checker) analyzeArmPattern(subject TypeId, pattern []MShellParseItem) a func (c *Checker) armPatternOf(subject TypeId, pattern []MShellParseItem) armPattern { out := armPattern{Tag: MatchArmTag{Kind: MatchArmType, TypeArm: TidNothing}} + // Enum constructor pattern: `member` or `member b1 b2 ...`, when the + // subject is an enum and the first token names one of its members. This + // can be any length (one token per payload binding), so it is handled + // before the length-based switch. + if ep, ok := c.enumMemberPattern(subject, pattern); ok { + return ep + } + switch len(pattern) { case 1: switch p := pattern[0].(type) { @@ -1340,7 +1500,7 @@ func (c *Checker) armPatternOf(subject TypeId, pattern []MShellParseItem) armPat case 2: t0, ok0 := pattern[0].(Token) t1, ok1 := pattern[1].(Token) - if !ok0 || !ok1 || t1.Type != LITERAL { + if !ok0 || !ok1 || (t1.Type != LITERAL && t1.Type != UNDERSCORE) { return out } if t0.Type == LITERAL && t0.Lexeme == "just" { @@ -1379,6 +1539,69 @@ func (c *Checker) armPatternOf(subject TypeId, pattern []MShellParseItem) armPat return out } +// enumMemberPattern recognizes an enum constructor arm: `member` (nullary) or +// `member b1 b2 ...` (one binding name per payload). It returns ok=false when +// the subject is not an enum or the first token is not one of its members, so +// the caller falls back to the ordinary pattern forms. A payload-arity mismatch +// is recognized (so no "invalid pattern" cascade) but reported. +func (c *Checker) enumMemberPattern(subject TypeId, pattern []MShellParseItem) (armPattern, bool) { + if len(pattern) == 0 { + return armPattern{}, false + } + tok, ok := pattern[0].(Token) + if !ok || tok.Type != LITERAL { + return armPattern{}, false + } + // The subject is already brand-unwrapped by checkMatchBlock, so a branded + // enum (`type X = Enum`) arrives here as its underlying TKEnum. + resolved := c.subst.Apply(c.arena, subject) + sn := c.arena.Node(resolved) + if sn.Kind != TKEnum { + return armPattern{}, false + } + memberId := c.names.Intern(tok.Lexeme) + var payload []TypeId + found := false + for _, v := range c.arena.enumVariants[sn.Extra] { + if v.Name == memberId { + payload = v.Payload + found = true + break + } + } + if !found { + return armPattern{}, false + } + out := armPattern{ + Recognized: true, + Tag: MatchArmTag{Kind: MatchArmEnumMember, TypeArm: resolved, EnumMember: memberId}, + } + binds := pattern[1:] + if len(binds) != len(payload) { + c.errors = append(c.errors, TypeError{ + Kind: TErrInvalidMatchPattern, + Pos: tok, + Hint: fmt.Sprintf("enum member '%s' binds %d payload value(s), got %d", tok.Lexeme, len(payload), len(binds)), + }) + return out, true + } + for i, b := range binds { + bt, ok := b.(Token) + if !ok || (bt.Type != LITERAL && bt.Type != UNDERSCORE) { + c.errors = append(c.errors, TypeError{ + Kind: TErrInvalidMatchPattern, + Pos: tok, + Hint: "enum payload bindings must be names", + }) + return out, true + } + if bt.Lexeme != "_" { + out.Bindings = append(out.Bindings, patternBind{bt.Lexeme, payload[i]}) + } + } + return out, true +} + // analyzeTokenPattern handles the single-token pattern forms: type // keywords, value literals, `_`, `none`, and user-declared type names. func (c *Checker) analyzeTokenPattern(tok Token, out *armPattern) { @@ -1404,6 +1627,9 @@ func (c *Checker) analyzeTokenPattern(tok Token, out *armPattern) { case INTEGER, FLOAT, STRING, SINGLEQUOTESTRING, PATH: // Value literals: legal patterns, but they credit no coverage. out.Recognized = true + case UNDERSCORE: + out.Recognized = true + out.Tag = MatchArmTag{Kind: MatchArmWildcard} case LITERAL: switch tok.Lexeme { case "_": diff --git a/mshell/TypeCheckProgram_test.go b/mshell/TypeCheckProgram_test.go index 21d4a994..b7e646da 100644 --- a/mshell/TypeCheckProgram_test.go +++ b/mshell/TypeCheckProgram_test.go @@ -15,7 +15,7 @@ func parseAndCheck(t *testing.T, src string) ([]string, bool) { if err != nil { t.Fatalf("parse error: %v", err) } - return TypeCheckProgram(file, nil) + return TypeCheckProgram(file, nil, nil) } func TestTypeCheckProgramEmpty(t *testing.T) { diff --git a/mshell/TypeChecker.go b/mshell/TypeChecker.go index 2ac50e88..2bb712cd 100644 --- a/mshell/TypeChecker.go +++ b/mshell/TypeChecker.go @@ -101,6 +101,22 @@ type Checker struct { // type names are NOT stored here — they are recognized directly. typeEnv map[NameId]TypeId + // enumMemberToks records every registered enum member name (value: the + // member's declaration token). Enum constructors and user defs share the + // word namespace, and enums register before same-file defs — so def + // registration checks this to reject a def whose name collides with a + // member, mirroring defineEnum rejecting a member that collides with an + // existing def or builtin. + enumMemberToks map[NameId]Token + + // defNameToks records every registered definition name (value: the def's + // name token). Runtime definition lookup is first-match-wins, so a second + // def of a name is silently dead code, not an override; def registration + // checks this and rejects the duplicate, mirroring the runtime's + // FindDuplicateDefinition. Stdlib/init defs register before file defs, + // so a script redefining a stdlib name is caught too. + defNameToks map[NameId]Token + // Quote-body inference state (Phase 7). When inferring is true, // applySig responds to stack underflow by synthesizing fresh type // variables instead of reporting an error; those vars accumulate @@ -394,7 +410,7 @@ func (c *Checker) checkOne(tok Token) { // values, and it's load-bearing for `[cmd args] ;`-style // pipelines where forcing the user to quote every word // would defeat the point. - if c.listDepth > 0 && tok.Type == LITERAL { + if c.listDepth > 0 && (tok.Type == LITERAL || tok.Type == UNDERSCORE) { c.stack.Push(TidStr) return } @@ -959,6 +975,11 @@ func (c *Checker) unify(got, want TypeId) bool { return c.unifyQuote(gn, wn) case TKOverloadedQuote: return false + case TKEnum: + // Nominal: two enum types unify only when identical. Equal ids were + // already accepted at the top of unify; reaching here means distinct + // enums, which never unify. + return false case TKGrid, TKGridView, TKGridRow: // Phase-3 grids are opaque. Equality-by-id is the only way two grid // types match; if we got here with same kind but different ids, diff --git a/mshell/TypeEnum.go b/mshell/TypeEnum.go new file mode 100644 index 00000000..9180ed96 --- /dev/null +++ b/mshell/TypeEnum.go @@ -0,0 +1,88 @@ +package main + +// Enum support: registering `enum Name = a | b(T..) | ...` declarations. +// +// An enum is a generative tagged sum type. Registration happens in two passes +// so member payloads may reference any enum regardless of order (including the +// enum itself): +// +// - predeclareEnum interns the name and registers a placeholder TKEnum in the +// type environment, so the name resolves in any later type position. +// - defineEnum resolves each member's payload types, finalizes the variant +// list, and registers each member as a constructor word whose signature +// consumes the payload and produces the enum (`(T.. -- Enum)`). +// +// Members share the global word namespace: a member name that collides with an +// existing builtin / def / another enum member is rejected. + +// predeclareEnum registers the enum's name with a placeholder type. It returns +// true when the name was newly registered (so defineEnum should finish it), and +// false for a reserved or duplicate name (an error is recorded). +func (c *Checker) predeclareEnum(d *MShellEnumDecl) bool { + if IsReservedTypeName(d.Name) { + c.errors = append(c.errors, TypeError{Kind: TErrReservedTypeName, Pos: d.NameToken, Name: d.Name}) + return false + } + if c.typeEnv == nil { + c.typeEnv = make(map[NameId]TypeId, 8) + } + nameId := c.names.Intern(d.Name) + if _, exists := c.typeEnv[nameId]; exists { + c.errors = append(c.errors, TypeError{Kind: TErrDuplicateTypeName, Pos: d.NameToken, Name: d.Name}) + return false + } + c.typeEnv[nameId] = c.arena.MakeEnum(nameId, nil) + return true +} + +// defineEnum resolves payload types, finalizes the variant list, and registers +// constructor words. It must run after predeclareEnum has registered the name. +func (c *Checker) defineEnum(d *MShellEnumDecl) { + nameId := c.names.Intern(d.Name) + enumType := c.typeEnv[nameId] + + type member struct { + name string + tok Token + payloads []TypeId + } + uniq := make([]member, 0, len(d.Members)) + variants := make([]EnumVariant, 0, len(d.Members)) + seen := make(map[string]bool, len(d.Members)) + for i, m := range d.Members { + tok := d.MemberToks[i] + if seen[m] { + c.errors = append(c.errors, TypeError{ + Kind: TErrTypeParse, Pos: tok, + Hint: "duplicate enum member '" + m + "' in '" + d.Name + "'", + }) + continue + } + seen[m] = true + + var payloads []TypeId + for _, p := range d.MemberPayloads[i] { + payloads = append(payloads, c.resolveTypeExpr(p, nil)) + } + uniq = append(uniq, member{name: m, tok: tok, payloads: payloads}) + variants = append(variants, EnumVariant{Name: c.names.Intern(m), Payload: payloads}) + } + + c.arena.SetEnumVariants(enumType, variants) + + for _, u := range uniq { + mid := c.names.Intern(u.name) + if _, exists := c.nameBuiltins[mid]; exists { + c.errors = append(c.errors, TypeError{ + Kind: TErrTypeParse, Pos: u.tok, + Hint: "enum member '" + u.name + "' conflicts with an existing definition or builtin of the same name", + }) + continue + } + c.nameBuiltins[mid] = append(c.nameBuiltins[mid], QuoteSig{Inputs: u.payloads, Outputs: []TypeId{enumType}}) + if c.enumMemberToks == nil { + c.enumMemberToks = make(map[NameId]Token, len(uniq)) + } + c.enumMemberToks[mid] = u.tok + } +} diff --git a/mshell/TypeEnum_test.go b/mshell/TypeEnum_test.go new file mode 100644 index 00000000..5cd57a7c --- /dev/null +++ b/mshell/TypeEnum_test.go @@ -0,0 +1,170 @@ +package main + +import ( + "strings" + "testing" +) + +func TestEnumNullaryDeclAndConstruct(t *testing.T) { + errs, ok := parseAndCheck(t, "enum Color = red | green | blue end\ndef describe (Color -- str) c! \"x\" end\nred describe") + if !ok || len(errs) != 0 { + t.Fatalf("nullary enum decl + construct should pass; errs=%v ok=%v", errs, ok) + } +} + +func TestEnumPayloadConstructorSignature(t *testing.T) { + // A payload constructor has signature (payload... -- Enum). + errs, ok := parseAndCheck(t, "enum R = ok str | failed int str | none2 end\ndef use (R -- str) c! \"x\" end\n404 \"nf\" failed use") + if !ok || len(errs) != 0 { + t.Fatalf("payload constructor should type-check; errs=%v ok=%v", errs, ok) + } +} + +func TestEnumPayloadWrongType(t *testing.T) { + errs, ok := parseAndCheck(t, "enum R = ok int end\n\"x\" ok") + if ok { + t.Fatalf("wrong payload type should fail; errs=%v", errs) + } +} + +func TestEnumDistinctNominal(t *testing.T) { + // Two enums with parallel members do not unify. + src := "enum A = a1 | a2 end\nenum B = b1 | b2 end\ndef takesA (A -- str) c! \"x\" end\nb1 takesA" + errs, ok := parseAndCheck(t, src) + if ok { + t.Fatalf("feeding enum B where A is expected should fail; errs=%v", errs) + } +} + +func TestEnumDuplicateMember(t *testing.T) { + errs, ok := parseAndCheck(t, "enum E = a | b | a end") + if ok { + t.Fatalf("duplicate enum member should fail; errs=%v", errs) + } + if !strings.Contains(strings.Join(errs, "\n"), "duplicate enum member") { + t.Fatalf("expected duplicate-member error; errs=%v", errs) + } +} + +func TestEnumCrossEnumMemberCollision(t *testing.T) { + errs, ok := parseAndCheck(t, "enum E = x1 | shared end\nenum F = shared | y1 end") + if ok { + t.Fatalf("member name reused across enums should fail; errs=%v", errs) + } +} + +func TestEnumReservedName(t *testing.T) { + errs, ok := parseAndCheck(t, "enum Maybe = a | b end") + if ok { + t.Fatalf("enum named with a reserved type name should fail; errs=%v", errs) + } +} + +func TestEnumMissingEnd(t *testing.T) { + // Without the closing `end`, the declaration is incomplete — a parse error. + l := NewLexer("enum Color = red | green | blue\n\"x\" wl", nil) + p := NewMShellParser(l) + if _, err := p.ParseFile(); err == nil { + t.Fatalf("enum without a closing 'end' should be a parse error") + } +} + +func TestEnumMatchExhaustive(t *testing.T) { + src := "enum Color = red | green | blue end\ngreen match\n red : \"r\" wl,\n green : \"g\" wl,\n blue : \"b\" wl,\nend" + errs, ok := parseAndCheck(t, src) + if !ok || len(errs) != 0 { + t.Fatalf("exhaustive enum match should pass; errs=%v ok=%v", errs, ok) + } +} + +func TestEnumMatchNonExhaustive(t *testing.T) { + src := "enum Color = red | green | blue end\ngreen match\n red : \"r\" wl,\n blue : \"b\" wl,\nend" + errs, ok := parseAndCheck(t, src) + if ok { + t.Fatalf("non-exhaustive enum match should fail; errs=%v", errs) + } + if !strings.Contains(strings.Join(errs, "\n"), "missing: green") { + t.Fatalf("expected missing-member hint naming 'green'; errs=%v", errs) + } +} + +func TestEnumMatchWildcardExhaustive(t *testing.T) { + src := "enum Color = red | green | blue end\nred match\n red : \"r\" wl,\n _ : \"o\" wl,\nend" + errs, ok := parseAndCheck(t, src) + if !ok || len(errs) != 0 { + t.Fatalf("wildcard should make enum match exhaustive; errs=%v ok=%v", errs, ok) + } +} + +func TestEnumMatchEmptyNonExhaustive(t *testing.T) { + // An empty match covers no members and must be rejected. + src := "enum Color = red | green | blue end\nred match end" + errs, ok := parseAndCheck(t, src) + if ok { + t.Fatalf("empty match on an enum should be non-exhaustive; errs=%v", errs) + } +} + +func TestEnumMatchPayloadBinding(t *testing.T) { + src := "enum R = ok str | failed int str | quit end\n404 \"nf\" failed match\n ok s : @s wl,\n failed c e : @e wl,\n quit : \"q\" wl,\nend" + errs, ok := parseAndCheck(t, src) + if !ok || len(errs) != 0 { + t.Fatalf("payload-binding enum match should pass; errs=%v ok=%v", errs, ok) + } +} + +func TestEnumRecursivePayload(t *testing.T) { + // A member may carry a payload that references the enum itself. + src := "enum Tree = leaf int | node Tree Tree end\n3 leaf 4 leaf node" + errs, ok := parseAndCheck(t, src) + if !ok || len(errs) != 0 { + t.Fatalf("self-referential enum payload should type-check; errs=%v ok=%v", errs, ok) + } +} + +// parseItemsForTest parses source and returns its top-level items, for tests +// that feed startup-file declarations to the checker. +func parseItemsForTest(t *testing.T, src string) []MShellParseItem { + t.Helper() + l := NewLexer(src, nil) + p := NewMShellParser(l) + file, err := p.ParseFile() + if err != nil { + t.Fatalf("parse error: %v", err) + } + return file.Items +} + +func TestStartupEnumAndTypeVisibleToChecker(t *testing.T) { + // `enum` and `type` declarations in a startup file (stdlib / init) are + // registered before the main program is checked, so the program can + // construct members, match on them, and reference the alias — the same + // declarations the runtime registers. + startup := parseItemsForTest(t, "enum Status = active | inactive end\ntype Tagged = {name: str, s: Status}") + l := NewLexer("active match\n active : \"A\" wl,\n inactive : \"I\" wl,\nend\n{ \"name\": \"x\", \"s\": active } as Tagged drop", nil) + p := NewMShellParser(l) + file, err := p.ParseFile() + if err != nil { + t.Fatalf("parse error: %v", err) + } + errs, ok := TypeCheckProgram(file, nil, startup) + if !ok || len(errs) != 0 { + t.Fatalf("startup enum/type should be visible to the checker; errs=%v ok=%v", errs, ok) + } +} + +func TestDefCollidingWithStartupEnumMemberRejected(t *testing.T) { + // The member/def collision check spans files: a program def reusing a + // startup enum's member name is rejected, same as a same-file collision. + startup := parseItemsForTest(t, "enum E = foo | zz end") + l := NewLexer("def foo ( -- int) 42 end\nfoo drop", nil) + p := NewMShellParser(l) + file, err := p.ParseFile() + if err != nil { + t.Fatalf("parse error: %v", err) + } + errs, ok := TypeCheckProgram(file, nil, startup) + if ok { + t.Fatalf("def colliding with startup enum member should fail; errs=%v", errs) + } +} diff --git a/mshell/TypeError.go b/mshell/TypeError.go index 9eabbf53..c4f5b0c3 100644 --- a/mshell/TypeError.go +++ b/mshell/TypeError.go @@ -282,6 +282,8 @@ func FormatType(arena *TypeArena, names *NameTable, id TypeId) string { return "GridView" case TKGridRow: return "GridRow" + case TKEnum: + return names.Name(NameId(n.A)) } return fmt.Sprintf("<%s #%d>", n.Kind, uint32(id)) } diff --git a/mshell/TypeParseIntegration.go b/mshell/TypeParseIntegration.go index 20979c26..1ffd8fc3 100644 --- a/mshell/TypeParseIntegration.go +++ b/mshell/TypeParseIntegration.go @@ -35,6 +35,110 @@ func (d *MShellTypeDecl) DebugString() string { func (d *MShellTypeDecl) GetStartToken() Token { return d.StartTok } func (d *MShellTypeDecl) GetEndToken() Token { return d.NameToken } +// MShellEnumDecl is a top-level `enum Name = c1 | c2 T.. | ... end` +// declaration: a generative tagged sum type. Each member is a constructor +// name followed by zero or more space-separated payload types, members are +// separated by `|`, and the body is closed by `end`. MemberPayloads is +// parallel to Members; an entry is empty for a nullary member. +type MShellEnumDecl struct { + Name string + NameToken Token + StartTok Token // the ENUM keyword + Members []string + MemberToks []Token + MemberPayloads [][]MShellParseItem +} + +func (d *MShellEnumDecl) ToJson() string { + parts := make([]string, len(d.Members)) + for i, m := range d.Members { + parts[i] = fmt.Sprintf("%q", m) + } + return fmt.Sprintf("{\"kind\": \"enumDecl\", \"name\": %q, \"members\": [%s]}", d.Name, strings.Join(parts, ", ")) +} + +func (d *MShellEnumDecl) DebugString() string { + return fmt.Sprintf("enum %s = %s end", d.Name, strings.Join(d.Members, " | ")) +} + +func (d *MShellEnumDecl) GetStartToken() Token { return d.StartTok } +func (d *MShellEnumDecl) GetEndToken() Token { + if len(d.MemberToks) > 0 { + return d.MemberToks[len(d.MemberToks)-1] + } + return d.NameToken +} + +// ParseEnumDecl handles a top-level enum declaration: +// +// enum Name = m1 | m2 T1 T2 | m3 ... end +// +// Each member is a bare identifier (LITERAL) followed by zero or more +// space-separated payload type expressions; members are separated by `|` and +// the body is closed by `end`. The `end` terminator is what makes the grammar +// whitespace-insensitive and unambiguous against the following code: a member's +// payload list runs until the next `|` or the closing `end`, so a nullary +// member can be followed by an arbitrary `(...)`/`[...]` statement without it +// being mistaken for a payload. The ENUM keyword is the current token on entry; +// on return, parser.curr is positioned past the closing `end`. +func (parser *MShellParser) ParseEnumDecl() (*MShellEnumDecl, error) { + startTok := parser.curr + parser.NextToken() // consume ENUM + if parser.curr.Type != LITERAL { + return nil, fmt.Errorf("%d:%d: expected an enum name after 'enum', got %s", + parser.curr.Line, parser.curr.Column, parser.curr.Type) + } + nameTok := parser.curr + parser.NextToken() // consume name + if parser.curr.Type != EQUALS { + return nil, fmt.Errorf("%d:%d: expected '=' in enum declaration, got %s", + parser.curr.Line, parser.curr.Column, parser.curr.Type) + } + parser.NextToken() // consume = + + // Allow an optional leading `|` so members can be written ML-style, one + // per line each prefixed with `|`. + if parser.curr.Type == PIPE { + parser.NextToken() + } + + decl := &MShellEnumDecl{Name: nameTok.Lexeme, NameToken: nameTok, StartTok: startTok} + var errs []TypeError + for { + if parser.curr.Type != LITERAL { + return nil, fmt.Errorf("%d:%d: expected an enum member name (an identifier), got %s. An enum is written `enum Name = m1 | m2 T ... end`", + parser.curr.Line, parser.curr.Column, parser.curr.Type) + } + memberTok := parser.curr + decl.Members = append(decl.Members, memberTok.Lexeme) + decl.MemberToks = append(decl.MemberToks, memberTok) + parser.NextToken() // consume member name + + // Payload types run until the next `|`, the closing `end`, or EOF. + var payloads []MShellParseItem + for parser.curr.Type != PIPE && parser.curr.Type != END && parser.curr.Type != EOF { + payloads = append(payloads, parser.parseTypePrimary(&errs)) + } + decl.MemberPayloads = append(decl.MemberPayloads, payloads) + + if parser.curr.Type == PIPE { + parser.NextToken() // consume | + continue + } + if parser.curr.Type == END { + parser.NextToken() // consume end + break + } + // EOF before `end`. + return nil, fmt.Errorf("%d:%d: expected 'end' to close the enum declaration '%s'", + parser.curr.Line, parser.curr.Column, decl.Name) + } + if len(errs) > 0 { + return nil, fmt.Errorf("enum declaration body: %s", joinTypeErrs(errs)) + } + return decl, nil +} + // MShellAsCast is a ` as ` postfix cast. type MShellAsCast struct { AsToken Token diff --git a/mshell/TypeUnify.go b/mshell/TypeUnify.go index d5b9a5b7..c7295499 100644 --- a/mshell/TypeUnify.go +++ b/mshell/TypeUnify.go @@ -158,6 +158,10 @@ func (w *typeRewriter) mapType(t TypeId, skip map[TypeVarId]struct{}) TypeId { return t } return w.arena.MakeCommand(argv, CommandCaptureMode(n.B), CommandCaptureMode(n.Extra)) + case TKEnum: + // Nominal and ground: identity is the declaration name and payloads + // carry no type variables, so there is nothing to rewrite. + return t case TKQuote: sig, changed := w.mapSig(w.arena.quoteSigs[n.Extra], skip) if !changed { @@ -342,6 +346,11 @@ func (a *TypeArena) walkTypeVars(t TypeId, visit func(TypeVarId) bool) bool { return true } } + case TKEnum: + // Enums are nominal and ground — payloads are resolved without a + // generic scope, so they never contain a type variable. Treat the + // enum as a leaf; recursing into payloads would loop forever on a + // self-referential enum (e.g. `node Tree Tree`). case TKQuote: if a.walkSigVars(a.quoteSigs[n.Extra], visit) { return true diff --git a/mshell/lsp.go b/mshell/lsp.go index a7a34377..b8d55288 100644 --- a/mshell/lsp.go +++ b/mshell/lsp.go @@ -41,6 +41,7 @@ type lspServer struct { envNames map[string]struct{} candsBuf []string stdlibDefs []MShellDefinition + stdlibItems []MShellParseItem // stdlib top-level items; `type`/`enum` decls registered per diagnostics pass builtinSigs map[string][]string // name -> formatted "(in -- out)" sigs from the type checker stdlibHover map[string][]string // name -> formatted sigs for stdlib defs } @@ -134,10 +135,11 @@ func RunLSP(in io.Reader, out io.Writer) error { envNames: make(map[string]struct{}), } - if defs, err := loadStdlibDefsForLSP(); err != nil { + if defs, items, err := loadStdlibDefsForLSP(); err != nil { logLSP(fmt.Sprintf("type-check diagnostics: stdlib unavailable (%v); proceeding without stdlib sigs", err)) } else { server.stdlibDefs = defs + server.stdlibItems = items } server.builtinSigs, server.stdlibHover = buildHoverIndex(server.stdlibDefs) @@ -181,23 +183,23 @@ func buildHoverIndex(stdlibDefs []MShellDefinition) (map[string][]string, map[st // MSHSTDLIB if set, else the version-keyed install path), parses it, // and returns its definitions. The bodies are not evaluated; we only // need the signatures to register as builtins for the type-checker. -func loadStdlibDefsForLSP() ([]MShellDefinition, error) { +func loadStdlibDefsForLSP() ([]MShellDefinition, []MShellParseItem, error) { stdlibSpec, _, err := getStartupFileSpecs(startupLoadOptions{ version: mshellVersion, allowEnvOverrides: true, }) if err != nil { - return nil, err + return nil, nil, err } source, err := os.ReadFile(stdlibSpec.path) if err != nil { - return nil, err + return nil, nil, err } parsed, err := parseMShellInput(string(source), &TokenFile{stdlibSpec.path}) if err != nil { - return nil, err + return nil, nil, err } - return parsed.Definitions, nil + return parsed.Definitions, parsed.Items, nil } func (s *lspServer) run() error { @@ -548,6 +550,7 @@ func (s *lspServer) computeDiagnostics(text string) []protocol.Diagnostic { names := NewNameTable() checker := NewChecker(arena, names) checker.RegisterStdlibSigs(s.stdlibDefs) + checker.RegisterStartupTypes(s.stdlibItems) checker.CheckProgram(file) errs := checker.Errors() diff --git a/mshell/lsp_test.go b/mshell/lsp_test.go index 689d5a33..d92d4a72 100644 --- a/mshell/lsp_test.go +++ b/mshell/lsp_test.go @@ -1328,7 +1328,7 @@ func TestCompletionWordIncludesBuiltinAndStdlib(t *testing.T) { } func TestBuildHoverIndexCoversTypedBuiltinsAndStdlib(t *testing.T) { - stdlibDefs, err := loadStdlibDefsForLSP() + stdlibDefs, _, err := loadStdlibDefsForLSP() if err != nil { t.Skipf("stdlib not available in test environment: %v", err) } diff --git a/sublime/msh.sublime-syntax b/sublime/msh.sublime-syntax index 965d4832..3f9d4a07 100644 --- a/sublime/msh.sublime-syntax +++ b/sublime/msh.sublime-syntax @@ -26,7 +26,7 @@ contexts: scope: keyword.control.msh - match: '\\*if' scope: keyword.control.msh - - match: '\\b(def|end|if|iff|loop|read|str|break|continue|else)\\b' + - match: '\\b(def|end|if|iff|loop|read|str|break|continue|else|match|enum|type)\\b' scope: keyword.control.msh - match: '\\b(and|or|not)\\b' scope: keyword.operator.word.msh @@ -48,6 +48,10 @@ contexts: numbers: - match: '\\b0[xX][0-9A-Fa-f]+\\b' scope: constant.numeric.integer.hex.msh + - match: '\\b0[oO][0-7]+\\b' + scope: constant.numeric.integer.octal.msh + - match: '\\b0[bB][01]+\\b' + scope: constant.numeric.integer.binary.msh - match: '\\b\\d+\\.\\d*(?:[eE][+-]?\\d+)?\\b' scope: constant.numeric.float.msh - match: '\\b\\d+(?:[eE][+-]?\\d+)?\\b' diff --git a/tests/fail/cyclic_render.msh b/tests/fail/cyclic_render.msh new file mode 100644 index 00000000..ba23816f --- /dev/null +++ b/tests/fail/cyclic_render.msh @@ -0,0 +1,10 @@ +# mshell is strict: a cyclic value (a container appended into itself) is a +# degenerate artifact of in-place mutation, so converting one to a string or +# JSON is an error rather than a hang. Equality and sorting on cyclic values +# still terminate (pointer-identity fast path + pair memoization). +enum Box = wrap [Box] | z end +[] x! +@x wrap e! +@x @e append drop +@e dup = str wl +@e str wl diff --git a/tests/fail/cyclic_render.msh.stderr b/tests/fail/cyclic_render.msh.stderr new file mode 100644 index 00000000..f45c549a --- /dev/null +++ b/tests/fail/cyclic_render.msh.stderr @@ -0,0 +1 @@ +10:4: Cannot convert a cyclic value (a container that contains itself) to a string. diff --git a/tests/fail/duplicate_def.msh b/tests/fail/duplicate_def.msh new file mode 100644 index 00000000..831b2e97 --- /dev/null +++ b/tests/fail/duplicate_def.msh @@ -0,0 +1,5 @@ +# Defining the same name twice is an error: definition lookup is +# first-match-wins, so the second def would be silently dead code. +def greet (-- str) "hi" end +def greet (-- str) "yo" end +greet wl diff --git a/tests/fail/duplicate_def.msh.stderr b/tests/fail/duplicate_def.msh.stderr new file mode 100644 index 00000000..74ff8249 --- /dev/null +++ b/tests/fail/duplicate_def.msh.stderr @@ -0,0 +1 @@ +4:5: Duplicate definition 'greet'; already defined at 3:5. diff --git a/tests/fail/enum_bad_payload_binding.msh b/tests/fail/enum_bad_payload_binding.msh new file mode 100644 index 00000000..26ed8441 --- /dev/null +++ b/tests/fail/enum_bad_payload_binding.msh @@ -0,0 +1,8 @@ +# An enum payload binding must be a plain name (or `_`). A nested pattern (or a +# keyword/operator token) is rejected at runtime, matching the type checker and +# the `just`/type-test binding forms. +enum Box = items [int] | z end +[1 2 3] items match + items [a b] : "matched" wl, + z : "z" wl, +end diff --git a/tests/fail/enum_bad_payload_binding.msh.stderr b/tests/fail/enum_bad_payload_binding.msh.stderr new file mode 100644 index 00000000..2682d61e --- /dev/null +++ b/tests/fail/enum_bad_payload_binding.msh.stderr @@ -0,0 +1 @@ +6:3: enum member 'items' payload bindings must be names, not '['a', 'b']'. diff --git a/tests/success/branded_maybe_match.msh b/tests/success/branded_maybe_match.msh new file mode 100644 index 00000000..6f4cd0b4 --- /dev/null +++ b/tests/success/branded_maybe_match.msh @@ -0,0 +1,28 @@ +# A `Maybe` (or enum, or any type) named via a `type` alias is a distinct brand, +# but `match` sees through the brand to the underlying type — so a branded +# Maybe matches by `just v` / `none`, a branded enum by its members, and a +# branded primitive by a type-keyword arm. The brand is nominal for typing but +# has no runtime form, matching what the runtime already does. +enum C = red | green | blue end +type MC = Maybe[C] + +red just as MC match + just v : @v str wl, + none : "n" wl, +end + +none as MC match + just v : @v str wl, + none : "n" wl, +end + +# Branded Maybe of a primitive, with a type-keyword binding inside. +type MI = Maybe[int] +7 just as MI match + just n : @n 1 + str wl, + none : "n" wl, +end + +# Branded primitive, matched with a type-keyword + binding. +type MyInt = int +5 as MyInt match int n : @n str wl, _ : "o" wl, end diff --git a/tests/success/branded_maybe_match.msh.stdout b/tests/success/branded_maybe_match.msh.stdout new file mode 100644 index 00000000..373f91b9 --- /dev/null +++ b/tests/success/branded_maybe_match.msh.stdout @@ -0,0 +1,4 @@ +red +n +8 +5 diff --git a/tests/success/enum.msh b/tests/success/enum.msh new file mode 100644 index 00000000..191a7cd1 --- /dev/null +++ b/tests/success/enum.msh @@ -0,0 +1,33 @@ +# Enum: declaration, construction, match, and payload-carrying variants. +enum Color = red | green | blue end + +green match + red : "red" wl, + green : "green" wl, + blue : "blue" wl, +end + +enum CmdResult = ok str | failed int str | timeout end + +404 "not found" failed match + ok out : @out wl, + failed c e : [@e " " @c str] "" join wl, + timeout : "timeout" wl, +end + +timeout match + ok _ : "ok" wl, + failed _ _ : "failed" wl, + timeout : "timed out" wl, +end + +# Enum used in a def signature, distinct from other enums. +def label (Color -- str) + match + red : "is red", + green : "is green", + blue : "is blue", + end +end + +blue label wl diff --git a/tests/success/enum.msh.stdout b/tests/success/enum.msh.stdout new file mode 100644 index 00000000..e8de0ada --- /dev/null +++ b/tests/success/enum.msh.stdout @@ -0,0 +1,4 @@ +green +not found 404 +timed out +is blue diff --git a/tests/success/enum_alternating_deep.msh b/tests/success/enum_alternating_deep.msh new file mode 100644 index 00000000..9ae0be88 --- /dev/null +++ b/tests/success/enum_alternating_deep.msh @@ -0,0 +1,25 @@ +# Deeply nested values must render, serialize, and compare without overflowing +# even when container kinds alternate: rendering/JSON/equality all run on one +# shared work-stack walker (renderValue / equalsIter) that expands enum, Maybe, +# list, dict, and pipe inline. Per-type iterative walkers were not enough — an +# enum→Maybe→enum chain re-entered each type's recursive method and overflowed +# the Go stack well before this depth. +enum E = m Maybe[E] | z end +z e! +0 i! +( + @i 50000 >= if break end + @e just m e! + @i 1 + i! +) loop +@e str len str wl +@e toJson len str wl +z e2! +0 i! +( + @i 50000 >= if break end + @e2 just m e2! + @i 1 + i! +) loop +@e @e2 = str wl +@e @e2 just m = str wl diff --git a/tests/success/enum_alternating_deep.msh.stdout b/tests/success/enum_alternating_deep.msh.stdout new file mode 100644 index 00000000..731afcab --- /dev/null +++ b/tests/success/enum_alternating_deep.msh.stdout @@ -0,0 +1,4 @@ +450001 +350003 +true +false diff --git a/tests/success/enum_branded_match.msh b/tests/success/enum_branded_match.msh new file mode 100644 index 00000000..a63a4a6a --- /dev/null +++ b/tests/success/enum_branded_match.msh @@ -0,0 +1,27 @@ +# An enum named via a `type` alias is a distinct branded type, but it can still +# be `match`ed by its members — just as a branded union (`type T = int | str`) +# is matched by its arms. Exhaustiveness is enforced over the members, payload +# binding works through the brand, and the brand stays nominal at call +# boundaries (an explicit `as` is needed to pass an enum where the alias is). +enum C = red | green | blue end +type Color2 = C + +red as Color2 match + red : "r" wl, + green : "g" wl, + blue : "b" wl, +end + +enum R = ok int | failed str | z end +type R2 = R +404 as int drop +5 ok as R2 match + ok n : @n str wl, + failed m : @m wl, + z : "z" wl, +end + +def paint (Color2 -- str) + match red: "is red", green: "is green", blue: "is blue", end +end +blue as Color2 paint wl diff --git a/tests/success/enum_branded_match.msh.stdout b/tests/success/enum_branded_match.msh.stdout new file mode 100644 index 00000000..c75b5130 --- /dev/null +++ b/tests/success/enum_branded_match.msh.stdout @@ -0,0 +1,3 @@ +r +5 +is blue diff --git a/tests/success/enum_dag_equality.msh b/tests/success/enum_dag_equality.msh new file mode 100644 index 00000000..064f00a6 --- /dev/null +++ b/tests/success/enum_dag_equality.msh @@ -0,0 +1,72 @@ +# Equality and ordering on enum values with shared substructure must not blow +# up: `@t @t node` reuses one subtree twice per level, so after 64 levels the +# value is a DAG with 65 nodes but 2^64 tree paths. The comparison walks skip +# pointer-identical pairs (and, past a step threshold, memoize already-expanded +# pairs), so these finish instantly; a naive structural walk would run for +# centuries. +enum T = leaf int | node T T end + +# One DAG, compared against itself / its own reference. +0 leaf t! +0 i! +( @i 64 >= if break end @t @t node t! @i 1 + i! ) loop +@t @t = str wl +@t dup = str wl +[ @t @t ] uniq len str wl +[ @t @t ] sort len str wl + +# Two DAGs built independently: no pointers are shared across the operands, so +# the pointer fast path never fires — this exercises the memoized mode. +0 leaf a! +0 i! +( @i 64 >= if break end @a @a node a! @i 1 + i! ) loop +0 leaf b! +0 i! +( @i 64 >= if break end @b @b node b! @i 1 + i! ) loop +@a @b = str wl + +# A third DAG differing at the bottom leaf: unequal, found without blowup. +1 leaf c! +0 i! +( @i 64 >= if break end @c @c node c! @i 1 + i! ) loop +@a @c = str wl +[ @a @c @b ] sort len str wl +[ @a @c @b ] uniq len str wl + +# Past the dagGuard memo cap (2^18): self-doubling pairs are deduplicated at +# push time, so independent DAGs deeper than the cap stay linear — a bounded +# memo alone cannot cover a pending-duplicate working set larger than itself. +0 leaf p! 0 leaf q! 0 i! +( + @i 300000 >= if break end + @p @p node p! + @q @q node q! + @i 1 + i! +) loop +@p @q = str wl + +# Dict-shaped self-doubling past the cap: the dict arms dedupe consecutive +# identical value pairs the same way (an enum with a dict payload closes the +# same doubling structure through {str: E}). +enum D = md {str: D} | zd end +zd u! zd v! 0 i! +( + @i 300000 >= if break end + { "l": @u, "r": @u } md u! + { "l": @v, "r": @v } md v! + @i 1 + i! +) loop +@u @v = str wl + +# Class closure: a NON-consecutive alternating sharing pattern ([x y x] per +# level) defeats push-time dedup entirely and, past any bounded memo, every +# such pattern re-explodes — so the pair memo is unbounded. Any sharing +# pattern, any container mix, any depth is polynomial in actual nodes. +[0] xa! [1] ya! [0] xb! [1] yb! 0 i! +( + @i 300000 >= if break end + [ @xa @ya @xa ] t! [ @ya @xa @ya ] ya! @t xa! + [ @xb @yb @xb ] t! [ @yb @xb @yb ] yb! @t xb! + @i 1 + i! +) loop +@xa @xb = str wl diff --git a/tests/success/enum_dag_equality.msh.stdout b/tests/success/enum_dag_equality.msh.stdout new file mode 100644 index 00000000..f7d26a30 --- /dev/null +++ b/tests/success/enum_dag_equality.msh.stdout @@ -0,0 +1,11 @@ +true +true +1 +2 +true +false +3 +2 +true +true +true diff --git a/tests/success/enum_deep_equals.msh b/tests/success/enum_deep_equals.msh new file mode 100644 index 00000000..48719b7e --- /dev/null +++ b/tests/success/enum_deep_equals.msh @@ -0,0 +1,17 @@ +# Deeply nested enum values must compare for equality without overflowing: +# `=` walks enum payloads with an explicit pair stack, not function recursion +# (mirroring `str`/`toJson`). Build two independent 50000-deep trees and a +# third that differs only at the very tip; a recursive comparator would +# overflow the stack on values this deep. +enum Tree = leaf int | node Tree Tree end +0 leaf a! 0 leaf b! 0 leaf c! 0 i! +( + @i 50000 >= if break end + @a 0 leaf node a! + @b 0 leaf node b! + @c 0 leaf node c! + @i 1 + i! +) loop +@c 0 leaf 99 leaf node node c! +@a @b = str wl +@a @c = str wl diff --git a/tests/success/enum_deep_equals.msh.stdout b/tests/success/enum_deep_equals.msh.stdout new file mode 100644 index 00000000..da29283a --- /dev/null +++ b/tests/success/enum_deep_equals.msh.stdout @@ -0,0 +1,2 @@ +true +false diff --git a/tests/success/enum_deep_json.msh b/tests/success/enum_deep_json.msh new file mode 100644 index 00000000..81175c78 --- /dev/null +++ b/tests/success/enum_deep_json.msh @@ -0,0 +1,14 @@ +# A deeply nested enum value must serialize to JSON without overflowing: +# `toJson` renders enum payloads with an explicit work stack, not function +# recursion (mirroring `str`/enum_deep_render). Build a 50000-deep tree and +# print the length of its JSON; a recursive serializer would overflow the +# stack well before this depth. +enum Tree = leaf int | node Tree Tree end +0 leaf t! +0 i! +( + @i 50000 >= if break end + @t 0 leaf node t! + @i 1 + i! +) loop +@t toJson len str wl diff --git a/tests/success/enum_deep_json.msh.stdout b/tests/success/enum_deep_json.msh.stdout new file mode 100644 index 00000000..1ca88b97 --- /dev/null +++ b/tests/success/enum_deep_json.msh.stdout @@ -0,0 +1 @@ +1250011 diff --git a/tests/success/enum_deep_render.msh b/tests/success/enum_deep_render.msh new file mode 100644 index 00000000..952cfe94 --- /dev/null +++ b/tests/success/enum_deep_render.msh @@ -0,0 +1,13 @@ +# A deeply nested enum value must stringify without overflowing: `str` renders +# enum payloads with an explicit work stack, not function recursion. Build a +# 50000-deep tree and print the length of its rendering (deterministic, and a +# recursive renderer would overflow the stack well before this depth). +enum Tree = leaf int | node Tree Tree end +0 leaf t! +0 i! +( + @i 50000 >= if break end + @t 0 leaf node t! + @i 1 + i! +) loop +@t str len str wl diff --git a/tests/success/enum_deep_render.msh.stdout b/tests/success/enum_deep_render.msh.stdout new file mode 100644 index 00000000..d9515471 --- /dev/null +++ b/tests/success/enum_deep_render.msh.stdout @@ -0,0 +1 @@ +700007 diff --git a/tests/success/enum_deep_sort.msh b/tests/success/enum_deep_sort.msh new file mode 100644 index 00000000..358fb808 --- /dev/null +++ b/tests/success/enum_deep_sort.msh @@ -0,0 +1,17 @@ +# Sorting a list of deeply nested enum values must not overflow: compareValues +# walks payloads with an explicit work stack, not recursion (mirroring `=`, +# `toJson`, and `str`). Two 50000-deep trees share their whole prefix and differ +# only at the tip, so comparing them descends the full depth. A recursive +# comparator would overflow the stack on values this deep. +enum Tree = leaf int | node Tree Tree end +0 leaf a! 0 leaf c! 0 i! +( + @i 50000 >= if break end + @a 0 leaf node a! + @c 0 leaf node c! + @i 1 + i! +) loop +@c 0 leaf 99 leaf node node c! +# Sorting is deterministic regardless of input order, and leaves 2 elements. +[@c @a] sort len str wl +[@c @a] sort [@a @c] sort = str wl diff --git a/tests/success/enum_deep_sort.msh.stdout b/tests/success/enum_deep_sort.msh.stdout new file mode 100644 index 00000000..7600dd4b --- /dev/null +++ b/tests/success/enum_deep_sort.msh.stdout @@ -0,0 +1,2 @@ +2 +true diff --git a/tests/success/enum_leading_pipe.msh b/tests/success/enum_leading_pipe.msh new file mode 100644 index 00000000..774f9993 --- /dev/null +++ b/tests/success/enum_leading_pipe.msh @@ -0,0 +1,15 @@ +# Members may be written ML-style: one per line, each prefixed with an optional +# leading `|`. +enum Suit = + | hearts + | diamonds + | clubs + | spades +end + +clubs match + hearts : "H" wl, + diamonds : "D" wl, + clubs : "C" wl, + spades : "S" wl, +end diff --git a/tests/success/enum_leading_pipe.msh.stdout b/tests/success/enum_leading_pipe.msh.stdout new file mode 100644 index 00000000..3cc58df8 --- /dev/null +++ b/tests/success/enum_leading_pipe.msh.stdout @@ -0,0 +1 @@ +C diff --git a/tests/success/enum_match_in_quote.msh b/tests/success/enum_match_in_quote.msh new file mode 100644 index 00000000..ac91746f --- /dev/null +++ b/tests/success/enum_match_in_quote.msh @@ -0,0 +1,16 @@ +# A `match` may be the body of an inferred quotation: the checker synthesizes +# the quote's input as the match subject (like any other underflow under +# inference) and pins it from the first arm that names a type — an enum member +# determines its enum, `just`/`none` determine Maybe. This is the canonical +# way to consume a list of enum values. +enum T = leaf int | node T T end +[ 1 leaf 2 leaf ] (match leaf n : @n, node a b : 0, end) map (str) map "," join wl + +enum C = red | green | blue end +[ red green blue ] (match red : true, green : false, blue : true, end) filter len str wl + +[ 5 just none ] (match just v : @v, none : 0, end) map (str) map "," join wl + +[1 2 3] (match 1 : "one", _ : "other", end) map "," join wl + +[ red green ] (match red : "r" wl, green : "g" wl, blue : "b" wl, end) each diff --git a/tests/success/enum_match_in_quote.msh.stdout b/tests/success/enum_match_in_quote.msh.stdout new file mode 100644 index 00000000..8bd43448 --- /dev/null +++ b/tests/success/enum_match_in_quote.msh.stdout @@ -0,0 +1,6 @@ +1,2 +2 +5,0 +one,other,other +r +g diff --git a/tests/success/enum_payload_typealias.msh b/tests/success/enum_payload_typealias.msh new file mode 100644 index 00000000..58c1bbef --- /dev/null +++ b/tests/success/enum_payload_typealias.msh @@ -0,0 +1,19 @@ +# An enum payload may reference a user `type` alias (here a shape), in either +# declaration order. Previously this failed with "unknown type" because enum +# payloads were resolved before `type` declarations were registered. +type Person = {name: str, age: int} +enum Record = person Person | empty end + +{ "name": "Ada", "age": 36 } as Person person match + person p : @p :name? wl, + empty : "empty" wl, +end + +# Order-independent: enum declared before the type alias it uses. +enum Cell = num Count | blank end +type Count = int + +5 as Count num match + num n : @n str wl, + blank : "blank" wl, +end diff --git a/tests/success/enum_payload_typealias.msh.stdout b/tests/success/enum_payload_typealias.msh.stdout new file mode 100644 index 00000000..1a34b8e4 --- /dev/null +++ b/tests/success/enum_payload_typealias.msh.stdout @@ -0,0 +1,2 @@ +Ada +5 diff --git a/tests/success/enum_recursive_generic.msh b/tests/success/enum_recursive_generic.msh new file mode 100644 index 00000000..51fbf6c3 --- /dev/null +++ b/tests/success/enum_recursive_generic.msh @@ -0,0 +1,10 @@ +# Regression: a self-referential enum flowing through a generic parameter +# triggers the type checker's occurs check. The checker must treat an enum as a +# leaf when scanning for type variables; otherwise it recurses into the cyclic +# payload (`node Tree Tree`) forever and overflows the stack. +enum Tree = leaf int | node Tree Tree end + +def ident (q -- q) end + +3 leaf ident drop +"ok" wl diff --git a/tests/success/enum_recursive_generic.msh.stdout b/tests/success/enum_recursive_generic.msh.stdout new file mode 100644 index 00000000..9766475a --- /dev/null +++ b/tests/success/enum_recursive_generic.msh.stdout @@ -0,0 +1 @@ +ok diff --git a/tests/success/enum_render_contexts.msh b/tests/success/enum_render_contexts.msh new file mode 100644 index 00000000..57c12293 --- /dev/null +++ b/tests/success/enum_render_contexts.msh @@ -0,0 +1,12 @@ +# An enum value renders the same way in every context: standalone, inside a +# list, and via `map` all use the member form (`red`, `leaf(3)`) with no +# `EnumName.` prefix. (A dict / toJson still use the JSON-tagged form.) +enum C = red | green | blue end +red str wl +[red green blue] str wl +[red green blue] (str) map "," join wl + +enum T = leaf int | node T T end +3 leaf str wl +[ 3 leaf 1 leaf ] str wl +1 leaf 2 leaf node str wl diff --git a/tests/success/enum_render_contexts.msh.stdout b/tests/success/enum_render_contexts.msh.stdout new file mode 100644 index 00000000..cde9f53c --- /dev/null +++ b/tests/success/enum_render_contexts.msh.stdout @@ -0,0 +1,6 @@ +red +[red green blue] +red,green,blue +leaf(3) +[leaf(3) leaf(1)] +node(leaf(1) leaf(2)) diff --git a/tests/success/enum_str_json.msh b/tests/success/enum_str_json.msh new file mode 100644 index 00000000..f470bfb3 --- /dev/null +++ b/tests/success/enum_str_json.msh @@ -0,0 +1,16 @@ +# Stringifying enum values. `str` renders the member, with payloads after the +# member name; `toJson` uses the externally-tagged convention. Nullary members +# render as the bare member name / string. +enum R = ok str | failed int str | timeout end + +"hi" ok str wl +404 "nf" failed str wl +timeout str wl + +"hi" ok toJson wl +404 "nf" failed toJson wl +timeout toJson wl + +# Nested payloads render through, member-first. +enum Tree = leaf int | node Tree Tree end +1 leaf 2 leaf node 3 leaf node str wl diff --git a/tests/success/enum_str_json.msh.stdout b/tests/success/enum_str_json.msh.stdout new file mode 100644 index 00000000..727821a0 --- /dev/null +++ b/tests/success/enum_str_json.msh.stdout @@ -0,0 +1,7 @@ +ok(hi) +failed(404 nf) +timeout +{"ok": "hi"} +{"failed": [404, "nf"]} +"timeout" +node(node(leaf(1) leaf(2)) leaf(3)) diff --git a/tests/success/enum_then_quote.msh b/tests/success/enum_then_quote.msh new file mode 100644 index 00000000..8873b7e4 --- /dev/null +++ b/tests/success/enum_then_quote.msh @@ -0,0 +1,7 @@ +# A `(...)` statement after an enum declaration is the following code, not a +# payload of the last member: the enum body is closed by `end`, so the boundary +# is unambiguous and whitespace-insensitive. +enum C = red | green | blue end + +(green) x str wl +[1 2 3] (0 >) filter len str wl diff --git a/tests/success/enum_then_quote.msh.stdout b/tests/success/enum_then_quote.msh.stdout new file mode 100644 index 00000000..50af8b94 --- /dev/null +++ b/tests/success/enum_then_quote.msh.stdout @@ -0,0 +1,2 @@ +green +3 diff --git a/tests/success/enum_union_match.msh b/tests/success/enum_union_match.msh new file mode 100644 index 00000000..1f9232f0 --- /dev/null +++ b/tests/success/enum_union_match.msh @@ -0,0 +1,31 @@ +# An enum can be a member of a `type` union, and a `match` discriminates the +# union by the enum's type name: a bare enum type name (`C`) is a type-test arm +# that matches any value of that enum, while a non-matching value (here an int, +# or a value of a different enum) falls through to the next arm. +enum C = red | green | blue end +type T = C | int + +red as T match + C : "a color" wl, + int : "an int" wl, +end + +42 as T match + C : "a color" wl, + int : "an int" wl, +end + +# A union of two enums, discriminated by each enum's type name. +enum A = a1 | a2 end +enum B = b1 | b2 end +type AB = A | B + +b1 as AB match + A : "an A" wl, + B : "a B" wl, +end + +a2 as AB match + A : "an A" wl, + B : "a B" wl, +end diff --git a/tests/success/enum_union_match.msh.stdout b/tests/success/enum_union_match.msh.stdout new file mode 100644 index 00000000..42a132b5 --- /dev/null +++ b/tests/success/enum_union_match.msh.stdout @@ -0,0 +1,4 @@ +a color +an int +a B +an A diff --git a/tests/success/equality.msh b/tests/success/equality.msh new file mode 100644 index 00000000..6484e3e2 --- /dev/null +++ b/tests/success/equality.msh @@ -0,0 +1,40 @@ +# Equality is total and structural for every value type: containers compare +# element-wise, and at runtime a type mismatch yields false rather than an +# error (comparing two different concrete types is itself a static type error, +# caught by the checker, so this matters for unions and unchecked code). + +# Lists (structural, nested, length-sensitive) +[1 2 3] [1 2 3] = str wl +[1 2 3] [1 2 4] = str wl +[[1] [2]] [[1] [2]] = str wl +[1 2] [1 2 3] = str wl + +# str / path / literal compare by text content +"foo" `foo` = str wl + +# Dicts compare structurally, independent of key order +{ "a": 1, "b": 2 } { "b": 2, "a": 1 } = str wl + +# Maybe: None==None, Just==Just (equal / unequal payloads), Just != None, and +# a Maybe nested in a list all compare structurally. +none none = str wl +"x" just "x" just = str wl +5 just 5 just = str wl +5 just 6 just = str wl +5 just none = str wl +[5 just] [5 just] = str wl + +# Enums (including Maybe[enum], the common optional-enum case) +enum C = red | green end +red red = str wl +red green = str wl +red just red just = str wl +red just green just = str wl + +# uniq now deduplicates any equatable value (lists, enums, ...) +[[1] [1] [2]] uniq len str wl +[red green red green] uniq len str wl + +# Quotations compare by identity +(1 +) dup = str wl +(1 +) (1 +) = str wl diff --git a/tests/success/equality.msh.stdout b/tests/success/equality.msh.stdout new file mode 100644 index 00000000..23ac1554 --- /dev/null +++ b/tests/success/equality.msh.stdout @@ -0,0 +1,20 @@ +true +false +true +false +true +true +true +true +true +false +false +true +true +false +true +false +2 +2 +true +false diff --git a/tests/success/grid_set_cell_mixed.msh b/tests/success/grid_set_cell_mixed.msh new file mode 100644 index 00000000..59a5dcef --- /dev/null +++ b/tests/success/grid_set_cell_mixed.msh @@ -0,0 +1,11 @@ +# gridSetCell stores the value even when its type differs from the column's +# original type: the column is promoted to mixed (generic) storage rather than +# silently dropping the value, and the other rows are preserved. +enum C = red | green end + +# int column, row 0 set to an enum; row 1 (int 2) is preserved. +[| "c" ; 1 ; 2 |] "c" 0 red gridSetCell toJson wl + +# int column receiving a string, and a string column receiving an int. +[| "n" ; 1 |] "n" 0 "X" gridSetCell toJson wl +[| "s" ; "a" |] "s" 0 99 gridSetCell toJson wl diff --git a/tests/success/grid_set_cell_mixed.msh.stdout b/tests/success/grid_set_cell_mixed.msh.stdout new file mode 100644 index 00000000..dd332a96 --- /dev/null +++ b/tests/success/grid_set_cell_mixed.msh.stdout @@ -0,0 +1,3 @@ +[{"c": "red"}, {"c": 2}] +[{"n": "X"}] +[{"s": 99}] diff --git a/tests/success/sort_structural.msh b/tests/success/sort_structural.msh new file mode 100644 index 00000000..572e8db1 --- /dev/null +++ b/tests/success/sort_structural.msh @@ -0,0 +1,23 @@ +# `sort` reorders the original elements by a total structural order and never +# changes their type (the old implementation replaced every element with a +# string, dropping enum payloads). Numbers sort numerically and stay numbers, +# enums sort by declaration order then payload, dicts by sorted key/value, and a +# mixed-type list sorts deterministically by a fixed type rank (numbers < text). + +# Numbers keep their type (sum works) and sort numerically, not lexically. +[10 2 1] sort (str) map "," join wl +[10 2 1] sort sum str wl + +# Enums sort by member declaration order (low < medium < high), not by name. +enum Priority = low | medium | high end +[high low medium high low] sort (str) map "," join wl + +# Same member: payloads break the tie. +enum Tree = leaf int | node Tree Tree end +[3 leaf 1 leaf 2 leaf] sort (str) map "," join wl + +# Dicts compare by sorted key then value. +[ { "b": 2, "a": 9 } { "a": 1, "b": 1 } { "a": 1, "b": 2 } ] sort (toJson) map " | " join wl + +# Mixed types: fixed type rank (numbers before text), deterministic. +[hello 1 'c' 'A'] sort (str) map "," join wl diff --git a/tests/success/sort_structural.msh.stdout b/tests/success/sort_structural.msh.stdout new file mode 100644 index 00000000..c22eea2b --- /dev/null +++ b/tests/success/sort_structural.msh.stdout @@ -0,0 +1,6 @@ +1,2,10 +13 +low,low,medium,high,high +leaf(1),leaf(2),leaf(3) +{"a": 1, "b": 1} | {"a": 1, "b": 2} | {"a": 9, "b": 2} +1,A,c,hello diff --git a/tests/success/sort_test.msh b/tests/success/sort_test.msh index 613e32ac..1d0f888f 100644 --- a/tests/success/sort_test.msh +++ b/tests/success/sort_test.msh @@ -1,7 +1,7 @@ -# TODO: re-enable once the sort fix (sort -> [str]) lands; with the uw -# fix ([str]) this [int | str] input fails type-check until then. -# "# Basic sort test" wl -# [hello 1 'c' 'A'] sort uw +"# Basic sort test" wl +# sort preserves element types (the int stays an int), so stringify for display. +# Across types the sort order is by a fixed type rank (numbers before text). +[hello 1 'c' 'A'] sort (str) map uw "# Unique sort test" wl [z y 'x' y z] uniq sort uw diff --git a/tests/success/sort_test.msh.stdout b/tests/success/sort_test.msh.stdout index a25f42b4..afd65e1d 100644 --- a/tests/success/sort_test.msh.stdout +++ b/tests/success/sort_test.msh.stdout @@ -1,3 +1,8 @@ +# Basic sort test +1 +A +c +hello # Unique sort test x y diff --git a/tests/success/underscore_argv.msh b/tests/success/underscore_argv.msh new file mode 100644 index 00000000..9f77ca26 --- /dev/null +++ b/tests/success/underscore_argv.msh @@ -0,0 +1,7 @@ +# A lone `_` is the match wildcard, but it remains usable as a bare argv word +# (the literal string "_") inside a list. The two roles coexist. +[echo _ arg _] ; + +5 match + _ : "wild" wl, +end diff --git a/tests/success/underscore_argv.msh.stdout b/tests/success/underscore_argv.msh.stdout new file mode 100644 index 00000000..518b4723 --- /dev/null +++ b/tests/success/underscore_argv.msh.stdout @@ -0,0 +1,2 @@ +_ arg _ +wild diff --git a/tests/success/uniq_enum.msh b/tests/success/uniq_enum.msh new file mode 100644 index 00000000..6d4fe954 --- /dev/null +++ b/tests/success/uniq_enum.msh @@ -0,0 +1,9 @@ +# `uniq` accepts any value type (matching its `([t] -- [t])` signature) and +# deduplicates by structural equality, so a list of enums dedupes instead of +# throwing at runtime. First-occurrence order is preserved. +enum C = red | green | blue end + +[red green red blue green red] uniq (str) map "," join wl + +# Other equatable values dedupe too (previously a runtime error). +[true false true true] uniq len str wl diff --git a/tests/success/uniq_enum.msh.stdout b/tests/success/uniq_enum.msh.stdout new file mode 100644 index 00000000..41966831 --- /dev/null +++ b/tests/success/uniq_enum.msh.stdout @@ -0,0 +1,2 @@ +red,green,blue +2 diff --git a/tests/typecheck_fail/branded_maybe_nonexhaustive.msh b/tests/typecheck_fail/branded_maybe_nonexhaustive.msh new file mode 100644 index 00000000..d1fb35bb --- /dev/null +++ b/tests/typecheck_fail/branded_maybe_nonexhaustive.msh @@ -0,0 +1,7 @@ +# Exhaustiveness is enforced through a `type` alias of a Maybe: a branded Maybe +# match must still cover both `just` and `none` (or use `_`). Here `none` is +# missing, so it is rejected — the brand does not hide the cases. +type MI = Maybe[int] +5 just as MI match + just v : @v str wl, +end diff --git a/tests/typecheck_fail/duplicate_def.msh b/tests/typecheck_fail/duplicate_def.msh new file mode 100644 index 00000000..0455839f --- /dev/null +++ b/tests/typecheck_fail/duplicate_def.msh @@ -0,0 +1,4 @@ +# A name defined twice in one file is rejected by the checker. +def greet (-- str) "hi" end +def greet (-- str) "yo" end +greet wl diff --git a/tests/typecheck_fail/duplicate_def_stdlib.msh b/tests/typecheck_fail/duplicate_def_stdlib.msh new file mode 100644 index 00000000..c8015faf --- /dev/null +++ b/tests/typecheck_fail/duplicate_def_stdlib.msh @@ -0,0 +1,4 @@ +# Redefining a name already defined by the standard library is rejected: +# lookup is first-match-wins, so this def could never take effect. +def id (q -- q) end +3 id drop diff --git a/tests/typecheck_fail/enum_branded_nonexhaustive.msh b/tests/typecheck_fail/enum_branded_nonexhaustive.msh new file mode 100644 index 00000000..7be9ae0c --- /dev/null +++ b/tests/typecheck_fail/enum_branded_nonexhaustive.msh @@ -0,0 +1,10 @@ +# Exhaustiveness is enforced through a `type` alias of an enum: matching a +# branded enum must still cover every member (or use `_`). Here `blue` is +# missing, so the match is rejected — the brand does not hide the members. +enum C = red | green | blue end +type Color2 = C + +red as Color2 match + red : "r" wl, + green : "g" wl, +end diff --git a/tests/typecheck_fail/enum_distinct.msh b/tests/typecheck_fail/enum_distinct.msh new file mode 100644 index 00000000..b43f407e --- /dev/null +++ b/tests/typecheck_fail/enum_distinct.msh @@ -0,0 +1,11 @@ +# Two enums are nominally distinct even with parallel members: a value of one +# cannot be used where the other is expected. +enum A = a1 | a2 end +enum B = b1 | b2 end + +def takesA (A -- str) + c! + "ok" +end + +b1 takesA wl diff --git a/tests/typecheck_fail/enum_empty_match.msh b/tests/typecheck_fail/enum_empty_match.msh new file mode 100644 index 00000000..2c58cdb1 --- /dev/null +++ b/tests/typecheck_fail/enum_empty_match.msh @@ -0,0 +1,5 @@ +# An empty match covers no members and has no wildcard, so it can never fire +# and always crashes at runtime. It must be rejected as non-exhaustive. +enum Color = red | green | blue end + +red match end diff --git a/tests/typecheck_fail/enum_match_in_quote_nonexhaustive.msh b/tests/typecheck_fail/enum_match_in_quote_nonexhaustive.msh new file mode 100644 index 00000000..0b96d55d --- /dev/null +++ b/tests/typecheck_fail/enum_match_in_quote_nonexhaustive.msh @@ -0,0 +1,4 @@ +# Exhaustiveness is enforced inside inferred quotations too: the subject is +# pinned to the member's enum, so a match that omits a member is rejected. +enum C = red | green | blue end +[ red ] (match red : 1, green : 2, end) map drop diff --git a/tests/typecheck_fail/enum_member_def_collision.msh b/tests/typecheck_fail/enum_member_def_collision.msh new file mode 100644 index 00000000..7c3a68a3 --- /dev/null +++ b/tests/typecheck_fail/enum_member_def_collision.msh @@ -0,0 +1,8 @@ +# Enum constructors and defs share the word namespace, so a def reusing a +# member name must be rejected (in either textual order — enums register +# first regardless). Without this, the checker resolves the word to the +# constructor while the runtime runs the def, and a type-checked program +# fails at runtime. +enum E = foo | z end +def foo ( -- int) 42 end +foo str wl diff --git a/tests/typecheck_fail/enum_nonexhaustive.msh b/tests/typecheck_fail/enum_nonexhaustive.msh new file mode 100644 index 00000000..fce677a0 --- /dev/null +++ b/tests/typecheck_fail/enum_nonexhaustive.msh @@ -0,0 +1,8 @@ +# A match on an enum that omits a member (and has no wildcard) is +# non-exhaustive and must be rejected. +enum Color = red | green | blue end + +red match + red : "r" wl, + blue : "b" wl, +end diff --git a/tests/typecheck_fail/enum_underscore_member.msh b/tests/typecheck_fail/enum_underscore_member.msh new file mode 100644 index 00000000..772e8dcc --- /dev/null +++ b/tests/typecheck_fail/enum_underscore_member.msh @@ -0,0 +1,10 @@ +# `_` is the wildcard token, reserved as a name, so it cannot be an enum +# member. (Previously it was accepted and then mis-dispatched at runtime, +# because the checker treated `_` as a member while the matcher treated it as +# the catch-all wildcard.) +enum C = _ | red end + +red match + _ : "u" wl, + red : "r" wl, +end