Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
0ca5b0b
Initial enum commit
mitchpaulus Jun 29, 2026
27821a8
More Enum
mitchpaulus Jun 29, 2026
15782c0
Enum: payload stringification + crash/soundness fixes
mitchpaulus Jun 30, 2026
8bc9c0b
Enum: fix payload paren swallowing the following statement
mitchpaulus Jun 30, 2026
dde7f95
Enum: switch to `end`-terminated syntax, drop payload parens
mitchpaulus Jun 30, 2026
52bfa28
Enum: support optional leading `|` (ML-style member lists)
mitchpaulus Jun 30, 2026
d3a1835
Make a lone `_` its own token (the wildcard), reserved as a name
mitchpaulus Jun 30, 2026
80be0a9
Enum: let payloads reference user `type` aliases
mitchpaulus Jun 30, 2026
67b3f55
Fix uniq to accept any value type, not just primitives
mitchpaulus Jun 30, 2026
bdd26bd
Make equality total and structural for all value types
mitchpaulus Jun 30, 2026
6ce234f
Fix gridSetCell silently dropping type-mismatched values
mitchpaulus Jun 30, 2026
a4b24a0
Enum robustness: stack-safe deep values, union match, structural sort
mitchpaulus Jul 1, 2026
8abcaee
Enum match: reject non-name payload bindings at runtime
mitchpaulus Jul 1, 2026
99ac5b1
Enum match: see through a `type` alias (brand) to the enum
mitchpaulus Jul 1, 2026
a222e3a
Merge remote-tracking branch 'origin/main' into enum-types
mitchpaulus Jul 1, 2026
556b382
Enum: drop the EnumName prefix from DebugString
mitchpaulus Jul 1, 2026
622ce23
Fix Maybe equality: accept *Maybe, so None==None and Just==Just work
mitchpaulus Jul 1, 2026
3d6f3f9
Match: unwrap a type-alias brand on the subject once, for all pattern…
mitchpaulus Jul 1, 2026
3413d03
Fix exponential blowup comparing enum values with shared substructure
mitchpaulus Jul 2, 2026
8087f6c
Enum: reject a def whose name collides with an enum member
mitchpaulus Jul 2, 2026
c441027
Register startup-file type/enum declarations in both runtime and checker
mitchpaulus Jul 2, 2026
debe03e
Unify render/JSON/equality into shared iterative walkers over all con…
mitchpaulus Jul 2, 2026
0d78c14
Error on rendering cyclic values instead of hanging
mitchpaulus Jul 2, 2026
da5f6a9
Infer the match subject inside quotations
mitchpaulus Jul 3, 2026
88d8de3
Dedupe identical pairs at push time: fix exponential cliff past the m…
mitchpaulus Jul 3, 2026
e7658a7
Dedupe identical pairs in the dict comparison arms too
mitchpaulus Jul 3, 2026
6341f99
Make the comparison pair-memo unbounded: end the DAG blowup class glo…
mitchpaulus Jul 3, 2026
2a326e3
Collapse the pointer-kind switches and drop the superseded push-dedup
mitchpaulus Jul 3, 2026
2e2f4b6
Error on duplicate definition names instead of silently ignoring them
mitchpaulus Jul 3, 2026
3b76860
PR polish: CHANGELOG gaps and a comment overclaim
mitchpaulus Jul 3, 2026
fc7ba60
Syntax highlighting: cover the enum keyword and new int literals
mitchpaulus Jul 3, 2026
b28b00a
VS Code grammar: highlight keywords, word operators, types, numbers
mitchpaulus Jul 3, 2026
7b7e03e
Bring the enum design doc up to the shipped design
mitchpaulus Jul 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,65 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## Unreleased

### Changed

- Breaking: defining a name that is already defined is now an error, at runtime
and in the type checker. This covers a second `def` in the same file, a
script `def` whose name is already taken by the standard library or an init
file, and an interactive redefinition. Definition lookup is first-match-wins,
so a duplicate never took effect anyway — it was silently dead code while the
first definition kept running; the error makes that visible. The message
reports both positions: `Duplicate definition 'id'; already defined at
lib/std.msh:62:5.`

### Fixed

- `gridSetCell` no longer silently drops a value whose type differs from the
column's original type (e.g. setting a string, enum, or bool into an int
column). The column is promoted to mixed storage so the value is stored, and
the other rows are preserved.
- Equality (`=`) is now total and defined for every value type. Lists,
quotations, pipes, and grids previously raised "equality not defined" at
runtime; lists/pipes/grids now compare structurally (element- and cell-wise)
and quotations compare by identity. Comparing values of different runtime
types yields `false` rather than an error (a genuinely incompatible
comparison is already a static type error), so the result no longer depends
on operand order and union members like `int | null` compare cleanly.
- Converting a cyclic value (a container appended into itself) with `str` or
`toJson` now fails with a clear error instead of hanging forever. Internal
rendering (error messages, stack dumps) prints a `<cycle>` marker at the
back-reference instead.
- A `match` used as the body of an inferred quotation (e.g.
`(match leaf n : @n, node a b : 0, end) map`) now type-checks; it previously
always failed with "stack underflow at 'match'", rejecting the canonical way
to consume enums and Maybe values inside `map`/`filter`/`each`.
- `uniq` now accepts a list of any value type (matching its `([t] -- [t])`
signature) and deduplicates by structural equality, instead of throwing at
runtime for non-primitive elements such as enums, dicts, and booleans.
- `sort` now reorders the original elements and preserves their type, instead of
replacing every element with its string form. Previously `[10 2 1] sort` gave
the strings `1 10 2` (lexical order), and sorting a list of enums silently
dropped their payloads; now numbers sort numerically and stay numbers, and
every value keeps its type. Ordering is a total structural order: numbers
numerically, text lexically, lists positionally, dicts by sorted key/value,
enums by declaration order then payload, and different types by a fixed type
rank. (Use `sortV` for version/string sorting.)

### Added

- `enum` declarations: a generative tagged sum type. Members are separated by
`|` and the declaration is closed by `end` (like `def`/`if`/`match`):
`enum CmdResult = ok str | failed int str | timeout end`. A member is a bare
constructor name optionally followed by payload types; a bare member word
constructs a value (consuming any payload from the stack, e.g.
`404 "x" failed`), and `match` dispatches on members with binding
(`failed code msg : ...`) and exhaustiveness checking — a match that omits a
member (or is empty) is rejected unless it has a `_` arm. Enums are nominal:
two enums with the same members are distinct types. Member names are
identifiers (not keywords) and are unique across all enums. `str` renders a
value as `member(payload ...)` and `toJson` uses the externally-tagged
convention. Payloads may reference the enum itself, so recursive enums like
`enum Tree = leaf int | node Tree Tree end` are supported.
- Octal, hexadecimal, and binary integer literals via `0o`, `0x`, and `0b`
prefixes (case-insensitive), e.g. `0o644`, `0xFF`, `0b101`. The base is purely
a way of writing the literal; the value is an ordinary integer and prints in
Expand Down
139 changes: 139 additions & 0 deletions ai/enum_implementation_plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Enum (generative tagged sum type) — implementation plan

Companion to `design/literal_or_enum_typing.html` (the design + rationale). This is the
file-by-file build plan. Plans live here in `ai/`; the design lives in `design/`.

**Status: implemented on branch `enum-types`** — nullary + payload-carrying members,
construction, nominal distinctness, and `match` (member dispatch, payload binding,
exhaustiveness) all ship in one PR. Payloads use a parenthesized list (`member(T..)`)
rather than the space-separated form originally sketched, because mshell has no statement
terminator and space-separated payloads are ambiguous against following code.
Out of scope, as agreed: derived `decode`/`encode`/`values`, backing strings, qualified
`Enum.member` names, generics, and `Result` (Maybe suffices). JSON stays a structural union.

## Scope & non-goals

In scope (a generative tagged sum type declared with `enum`, inline `= a | b | c`):

- `enum Name = c1 | c2 | ...` and `enum Name = c1 t.. | c2 t.. | ...`.
- Constructors are case-free; produced only by a constructor word or `decode` (**Position 1** —
no implicit coercion, `str as Enum` rejected).
- `match` over members with exhaustiveness; payload binding reuses the `just v` path.

Explicit non-goals (per the design + owner direction):

- **No `decode` / `encode` / `values` derived functions** in v1. Reading config back is handled at
the use site with `match` (or whatever fits). This removes the wire/serialization surface.
- **No backing strings** (`member = "wire"`) in v1 — they only existed to feed `decode`/`encode`.
The member's own name is its identity. (Easy to add later when serialization is wanted.)
- **No qualified `Enum.member` / `Enum.method` dispatch** in v1. Members are referenced by bare name,
resolved by context; member names are unique across enums (collision is a declaration error). This
removes the `.`-lexing / qualified-dispatch unknown entirely.
- **No `Result` type.** `Maybe` already covers the common case.
- **No change to JSON typing.** `JsonScalar` / `Json` stay *structural unions* — their variants are
distinguishable by structural type, so they do not need tags. Enums are only for cases structure
cannot discriminate (e.g. two variants with the same payload type) and for closed config sets.
- **No generic enums** (`Enum[t, e]`) in v1.
- **No `?`-propagation sugar** in v1.
- A *checked* `"GET" as Method` stays deferred (it needs literal/singleton types).

## Phasing

Two PRs. Phase 1 (nullary enums) is now small — declaration, construction, match — with **no
serialization surface and no qualified names**. Phase 2 adds payload variants and the tagged runtime
value where `Evaluator.go` gets touched substantially.

---

## Phase 1 — Nullary enums

`enum Mode = read | write | readwrite`, match-by-member + exhaustiveness, Position 1. The full v1
surface is: declare, construct (bare member word), and `match`. No `decode`/`encode`/`values`, no
backing strings, no qualified names.

The runtime value just needs to carry **which member it is** (enum `NameId` + member `NameId`). A
lightweight value suffices; no new heavy `MShellObject` is required for Phase 1.

### Type system

1. **`Lexer.go`** — add an `ENUM` keyword token (via `literalOrKeywordType`). Audit existing user
identifiers/usages of `enum`.
2. **`TypeParseIntegration.go`** — add `MShellEnumDecl` (parallel to `MShellTypeDecl`) and
`ParseEnumDecl`: `enum` Name `=` member (`|` member)*, where a member is a bare `LITERAL`. No
backing clause.
3. **`Parser.go`** — add `case ENUM:` beside `case TYPE:` (≈ line 677) dispatching to `ParseEnumDecl`.
4. **`Type.go`** — add `TKEnum` kind + a variants side table (`[]EnumVariant{Name NameId; Payload
[]TypeId}`; Payload empty in Phase 1), `MakeEnum`, hashconsing key, and accessors. Nominal
identity = the declaration `NameId` (two `enum`s with identical members stay distinct, like brands).
5. **`Type.go` / `TypeUnify.go`** — extend `walkTypeVars` and `typeRewriter.mapType` with a `TKEnum`
arm (recurse payload types; none in Phase 1). `unify` (`TypeChecker.go`): `TKEnum` unifies only
with the same enum (by name).
6. **Constructors as words** — register each member as a nullary sig `( -- Mode)`. Members live in a
**global constructor namespace**; a member name duplicated across two enums is a declaration error
in v1 (no qualification to disambiguate yet). A bare member word resolves to its enum; where an
expected enum type is in context (match subject, sig slot) that pins it.
7. **Pre-pass registration** — mirror `DeclareType` registration (`TypeCheckProgram.go:99-101`):
collect `enum` headers with placeholder TypeIds, resolve bodies, register constructor words,
detect cross-enum member-name collisions.
8. **Match** — `analyzeTokenPattern` (`TypeCheckProgram.go:1381`): an enum member name is a
recognized pattern that **credits coverage** against the enum's closed set (flip the
"value literals credit no coverage" behavior at `:1402` for enum subjects). `TypeBranch.go`:
exhaustiveness over the member set; narrowing (subject known to be that member in the arm).

### Runtime (`Evaluator.go`)

9. A lightweight enum value (enum + member ids). Constructor evaluation pushes it; member-pattern
matching extends the `matchTokenPattern` path that already handles `none`/type keywords near
`:1117`; plus equality, `DebugString`, `ToJson`.

### Docs / housekeeping

10. `doc/type_system.inc.html` + `doc/mshell.md` (rebuild with `cd doc; msh build.msh`).
11. `CHANGELOG.md` → Unreleased / Added.
12. `lib/std.msh` completions, in the documented Vim-fold pattern.
13. Tests: `tests/` (+ `typecheck_test.sh`) and `mshell/ go test`. Cover: decl parse, construct,
match exhaustive (no `_`), non-exhaustive rejected, member narrowing, `str as Enum` rejected,
two enums with same members stay distinct, duplicate member name across enums rejected.

---

## Phase 2 — Payload-carrying variants

`enum CmdResult = ok str | failed int str | timeout`. Adds:

1. **Parser** — arms parse a constructor name followed by payload type exprs (reuse
`parseTypeExpr` productions for each payload).
2. **`Type.go`** — `EnumVariant.Payload` populated; payload types flow through hashconsing and the
rewriter arms added in Phase 1.
3. **Constructors with payloads** — `failed : (int str -- CmdResult)`, postfix, consume from the
stack like `5 just`.
4. **Runtime value** — a new `MShellObject` generalizing `Maybe`: `{ enum NameId; tag; payload
[]MShellObject }`. `Maybe` is the proven two-variant precedent; follow its equality/`DebugString`/
`ToJson` shape. Phase-1 nullary values fold in as the empty-payload case.
5. **Match payload binding** — extend the `just v`-style binding (`TypeCheckProgram.go:1348`,
`Evaluator.go:1055`) to N payloads: `failed c e : ...` binds `c`, `e`.
6. **Recursive enums** — already work via the placeholder-TypeId pre-pass.
7. Docs / changelog / completions / tests as above (payload construct + destructure + recursive
enum + exhaustiveness with payloads).

(Serialization helpers — `decode`/`encode`/backing strings — remain out of scope until a concrete
need appears; config reads are handled with `match` at the use site.)

---

## Process

- New feature branch before any code (per `CLAUDE.md`).
- Build in `mshell/` (`go build -o ...`, in-repo cache if needed) before testing.
- `gofmt` only with explicit permission.
- `CHANGELOG.md` for user-facing additions; `mshell/BuiltInList.go` kept in sync if builtins added.

## Decisions still to nail before coding Phase 1

None blocking. The former unknowns (qualified-name dispatch, backing defaults, decode/encode
delivery) are all dropped from v1 scope above. Remaining small calls can be made during the build:

- Exact lexical home for the lightweight runtime enum value (new `MShellObject` vs. reuse).
- Whether a bare member word with **no** expected-type context (e.g. stored straight into a var) is
allowed (resolves via the global member namespace) or requires a context — default: allowed, since
member names are unique across enums in v1.
40 changes: 40 additions & 0 deletions code/syntaxes/mshell.textmate.json
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,46 @@
{
"name": "variable.other.set.mshell",
"match": "@[a-zA-Z0-9_]+"
},
{
"name": "keyword.control.mshell",
"match": "else\\*|\\*if"
},
{
"name": "keyword.control.mshell",
"match": "\\b(def|end|if|iff|loop|read|str|break|continue|else|match|enum|type)\\b"
},
{
"name": "keyword.operator.word.mshell",
"match": "\\b(and|or|not)\\b"
},
{
"name": "keyword.other.mshell",
"match": "\\bsoe\\b"
},
{
"name": "storage.type.mshell",
"match": "\\b(int|float|bool)\\b"
},
{
"name": "constant.numeric.integer.hex.mshell",
"match": "\\b0[xX][0-9A-Fa-f]+\\b"
},
{
"name": "constant.numeric.integer.octal.mshell",
"match": "\\b0[oO][0-7]+\\b"
},
{
"name": "constant.numeric.integer.binary.mshell",
"match": "\\b0[bB][01]+\\b"
},
{
"name": "constant.numeric.float.mshell",
"match": "\\b\\d+\\.\\d*(?:[eE][+-]?\\d+)?\\b"
},
{
"name": "constant.numeric.integer.mshell",
"match": "\\b\\d+(?:[eE][+-]?\\d+)?\\b"
}
],
"repository": {
Expand Down
Loading