From a07a2133235d16e3c54bbfeafbec9b0c03a1ad2f Mon Sep 17 00:00:00 2001 From: Hyper66666 <2247081184@qq.com> Date: Wed, 10 Jun 2026 14:00:48 +0000 Subject: [PATCH] openspec: add codegen-ir-correctness-and-gate change proposal --- .../.openspec.yaml | 2 + .../codegen-ir-correctness-and-gate/design.md | 126 ++++++++++++++++++ .../proposal.md | 107 +++++++++++++++ .../codegen-ir-correctness-and-gate/spec.md | 114 ++++++++++++++++ .../codegen-ir-correctness-and-gate/tasks.md | 89 +++++++++++++ 5 files changed, 438 insertions(+) create mode 100644 openspec/changes/codegen-ir-correctness-and-gate/.openspec.yaml create mode 100644 openspec/changes/codegen-ir-correctness-and-gate/design.md create mode 100644 openspec/changes/codegen-ir-correctness-and-gate/proposal.md create mode 100644 openspec/changes/codegen-ir-correctness-and-gate/specs/codegen-ir-correctness-and-gate/spec.md create mode 100644 openspec/changes/codegen-ir-correctness-and-gate/tasks.md diff --git a/openspec/changes/codegen-ir-correctness-and-gate/.openspec.yaml b/openspec/changes/codegen-ir-correctness-and-gate/.openspec.yaml new file mode 100644 index 0000000..2cb8041 --- /dev/null +++ b/openspec/changes/codegen-ir-correctness-and-gate/.openspec.yaml @@ -0,0 +1,2 @@ +schema: spec-driven +created: 2026-06-10 diff --git a/openspec/changes/codegen-ir-correctness-and-gate/design.md b/openspec/changes/codegen-ir-correctness-and-gate/design.md new file mode 100644 index 0000000..7c60472 --- /dev/null +++ b/openspec/changes/codegen-ir-correctness-and-gate/design.md @@ -0,0 +1,126 @@ +# Design: Codegen IR Correctness And Gate Hardening + +## 1. Problem + +`core-language-correctness` is checked off and CI is green, but the shipping +`sgc` CLI cannot compile the documented core forms on the toolchain this repo +pins for developers. The defects are in codegen IR typing, in the match parser, +and — critically — in the conformance gate that was supposed to catch them. + +## 2. Observed Failures (Reproductions) + +All reproduced with the release `sgc` CLI on `clang 14.0.0` (the version the +repo's environment blueprint installs). `rustc 1.94.0` per `rust-toolchain.toml`. + +| # | Form | Command | Observed | +| --- | --- | --- | --- | +| 1 | Array index | `sgc run examples/04_array.sg` | `examples/build/04_array.ll:36: error: '%u_4' defined with type '[3 x i64]*' but expected 'i64*'` | +| 2 | Array write/for | `sgc run examples/05_loop.sg`, `examples/conformance/03_array_write.sg` | same element-pointer mismatch | +| 3 | Closure capture | `sgc run examples/06_lambda.sg`, `examples/conformance/04_closure_multi_capture.sg` | `'%u_5' defined with type '[1 x i64]*' but expected 'i64*'` | +| 4 | Enum-returning fn | function with `-> EnumType` | `@get` defined with type `{ i64, [8 x i8] } (i64)*` but expected `i64 (i64)*` | +| 5 | Multi-payload match | arm `Variant(b) => ...` followed by any arm | `parse error: invalid pattern: expected identifier` | +| 6 | Local gate | `cargo test -p sgc core_conformance_examples_compile_link_and_run` | FAILED (item 1 inside the test) | + +Generated IR for item 1: + +```llvm +%u_4 = alloca [3 x i64] ; %u_4 : [3 x i64]* +%u_4.elem.0 = getelementptr i64, i64* %u_4, i64 0 ; operand stated as i64*, mismatched +store i64 %t_1, i64* %u_4.elem.0 +%t_6 = getelementptr i64, i64* %u_4, i64 %t_5 +%t_7 = load i64, i64* %t_6 +``` + +## 3. Root Cause + +### 3.1 Typed-pointer inconsistency (items 1–4) + +Codegen states pointer operand types (`i64*`) that disagree with the type the +SSA value was defined with (`[3 x i64]*`, or an aggregate function type). + +- Under **typed pointers** (LLVM <= 14): a hard verifier error. +- Under **opaque pointers** (LLVM >= 15, every pointer is `ptr`): the pointee + type is dropped, the mismatch disappears, and the `getelementptr i64, ptr ...` + form happens to compute the correct address — so it compiles and runs. + +CI runs on `clang-19` (opaque pointers), so it never sees the error. The pinned +developer `clang-14` (typed pointers) rejects it. + +Two acceptable fixes; the implementation MUST pick one and apply it +consistently: + +1. **Decay to the correct pointee type.** For `alloca [N x T]` producing + `[N x T]*`, compute an element pointer with + `getelementptr [N x T], [N x T]* %p, i64 0, i64 ` (yielding a real `T*`), + then index/load/store at `T*`. Apply the analogous decay for closure-captured + slots and aggregate results. +2. **Commit to opaque pointers.** Emit `ptr` uniformly (target a pinned LLVM that + supports opaque pointers) so the pointee type is never part of the operand. + +Either way the invariant is: *the type a value is defined with and the type it is +used with agree, and the IR passes the verifier of the pinned toolchain.* + +### 3.2 Match-arm parser defect (item 5) + +The match-arm parser does not correctly terminate a payload pattern +`Variant(bindings)` and resynchronize to the next `Pattern => Expr` arm. When a +payload arm is not last, parsing the following arm starts mid-pattern and fails +with `expected identifier`. The committed examples always place the single +payload arm last, so the path is never exercised. + +### 3.3 Gate blind spots (item 6) + +`tools/sgc` `core_conformance_examples_compile_link_and_run` calls the in-crate +`#[cfg(test)] compile_source()` helper, which routes through `Codegen` directly, +and links with whatever `clang` is on the runner. It does not invoke the shipping +`sgc` CLI, and it pins no LLVM contract. So it can pass while `sgc run` fails, and +while the pinned developer toolchain rejects the IR. + +## 4. Approach + +1. **IR typing fix** in `compiler/src/codegen/` for array places, closure + captures, and aggregate function results, holding the §3.1 invariant. +2. **Parser fix** in `compiler/src/parser/` so a payload-binding arm parses in any + position; add accepted/rejected parser tests. +3. **Gate hardening** in `tools/sgc`: the conformance harness shells out to the + built `sgc` binary (`build`/`run`) for each pinned core form and asserts exit + code / stdout; add new forms (multi-payload match, enum-returning fn). +4. **Toolchain contract**: declare the minimum `clang`/LLVM version and the + opaque-pointer expectation; pin the CI `clang` and align the developer + blueprint; emit a clear diagnostic when the toolchain is below contract. + +## 5. Toolchain Contract + +- The native backend targets a pinned LLVM/`clang` major version (>= the first + version whose behavior the gate validates). CI and the developer blueprint use + the same major version. +- If §3.1 fix (1) (typed-pointer decay) is chosen, the IR additionally remains + valid on older typed-pointer toolchains; fix (2) sets a hard `clang >= 15` + floor. The chosen floor is documented and enforced. + +## 6. Verification Strategy + +- The conformance gate runs the **real `sgc` CLI** for every pinned core form on + the pinned toolchain in CI; a wrong exit code, link failure, or IR verifier + error fails the build. +- `cargo test -p sgc core_conformance_examples_*` passes on the pinned toolchain + with the harness driving the CLI. +- New regression cases: array read/write/for, single- and multi-capture closure, + enum value as argument **and** as return value, and a match with two or more + payload-carrying arms. +- `cargo test` is green on Linux; parser tests cover payload arms in first, + middle, and last positions. + +## 7. Risks And Trade-offs + +- **Backend churn.** Touching array/closure/aggregate lowering risks regressions + in async aggregate results (the SysV ABI work in `core-language-correctness`). + Mitigation: keep the existing aggregate-result tests and add the enum-return + case alongside them. +- **Opaque vs typed pointers.** Choosing opaque pointers (fix 2) is simpler but + sets a hard `clang >= 15` floor; typed-pointer decay (fix 1) is more portable + but touches more emission sites. The design allows either; the gate enforces + whichever floor is declared. +- **Slower gate.** Driving the real CLI is slower than the in-crate helper. + Mitigation: keep the pinned core-form set minimal and run it as a dedicated CI + job. diff --git a/openspec/changes/codegen-ir-correctness-and-gate/proposal.md b/openspec/changes/codegen-ir-correctness-and-gate/proposal.md new file mode 100644 index 0000000..efb8e6c --- /dev/null +++ b/openspec/changes/codegen-ir-correctness-and-gate/proposal.md @@ -0,0 +1,107 @@ +## Why + +The `core-language-correctness` change marked arrays, closures, and enum values +as delivered, and CI is green. But building the toolchain and exercising the +**real `sgc` CLI** (`sgc run` / `sgc build`, the textual-LLVM-IR-plus-`clang` +path a user actually invokes) on the toolchain this repo pins for developers +(`clang-14`) shows that those core forms still fail to compile: + +- Native array index/iteration emits `getelementptr i64, i64* %u_4` while `%u_4` + is `[3 x i64]*` (the `alloca`). `examples/04_array.sg`, `examples/05_loop.sg`, + and `examples/conformance/03_array_write.sg` fail with + `'%u_4' defined with type '[3 x i64]*' but expected 'i64*'`. +- Environment-capturing closures emit the same element-pointer mismatch. + `examples/06_lambda.sg` and `examples/conformance/04_closure_multi_capture.sg` + fail with `'%u_5' defined with type '[1 x i64]*' but expected 'i64*'`. +- A function that returns an enum value emits a function pointer whose type + disagrees with its call site: `@get` defined with type + `{ i64, [8 x i8] } (i64)*` but expected `i64 (i64)*`. + +The emitted IR is **not type-consistent under typed pointers**: it only compiles +because opaque-pointer LLVM (`clang >= 15`, where every pointer is `ptr`) ignores +the pointee type. CI runs on `clang-19`, so it passes; the pinned developer +`clang-14` rejects it, and a local `cargo test -p sgc +core_conformance_examples_compile_link_and_run` **fails**. + +The gate hides this for two reasons: (1) it compiles through the in-crate +`#[cfg(test)]` `compile_source()` helper instead of driving the shipping `sgc` +CLI, and (2) it validates under whatever single `clang` the runner happens to +have, with no pinned/declared LLVM contract. So the gate can stay green while the +compiler is broken for real users. + +Separately, the parser mis-handles multi-arm enum matches: a payload-binding arm +(`Variant(bindings)`) only parses when it is the **last** arm. Any arm after it +fails with `parse error: invalid pattern: expected identifier`. Every committed +conformance example places its single payload arm last, so the gate stays green +while idiomatic multi-variant matches do not parse. + +This change owns making the documented core forms produce IR that compiles and +runs on the toolchain the project pins, fixing the multi-payload match parse, +and hardening the conformance gate so it can no longer go green while the +shipping compiler is broken. + +## What Changes + +- Emit **type-consistent LLVM IR** for array places, closure-captured slots, and + aggregate (enum/struct) function results, so the documented core forms compile + under the project's pinned LLVM contract (opaque pointers) and do not depend on + the runner's incidental `clang` version. Either decay array/aggregate places to + the correct pointee type before `getelementptr`, or emit opaque `ptr` operands + consistently; in both cases the IR a value is defined with and the IR it is + used with MUST agree. +- Fix the **multi-payload match parser bug**: a payload-binding match arm + (`Variant(bindings) => ...`) SHALL parse in any position, including when + followed by further arms, so multi-variant matches with multiple + payload-carrying arms compile and run. +- Make the **conformance gate drive the real `sgc` CLI**: the gate compiles, + links, and runs each pinned core form through `sgc build` / `sgc run` (the + shipping driver), not the in-crate `compile_source()` test helper, and asserts + the documented exit code / stdout. +- **Pin the toolchain LLVM contract**: declare and enforce a minimum `clang`/LLVM + version (and the opaque-pointer expectation) for the native backend, align the + developer blueprint with the CI toolchain, and fail fast with a clear message + when the toolchain does not satisfy the contract. +- Add **conformance examples that the previous gate could not catch**: a match + with two or more payload-carrying arms, and a function that returns an enum + value, each with an executable result assertion. + +## Capabilities + +### New Capabilities + +- `codegen-ir-correctness-and-gate`: a contract that the documented core forms + emit type-consistent IR that compiles and runs on the pinned toolchain, that + multi-payload matches parse, and that the conformance gate exercises the real + `sgc` CLI under a declared LLVM contract. + +### Modified Capabilities + +- None in canonical `openspec/specs/`. This change strengthens the verification + and codegen guarantees that `core-language-correctness` introduced; it cites + that change and does not re-specify the array/closure/enum/mutability semantics + themselves. + +## Impact + +- Implementation touches `compiler/src/codegen/` (IR emission for array places, + closure captures, aggregate results), `compiler/src/parser/` (match-arm + pattern parsing), `tools/sgc/` (conformance harness driving the real CLI; + toolchain version check), `.github/workflows/core-conformance.yml` (pinned + `clang`/LLVM), and the environment blueprint / `docs` (toolchain contract). +- Parent umbrella: `mainstream-default-readiness` (core-language readiness arm), + continuing `core-language-correctness`. +- Docs touched: `docs/language-features.md` (multi-arm match guidance, toolchain + requirement) and `PROGRESS.md` (status reflects CLI-verified behavior). + +## Non-Goals + +- No new language features: this change does not add trait bounds/objects, + first-class Option/Result, `as` casts, or generic collections (those are + separate proposed directions). +- No re-specification of the array/closure/enum/mutability semantics owned by + `core-language-correctness`; this change only makes them compile and run on the + pinned toolchain and adds the missing gate coverage. +- No change to match exhaustiveness or guard semantics beyond fixing the + multi-payload-arm parse defect. +- No switch of the backend away from textual LLVM IR + `clang`; the Cranelift + fast path is unchanged. diff --git a/openspec/changes/codegen-ir-correctness-and-gate/specs/codegen-ir-correctness-and-gate/spec.md b/openspec/changes/codegen-ir-correctness-and-gate/specs/codegen-ir-correctness-and-gate/spec.md new file mode 100644 index 0000000..4558d99 --- /dev/null +++ b/openspec/changes/codegen-ir-correctness-and-gate/specs/codegen-ir-correctness-and-gate/spec.md @@ -0,0 +1,114 @@ +## ADDED Requirements + +### Requirement: Documented core forms SHALL compile and run via the real `sgc` CLI + +The pinned core forms SHALL compile to valid LLVM IR, link, and run with the +documented result when built through the shipping `sgc` CLI (`sgc build` / +`sgc run`) on the project's pinned toolchain. A form is conformant only when it +has a committed runnable example and an executable test that drives the `sgc` +CLI and asserts its result or process exit code. + +#### Scenario: A core form is exercised through the shipping CLI + +- **WHEN** the conformance gate builds and runs a pinned core form through the + shipping `sgc` CLI on the pinned toolchain +- **THEN** the `sgc` invocation exits successfully and the produced program runs + with the documented exit code or stdout +- **AND** the gate does not substitute an in-crate compile helper for the CLI +- **AND** any deviation fails the gate and names the offending form and example + +#### Scenario: The CLI fails where a helper would pass + +- **WHEN** the shipping `sgc` CLI cannot compile, link, or correctly run a pinned + core form on the pinned toolchain +- **THEN** the conformance gate fails the build +- **AND** the gate's result does not depend on the in-crate `compile_source` + helper path + +### Requirement: Core forms SHALL emit type-consistent LLVM IR + +Core forms SHALL emit type-consistent LLVM IR: array element places, +closure-captured slots, and aggregate (enum/struct) function results SHALL be +emitted as IR in which the type a value is defined with and the type it is used +with agree, and which passes the verifier of the project's pinned LLVM/`clang` +toolchain. The IR SHALL NOT rely on the incidental `clang` version of the runner +to mask a pointee-type mismatch. + +#### Scenario: Array element address is computed with a consistent pointer type + +- **WHEN** a program reads or writes `arr[i]` for a fixed-size array allocated as + `[N x T]` +- **THEN** the emitted `getelementptr` operand type agrees with the array place's + defined type rather than indexing `[N x T]*` as if it were `T*` +- **AND** the IR passes the pinned toolchain's verifier +- **AND** `examples/04_array.sg`, `examples/05_loop.sg`, and + `examples/conformance/03_array_write.sg` compile, link, and run with their + documented results via the `sgc` CLI + +#### Scenario: A closure captures a local and compiles on the pinned toolchain + +- **WHEN** a program defines `let x = ...; let f = |y| x + y;` and calls `f` +- **THEN** captured slots load and store at a pointer type consistent with their + definition +- **AND** `examples/06_lambda.sg` and + `examples/conformance/04_closure_multi_capture.sg` compile and run with their + documented results via the `sgc` CLI + +#### Scenario: A function returns an enum value + +- **WHEN** a function declares an enum (or struct) return type and is called +- **THEN** the function's emitted function-pointer type and its call site agree, + with no `{ i64, [8 x i8] } (i64)*` versus `i64 (i64)*` mismatch +- **AND** the program compiles, links, and runs with the documented result via the + `sgc` CLI + +### Requirement: Payload-binding match arms SHALL parse in any position + +Payload-binding match arms SHALL parse in any position. A match arm whose pattern +binds variant payload fields (`Variant(bindings) => ...`) SHALL parse correctly +regardless of its position in the match, including when it is followed by +additional arms, and matches with multiple payload-carrying arms SHALL compile +and run. + +#### Scenario: A payload arm is followed by another arm + +- **WHEN** a match places a payload-binding arm before a later arm, e.g. + `match e { E::Z => 0, E::A(n) => n, E::Y => 1 }` +- **THEN** parsing succeeds without `invalid pattern: expected identifier` +- **AND** the match type-checks and runs, selecting the correct arm + +#### Scenario: Multiple payload arms in one match + +- **WHEN** a match has two or more payload-carrying arms, e.g. + `match s { Shape::Circle(r) => r, Shape::Square(w) => w }` +- **THEN** all arms parse and the match runs with the correct per-variant binding +- **AND** a committed conformance example covers this with a result assertion + +#### Scenario: A genuinely malformed pattern is still rejected + +- **WHEN** a match arm uses a syntactically invalid pattern +- **THEN** the parser rejects it with a stable diagnostic +- **AND** the rejection is independent of the arm's position + +### Requirement: The native backend SHALL declare and enforce a toolchain contract + +The native backend SHALL declare the minimum `clang`/LLVM version (and pointer +model) it targets. CI and the developer environment blueprint SHALL use the same +pinned major version, and `sgc` SHALL report a clear, actionable error when the +detected toolchain is below the contract rather than surfacing a raw IR verifier +error. + +#### Scenario: CI and developer toolchains match + +- **WHEN** the conformance gate runs in CI and a developer builds locally per the + environment blueprint +- **THEN** both use the same pinned `clang`/LLVM major version that satisfies the + declared contract +- **AND** a core form that passes the gate also compiles via `sgc run` for the + developer + +#### Scenario: The toolchain is below contract + +- **WHEN** `sgc` runs against a `clang`/LLVM version below the declared contract +- **THEN** `sgc` emits a clear diagnostic identifying the toolchain requirement +- **AND** it does not surface only a raw LLVM IR verifier error diff --git a/openspec/changes/codegen-ir-correctness-and-gate/tasks.md b/openspec/changes/codegen-ir-correctness-and-gate/tasks.md new file mode 100644 index 0000000..1fdd433 --- /dev/null +++ b/openspec/changes/codegen-ir-correctness-and-gate/tasks.md @@ -0,0 +1,89 @@ +## 1. Baseline And Reproduction + +- [ ] 1.1 Run `openspec validate codegen-ir-correctness-and-gate --strict`. +- [ ] 1.2 Record the red baseline on the pinned developer toolchain (`clang-14`): + `sgc run examples/04_array.sg`, `examples/05_loop.sg`, + `examples/conformance/03_array_write.sg`, `examples/06_lambda.sg`, + `examples/conformance/04_closure_multi_capture.sg`, an enum-returning function, + and a multi-payload match. Capture the exact IR verifier / parse errors. +- [ ] 1.3 Confirm the current gate is blind: show + `cargo test -p sgc core_conformance_examples_compile_link_and_run` passing + through the in-crate helper while `sgc run` fails on the same example. + +## 2. IR Type Consistency + +- [ ] 2.1 Decide and document the pointer model (typed-pointer decay vs opaque + `ptr`) in `design.md`; the chosen model fixes the LLVM contract floor. +- [ ] 2.2 Fix array place lowering so `arr[i]` read/write and `for v in arr` + produce IR whose `getelementptr` operand type agrees with the `alloca` + (`[N x T]*`), eliminating the `'[N x T]*' ... but expected 'i64*'` error. +- [ ] 2.3 Fix closure-captured slot lowering so captured locals load/store at a + pointer type consistent with their definition. +- [ ] 2.4 Fix aggregate (enum/struct) function-result lowering so a function that + returns an enum/struct value and its call site agree on the function pointer + type (no `{ i64, [8 x i8] } (i64)*` vs `i64 (i64)*` mismatch). +- [ ] 2.5 Keep the existing async aggregate-result (SysV ABI) tests green while + changing aggregate lowering. + +## 3. Multi-Payload Match Parsing + +- [ ] 3.1 Fix the match-arm parser so a payload-binding arm + (`Variant(bindings) => ...`) parses in first, middle, and last positions. +- [ ] 3.2 Add parser tests for payload arms in every position and for multiple + payload-carrying arms in one match; add a negative test for genuinely malformed + patterns that must still be rejected with a stable diagnostic. + +## 4. Conformance Gate Drives The Real CLI + +- [ ] 4.1 Change the conformance harness in `tools/sgc` to compile, link, and run + each pinned core form through the built `sgc` binary (`sgc build` / `sgc run`), + not the in-crate `compile_source()` helper, asserting exit code and stdout. +- [ ] 4.2 Add conformance examples + executable assertions the previous gate could + not catch: a match with two or more payload-carrying arms, and a function that + returns an enum value. +- [ ] 4.3 Ensure the gate fails loudly (with the offending form/example named) + when any pinned form does not compile, link, or run correctly. + +## 5. Toolchain Contract + +- [ ] 5.1 Declare the minimum `clang`/LLVM version and the pointer-model + expectation for the native backend; document it in `docs/language-features.md`. +- [ ] 5.2 Pin the CI `clang`/LLVM version in + `.github/workflows/core-conformance.yml` to the documented contract and align + the developer environment blueprint to the same major version. +- [ ] 5.3 Make `sgc` emit a clear, actionable diagnostic when the detected + toolchain is below the declared contract, instead of surfacing a raw IR + verifier error. + +## 6. Doc And Status Reconciliation + +- [ ] 6.1 Update `docs/language-features.md` to document multi-arm matches with + multiple payload arms and the toolchain requirement. +- [ ] 6.2 Update `PROGRESS.md` so array/closure/enum status reflects + CLI-verified behavior on the pinned toolchain (not helper-only green). + +## 7. Verification + +- [ ] 7.1 The conformance gate runs the real `sgc` CLI for every pinned core form + on the pinned toolchain in CI and is green. +- [ ] 7.2 `sgc run` succeeds for `examples/04_array.sg`, `examples/05_loop.sg`, + `examples/conformance/03_array_write.sg`, `examples/06_lambda.sg`, + `examples/conformance/04_closure_multi_capture.sg`, the new enum-returning + example, and the new multi-payload match example, with documented results. +- [ ] 7.3 `cargo test` is green on Linux on the pinned toolchain. +- [ ] 7.4 Parser tests prove payload match arms parse in first/middle/last + positions. + +## Archive Gate + +- [ ] `openspec validate codegen-ir-correctness-and-gate --strict` passes. +- [ ] Array index/write/iteration, environment-capturing closures, and + enum/struct-returning functions compile to type-consistent IR and run correctly + via the real `sgc` CLI on the pinned toolchain. +- [ ] Payload-binding match arms parse in any position; multi-payload matches run. +- [ ] The conformance gate drives the shipping `sgc` CLI and pins the LLVM + contract; it cannot pass while `sgc run` of a pinned form fails. +- [ ] CI and the developer blueprint share the same pinned `clang`/LLVM major + version; `sgc` reports a clear error below the contract. +- [ ] `cargo test` is green on Linux; docs and `PROGRESS.md` match CLI-verified + behavior.