HM type inference#24
Conversation
📝 WalkthroughWalkthroughThis PR introduces a complete Hindley-Milner type inference engine for Slang, enabling compile-time type checking via a new Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI as Main.kt<br/>(CLI Handler)
participant Checker as TypeInference.kt<br/>(typeCheck)
participant Engine as HindleyMilnerInference<br/>(Algorithm W)
participant Env as TypeEnv<br/>(Typing Environment)
participant Unify as Unification.kt<br/>(unify)
User->>CLI: --typecheck flag
CLI->>Checker: typeCheck(program)
Checker->>Engine: inferProgram(program)
Engine->>Engine: Register top-level functions
Engine->>Env: extend(funcName, generalized scheme)
Engine->>Engine: Infer module main block
Engine->>Engine: inferStmt / inferExpr
Engine->>Env: lookup variable
Env-->>Engine: TypeScheme
Engine->>Engine: instantiate scheme
Engine->>Unify: unify(actual, expected, location)
alt Type match
Unify->>Unify: Mutate TVar.bound
Unify-->>Engine: Success
else Type mismatch
Unify-->>Engine: TypeError
Engine->>Engine: Collect error
end
Engine-->>Checker: List<TypeError>
Checker->>CLI: Return errors
CLI->>User: Print success or error list
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/main/kotlin/Main.kt`:
- Around line 91-101: The runTypeCheck function currently prints type errors but
still allows the process to exit 0; after detecting non-empty errors from
typeCheck(programUnit) (inside runTypeCheck) you should throw ProgramResult(1)
to signal failure to Clikt; keep the existing echo("Type checking failed:", err
= true) and the loop that echoes each error (echo(" ${e.location}
${e.message}", err = true)) and then immediately throw ProgramResult(1) so
CI/scripts receive a non-zero exit code.
In `@src/main/kotlin/slang/typeinfer/TypeInference.kt`:
- Around line 357-363: The PLUS branch in TypeInference.kt (Operator.PLUS)
currently only enforces both operands be the same via fresh() + safeUnify, which
permits invalid same-typed operands like Bool; change it to explicitly constrain
PLUS to Num or String: attempt to unify leftType and rightType with the numeric
type (e.g. built-in Num type symbol or NumType) and if that fails, attempt to
unify both with the string type (e.g. StringType); if neither succeeds, emit a
type error using expr.codeInfo; use the existing safeUnify helper and the
leftType/rightType/resultType symbols (or introduce two candidate result types
Num and String) to implement the two-case check rather than a single
unconstrained fresh() resultType.
🧹 Nitpick comments (7)
src/main/kotlin/Main.kt (1)
57-63: Precedence of--hlirover--typecheckis implicit.If a user passes both
--hlirand--typecheck, only HLIR output runs becausehliris checked first. This is fine but could be confusing. Consider either documenting this or making the flags mutually exclusive via Clikt'smutuallyExclusiveOptionsor a runtime check.src/main/kotlin/slang/typeinfer/Types.kt (1)
53-57:TRecordis adata classwith aMap— field ordering intoStringdepends on map implementation.
TRecord.toString()iteratesfields.entries. If the map is aHashMap, iteration order is non-deterministic, which can produce flaky output in error messages and tests. Consider using aLinkedHashMaporSortedMapfor deterministic field ordering, or sort intoString.Minimal fix in toString
data class TRecord( val fields: Map<String, SlangType>, ) : SlangType() { - override fun toString() = "{${fields.entries.joinToString(", ") { "${it.key}: ${it.value}" }}}" + override fun toString() = "{${fields.entries.sortedBy { it.key }.joinToString(", ") { "${it.key}: ${it.value}" }}}" }src/main/kotlin/slang/typeinfer/TypeInference.kt (3)
315-332: Field access on an unresolved type variable silently returns a fresh type with no constraint.When
recordTypeprunes to aTVar, the field access returns a fresh unconstrained type without adding any structural constraint to the variable. This means code like:fun getX(r) => r.x;would infer
ras an unconstrained type variable, not a record with fieldx. The field access effectively becomes untyped. This is a known limitation of HM without row polymorphism, but worth documenting with a TODO or comment in the code.
76-93: Function declaration order matters — no mutual recursion support.The first pass processes functions sequentially, so
fn2can callfn1but not vice versa. Mutual recursion (e.g.,isEven/isOdd) will produce "Undefined function" errors. This is a reasonable starting point, but worth documenting. A fix would involve a preliminary pass that registers all function names with fresh types before inferring any bodies.
411-419:TypeCheckTransformuses fully qualified class references inline.Minor style nit:
slang.common.Transformandslang.parser.CompilerErrorcould be imported at the top of the file for consistency with the other imports.src/test/kotlin/TypeInferenceTest.kt (2)
8-50: Well-structured test helpers;assertErrorCountis unused.The three helpers provide a clean testing API.
assertErrorCount(line 35) is defined but not called by any test. Consider either adding tests that use it (e.g., verifying exact error counts for multi-error scenarios) or removing it to avoid dead code.
202-299: Good coverage of common type errors; consider adding edge cases.The type-error tests cover the important basics well. A few cases that could strengthen the suite:
- PLUS on booleans:
let x = true + false;— this would currently pass due to the permissive PLUS implementation (see related comment onTypeInference.kt). Adding this test would document the expected behavior or catch the gap.- Undefined variable/function: directly exercising error messages for unresolved names.
- Array/Record/Ref operations: the inference engine supports these types but they have no test coverage.
There was a problem hiding this comment.
Pull request overview
Adds a new Hindley–Milner (Algorithm W–style) type inference/type checking pass for Slang HLIR, along with CLI support to run it and a dedicated test suite.
Changes:
- Introduces HM type representations, unification, and an inference engine (
slang.typeinfer.*) plus a pipelineTypeCheckTransform. - Adds
--typecheckCLI flag to run type inference and report errors without executing. - Adds
TypeInferenceTestcovering well-typed programs and expected type errors.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/test/kotlin/TypeInferenceTest.kt | New tests for type inference/type error reporting via typeCheck. |
| src/main/kotlin/slang/typeinfer/Types.kt | HM type AST (type vars, functions, arrays, records, refs), plus prune and freeVars. |
| src/main/kotlin/slang/typeinfer/Unification.kt | Unification with occurs check and structured type handling; defines TypeError. |
| src/main/kotlin/slang/typeinfer/TypeInference.kt | Main inference engine (HindleyMilnerInference), statement/expression typing rules, typeCheck, and TypeCheckTransform. |
| src/main/kotlin/Main.kt | Adds --typecheck flag and wiring to run inference and print results. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // PLUS works on Num+Num or String+String; default to unifying both sides | ||
| val resultType: SlangType = fresh() | ||
| safeUnify(leftType, resultType, expr.codeInfo) | ||
| safeUnify(rightType, resultType, expr.codeInfo) | ||
| resultType |
There was a problem hiding this comment.
The PLUS typing rule currently just unifies both operands with a fresh result type, which allows true + false, [1] + [2], {a:1}+{a:2}, etc. to typecheck even though the interpreter only supports Num+Num and String+String (see slang.runtime.evaluateBinaryOp). The inference needs to restrict + to those runtime-supported types (or introduce explicit overloading/constraints), otherwise the type checker is unsound.
| // PLUS works on Num+Num or String+String; default to unifying both sides | |
| val resultType: SlangType = fresh() | |
| safeUnify(leftType, resultType, expr.codeInfo) | |
| safeUnify(rightType, resultType, expr.codeInfo) | |
| resultType | |
| // PLUS works on Num+Num or String+String; restrict operands accordingly. | |
| // If either side is already known to be a string, treat this as String+String. | |
| if (leftType == SlangType.TString || rightType == SlangType.TString) { | |
| safeUnify(leftType, SlangType.TString, expr.codeInfo) | |
| safeUnify(rightType, SlangType.TString, expr.codeInfo) | |
| SlangType.TString | |
| } else { | |
| // Otherwise, default to numeric addition. | |
| safeUnify(leftType, SlangType.TNum, expr.codeInfo) | |
| safeUnify(rightType, SlangType.TNum, expr.codeInfo) | |
| SlangType.TNum | |
| } |
| // For records, we need structural access; try to unify if already a record | ||
| val pruned = prune(recordType) | ||
| if (pruned is SlangType.TRecord) { | ||
| val ft = pruned.fields[fieldName] | ||
| if (ft != null) { | ||
| safeUnify(fieldType, ft, expr.codeInfo) | ||
| } else { | ||
| errors.add(TypeError(expr.codeInfo, "Record has no field '$fieldName'")) | ||
| } | ||
| } | ||
| // If it's a type variable, we can't know the fields yet — return fresh |
There was a problem hiding this comment.
FieldAccess only checks fields when the LHS prunes to TRecord; for any non-record type (Num/Bool/String/Fun/Array/Ref) it currently returns a fresh fieldType without reporting an error. This means expressions like 1.foo or true.bar will typecheck but always fail at runtime. At minimum, add an error when prune(recordType) is neither TRecord nor TVar, and consider unifying the LHS with a record type when it is a TVar (or otherwise define the intended record typing semantics).
| // For records, we need structural access; try to unify if already a record | |
| val pruned = prune(recordType) | |
| if (pruned is SlangType.TRecord) { | |
| val ft = pruned.fields[fieldName] | |
| if (ft != null) { | |
| safeUnify(fieldType, ft, expr.codeInfo) | |
| } else { | |
| errors.add(TypeError(expr.codeInfo, "Record has no field '$fieldName'")) | |
| } | |
| } | |
| // If it's a type variable, we can't know the fields yet — return fresh | |
| // For records, we need structural access; handle based on the pruned LHS type | |
| val pruned = prune(recordType) | |
| when (pruned) { | |
| is SlangType.TRecord -> { | |
| val ft = pruned.fields[fieldName] | |
| if (ft != null) { | |
| safeUnify(fieldType, ft, expr.codeInfo) | |
| } else { | |
| errors.add( | |
| TypeError( | |
| expr.codeInfo, | |
| "Record has no field '$fieldName'" | |
| ) | |
| ) | |
| } | |
| } | |
| is SlangType.TVar -> { | |
| // Constrain the LHS type variable to be a record with at least this field | |
| val recordConstraint = SlangType.TRecord(mapOf(fieldName to fieldType)) | |
| safeUnify(recordType, recordConstraint, expr.codeInfo) | |
| } | |
| else -> { | |
| // Non-record LHS: this is a type error | |
| errors.add( | |
| TypeError( | |
| expr.codeInfo, | |
| "Field access on non-record value" | |
| ) | |
| ) | |
| } | |
| } |
| is Stmt.StructStmt -> { | ||
| val fieldTypes = stmt.fields.mapValues { inferExpr(it.value, env) } | ||
| val recordType = SlangType.TRecord(fieldTypes) | ||
| val scheme = generalize(env, recordType) |
There was a problem hiding this comment.
StructStmt type inference ignores stmt.functions (methods) and only infers field initializer expressions. That means type errors inside struct methods won’t be reported, and any method signatures aren’t reflected in the resulting type for the struct. If struct methods are part of the language semantics, they should be inferred/checked here (and likely incorporated into the struct’s type, e.g., as function-typed fields or a separate method environment).
| val scheme = generalize(env, recordType) | |
| val scheme = generalize(env, recordType) | |
| // Infer struct methods so that type errors inside them are reported. | |
| // We intentionally ignore any environment extensions they may produce | |
| // to preserve the existing behaviour where only the struct's own type | |
| // is added to the environment here. | |
| stmt.functions.forEach { inferFunctionDecl(it, env) } |
| val lhsType = inferExpr(stmt.lhs, env) | ||
| val rhsType = inferExpr(stmt.expr, env) | ||
| safeUnify(lhsType, rhsType, stmt.codeInfo) |
There was a problem hiding this comment.
AssignStmt infers the LHS by calling inferExpr(stmt.lhs, env). For VarExpr this instantiates polymorphic schemes, so the unification in safeUnify(lhsType, rhsType, ...) doesn't constrain the variable’s actual binding in env. This can let assignments to polymorphic bindings typecheck but produce runtime errors (e.g., reassigning a polymorphic function and later calling it at a different type). Consider handling VarExpr LHS specially: look up the scheme in env and unify rhsType against the scheme’s underlying type without instantiation, and/or reject assignment to schemes with quantified vars (or apply a value restriction / monomorphize-on-mutation strategy).
| val lhsType = inferExpr(stmt.lhs, env) | |
| val rhsType = inferExpr(stmt.expr, env) | |
| safeUnify(lhsType, rhsType, stmt.codeInfo) | |
| val rhsType = inferExpr(stmt.expr, env) | |
| when (val lhs = stmt.lhs) { | |
| is Expr.VarExpr -> { | |
| val scheme = env[lhs.name] | |
| if (scheme != null) { | |
| // Unify against the underlying scheme type without instantiation | |
| safeUnify(scheme.type, rhsType, stmt.codeInfo) | |
| } else { | |
| // Fallback: infer LHS type if no scheme is found | |
| val lhsType = inferExpr(stmt.lhs, env) | |
| safeUnify(lhsType, rhsType, stmt.codeInfo) | |
| } | |
| } | |
| else -> { | |
| // Non-variable LHS: infer its type normally | |
| val lhsType = inferExpr(stmt.lhs, env) | |
| safeUnify(lhsType, rhsType, stmt.codeInfo) | |
| } | |
| } |
| } | ||
| } | ||
|
|
||
| is Expr.ReadInputExpr -> fresh() // could be Num or String |
There was a problem hiding this comment.
ReadInputExpr is typed as an unconstrained fresh type variable. At runtime readInput() only produces a Number or String; allowing it to unify to Bool/function/record types can accept programs that will always crash at runtime (e.g. readInput() && true). Consider modeling readInput() as a dedicated type (or a Num | String-like sum if supported), or conservatively choosing a single static type (e.g. String) plus explicit conversion APIs.
| is Expr.ReadInputExpr -> fresh() // could be Num or String | |
| is Expr.ReadInputExpr -> SlangType.TString |



Summary by CodeRabbit
Release Notes
New Features
--typecheckcommand-line flag to validate your code for type errors before execution. The type checker detects issues with arithmetic operations, function calls, control flow conditions, and variable assignments.Tests