Skip to content

HM type inference#24

Merged
jpksh90 merged 1 commit into
mainfrom
hm-type-infer
Feb 10, 2026
Merged

HM type inference#24
jpksh90 merged 1 commit into
mainfrom
hm-type-infer

Conversation

@jpksh90

@jpksh90 jpksh90 commented Feb 10, 2026

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a --typecheck command-line flag to validate your code for type errors before execution. The type checker detects issues with arithmetic operations, function calls, control flow conditions, and variable assignments.
  • Tests

    • Added comprehensive test suite validating type checking for correctly-typed programs and various error detection scenarios.

Copilot AI review requested due to automatic review settings February 10, 2026 22:25
@coderabbitai

coderabbitai Bot commented Feb 10, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR introduces a complete Hindley-Milner type inference engine for Slang, enabling compile-time type checking via a new --typecheck CLI flag. The implementation spans type primitives, unification, Algorithm W inference with let-polymorphism, and a comprehensive test suite validating both well-typed and type-error scenarios.

Changes

Cohort / File(s) Summary
Type System Primitives
src/main/kotlin/slang/typeinfer/Types.kt
Defines sealed class SlangType with variants (TVar, TNum, TBool, TString, TNone, TUnit, TFun, TArray, TRecord, TRef), TypeScheme for polymorphic types, and helpers prune() and freeVars() for type traversal.
Unification Engine
src/main/kotlin/slang/typeinfer/Unification.kt
Implements Hindley-Milner unification with occurs-check. Introduces TypeError exception and public unify() function handling TVar binding, arity checks, and type constructor dispatch with specific error messages.
Type Inference Implementation
src/main/kotlin/slang/typeinfer/TypeInference.kt
Implements Algorithm W with let-polymorphism via TypeEnv (immutable typing environment), HindleyMilnerInference (inference engine with fresh variable generation, generalization, instantiation, and substitution), two-pass program inference, and public APIs typeCheck() and TypeCheckTransform.
CLI Integration
src/main/kotlin/Main.kt
Adds --typecheck flag to SlangCLI that runs type checking instead of execution or HLIR emission. Introduces runTypeCheck() helper that invokes typeCheck() and prints success or per-error details. Adds typeCheck import.
Test Suite
src/test/kotlin/TypeInferenceTest.kt
Comprehensive test class with 20+ test cases covering well-typed programs (arithmetic, recursion, higher-order functions, control flow, polymorphism) and type-error scenarios (operator type mismatches, arity mismatches, non-bool conditions, assignment mismatches).

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as Main.kt<br/>(CLI Handler)
    participant Checker as TypeInference.kt<br/>(typeCheck)
    participant Engine as HindleyMilnerInference<br/>(Algorithm W)
    participant Env as TypeEnv<br/>(Typing Environment)
    participant Unify as Unification.kt<br/>(unify)
    
    User->>CLI: --typecheck flag
    CLI->>Checker: typeCheck(program)
    Checker->>Engine: inferProgram(program)
    Engine->>Engine: Register top-level functions
    Engine->>Env: extend(funcName, generalized scheme)
    Engine->>Engine: Infer module main block
    Engine->>Engine: inferStmt / inferExpr
    Engine->>Env: lookup variable
    Env-->>Engine: TypeScheme
    Engine->>Engine: instantiate scheme
    Engine->>Unify: unify(actual, expected, location)
    alt Type match
        Unify->>Unify: Mutate TVar.bound
        Unify-->>Engine: Success
    else Type mismatch
        Unify-->>Engine: TypeError
        Engine->>Engine: Collect error
    end
    Engine-->>Checker: List<TypeError>
    Checker->>CLI: Return errors
    CLI->>User: Print success or error list
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Behold! A type-checker born from logic's art,
Where Hindley-Milner mends each broken part,
Variables fresh and schemes that generalize,
Unification binds as errors crystallize—
Let polymorphism flow, the types align! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'HM type inference' is vague and uses an acronym without context, making it unclear to someone unfamiliar with the codebase. Expand the title to be more descriptive, e.g., 'Add Hindley-Milner type inference engine' or 'Implement type checking via Hindley-Milner inference'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm-type-infer

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sonarqubecloud

Copy link
Copy Markdown

@jpksh90 jpksh90 left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/main/kotlin/Main.kt`:
- Around line 91-101: The runTypeCheck function currently prints type errors but
still allows the process to exit 0; after detecting non-empty errors from
typeCheck(programUnit) (inside runTypeCheck) you should throw ProgramResult(1)
to signal failure to Clikt; keep the existing echo("Type checking failed:", err
= true) and the loop that echoes each error (echo("  ${e.location}
${e.message}", err = true)) and then immediately throw ProgramResult(1) so
CI/scripts receive a non-zero exit code.

In `@src/main/kotlin/slang/typeinfer/TypeInference.kt`:
- Around line 357-363: The PLUS branch in TypeInference.kt (Operator.PLUS)
currently only enforces both operands be the same via fresh() + safeUnify, which
permits invalid same-typed operands like Bool; change it to explicitly constrain
PLUS to Num or String: attempt to unify leftType and rightType with the numeric
type (e.g. built-in Num type symbol or NumType) and if that fails, attempt to
unify both with the string type (e.g. StringType); if neither succeeds, emit a
type error using expr.codeInfo; use the existing safeUnify helper and the
leftType/rightType/resultType symbols (or introduce two candidate result types
Num and String) to implement the two-case check rather than a single
unconstrained fresh() resultType.
🧹 Nitpick comments (7)
src/main/kotlin/Main.kt (1)

57-63: Precedence of --hlir over --typecheck is implicit.

If a user passes both --hlir and --typecheck, only HLIR output runs because hlir is checked first. This is fine but could be confusing. Consider either documenting this or making the flags mutually exclusive via Clikt's mutuallyExclusiveOptions or a runtime check.

src/main/kotlin/slang/typeinfer/Types.kt (1)

53-57: TRecord is a data class with a Map — field ordering in toString depends on map implementation.

TRecord.toString() iterates fields.entries. If the map is a HashMap, iteration order is non-deterministic, which can produce flaky output in error messages and tests. Consider using a LinkedHashMap or SortedMap for deterministic field ordering, or sort in toString.

Minimal fix in toString
     data class TRecord(
         val fields: Map<String, SlangType>,
     ) : SlangType() {
-        override fun toString() = "{${fields.entries.joinToString(", ") { "${it.key}: ${it.value}" }}}"
+        override fun toString() = "{${fields.entries.sortedBy { it.key }.joinToString(", ") { "${it.key}: ${it.value}" }}}"
     }
src/main/kotlin/slang/typeinfer/TypeInference.kt (3)

315-332: Field access on an unresolved type variable silently returns a fresh type with no constraint.

When recordType prunes to a TVar, the field access returns a fresh unconstrained type without adding any structural constraint to the variable. This means code like:

fun getX(r) => r.x;

would infer r as an unconstrained type variable, not a record with field x. The field access effectively becomes untyped. This is a known limitation of HM without row polymorphism, but worth documenting with a TODO or comment in the code.


76-93: Function declaration order matters — no mutual recursion support.

The first pass processes functions sequentially, so fn2 can call fn1 but not vice versa. Mutual recursion (e.g., isEven/isOdd) will produce "Undefined function" errors. This is a reasonable starting point, but worth documenting. A fix would involve a preliminary pass that registers all function names with fresh types before inferring any bodies.


411-419: TypeCheckTransform uses fully qualified class references inline.

Minor style nit: slang.common.Transform and slang.parser.CompilerError could be imported at the top of the file for consistency with the other imports.

src/test/kotlin/TypeInferenceTest.kt (2)

8-50: Well-structured test helpers; assertErrorCount is unused.

The three helpers provide a clean testing API. assertErrorCount (line 35) is defined but not called by any test. Consider either adding tests that use it (e.g., verifying exact error counts for multi-error scenarios) or removing it to avoid dead code.


202-299: Good coverage of common type errors; consider adding edge cases.

The type-error tests cover the important basics well. A few cases that could strengthen the suite:

  • PLUS on booleans: let x = true + false; — this would currently pass due to the permissive PLUS implementation (see related comment on TypeInference.kt). Adding this test would document the expected behavior or catch the gap.
  • Undefined variable/function: directly exercising error messages for unresolved names.
  • Array/Record/Ref operations: the inference engine supports these types but they have no test coverage.

Comment thread src/main/kotlin/Main.kt
Comment thread src/main/kotlin/slang/typeinfer/TypeInference.kt
@jpksh90 jpksh90 merged commit dd286aa into main Feb 10, 2026
9 checks passed

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Hindley–Milner (Algorithm W–style) type inference/type checking pass for Slang HLIR, along with CLI support to run it and a dedicated test suite.

Changes:

  • Introduces HM type representations, unification, and an inference engine (slang.typeinfer.*) plus a pipeline TypeCheckTransform.
  • Adds --typecheck CLI flag to run type inference and report errors without executing.
  • Adds TypeInferenceTest covering well-typed programs and expected type errors.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/test/kotlin/TypeInferenceTest.kt New tests for type inference/type error reporting via typeCheck.
src/main/kotlin/slang/typeinfer/Types.kt HM type AST (type vars, functions, arrays, records, refs), plus prune and freeVars.
src/main/kotlin/slang/typeinfer/Unification.kt Unification with occurs check and structured type handling; defines TypeError.
src/main/kotlin/slang/typeinfer/TypeInference.kt Main inference engine (HindleyMilnerInference), statement/expression typing rules, typeCheck, and TypeCheckTransform.
src/main/kotlin/Main.kt Adds --typecheck flag and wiring to run inference and print results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +358 to +362
// PLUS works on Num+Num or String+String; default to unifying both sides
val resultType: SlangType = fresh()
safeUnify(leftType, resultType, expr.codeInfo)
safeUnify(rightType, resultType, expr.codeInfo)
resultType

Copilot AI Feb 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PLUS typing rule currently just unifies both operands with a fresh result type, which allows true + false, [1] + [2], {a:1}+{a:2}, etc. to typecheck even though the interpreter only supports Num+Num and String+String (see slang.runtime.evaluateBinaryOp). The inference needs to restrict + to those runtime-supported types (or introduce explicit overloading/constraints), otherwise the type checker is unsound.

Suggested change
// PLUS works on Num+Num or String+String; default to unifying both sides
val resultType: SlangType = fresh()
safeUnify(leftType, resultType, expr.codeInfo)
safeUnify(rightType, resultType, expr.codeInfo)
resultType
// PLUS works on Num+Num or String+String; restrict operands accordingly.
// If either side is already known to be a string, treat this as String+String.
if (leftType == SlangType.TString || rightType == SlangType.TString) {
safeUnify(leftType, SlangType.TString, expr.codeInfo)
safeUnify(rightType, SlangType.TString, expr.codeInfo)
SlangType.TString
} else {
// Otherwise, default to numeric addition.
safeUnify(leftType, SlangType.TNum, expr.codeInfo)
safeUnify(rightType, SlangType.TNum, expr.codeInfo)
SlangType.TNum
}

Copilot uses AI. Check for mistakes.
Comment on lines +320 to +330
// For records, we need structural access; try to unify if already a record
val pruned = prune(recordType)
if (pruned is SlangType.TRecord) {
val ft = pruned.fields[fieldName]
if (ft != null) {
safeUnify(fieldType, ft, expr.codeInfo)
} else {
errors.add(TypeError(expr.codeInfo, "Record has no field '$fieldName'"))
}
}
// If it's a type variable, we can't know the fields yet — return fresh

Copilot AI Feb 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FieldAccess only checks fields when the LHS prunes to TRecord; for any non-record type (Num/Bool/String/Fun/Array/Ref) it currently returns a fresh fieldType without reporting an error. This means expressions like 1.foo or true.bar will typecheck but always fail at runtime. At minimum, add an error when prune(recordType) is neither TRecord nor TVar, and consider unifying the LHS with a record type when it is a TVar (or otherwise define the intended record typing semantics).

Suggested change
// For records, we need structural access; try to unify if already a record
val pruned = prune(recordType)
if (pruned is SlangType.TRecord) {
val ft = pruned.fields[fieldName]
if (ft != null) {
safeUnify(fieldType, ft, expr.codeInfo)
} else {
errors.add(TypeError(expr.codeInfo, "Record has no field '$fieldName'"))
}
}
// If it's a type variable, we can't know the fields yet — return fresh
// For records, we need structural access; handle based on the pruned LHS type
val pruned = prune(recordType)
when (pruned) {
is SlangType.TRecord -> {
val ft = pruned.fields[fieldName]
if (ft != null) {
safeUnify(fieldType, ft, expr.codeInfo)
} else {
errors.add(
TypeError(
expr.codeInfo,
"Record has no field '$fieldName'"
)
)
}
}
is SlangType.TVar -> {
// Constrain the LHS type variable to be a record with at least this field
val recordConstraint = SlangType.TRecord(mapOf(fieldName to fieldType))
safeUnify(recordType, recordConstraint, expr.codeInfo)
}
else -> {
// Non-record LHS: this is a type error
errors.add(
TypeError(
expr.codeInfo,
"Field access on non-record value"
)
)
}
}

Copilot uses AI. Check for mistakes.
is Stmt.StructStmt -> {
val fieldTypes = stmt.fields.mapValues { inferExpr(it.value, env) }
val recordType = SlangType.TRecord(fieldTypes)
val scheme = generalize(env, recordType)

Copilot AI Feb 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StructStmt type inference ignores stmt.functions (methods) and only infers field initializer expressions. That means type errors inside struct methods won’t be reported, and any method signatures aren’t reflected in the resulting type for the struct. If struct methods are part of the language semantics, they should be inferred/checked here (and likely incorporated into the struct’s type, e.g., as function-typed fields or a separate method environment).

Suggested change
val scheme = generalize(env, recordType)
val scheme = generalize(env, recordType)
// Infer struct methods so that type errors inside them are reported.
// We intentionally ignore any environment extensions they may produce
// to preserve the existing behaviour where only the struct's own type
// is added to the environment here.
stmt.functions.forEach { inferFunctionDecl(it, env) }

Copilot uses AI. Check for mistakes.
Comment on lines +156 to +158
val lhsType = inferExpr(stmt.lhs, env)
val rhsType = inferExpr(stmt.expr, env)
safeUnify(lhsType, rhsType, stmt.codeInfo)

Copilot AI Feb 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssignStmt infers the LHS by calling inferExpr(stmt.lhs, env). For VarExpr this instantiates polymorphic schemes, so the unification in safeUnify(lhsType, rhsType, ...) doesn't constrain the variable’s actual binding in env. This can let assignments to polymorphic bindings typecheck but produce runtime errors (e.g., reassigning a polymorphic function and later calling it at a different type). Consider handling VarExpr LHS specially: look up the scheme in env and unify rhsType against the scheme’s underlying type without instantiation, and/or reject assignment to schemes with quantified vars (or apply a value restriction / monomorphize-on-mutation strategy).

Suggested change
val lhsType = inferExpr(stmt.lhs, env)
val rhsType = inferExpr(stmt.expr, env)
safeUnify(lhsType, rhsType, stmt.codeInfo)
val rhsType = inferExpr(stmt.expr, env)
when (val lhs = stmt.lhs) {
is Expr.VarExpr -> {
val scheme = env[lhs.name]
if (scheme != null) {
// Unify against the underlying scheme type without instantiation
safeUnify(scheme.type, rhsType, stmt.codeInfo)
} else {
// Fallback: infer LHS type if no scheme is found
val lhsType = inferExpr(stmt.lhs, env)
safeUnify(lhsType, rhsType, stmt.codeInfo)
}
}
else -> {
// Non-variable LHS: infer its type normally
val lhsType = inferExpr(stmt.lhs, env)
safeUnify(lhsType, rhsType, stmt.codeInfo)
}
}

Copilot uses AI. Check for mistakes.
}
}

is Expr.ReadInputExpr -> fresh() // could be Num or String

Copilot AI Feb 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadInputExpr is typed as an unconstrained fresh type variable. At runtime readInput() only produces a Number or String; allowing it to unify to Bool/function/record types can accept programs that will always crash at runtime (e.g. readInput() && true). Consider modeling readInput() as a dedicated type (or a Num | String-like sum if supported), or conservatively choosing a single static type (e.g. String) plus explicit conversion APIs.

Suggested change
is Expr.ReadInputExpr -> fresh() // could be Num or String
is Expr.ReadInputExpr -> SlangType.TString

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants