Skip to content

Implement generic dataflow analysis framework with gen-kill pattern and SetLattice#23

Merged
jpksh90 merged 3 commits into
mainfrom
copilot/implement-generic-dataflow-framework
Nov 23, 2025
Merged

Implement generic dataflow analysis framework with gen-kill pattern and SetLattice#23
jpksh90 merged 3 commits into
mainfrom
copilot/implement-generic-dataflow-framework

Conversation

Copilot AI commented Nov 18, 2025

Copy link
Copy Markdown
Contributor

Adds a generic gen-kill framework for dataflow analysis and implements constant propagation. The existing reaching definitions and live variables analyses are refactored to use the framework.

Changes

SetLattice

  • SetLattice<E>: Reusable lattice class providing set operations for gen-kill analyses
  • Implements difference(), union(), and meet() operations
  • Encapsulates all set-based operations in one place

Generic Gen-Kill Framework

  • GenKillAnalysis<T>: Abstract class implementing the gen-kill pattern OUT = (IN - kill) ∪ gen
  • Parameterized by element type T (not Set<T>)
  • Takes SetLattice<T> as constructor parameter
  • Delegates set operations to the lattice
  • Subclasses only need to implement gen() and kill() methods

Refactored Analyses

  • ReachingDefinitionsAnalysis: Now extends GenKillAnalysis<String> with SetLattice<String>
  • LiveVariablesAnalysis: Now extends GenKillAnalysis<String> with SetLattice<String>
  • Both use the existing WorklistSolver and maintain identical semantics
  • Cleaner implementation without redundant method overrides

Constant Propagation

  • ConstantPropagationAnalysis: Tracks constant values using lattice {⊤, ⊥, Constant(Double)}
  • Evaluates arithmetic, comparison, and logical operators on constants
  • Meet operation: Constant(c1) ⊓ Constant(c2) = c1 == c2 ? c1 : ⊥

Example Usage

// Forward analysis using gen-kill framework with SetLattice
class ReachingDefinitionsAnalysis : GenKillAnalysis<String>(Direction.FORWARD, SetLattice()) {
    override fun initialValue(): Set<String> = emptySet()
    override fun boundaryValue(): Set<String> = emptySet()
    override fun gen(block: BasicBlock): Set<String> = /* variables defined */
    override fun kill(block: BasicBlock): Set<String> = /* previous definitions */
}

// Constant propagation with lattice
val analysis = ConstantPropagationAnalysis()
val result = analysis.analyze(cfg)
// result.getIn(block)["x"] => Constant(42.0)

Testing

Added 7 tests for constant propagation covering arithmetic, branching, and edge cases. All 18 dataflow tests pass.

Original prompt

This section details on the original issue you should resolve

<issue_title>Implement a generic dataflow analysis framework</issue_title>
<issue_description>Write a dataflow analysis framework generalised over a domain $D$ over a control flow graph of basic blocks. The CFG may be hlir representation, or a three-address code, or some arbitary CFG defined over arbitary code structures (either statements or basic blocks). Implement the worklist solver. Implement a generic gen-kill framework which generates transfer functions over basic blocks or statements.

Use the gen-kill framework to implement the following algorithms:

  1. Reaching definitions
  2. Live variables

Use the worklist solver to implement constant propagation.

Refactor the existing dataflow analysis on HLIR to use this generic framework. </issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

- Created GenKillAnalysis abstract class for gen-kill pattern
- Refactored ReachingDefinitionsAnalysis to use gen-kill framework
- Refactored LiveVariablesAnalysis to use gen-kill framework
- Implemented ConstantPropagationAnalysis with worklist solver
- Added comprehensive tests for constant propagation
- All existing tests still pass

Co-authored-by: jpksh90 <8288548+jpksh90@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement generic dataflow analysis framework with algorithms Implement generic dataflow analysis framework with gen-kill pattern Nov 18, 2025
Copilot AI requested a review from jpksh90 November 18, 2025 17:46
*/
abstract class GenKillAnalysis<T>(
override val direction: Direction,
) : DataflowAnalysis<T>() {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a lattice SetLattice and parameterize over SetLattice

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created SetLattice class with difference, union, and meet operations. GenKillAnalysis now parameterized with SetLattice instead of abstract methods. (commit fd61463)

class ReachingDefinitionsAnalysis : DataflowAnalysis<Set<String>>() {
override val direction = Direction.FORWARD

class ReachingDefinitionsAnalysis : GenKillAnalysis<Set<String>>(Direction.FORWARD) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change accoridngly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated ReachingDefinitionsAnalysis and LiveVariablesAnalysis to use SetLattice parameterization. Removed redundant method implementations. (commit fd61463)

@jpksh90

jpksh90 commented Nov 18, 2025

Copy link
Copy Markdown
Owner

@copilot Fix the comments.

- Created SetLattice class with difference, union, and meet operations
- Parameterized GenKillAnalysis with SetLattice<T> instead of abstract methods
- Updated ReachingDefinitionsAnalysis to use SetLattice<String>
- Updated LiveVariablesAnalysis to use SetLattice<String>
- Removed redundant difference/union/meet method implementations
- All tests pass successfully

Co-authored-by: jpksh90 <8288548+jpksh90@users.noreply.github.com>

Copilot AI commented Nov 18, 2025

Copy link
Copy Markdown
Contributor Author

@copilot Fix the comments.

Fixed both comments. Created SetLattice class and updated all analyses to use it. All tests pass. (commit fd61463)

@sonarqubecloud

Copy link
Copy Markdown

Copilot AI changed the title Implement generic dataflow analysis framework with gen-kill pattern Implement generic dataflow analysis framework with gen-kill pattern and SetLattice Nov 18, 2025
Copilot AI requested a review from jpksh90 November 18, 2025 18:02
@jpksh90 jpksh90 marked this pull request as ready for review November 23, 2025 00:29
Copilot AI review requested due to automatic review settings November 23, 2025 00:29
@jpksh90 jpksh90 merged commit 4cc2f6e into main Nov 23, 2025
5 checks passed

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a generic dataflow analysis framework with a gen-kill pattern and introduces constant propagation analysis. The framework includes a reusable SetLattice class and GenKillAnalysis base class that simplifies implementing set-based dataflow analyses. The existing reaching definitions and live variables analyses are refactored to use this framework, and a new constant propagation analysis is added.

Key changes:

  • Generic gen-kill framework that abstracts the common OUT = (IN - kill) ∪ gen pattern for both forward and backward analyses
  • SetLattice utility class providing reusable set operations (difference, union, meet)
  • Constant propagation analysis with a three-level lattice (Top, Constant, Bottom) supporting arithmetic, comparison, and logical operators
  • Refactored reaching definitions and live variables analyses to extend the gen-kill framework

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/main/kotlin/slang/hlir/DataflowAnalysis.kt Adds SetLattice, GenKillAnalysis framework, refactors ReachingDefinitionsAnalysis and LiveVariablesAnalysis to use gen-kill pattern, implements ConstantPropagationAnalysis with lattice-based value tracking
src/test/kotlin/DataflowAnalysisTest.kt Adds 7 tests for constant propagation covering arithmetic operations, reassignment, unknown variables, and branching scenarios; includes 2 smoke tests for refactored gen-kill framework analyses

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +348 to 387
override fun gen(block: BasicBlock): Set<String> {
// Gen = use (variables used in the block)
val use = mutableSetOf<String>()
val def = mutableSetOf<String>()

// Process each statement in the block (in reverse for backward analysis)
// Process in reverse for backward analysis
for (stmt in block.stmts.reversed()) {
when (stmt) {
is Stmt.LetStmt -> {
def.add(stmt.name)
use.addAll(getUsedVariables(stmt.expr))
}
is Stmt.AssignStmt -> {
when (val lhs = stmt.lhs) {
is Expr.VarExpr -> {
def.add(lhs.name)
}
else -> {
// For complex assignments, we don't track them yet
}
}
use.addAll(getUsedVariables(stmt.expr))
}
is Stmt.LetStmt -> use.addAll(getUsedVariables(stmt.expr))
is Stmt.AssignStmt -> use.addAll(getUsedVariables(stmt.expr))
is Stmt.PrintStmt -> {
for (arg in stmt.args) {
use.addAll(getUsedVariables(arg))
}
}
is Stmt.ExprStmt -> {
use.addAll(getUsedVariables(stmt.expr))
}
is Stmt.ReturnStmt -> {
use.addAll(getUsedVariables(stmt.expr))
}
is Stmt.IfStmt -> {
use.addAll(getUsedVariables(stmt.condition))
}
is Stmt.WhileStmt -> {
use.addAll(getUsedVariables(stmt.condition))
}
else -> {
// Other statements don't use/define variables
}
is Stmt.ExprStmt -> use.addAll(getUsedVariables(stmt.expr))
is Stmt.ReturnStmt -> use.addAll(getUsedVariables(stmt.expr))
is Stmt.IfStmt -> use.addAll(getUsedVariables(stmt.condition))
is Stmt.WhileStmt -> use.addAll(getUsedVariables(stmt.condition))
else -> {}
}
}
return use
}

// IN = (OUT - def) ∪ use
return (input - def) + use
override fun kill(block: BasicBlock): Set<String> {
// Kill = def (variables defined in the block)
val def = mutableSetOf<String>()
for (stmt in block.stmts.reversed()) {
when (stmt) {
is Stmt.LetStmt -> def.add(stmt.name)
is Stmt.AssignStmt -> {
when (val lhs = stmt.lhs) {
is Expr.VarExpr -> def.add(lhs.name)
else -> {}
}
}
else -> {}
}
}
return def
}

Copilot AI Nov 23, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gen() and kill() methods for live variables analysis don't correctly handle intra-block dataflow. For backward analysis, when computing use-def sets within a block, you need to process statements in reverse order and track which variables are used before being defined.

Current implementation: Both methods iterate in reverse and independently collect all uses and defs.

Problem: Consider x = 1; y = x; - In the second statement, x is used, but it was just defined in the first statement. When processing backwards, we should recognize that x is NOT live before the first statement (it's defined before use within the block).

The correct approach is to process statements in reverse order once, maintaining a running set of live variables, adding to USE when a variable is used (and not already in DEF), and adding to DEF when defined.

Suggested fix:

override fun gen(block: BasicBlock): Set<String> {
    val use = mutableSetOf<String>()
    val def = mutableSetOf<String>()
    for (stmt in block.stmts.reversed()) {
        // First collect uses (before they might be killed by defs in this stmt)
        when (stmt) {
            is Stmt.LetStmt -> {
                use.addAll(getUsedVariables(stmt.expr) - def)
                def.add(stmt.name)
            }
            is Stmt.AssignStmt -> {
                use.addAll(getUsedVariables(stmt.expr) - def)
                when (val lhs = stmt.lhs) {
                    is Expr.VarExpr -> def.add(lhs.name)
                    else -> {}
                }
            }
            // ... handle other statement types
        }
    }
    return use
}

Copilot uses AI. Check for mistakes.
Comment on lines +489 to +505
for (varName in allVars) {
// Meet of constant values: if all have same constant, use it; otherwise BOTTOM
val varValues = values.mapNotNull { it[varName] }
if (varValues.isEmpty()) {
// Variable not in any predecessor
continue
}

// Start with the first value
var meetValue: ConstantValue = varValues[0]
for (i in 1 until varValues.size) {
meetValue = meetConstantValues(meetValue, varValues[i])
}
result[varName] = meetValue
}

return result

Copilot AI Nov 23, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meet operation doesn't correctly handle variables that exist in some predecessor flows but not others.

When varValues.size < values.size, it means some predecessors have information about the variable while others don't. The correct interpretation is that missing values should be treated as ⊤ (no information yet, not defined on that path).

However, the current code doesn't distinguish between:

  1. Variable not defined on a path (should be ⊤)
  2. Variable is defined but we have no information (also ⊤, but different semantically)

For constant propagation, if a variable is defined on one path but not on another, after the merge the variable should either be absent from the map (if treating undefined variables as ⊤), or be explicitly set based on meeting the defined value with ⊤.

Current behavior: If x=10 on one path and x is not in the map on another path, the meet ignores the missing value and returns x=10, which is incorrect.

Suggested fix: Check if varValues.size < values.size, and if so, the variable is not consistently defined across all paths, so treat it appropriately (likely set to ⊤ or exclude from result depending on semantics).

Copilot uses AI. Check for mistakes.
Comment on lines +252 to +254
* OUT = (IN - kill) ∪ gen (for forward analysis)
* IN = (OUT - kill) ∪ gen (for backward analysis)
*

Copilot AI Nov 23, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation comment is misleading about how the gen-kill pattern handles different analysis directions.

The comment states:

OUT = (IN - kill) ∪ gen  (for forward analysis)
IN  = (OUT - kill) ∪ gen (for backward analysis)

But the transfer() function only implements one formula at line 284. The framework works because the input parameter semantically represents different things based on direction:

  • For forward analysis: input is IN, and we compute OUT
  • For backward analysis: input is OUT, and we compute IN

The documentation should clarify that the same formula applies in both cases, with input and the return value having different semantic meanings based on the analysis direction. The direction-specific behavior is handled by the worklist solver, not the transfer function.

Suggested documentation update:

/**
 * Generic gen-kill framework for dataflow analysis
 * This framework implements the standard gen-kill pattern where:
 *   output = (input - kill) ∪ gen
 * 
 * For forward analysis: input=IN, output=OUT
 * For backward analysis: input=OUT, output=IN
 * 
 * The worklist solver handles the direction-specific logic.
 */
Suggested change
* OUT = (IN - kill) ∪ gen (for forward analysis)
* IN = (OUT - kill) ∪ gen (for backward analysis)
*
* output = (input - kill) ∪ gen
*
* For forward analysis: input=IN, output=OUT
* For backward analysis: input=OUT, output=IN
*
* The worklist solver handles the direction-specific logic.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement a generic dataflow analysis framework

3 participants