Report distinct variable values observed in TLC coverage statistics.

lemmy · lemmy · commit 96056d942b70 · 2025-05-19T20:11:46.000-07:00
These statistics show the number of unique values each variable takes during model checking. An unusually high number of values for a particular variable may suggest that the model is not properly constrained, potentially leading to state space explosion during exhaustive analysis. Note I: The data structure used to estimate these counts is probabilistic—specifically, HyperLogLog—which helps minimize memory usage. As a result, the reported counts may have a small margin of error. Additionally, the use of this structure introduces contention among workers, which can negatively affect performance and scalability. However, empirical measurements (see tlaplus#1183 (comment)) have shown that the performance overhead of variable statistics collection on top of action and ordinary coverage is negligible. Note II: The `TLC!TLCGet("spec")` named register equals the same data and serves as a more appropriate and structured input for extracting and parsing these numeric values during subsequent processing stages: ```tla ---- MODULE Spec ---- EXTENDS TLC, Json ... MyStats == PrintT( ToJson( \* Alternatively, see CSV!CSVWrite operator. { [name |-> v.name, count |-> v.coverage.distinct] : v \in TLCGet("spec").variables } ) ) ==== ---- CONFIG Spec ---- ... _PERIODIC MyStats POSTCONDITION MyStats ==== ``` Variable statistics can be enabled independently of the `-coverage someTime` option by setting the Java system property `-Dtlc2.TLCGlobals.coverage=2` when running TLC. To activate both action and variable statistics, use `-Dtlc2.TLCGlobals.coverage=3`. [Feature][TLC] Signed-off-by: Markus Alexander Kuppe <github.com@lemmster.de>
diff --git a/docs/module-coverage-statistics.md b/docs/module-coverage-statistics.md
@@ -33,10 +33,11 @@ Spec ==
 =====
 ```
 
-When TLC runs this spec with coverage reporting enabled, it may produce output like this:
+When TLC runs this spec with coverage reporting enabled (and without deadlock checking), it produces output like this:
 
 ```
 The coverage statistics at 2025-04-02 18:02:30
+<x line 4, col 11 to line 4, col 11 of module Foobar>: 10
 <Init line 6, col 1 to line 6, col 4 of module Foobar>: 1:1
   line 7, col 5 to line 7, col 12 of module Foobar: 1
 <Inc line 9, col 1 to line 9, col 3 of module Foobar>: 10:10
@@ -54,11 +55,14 @@ End of statistics.
 ```
 
 ## How to Interpret This Output
-Each block of coverage statistics corresponds to either a definition (like `Init`, `Inc`, or `Dec`) or an expression inside the definition.
+Each block of coverage statistics corresponds to either a variable declaration, a definition (like `Init`, `Inc`, or `Dec`), or an expression inside the definition.
+
+### Variable Declaration:
+The line `<x line 4, col 11 to line 4, col 11 of module Foobar>:10` indicates that TLC found 10 distinct values for the (declared) variable `x`.
 
 ### State-level expressions (`Init`):
 
-The line `<Init line 6, col 1 to line 6, col 4 of module Foobar>: 1:1` shows that TLC evaluated the `Init` predicate once, and it produced one initial state 
+The line `<Init line 6, col 1 to line 6, col 4 of module Foobar>: 1:1` shows that TLC evaluated the `Init` predicate once, and it produced one initial state. 
 
 ### Action-level expressions (`Inc` and `Dec`):
 
@@ -74,7 +78,7 @@ In addition to tracking how many times an expression is evaluated, TLC also repo
 
 This is especially relevant for expressions that manipulate sets, functions, sequences, or other compound structures. When such an allocation occurs, TLC appends a second number to the coverage entry in the format evaluations:cost.
 
-Consider the following specification, where the Next action repeatedly adds a new element to the set x:
+Consider the following specification, where the `Next` action repeatedly adds a new element to the set `x`:
 
 ```tla
 ------ MODULE Costs ------
@@ -96,6 +100,7 @@ Spec ==
 
 ```
 The coverage statistics at 2025-04-02 18:28:21
+<x line 4, col 11 to line 4, col 11 of module Foobar>:10
 Init line 6, col 1 to line 6, col 4 of module Foobar>: 1:1
   line 7, col 5 to line 7, col 10 of module Foobar: 1
 <Next line 9, col 1 to line 9, col 4 of module Foobar>: 10:10
@@ -111,6 +116,6 @@ End of statistics.
 
 This tells us:
 
-The sub-expression ({Cardinality(x) + 1}) was evaluated 10 times, and TLC incurred an allocation cost of 18 across those 10 evaluations. This cost represents internal overhead, such as memory allocation or structural copying involved in creating the new set value.
+The sub-expression `({Cardinality(x) + 1})` was evaluated 10 times, and TLC incurred an allocation cost of 18 across those 10 evaluations. This cost represents internal overhead, such as memory allocation or structural copying involved in creating the new set value.
 
 These costs can highlight performance hotspots in your specification—helpful for optimizing large models where memory usage or computational effort may become significant.
diff --git a/tlatools/org.lamport.tlatools/src/tlc2/output/EC.java b/tlatools/org.lamport.tlatools/src/tlc2/output/EC.java
@@ -293,7 +293,8 @@ public interface EC
     public static final int TLC_COVERAGE_PROPERTY = 2774;
     public static final int TLC_COVERAGE_CONSTRAINT = 2778;
     public static final int TLC_COVERAGE_END_OVERHEAD = 2777;
-    
+    public static final int TLC_COVERAGE_VAR = 2779;
+   
     // config file errors
     public static final int TLC_CONFIG_VALUE_NOT_ASSIGNED_TO_CONSTANT_PARAM = 2222;
     public static final int TLC_CONFIG_RHS_ID_APPEARED_AFTER_LHS_ID = 2223;
diff --git a/tlatools/org.lamport.tlatools/src/tlc2/output/MP.java b/tlatools/org.lamport.tlatools/src/tlc2/output/MP.java
@@ -1169,6 +1169,9 @@ else if (parameters.length == 2) {
         case EC.TLC_COVERAGE_VALUE_COST:
             b.append("  %1%: %2%:%3%");
             break;
+        case EC.TLC_COVERAGE_VAR:
+       		b.append("<%1% %2%>: %3%");
+            break;
         case EC.TLC_COVERAGE_INIT:
        		b.append("%1%: %2%:%3%");
             break;
diff --git a/tlatools/org.lamport.tlatools/src/tlc2/tool/coverage/CostModelCreator.java b/tlatools/org.lamport.tlatools/src/tlc2/tool/coverage/CostModelCreator.java
@@ -49,6 +49,7 @@
 import tla2sany.semantic.Subst;
 import tla2sany.semantic.SubstInNode;
 import tla2sany.semantic.SymbolNode;
+import tla2sany.st.Location;
 import tlc2.output.EC;
 import tlc2.output.MP;
 import tlc2.tool.Action;
@@ -58,6 +59,7 @@
 import tlc2.util.ObjLongTable;
 import tlc2.util.Vect;
 import tlc2.util.statistics.CountDistinct;
+import util.UniqueString;
 
 /**
  * <h1>Why a CostModel:</h1> Why a CostModelCreator to traverses the semantic
@@ -428,7 +430,38 @@ public static final void create(final ITool tool) {
 	}
 	
 	public static void report(final ITool tool, final long startTime) {
-        MP.printMessage(EC.TLC_COVERAGE_START);
+		report(tool);
+
+		// Notify users about the performance overhead related to coverage collection
+		// after N minutes of model checking. The assumption is that a user has little
+		// interest in coverage for a large (long-running) model anyway. In the future
+		// it is hopefully possible to switch from profiling to sampling to relax the
+		// performance overhead of coverage and cost statistics.
+		final long l = System.currentTimeMillis() - startTime;
+		if (l > (5L * 60L * 1000L)) {
+			MP.printMessage(EC.TLC_COVERAGE_END_OVERHEAD);
+		} else {
+			MP.printMessage(EC.TLC_COVERAGE_END);
+		}
+	}
+
+	private static void report(final ITool tool) {
+		MP.printMessage(EC.TLC_COVERAGE_START);
+		
+		// VARIABLE and VARIABLES
+		for (final OpDeclNode odn : tool.getSpecProcessor().getVariablesNodes()) {
+			final long count = odn.getCountDistinct().count();
+			// 'count' may be zero if report is evaluated before state-space exploration
+			// begins. Luckily, Noop#count returns -1 in such cases.
+			if (count >= 0) {
+				final UniqueString varName = odn.getName();
+				final Location location = odn.getLocation();
+				MP.printMessage(EC.TLC_COVERAGE_VAR,
+						new String[] { varName.toString(), location.toString(), String.valueOf(count) });
+			}
+		}
+		
+		// INIT (or SPECIFICATION)
     	final Vect<Action> init = tool.getInitStateSpec();
     	for (int i = 0; i < init.size(); i++) {
     		final Action initAction = init.elementAt(i);
@@ -495,17 +528,5 @@ public int compare(Action o1, Action o2) {
     			impliedActions.cm.report();
     		}
         }
-       
-		// Notify users about the performance overhead related to coverage collection
-		// after N minutes of model checking. The assumption is that a user has little
-		// interest in coverage for a large (long-running) model anyway.  In the future
-        // it is hopefully possible to switch from profiling to sampling to relax the
-        // performance overhead of coverage and cost statistics.
-		final long l = System.currentTimeMillis() - startTime;
-		if (l > (5L * 60L * 1000L)) {
-			MP.printMessage(EC.TLC_COVERAGE_END_OVERHEAD);
-		} else {
-			MP.printMessage(EC.TLC_COVERAGE_END);
-		}
 	}
 }