Problem
When analyzing batch comparison results, it's unclear whether divergences cluster at specific token positions (early prefill vs late decode).
Solution
Add xpyd-acc heatmap --report <path> command that:
- Analyzes divergence frequency by token position across all samples
- Bins token positions into configurable buckets (default 10)
- Per-bucket: divergence count, divergence rate, avg logprob gap
HeatmapBucket and HeatmapReport dataclasses with to_dict() serialization
--buckets <int> flag, --json <path> export
- Rich terminal bar chart showing divergence frequency by position range
Acceptance Criteria
Problem
When analyzing batch comparison results, it's unclear whether divergences cluster at specific token positions (early prefill vs late decode).
Solution
Add
xpyd-acc heatmap --report <path>command that:HeatmapBucketandHeatmapReportdataclasses withto_dict()serialization--buckets <int>flag,--json <path>exportAcceptance Criteria
heatmap.pymodule withcompute_heatmap(),HeatmapReport,format_heatmap()heatmapwith--report,--buckets,--jsonflags