ci(regressions): restore column-aware SARIF comparison after byte→UTF-16 transition#915
Merged
robertohuertasm-datadog merged 1 commit intoMay 21, 2026
Conversation
|
🎯 Code Coverage (details) 🔗 Commit SHA: 0c391ff | Docs | Datadog PR Page | Give us feedback! |
1a1b490 to
142502a
Compare
Contributor
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Base automatically changed from
rob/fix/tree-sitter-byte-column-bug-IDE-6037
to
main
May 20, 2026 17:24
…-16 transition - Reverted the temporary `startColumn` / `endColumn` strip introduced in #914 (commit 329df0c), which was added to unblock CI while the kernel was migrated from 1-based UTF-8 byte columns to 1-based UTF-16 code-unit columns. - Restored the comparison key to the full `physicalLocation` (including `startColumn` and `endColumn`) so the regression workflow once again detects column-level drift on top of file / rule / line changes. - Restored the summary table location string to the original `startLine:startColumn-endLine:endColumn` format to keep the GitHub Actions summary informative. - Safe to land once #914 is on `main`: both pre- and post-runs produce UTF-16 columns, so column-aware diffing is meaningful again and no false "removed + added" pairs are expected. IDE-6037
142502a to
0c391ff
Compare
jasonforal
approved these changes
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Context
PR #914 changed the static-analysis kernel from emitting 1-based UTF-8 byte columns to 1-based UTF-16 code-unit columns (matching LSP, VS Code, and the SARIF v2.1 default encoding).
While that PR was open, the regression workflow (
.github/scripts/check-regressions.js) compared two SARIF runs as JSON strings:main→ byte columnsFor every violation on a line containing a non-ASCII character, the column numbers drift by N (where N is the number of multibyte characters before the violation position). The string-equality comparison saw those as
"violation X removed"+"violation Y added"pairs, even though the rule, the file, and the line were identical.Example from
numpy/_core/tests/test_strings.pyline 1284:startColumnendColumn→ same violation, two different rows in the diff.
To unblock #914's CI, commit
329df0cctemporarily droppedstartColumn/endColumnfrom both the comparison key and the summary location string, comparing by(file, ruleId, message, startLine, endLine)instead. That commit was explicitly marked as a one-shot loosening and called out a stacked follow-up PR to restore the original behaviour. This is that follow-up.What this PR does
physicalLocation(includingstartColumnandendColumn).startLine:startColumn-endLine:endColumnformat.329df0cc. No other files touched.Why this is safe to land
Once #914 is merged into
main, both sides of the regression check run with UTF-16 columns:main(post-fix(kernel): emit 1-based UTF-16 code-unit columns from tree-sitter nodes #914) → UTF-16 columns…so column-aware diffing becomes meaningful again and the false "removed + added" pairs disappear. Verified locally on
_core/tests/test_strings.py:main-vs-mainbaselinesMerge order
This PR depends on #914 landing first. It is opened against the
rob/fix/tree-sitter-byte-column-bug-IDE-6037branch as a stacked PR — GitHub will automatically retarget it tomainonce #914 is merged.What we get back
Column-level regression detection, on top of the file / rule / line / message protection we kept throughout. The workflow can again catch column-shift bugs introduced by future tree-sitter or grammar updates, off-by-one regressions in rule code, etc.
Closes IDE-6037 follow-up.