Add stubtest to CI for type annotation validation#11124
Add stubtest to CI for type annotation validation#11124dcherian merged 6 commits intopydata:mainfrom
Conversation
08ca82f to
11bab2c
Compare
|
Hi @kkollsga , I'm curious about this issue so am happy to dig into this PR a bit - but before I do that I would like to know to what extent AI has been used on this PR. It seems like a heavy PR and from a glance (on mobile) I dont see the connections to the original work done in Scipy. An AI disclosure statement would be really helpful for me to know how its been used 😊 (Note I am not a maintainer, and Xarray - AFAICT - doesn't have LLM disclosure as a contributing policy. This is a request from myself) |
|
This PR was developed with assistance from Claude Code. AI usage is disclosed via the The original issue references stubtest tooling from NumPy's fellowship work, with a cross-reference to #10273 (scipy-stubs compliance). The allowlist was built specifically for xarray's patterns by iteratively running stubtest on core modules. This PR also cleans up some obsolete type ignores that scipy-stubs made redundant, partially addressing #10273. The categorized allowlist (TYPE_CHECKING imports, dynamic methods, protocol differences) is designed to be maintainable as xarray evolves. |
51e903b to
c36e571
Compare
|
Earlier iterations ran into issues with pixi lock validation when modifying The fix was to keep this PR as pure infrastructure with no source changes, and rebuild the allowlist from scratch using efficient grouped regex patterns. This brought it down to 170 lines with 0 stubtest errors and 0 unused patterns. Future work: Expand coverage to additional modules (backends, computation, indexes), then transition from |
70abc7a to
04105e0
Compare
Add stubtest to CI for validating type stubs against runtime behavior. This helps catch type annotation regressions early. - Add stubtest job to ci-additional.yaml (non-blocking with continue-on-error) - Create allowlist for known acceptable differences (TYPE_CHECKING imports, etc.) - Tests core modules: dataarray, dataset, variable Refs pydata#11086 Co-Authored-By: Claude <noreply@anthropic.com>
04105e0 to
293b7e2
Compare
There was a problem hiding this comment.
I see a lot of "weird" false positives in the allowlist.
Maybe that is a limitation of not using actual stub files but in-lining the typing info (stubtest was designed to check stub files vs implementation files).
But I still think this is a nice addition that we can improve upon. Thanks!
|
|
||
| - name: Install type stubs | ||
| run: | | ||
| pixi run -e ${{env.PIXI_ENV}} -- python -m mypy --install-types --non-interactive xarray/ || true |
There was a problem hiding this comment.
That feels a bit wacky. I always wanted to add the stubs dependencies to the environments directly for better transparency.
But since the mypy job does this as well, I don't see a better way as well.
Let's keep this for the future.
| - name: Run stubtest (core modules) | ||
| run: | | ||
| pixi run -e ${{env.PIXI_ENV}} -- python -m mypy.stubtest \ | ||
| xarray.core.dataarray \ |
There was a problem hiding this comment.
Any reason why not run against full xarray?
Does it produce too many errors?
There was a problem hiding this comment.
The earlier version ran against more modules, but some of the source code fixes needed to pass stubtest caused mypy to fail — the two tools have opposing perspectives on the same annotations. Rather than debugging those conflicts across the whole codebase in one PR, I scoped it down to the three core modules as pure infrastructure (CI job + allowlist, no source changes) to avoid that tension entirely. Running against full xarray produces ~6,600 errors, which would need a much larger allowlist. Happy to expand coverage incrementally in follow-ups.
| @@ -0,0 +1,170 @@ | |||
| # Stubtest Allowlist for xarray | |||
There was a problem hiding this comment.
I see a lot of stuff here that is typing only. I am actually not sure what the best way forward is here.
There was a problem hiding this comment.
Thanks for the review! You're exactly right — stubtest was designed for separate .pyi stub files where type-check-only symbols simply don't appear in the stubs. With inline annotations, every TYPE_CHECKING import gets flagged because it exists in the source but not at runtime. That accounts for ~60% of the allowlist.
The earlier version of this PR tried to address some of these by modifying source code (removing obsolete type: ignore comments, adjusting signatures), but some of those fixes caused mypy to fail — the two tools pull annotations in opposite directions. stubtest wants them to match runtime, mypy wants internal consistency, and in xarray's architecture with heavy TYPE_CHECKING blocks those goals conflict. I scoped it down to pure infrastructure (zero source changes) so the tools can't interfere with each other, which is why the allowlist is larger than ideal.
That said, each allowlist entry documents an intentional decision, and the categorized regex patterns should make it straightforward to chip away at entries in follow-up PRs as the typing infrastructure evolves.
|
Now that I think about it: does it even make sense to use stubtest if we only use inline type hints? Isn't mypy covering these things already (I have to admit that I am not familiar with stubtest)? |
|
Good question — mypy and stubtest check fundamentally different properties, even with inline annotations. mypy checks internal consistency: "are the annotations self-consistent?" It never imports the module. A concrete xarray example: Same applies to parameter renames, positional-only/keyword-only changes, final/abstractmethod drift, and methods that get removed but whose annotations stick around. These are all things mypy is structurally blind to. Once this lands, it's possible to incrementally expand module coverage, start resolving straightforward allowlist entries, and use the baseline to catch regressions as the codebase evolves. Plenty of room to iterate from here. |
|
Thanks for the review @headtr1ck ! and thanks to @kkollsga for the contribuition |
|
I'm not (yet) familiar with pixi, not sure why the tests fail on setup. |
|
The stubtest job was failing because the cache setup step was out of date. When I merged the latest main into this branch, the other CI jobs had been refactored to use a new caching approach (PR #11096), but the new stubtest job wasn't updated since there was no merge conflict. I'll fix and push the update. |
Co-Authored-By: Claude <noreply@anthropic.com>
Head branch was pushed to by a user without write access
whats-new.rstapi.rst(N/A - no new public API)Summary
This PR integrates mypy's stubtest tool into xarray's CI pipeline to validate that type annotations match runtime behavior.
What's included
CI Integration:
stubtestjob in.github/workflows/ci-additional.yamlcontinue-on-error: true(Phase 1: informational)dataarray,dataset,variableInfrastructure:
_stubtest/allowlist.txt- Comprehensive allowlist for TYPE_CHECKING imports and intentional stub/runtime differences_stubtest/run_stubtest.py- Runner script with reportingci_tests/- Pytest-based tests for stubtest compliance and type regressionsCode cleanup:
# type: ignorecomments revealed byscipy-stubsandpandas-stubsPhased rollout
The stubtest job is currently non-blocking (
continue-on-error: true). Once maintainers are confident it's stable, it can be made required by changing tocontinue-on-error: false.Verification
Stubtest errors reduced from 1,084 → 0 with the allowlist.