-
Notifications
You must be signed in to change notification settings - Fork 9
Add mypy type checking for KFP pipelines #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| name: Type Check (mypy) | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
| pull_request: | ||
| types: [opened, synchronize, reopened] | ||
| paths: | ||
| - "kubeflow-pipelines/**/*.py" | ||
| - "pyproject.toml" | ||
| - "requirements-dev.txt" | ||
| - ".github/workflows/typecheck.yml" | ||
|
|
||
| jobs: | ||
| typecheck: | ||
| runs-on: ubuntu-latest | ||
| permissions: | ||
| contents: read | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v6 | ||
|
|
||
| - uses: actions/setup-python@v6 | ||
| with: | ||
| python-version: "3.12" | ||
|
|
||
| - name: Install type checking dependencies | ||
| run: | | ||
| pip install mypy | ||
| pip install types-requests | ||
|
Comment on lines
+26
to
+29
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. π§© Analysis chainπ Web query:
π‘ Result: The latest stable version of mypy is 1.19.1 (released 2025-12-15). The latest stable version of types-requests is 2.32.4.20260107 (released 2026-01-07). Citations:
π Script executed: cat -n .github/workflows/typecheck.ymlRepository: opendatahub-io/data-processing Length of output: 982 Pin GitHub Actions to full SHA commits and pin pip dependency versions (CWE-829). Actions at lines 20 and 22 use version tags ( - - uses: actions/checkout@v6
+ - uses: actions/checkout@a1cd7436b75281505b582f28af266402b4b15d67
- - uses: actions/setup-python@v6
+ - uses: actions/setup-python@3e1ebddf0c1ce5469f8995e0f3bd46dcd2006482
with:
python-version: "3.12"
- name: Install type checking dependencies
run: |
- pip install mypy
- pip install types-requests
+ pip install mypy==1.19.1 types-requests==2.32.4.20260107π€ Prompt for AI Agents |
||
|
|
||
| - name: Run mypy on kubeflow-pipelines | ||
| run: mypy kubeflow-pipelines/ | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,4 +29,45 @@ lint.ignore = ["E203","E501","UP006","UP007","UP035"] | |
| [tool.pytest.ini_options] | ||
| testpaths = ["tests"] | ||
| python_files = ["test_*.py"] | ||
| addopts = ["--tb=short", "-v"] | ||
| addopts = ["--tb=short", "-v"] | ||
|
|
||
| [tool.mypy] | ||
| python_version = "3.12" | ||
| warn_return_any = true | ||
| warn_unused_configs = true | ||
| warn_unused_ignores = true | ||
| warn_redundant_casts = true | ||
| check_untyped_defs = true | ||
| disallow_untyped_defs = false | ||
| disallow_incomplete_defs = true | ||
| no_implicit_optional = true | ||
| strict_equality = true | ||
| show_error_codes = true | ||
| show_column_numbers = true | ||
|
|
||
| # Exclude generated and non-source files | ||
| exclude = [ | ||
| "venv/", | ||
| "\\.venv/", | ||
| "\\.ipynb_checkpoints/", | ||
| "local_outputs/", | ||
| ] | ||
|
|
||
| # KFP components: imports happen inside function bodies and reference | ||
| # packages only available in the container base image (docling, etc.). | ||
| [[tool.mypy.overrides]] | ||
| module = [ | ||
| "kubeflow-pipelines.common.components", | ||
| "kubeflow-pipelines.docling-standard.standard_components", | ||
| "kubeflow-pipelines.docling-vlm.vlm_components", | ||
| ] | ||
| disable_error_code = ["import-not-found"] | ||
|
Comment on lines
+58
to
+64
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. π§© Analysis chainπ Script executed: #!/bin/bash
# Check actual directory name and structure
fd -t d -d 1 'kubeflow' .
# Check if there's an __init__.py making it a package
fd '__init__.py' kubeflow-pipelines/ 2>/dev/null || fd '__init__.py' kubeflow_pipelines/ 2>/dev/null
# List Python files to understand actual module structure
tree -L 3 -P '*.py' kubeflow-pipelines/ 2>/dev/null || tree -L 3 -P '*.py' kubeflow_pipelines/ 2>/dev/nullRepository: opendatahub-io/data-processing Length of output: 456 π Script executed: # Check the pyproject.toml for package configuration
head -50 pyproject.toml
# Search for any Python imports of these modules
rg 'from kubeflow' --type py -A 2
rg 'import kubeflow' --type py -A 2Repository: opendatahub-io/data-processing Length of output: 1365 π Script executed: # Get the full pyproject.toml - check file size first
wc -l pyproject.toml
# Read the entire file to see package name and full context
cat pyproject.toml
# Also search for imports in Python files
rg 'from kubeflow|import kubeflow' --type pyRepository: opendatahub-io/data-processing Length of output: 1978 π Script executed: # Check if this is a package itself with entry points
grep -n "name\|packages" pyproject.toml
# Search all Python files to see what they actually import
rg 'from kubeflow_pipelines|import kubeflow_pipelines' --type py
# Check if any files import relative to the kubeflow-pipelines directory
find kubeflow-pipelines -name '*.py' -exec grep -l 'from \.' {} \;
# Look for how components are actually used/imported in tests
find tests -name '*.py' 2>/dev/null | head -5 && find tests -name '*.py' -exec head -20 {} \;Repository: opendatahub-io/data-processing Length of output: 1762 π Script executed: # Check if kubeflow-pipelines is a package being installed (look for setup.py or [project] section)
grep -A 20 '^\[project\]' pyproject.toml
# Look for actual imports from kubeflow-pipelines or kubeflow_pipelines in Python files
find . -name '*.py' -type f | xargs grep -h 'from kubeflow' 2>/dev/null | head -20
# Check what the actual module structure is - do subdirs have __init__.py?
find kubeflow-pipelines -name '__init__.py' -type fRepository: opendatahub-io/data-processing Length of output: 111 π Script executed: # Check which directories have __init__.py - get full list
find kubeflow-pipelines -name '__init__.py' -type f
# Check if docling-standard and docling-vlm dirs have __init__.py
ls -la kubeflow-pipelines/docling-standard/
ls -la kubeflow-pipelines/docling-vlm/
# Check what actual Python modules would be importable
find kubeflow-pipelines -type d | sortRepository: opendatahub-io/data-processing Length of output: 1607 π Script executed: # Check how components.py is actually used - search for imports of it
rg 'components' --type py -A 2 -B 2 | head -50
# Check if kubeflow-pipelines itself has an __init__.py at root
ls -la kubeflow-pipelines/__init__.py 2>&1
# Look for sys.path manipulation or PYTHONPATH usage
rg 'sys.path|PYTHONPATH' --type py
# Check the actual import statements in the files
head -30 kubeflow-pipelines/common/components.pyRepository: opendatahub-io/data-processing Length of output: 4802 π Script executed: # Verify mypy behavior - search for mypy documentation or issues about hyphenated module names
web_search "mypy module override hyphenated names"
# Also check if there are any test files that show how mypy is used
find . -name '*mypy*' -o -name 'test_*.py' | xargs grep -l mypy 2>/dev/null
# Double-check: verify these directories are NOT importable as packages
python3 <<'EOF'
import sys
sys.path.insert(0, '.')
# Try to import the module as listed in mypy config
try:
import kubeflow_pipelines.common.components
print("β kubeflow_pipelines (with underscore) importable")
except ImportError as e:
print(f"β kubeflow_pipelines import failed: {e}")
try:
import kubeflow_pipelines.common
print("β kubeflow_pipelines.common (with underscore) importable")
except ImportError as e:
print(f"β kubeflow_pipelines.common import failed: {e}")
try:
import common
print("β common importable")
except ImportError as e:
print(f"β common import failed: {e}")
EOFRepository: opendatahub-io/data-processing Length of output: 341 π Script executed: # Let's understand the actual mypy behavior - check if mypy processes these overrides
cd kubeflow-pipelines/docling-standard
# Try to run mypy with the config to see what happens
python3 -m mypy --config-file=../../pyproject.toml standard_components.py 2>&1 | head -30
# Also check: mypy's module matching - does it match on import paths or file paths?
# Let's see what mypy reports for these files
cd ../..
python3 -m mypy --config-file=pyproject.toml kubeflow-pipelines/docling-standard/standard_components.py 2>&1 | head -40Repository: opendatahub-io/data-processing Length of output: 151 π Script executed: # Let's check mypy documentation or understand the actual impact
# Search for how mypy actually processes module overrides
rg 'mypy.*override|module.*pattern' --type md
# Key insight: check if mypy would even see these files as modules with these names
# Since there's no __init__.py at kubeflow-pipelines/, let's see what names mypy would assign
# Check git to see if __init__.py was removed or never existed
git log --oneline --all -- 'kubeflow-pipelines/__init__.py' 2>/dev/null | head -10
# Check if there's a setup.py or other package config
ls -la | grep -E 'setup|MANIFEST|pyproject'
# Final verification: Does the package define entry points or install paths that would make kubeflow-pipelines importable?
grep -n 'packages\|py-modules\|entry-points' pyproject.tomlRepository: opendatahub-io/data-processing Length of output: 212 π Script executed: # Let's verify the actual behavior: mypy might silently ignore non-matching module patterns
# Check mypy's behavior when module patterns don't match
# Search for any documentation or comments about this configuration
head -72 pyproject.toml | tail -20
# Check if there are any other clues about intended module structure
find . -name 'README*' -o -name 'DESIGN*' | xargs head -50 2>/dev/null
# One more critical check: are these module paths ever referenced in imports?
rg '"kubeflow-pipelines\.|kubeflow_pipelines\.' --type pyRepository: opendatahub-io/data-processing Length of output: 14818 Module override patterns with hyphens will not match β use underscores in mypy module names. Mypy computes module names by normalizing directory names to use underscores (PEP 423). The patterns π€ Prompt for AI Agents |
||
|
|
||
| # Third-party libraries without type stubs | ||
| [[tool.mypy.overrides]] | ||
| module = [ | ||
| "kfp.*", | ||
| "docling.*", | ||
| "docling_core.*", | ||
| ] | ||
| ignore_missing_imports = true | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π§© Analysis chain
π Web query:
What are the latest commit SHAs for actions/checkout and actions/setup-python GitHub Actions?π‘ Result:
The latest release for actions/checkout is v6.0.2 (released 2026-01-09). The commit SHA for this tag is de0fac2. The latest release for actions/setup-python is v6.2.0 (released 2026-01-22). The specific commit SHA for this tag could not be directly extracted from search results, but the repository's last push was on 2026-03-25 to main branch. Users should check the official releases page for the exact tag commit: https://github.com/actions/setup-python/releases/tag/v6.2.0
Citations:
π Script executed:
Repository: opendatahub-io/data-processing
Length of output: 317
π Web query:
site:github.com/actions/checkout releases v6.0.2 commit SHAπ‘ Result:
The commit SHA for the actions/checkout v6.0.2 release is de0fac2e4500dabe0009e67214ff5f5447ce83dd.
Citations:
π Web query:
site:github.com/actions/setup-python releases v6.2.0 commit SHAπ‘ Result:
No definitive commit SHA found for a specific "releases v6.2.0" across searched repositories. Multiple projects have v6.2.0 releases (e.g., PowerShell v6.2.0 at https://github.com/PowerShell/PowerShell/releases/tag/v6.2.0 created 2019-03-28T18:49:59Z; Ant Design 6.2.0 at https://github.com/ant-design/ant-design/releases/tag/6.2.0 created 2026-01-13T06:08:53Z; Laravel framework v6.2.0), but none of the snippets explicitly state the full commit SHA the tag points to. Release pages typically show the tag creation timestamp and author, but the exact commit hash requires visiting the repo or using git commands like git rev-list v6.2.0. For PowerShell, the release notes list changes but no SHA. For Ant Design (recent 2026 release), changelog commit mentioned as 185398a but likely abbreviated.
Citations:
π Web query:
site:github.com/actions/checkout "11bd71901bbe5b1630ceea73d27597364c9af683"π‘ Result:
11bd71901bbe5b1630ceea73d27597364c9af683 is the Git commit SHA for the actions/checkout GitHub Action at version v4.2.2. It is used in GitHub Actions workflows to pin the checkout action to this specific commit for reproducibility and security, as seen in examples from reviewdog/action-yamllint and sethvargo/ratchet. The actions/checkout repository's CHANGELOG.md lists v4.2.2 under v4 releases, confirming it as a valid version with updates like leveraging environment variables in url-helper.ts.
Citations:
π Web query:
site:github.com/actions/setup-python "a26af69be951a213d495a4c3e4e4022e16d87065"π‘ Result:
The string "a26af69be951a213d495a4c3e4e4022e16d87065" is a Git commit SHA in the GitHub Actions repository actions/setup-python. It is used as a version ref in GitHub Actions workflows to pin the setup-python action to a specific commit, corresponding to version 5.6.0. This version installs a specific Python version, adds it to PATH, supports caching for pip/pipenv/poetry, and handles PyPy/GraalPy. It has been referenced in workflows like getsentry/sentry-kafka-management and getsentry/streams, and discussed in issue
#1232regarding rate limit handling.Citations:
Pin actions by full commit SHA, not version tags (CWE-829: Inclusion of Functionality from Untrusted Control Sphere).
Tags are mutable references. An attacker who compromises the upstream repository can move a tag to point to malicious code. Pin to the full 40-character commit SHA instead of
@v6.π€ Prompt for AI Agents