Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .github/workflows/typecheck.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Type Check (mypy)

on:
workflow_dispatch:
pull_request:
types: [opened, synchronize, reopened]
paths:
- "kubeflow-pipelines/**/*.py"
- "pyproject.toml"
- "requirements-dev.txt"
- ".github/workflows/typecheck.yml"

jobs:
typecheck:
runs-on: ubuntu-latest
permissions:
contents: read

steps:
- uses: actions/checkout@v6

- uses: actions/setup-python@v6
with:
python-version: "3.12"
Comment on lines +20 to +24
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | πŸ”΄ Critical

🧩 Analysis chain

🌐 Web query:

What are the latest commit SHAs for actions/checkout and actions/setup-python GitHub Actions?

πŸ’‘ Result:

The latest release for actions/checkout is v6.0.2 (released 2026-01-09). The commit SHA for this tag is de0fac2. The latest release for actions/setup-python is v6.2.0 (released 2026-01-22). The specific commit SHA for this tag could not be directly extracted from search results, but the repository's last push was on 2026-03-25 to main branch. Users should check the official releases page for the exact tag commit: https://github.com/actions/setup-python/releases/tag/v6.2.0

Citations:


🏁 Script executed:

#!/bin/bash
# Get the actual commit SHAs for the latest versions of these actions
# Using GitHub API to fetch release information

# actions/checkout
echo "=== actions/checkout releases ==="
curl -s "https://api.github.com/repos/actions/checkout/releases?per_page=5" | jq -r '.[] | "Tag: \(.tag_name), SHA: \(.target_commitish)"' | head -10

echo ""
echo "=== actions/setup-python releases ==="
# actions/setup-python
curl -s "https://api.github.com/repos/actions/setup-python/releases?per_page=5" | jq -r '.[] | "Tag: \(.tag_name), SHA: \(.target_commitish)"' | head -10

Repository: opendatahub-io/data-processing

Length of output: 317


🌐 Web query:

site:github.com/actions/checkout releases v6.0.2 commit SHA

πŸ’‘ Result:

The commit SHA for the actions/checkout v6.0.2 release is de0fac2e4500dabe0009e67214ff5f5447ce83dd.

Citations:


🌐 Web query:

site:github.com/actions/setup-python releases v6.2.0 commit SHA

πŸ’‘ Result:

No definitive commit SHA found for a specific "releases v6.2.0" across searched repositories. Multiple projects have v6.2.0 releases (e.g., PowerShell v6.2.0 at https://github.com/PowerShell/PowerShell/releases/tag/v6.2.0 created 2019-03-28T18:49:59Z; Ant Design 6.2.0 at https://github.com/ant-design/ant-design/releases/tag/6.2.0 created 2026-01-13T06:08:53Z; Laravel framework v6.2.0), but none of the snippets explicitly state the full commit SHA the tag points to. Release pages typically show the tag creation timestamp and author, but the exact commit hash requires visiting the repo or using git commands like git rev-list v6.2.0. For PowerShell, the release notes list changes but no SHA. For Ant Design (recent 2026 release), changelog commit mentioned as 185398a but likely abbreviated.

Citations:


🌐 Web query:

site:github.com/actions/checkout "11bd71901bbe5b1630ceea73d27597364c9af683"

πŸ’‘ Result:

11bd71901bbe5b1630ceea73d27597364c9af683 is the Git commit SHA for the actions/checkout GitHub Action at version v4.2.2. It is used in GitHub Actions workflows to pin the checkout action to this specific commit for reproducibility and security, as seen in examples from reviewdog/action-yamllint and sethvargo/ratchet. The actions/checkout repository's CHANGELOG.md lists v4.2.2 under v4 releases, confirming it as a valid version with updates like leveraging environment variables in url-helper.ts.

Citations:


🌐 Web query:

site:github.com/actions/setup-python "a26af69be951a213d495a4c3e4e4022e16d87065"

πŸ’‘ Result:

The string "a26af69be951a213d495a4c3e4e4022e16d87065" is a Git commit SHA in the GitHub Actions repository actions/setup-python. It is used as a version ref in GitHub Actions workflows to pin the setup-python action to a specific commit, corresponding to version 5.6.0. This version installs a specific Python version, adds it to PATH, supports caching for pip/pipenv/poetry, and handles PyPy/GraalPy. It has been referenced in workflows like getsentry/sentry-kafka-management and getsentry/streams, and discussed in issue #1232 regarding rate limit handling.

Citations:


Pin actions by full commit SHA, not version tags (CWE-829: Inclusion of Functionality from Untrusted Control Sphere).

Tags are mutable references. An attacker who compromises the upstream repository can move a tag to point to malicious code. Pin to the full 40-character commit SHA instead of @v6.

πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/typecheck.yml around lines 20 - 24, Replace the mutable
action tags with pinned commit SHAs: change the uses entries for
actions/checkout@v6 and actions/setup-python@v6 to their respective full
40-character commit SHAs (e.g., actions/checkout@<full-sha> and
actions/setup-python@<full-sha>) so the workflow uses immutable commits; update
both occurrences in the workflow file and verify the SHAs are the official
release commit hashes from the actions repositories.


- name: Install type checking dependencies
run: |
pip install mypy
pip install types-requests
Comment on lines +26 to +29
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | πŸ”΄ Critical

🧩 Analysis chain

🌐 Web query:

What are the latest stable versions of mypy and types-requests Python packages?

πŸ’‘ Result:

The latest stable version of mypy is 1.19.1 (released 2025-12-15). The latest stable version of types-requests is 2.32.4.20260107 (released 2026-01-07).

Citations:


🏁 Script executed:

cat -n .github/workflows/typecheck.yml

Repository: opendatahub-io/data-processing

Length of output: 982


Pin GitHub Actions to full SHA commits and pin pip dependency versions (CWE-829).

Actions at lines 20 and 22 use version tags (@v6) instead of full commit SHAs, creating supply chain attack risk. Additionally, unpinned pip dependencies at lines 28-29 can introduce breaking changes.

-      - uses: actions/checkout@v6
+      - uses: actions/checkout@a1cd7436b75281505b582f28af266402b4b15d67
 
-      - uses: actions/setup-python@v6
+      - uses: actions/setup-python@3e1ebddf0c1ce5469f8995e0f3bd46dcd2006482
         with:
           python-version: "3.12"
 
       - name: Install type checking dependencies
         run: |
-          pip install mypy
-          pip install types-requests
+          pip install mypy==1.19.1 types-requests==2.32.4.20260107
πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/typecheck.yml around lines 26 - 29, The workflow step
"Install type checking dependencies" currently installs unpinned pip packages
via "pip install mypy" and "pip install types-requests" and the workflow uses
actions by tag elsewhere; update the workflow to pin GitHub Actions to their
full commit SHAs (replace any uses: ...@vX tags with the corresponding full SHA
commits) and pin pip packages to specific, tested versions (e.g., change "pip
install mypy" and "pip install types-requests" to install exact versions like
"pip install mypy==<version>" and "pip install types-requests==<version>");
ensure you choose known-good SHAs and package versions and update the step
"Install type checking dependencies" accordingly.


- name: Run mypy on kubeflow-pipelines
run: mypy kubeflow-pipelines/
43 changes: 42 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,45 @@ lint.ignore = ["E203","E501","UP006","UP007","UP035"]
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
addopts = ["--tb=short", "-v"]
addopts = ["--tb=short", "-v"]

[tool.mypy]
python_version = "3.12"
warn_return_any = true
warn_unused_configs = true
warn_unused_ignores = true
warn_redundant_casts = true
check_untyped_defs = true
disallow_untyped_defs = false
disallow_incomplete_defs = true
no_implicit_optional = true
strict_equality = true
show_error_codes = true
show_column_numbers = true

# Exclude generated and non-source files
exclude = [
"venv/",
"\\.venv/",
"\\.ipynb_checkpoints/",
"local_outputs/",
]

# KFP components: imports happen inside function bodies and reference
# packages only available in the container base image (docling, etc.).
[[tool.mypy.overrides]]
module = [
"kubeflow-pipelines.common.components",
"kubeflow-pipelines.docling-standard.standard_components",
"kubeflow-pipelines.docling-vlm.vlm_components",
]
disable_error_code = ["import-not-found"]
Comment on lines +58 to +64
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check actual directory name and structure
fd -t d -d 1 'kubeflow' .
# Check if there's an __init__.py making it a package
fd '__init__.py' kubeflow-pipelines/ 2>/dev/null || fd '__init__.py' kubeflow_pipelines/ 2>/dev/null
# List Python files to understand actual module structure
tree -L 3 -P '*.py' kubeflow-pipelines/ 2>/dev/null || tree -L 3 -P '*.py' kubeflow_pipelines/ 2>/dev/null

Repository: opendatahub-io/data-processing

Length of output: 456


🏁 Script executed:

# Check the pyproject.toml for package configuration
head -50 pyproject.toml

# Search for any Python imports of these modules
rg 'from kubeflow' --type py -A 2
rg 'import kubeflow' --type py -A 2

Repository: opendatahub-io/data-processing

Length of output: 1365


🏁 Script executed:

# Get the full pyproject.toml - check file size first
wc -l pyproject.toml

# Read the entire file to see package name and full context
cat pyproject.toml

# Also search for imports in Python files
rg 'from kubeflow|import kubeflow' --type py

Repository: opendatahub-io/data-processing

Length of output: 1978


🏁 Script executed:

# Check if this is a package itself with entry points
grep -n "name\|packages" pyproject.toml

# Search all Python files to see what they actually import
rg 'from kubeflow_pipelines|import kubeflow_pipelines' --type py

# Check if any files import relative to the kubeflow-pipelines directory
find kubeflow-pipelines -name '*.py' -exec grep -l 'from \.' {} \;

# Look for how components are actually used/imported in tests
find tests -name '*.py' 2>/dev/null | head -5 && find tests -name '*.py' -exec head -20 {} \;

Repository: opendatahub-io/data-processing

Length of output: 1762


🏁 Script executed:

# Check if kubeflow-pipelines is a package being installed (look for setup.py or [project] section)
grep -A 20 '^\[project\]' pyproject.toml

# Look for actual imports from kubeflow-pipelines or kubeflow_pipelines in Python files
find . -name '*.py' -type f | xargs grep -h 'from kubeflow' 2>/dev/null | head -20

# Check what the actual module structure is - do subdirs have __init__.py?
find kubeflow-pipelines -name '__init__.py' -type f

Repository: opendatahub-io/data-processing

Length of output: 111


🏁 Script executed:

# Check which directories have __init__.py - get full list
find kubeflow-pipelines -name '__init__.py' -type f

# Check if docling-standard and docling-vlm dirs have __init__.py
ls -la kubeflow-pipelines/docling-standard/
ls -la kubeflow-pipelines/docling-vlm/

# Check what actual Python modules would be importable
find kubeflow-pipelines -type d | sort

Repository: opendatahub-io/data-processing

Length of output: 1607


🏁 Script executed:

# Check how components.py is actually used - search for imports of it
rg 'components' --type py -A 2 -B 2 | head -50

# Check if kubeflow-pipelines itself has an __init__.py at root
ls -la kubeflow-pipelines/__init__.py 2>&1

# Look for sys.path manipulation or PYTHONPATH usage
rg 'sys.path|PYTHONPATH' --type py

# Check the actual import statements in the files
head -30 kubeflow-pipelines/common/components.py

Repository: opendatahub-io/data-processing

Length of output: 4802


🏁 Script executed:

# Verify mypy behavior - search for mypy documentation or issues about hyphenated module names
web_search "mypy module override hyphenated names"

# Also check if there are any test files that show how mypy is used
find . -name '*mypy*' -o -name 'test_*.py' | xargs grep -l mypy 2>/dev/null

# Double-check: verify these directories are NOT importable as packages
python3 <<'EOF'
import sys
sys.path.insert(0, '.')

# Try to import the module as listed in mypy config
try:
    import kubeflow_pipelines.common.components
    print("βœ“ kubeflow_pipelines (with underscore) importable")
except ImportError as e:
    print(f"βœ— kubeflow_pipelines import failed: {e}")

try:
    import kubeflow_pipelines.common
    print("βœ“ kubeflow_pipelines.common (with underscore) importable")
except ImportError as e:
    print(f"βœ— kubeflow_pipelines.common import failed: {e}")

try:
    import common
    print("βœ“ common importable")
except ImportError as e:
    print(f"βœ— common import failed: {e}")
EOF

Repository: opendatahub-io/data-processing

Length of output: 341


🏁 Script executed:

# Let's understand the actual mypy behavior - check if mypy processes these overrides
cd kubeflow-pipelines/docling-standard

# Try to run mypy with the config to see what happens
python3 -m mypy --config-file=../../pyproject.toml standard_components.py 2>&1 | head -30

# Also check: mypy's module matching - does it match on import paths or file paths?
# Let's see what mypy reports for these files
cd ../..
python3 -m mypy --config-file=pyproject.toml kubeflow-pipelines/docling-standard/standard_components.py 2>&1 | head -40

Repository: opendatahub-io/data-processing

Length of output: 151


🏁 Script executed:

# Let's check mypy documentation or understand the actual impact
# Search for how mypy actually processes module overrides
rg 'mypy.*override|module.*pattern' --type md

# Key insight: check if mypy would even see these files as modules with these names
# Since there's no __init__.py at kubeflow-pipelines/, let's see what names mypy would assign

# Check git to see if __init__.py was removed or never existed
git log --oneline --all -- 'kubeflow-pipelines/__init__.py' 2>/dev/null | head -10

# Check if there's a setup.py or other package config
ls -la | grep -E 'setup|MANIFEST|pyproject'

# Final verification: Does the package define entry points or install paths that would make kubeflow-pipelines importable?
grep -n 'packages\|py-modules\|entry-points' pyproject.toml

Repository: opendatahub-io/data-processing

Length of output: 212


🏁 Script executed:

# Let's verify the actual behavior: mypy might silently ignore non-matching module patterns
# Check mypy's behavior when module patterns don't match

# Search for any documentation or comments about this configuration
head -72 pyproject.toml | tail -20

# Check if there are any other clues about intended module structure
find . -name 'README*' -o -name 'DESIGN*' | xargs head -50 2>/dev/null

# One more critical check: are these module paths ever referenced in imports?
rg '"kubeflow-pipelines\.|kubeflow_pipelines\.' --type py

Repository: opendatahub-io/data-processing

Length of output: 14818


Module override patterns with hyphens will not match β€” use underscores in mypy module names.

Mypy computes module names by normalizing directory names to use underscores (PEP 423). The patterns kubeflow-pipelines.common.components, etc., use hyphens which won't match mypy's computed module names. Change them to kubeflow_pipelines.common.components, kubeflow_pipelines.docling_standard.standard_components, and kubeflow_pipelines.docling_vlm.vlm_components so the disable_error_code = ["import-not-found"] override actually applies.

πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 58 - 64, The mypy override module patterns use
hyphens and won't match computed module names; update the module entries in the
[[tool.mypy.overrides]] block by replacing hyphens with underscores so the
patterns match: change "kubeflow-pipelines.common.components" to
"kubeflow_pipelines.common.components",
"kubeflow-pipelines.docling-standard.standard_components" to
"kubeflow_pipelines.docling_standard.standard_components", and
"kubeflow-pipelines.docling-vlm.vlm_components" to
"kubeflow_pipelines.docling_vlm.vlm_components" so the disable_error_code =
["import-not-found"] override applies.


# Third-party libraries without type stubs
[[tool.mypy.overrides]]
module = [
"kfp.*",
"docling.*",
"docling_core.*",
]
ignore_missing_imports = true
Loading