Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 25 additions & 4 deletions .github/scripts/validate_readmes/test-readme-check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,17 +50,38 @@ for target_dir in "${TARGET_DIRS[@]}"; do
# Determine if it's a component or pipeline
if [[ "$target_dir" == components/* ]]; then
TYPE_FLAG="--component"
ASSET_FILE="component.py"
elif [[ "$target_dir" == pipelines/* ]]; then
TYPE_FLAG="--pipeline"
ASSET_FILE="pipeline.py"
else
print_error "Invalid directory: $target_dir. Must be in components/ or pipelines/"
exit 2
fi

echo "Checking $target_dir..."
# Run in check mode (no --fix flag). Exit code 1 means diffs detected.
if ! uv run python -m scripts.generate_readme $TYPE_FLAG "$target_dir"; then
HAS_ERRORS=1
# Check if this is a direct component/pipeline or a subcategory
if [[ -f "$target_dir/$ASSET_FILE" ]]; then
# Direct component/pipeline
echo "Checking $target_dir..."
if ! uv run python -m scripts.generate_readme $TYPE_FLAG "$target_dir"; then
HAS_ERRORS=1
fi
else
# This might be a subcategory - find components inside
found_assets=0
for subdir in "$target_dir"/*/; do
if [[ -f "$subdir$ASSET_FILE" ]]; then
found_assets=1
echo "Checking $subdir..."
if ! uv run python -m scripts.generate_readme $TYPE_FLAG "${subdir%/}"; then
HAS_ERRORS=1
fi
fi
done
if [[ $found_assets -eq 0 ]]; then
print_error "'$target_dir' does not contain a $ASSET_FILE file and has no subdirectories with one"
exit 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit code 2 doesn't seem right here:

2	Misuse of shell builtins	Incorrect usage of a shell built-in command or permission problem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the exit codes in the main PR -#94

fi
fi
done

Expand Down
32 changes: 26 additions & 6 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,10 @@ Agents typically interact with this repository in three modes. Use the mode to d
- **Reuse-first**: search `components/<category>/` and `pipelines/<category>/` for similar functionality; prefer
extending/composing instead of duplicating.
- **Create scaffolding**: use the Make targets in `Makefile`:
- `make component CATEGORY=<cat> NAME=<name> [NO_TESTS]`
- `make pipeline CATEGORY=<cat> NAME=<name> [NO_TESTS]`
- `make tests TYPE=component|pipeline CATEGORY=<cat> NAME=<name>`
- `make readme TYPE=component|pipeline CATEGORY=<cat> NAME=<name>`
- `make component CATEGORY=<cat> NAME=<name> [SUBCATEGORY=<sub>] [NO_TESTS=true] [CREATE_SHARED=true]`
- `make pipeline CATEGORY=<cat> NAME=<name> [SUBCATEGORY=<sub>] [NO_TESTS=true] [CREATE_SHARED=true]`
- `make tests TYPE=component|pipeline CATEGORY=<cat> NAME=<name> [SUBCATEGORY=<sub>]`
- `make readme TYPE=component|pipeline CATEGORY=<cat> NAME=<name> [SUBCATEGORY=<sub>]`
- **Validate like CI**: follow [`CONTRIBUTING.md` (Testing and Quality)](docs/CONTRIBUTING.md#testing-and-quality) and
reference the workflows under `.github/workflows/` (example: [`.github/workflows/python-lint.yml`](.github/workflows/python-lint.yml)).
- **New assets require approval**: for initial contributions (introducing a new component/pipeline to the catalog),
Expand Down Expand Up @@ -66,7 +66,9 @@ Good places to look:
#### Establish the target location and naming

- Components live under `components/<category>/<component_name>/`.
- Components can optionally use subcategories: `components/<category>/<subcategory>/<component_name>/`.
- Pipelines live under `pipelines/<category>/<pipeline_name>/`.
- Pipelines can optionally use subcategories: `pipelines/<category>/<subcategory>/<pipeline_name>/`.
- Use `snake_case` directory names (per `CONTRIBUTING.md`).

### Required files
Expand Down Expand Up @@ -95,25 +97,43 @@ Process (expected for agents):
Use this prompt pattern:

"Search `components/` for similar functionality and reuse if possible. If a new component is needed, create it under
`components/<category>/<name>/` using `make component CATEGORY=<cat> NAME=<name> [NO_TESTS]`, then implement
`components/<category>/<name>/` using `make component CATEGORY=<cat> NAME=<name> [NO_TESTS=true]`, then implement
`component.py` following repository lint rules (including import guard). Create `metadata.yaml` that conforms to
the metadata schema defined in [`CONTRIBUTING.md`](docs/CONTRIBUTING.md#metadatayaml-schema) (required field order, fresh `lastVerified`). Generate/validate
`README.md` using `make readme TYPE=component CATEGORY=<cat> NAME=<name>`. Add unit tests using `.python_func()` and a
LocalRunner test using `setup_and_teardown_subprocess_runner` (you can generate tests via
`make tests TYPE=component CATEGORY=<cat> NAME=<name>`). Reference an existing component like
`components/data_processing/yoda_data_processor/` for patterns."

#### Add a component in a subcategory

Use this prompt pattern when creating related components that should share ownership or utilities:

"Create a component in a subcategory using `make component CATEGORY=<cat> SUBCATEGORY=<sub> NAME=<name>`. This
automatically creates the subcategory structure with OWNERS and README.md if it doesn't exist. For shared utilities,
add `CREATE_SHARED=true` to create a `shared/` package. Update the subcategory OWNERS and README.md with appropriate
maintainers and documentation. Follow the same component implementation patterns as above."

#### Add a new pipeline (reuse-first, compliant)

Use this prompt pattern:

"Search `pipelines/` for similar functionality and reuse if possible. If a new pipeline is needed, create it under
`pipelines/<category>/<name>/` using `make pipeline CATEGORY=<cat> NAME=<name> [NO_TESTS]`, then implement
`pipelines/<category>/<name>/` using `make pipeline CATEGORY=<cat> NAME=<name> [NO_TESTS=true]`, then implement
`pipeline.py` following repository lint rules (including import guard). Create `metadata.yaml` that conforms to the
metadata schema defined in [`CONTRIBUTING.md`](docs/CONTRIBUTING.md#metadatayaml-schema) (required field order, fresh
`lastVerified`). Generate/validate `README.md` using `make readme TYPE=pipeline CATEGORY=<cat> NAME=<name>`. Add tests
(you can generate tests via `make tests TYPE=pipeline CATEGORY=<cat> NAME=<name>`)."

#### Add a pipeline in a subcategory

Use this prompt pattern when creating related pipelines that should share ownership or utilities:

"Create a pipeline in a subcategory using `make pipeline CATEGORY=<cat> SUBCATEGORY=<sub> NAME=<name>`. This
automatically creates the subcategory structure with OWNERS and README.md if it doesn't exist. For shared utilities,
add `CREATE_SHARED=true` to create a `shared/` package. Update the subcategory OWNERS and README.md with appropriate
maintainers and documentation. Follow the same pipeline implementation patterns as above."

#### Update an existing component safely

"Find the existing component directory. Make the minimal change needed. Update docstrings and regenerate the README
Expand Down
75 changes: 53 additions & 22 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ RUFF ?= $(UVRUN) ruff
YAMLLINT ?= $(UVRUN) yamllint
PYTEST ?= $(UVRUN) pytest

.PHONY: format fix lint lint-format lint-python lint-markdown lint-yaml lint-imports test test-coverage component pipeline tests readme
.PHONY: format fix lint lint-format lint-python lint-markdown lint-yaml lint-imports test test-coverage component pipeline tests readme sync-packages

format:
$(RUFF) format components pipelines scripts
Expand Down Expand Up @@ -38,37 +38,68 @@ test-coverage:
cd .github/scripts && $(PYTEST) */tests/ --cov=. --cov-report=term-missing -v $(ARGS)

component:
@if [ -z "$(CATEGORY)" ]; then echo "Error: CATEGORY is required. Usage: make component CATEGORY=data_processing NAME=my_component [NO_TESTS]"; exit 1; fi
@if [ -z "$(NAME)" ]; then echo "Error: NAME is required. Usage: make component CATEGORY=data_processing NAME=my_component [NO_TESTS]"; exit 1; fi
@if [ -n "$(NO_TESTS)" ]; then \
$(UVRUN) scripts/generate_skeleton/generate_skeleton.py --type=component --category=$(CATEGORY) --name=$(NAME) --no-tests; \
@if [ -z "$(CATEGORY)" ]; then echo "Error: CATEGORY is required. Usage: make component CATEGORY=data_processing NAME=my_component [SUBCATEGORY=x] [NO_TESTS=true] [CREATE_SHARED=true]"; exit 1; fi
@if [ -z "$(NAME)" ]; then echo "Error: NAME is required. Usage: make component CATEGORY=data_processing NAME=my_component [SUBCATEGORY=x] [NO_TESTS=true] [CREATE_SHARED=true]"; exit 1; fi
@SUBCATEGORY_ARG=""; \
if [ -n "$(SUBCATEGORY)" ]; then SUBCATEGORY_ARG="--subcategory=$(SUBCATEGORY)"; fi; \
NO_TESTS_ARG=""; \
if [ "$(NO_TESTS)" = "true" ]; then NO_TESTS_ARG="--no-tests"; fi; \
CREATE_SHARED_ARG=""; \
if [ "$(CREATE_SHARED)" = "true" ]; then CREATE_SHARED_ARG="--create-shared"; fi; \
$(UVRUN) scripts/generate_skeleton/generate_skeleton.py --type=component --category=$(CATEGORY) --name=$(NAME) $$SUBCATEGORY_ARG $$NO_TESTS_ARG $$CREATE_SHARED_ARG; \
echo ""; \
echo "Generating READMEs..."; \
if [ -n "$(SUBCATEGORY)" ]; then \
$(UVRUN) -m scripts.generate_readme --component components/$(CATEGORY)/$(SUBCATEGORY)/$(NAME) --fix; \
else \
$(UVRUN) scripts/generate_skeleton/generate_skeleton.py --type=component --category=$(CATEGORY) --name=$(NAME); \
$(UVRUN) -m scripts.generate_readme --component components/$(CATEGORY)/$(NAME) --fix; \
fi
@$(MAKE) --no-print-directory sync-packages

pipeline:
@if [ -z "$(CATEGORY)" ]; then echo "Error: CATEGORY is required. Usage: make pipeline CATEGORY=training NAME=my_pipeline [NO_TESTS]"; exit 1; fi
@if [ -z "$(NAME)" ]; then echo "Error: NAME is required. Usage: make pipeline CATEGORY=training NAME=my_pipeline [NO_TESTS]"; exit 1; fi
@if [ -n "$(NO_TESTS)" ]; then \
$(UVRUN) scripts/generate_skeleton/generate_skeleton.py --type=pipeline --category=$(CATEGORY) --name=$(NAME) --no-tests; \
@if [ -z "$(CATEGORY)" ]; then echo "Error: CATEGORY is required. Usage: make pipeline CATEGORY=training NAME=my_pipeline [SUBCATEGORY=x] [NO_TESTS=true] [CREATE_SHARED=true]"; exit 1; fi
@if [ -z "$(NAME)" ]; then echo "Error: NAME is required. Usage: make pipeline CATEGORY=training NAME=my_pipeline [SUBCATEGORY=x] [NO_TESTS=true] [CREATE_SHARED=true]"; exit 1; fi
@SUBCATEGORY_ARG=""; \
if [ -n "$(SUBCATEGORY)" ]; then SUBCATEGORY_ARG="--subcategory=$(SUBCATEGORY)"; fi; \
NO_TESTS_ARG=""; \
if [ "$(NO_TESTS)" = "true" ]; then NO_TESTS_ARG="--no-tests"; fi; \
CREATE_SHARED_ARG=""; \
if [ "$(CREATE_SHARED)" = "true" ]; then CREATE_SHARED_ARG="--create-shared"; fi; \
$(UVRUN) scripts/generate_skeleton/generate_skeleton.py --type=pipeline --category=$(CATEGORY) --name=$(NAME) $$SUBCATEGORY_ARG $$NO_TESTS_ARG $$CREATE_SHARED_ARG; \
echo ""; \
echo "Generating READMEs..."; \
if [ -n "$(SUBCATEGORY)" ]; then \
$(UVRUN) -m scripts.generate_readme --pipeline pipelines/$(CATEGORY)/$(SUBCATEGORY)/$(NAME) --fix; \
else \
$(UVRUN) scripts/generate_skeleton/generate_skeleton.py --type=pipeline --category=$(CATEGORY) --name=$(NAME); \
$(UVRUN) -m scripts.generate_readme --pipeline pipelines/$(CATEGORY)/$(NAME) --fix; \
fi
@$(MAKE) --no-print-directory sync-packages

tests:
@if [ -z "$(TYPE)" ]; then echo "Error: TYPE is required. Usage: make tests TYPE=component|pipeline CATEGORY=data_processing NAME=my_component"; exit 1; fi
@if [ -z "$(CATEGORY)" ]; then echo "Error: CATEGORY is required. Usage: make tests TYPE=component|pipeline CATEGORY=data_processing NAME=my_component"; exit 1; fi
@if [ -z "$(NAME)" ]; then echo "Error: NAME is required. Usage: make tests TYPE=component|pipeline CATEGORY=data_processing NAME=my_component"; exit 1; fi
$(UVRUN) scripts/generate_skeleton/generate_skeleton.py --type=$(TYPE) --category=$(CATEGORY) --name=$(NAME) --tests-only
@if [ -z "$(TYPE)" ]; then echo "Error: TYPE is required. Usage: make tests TYPE=component|pipeline CATEGORY=data_processing NAME=my_component [SUBCATEGORY=x]"; exit 1; fi
@if [ -z "$(CATEGORY)" ]; then echo "Error: CATEGORY is required. Usage: make tests TYPE=component|pipeline CATEGORY=data_processing NAME=my_component [SUBCATEGORY=x]"; exit 1; fi
@if [ -z "$(NAME)" ]; then echo "Error: NAME is required. Usage: make tests TYPE=component|pipeline CATEGORY=data_processing NAME=my_component [SUBCATEGORY=x]"; exit 1; fi
@if [ "$(TYPE)" = "component" ] || [ "$(TYPE)" = "pipeline" ]; then \
SUBCATEGORY_ARG=""; \
if [ -n "$(SUBCATEGORY)" ]; then SUBCATEGORY_ARG="--subcategory=$(SUBCATEGORY)"; fi; \
$(UVRUN) scripts/generate_skeleton/generate_skeleton.py --type=$(TYPE) --category=$(CATEGORY) --name=$(NAME) $$SUBCATEGORY_ARG --tests-only; \
else \
echo "Error: TYPE must be either 'component' or 'pipeline'"; exit 1; \
fi

readme:
@if [ -z "$(TYPE)" ]; then echo "Error: TYPE is required. Usage: make readme TYPE=component|pipeline CATEGORY=data_processing NAME=my_component"; exit 1; fi
@if [ -z "$(CATEGORY)" ]; then echo "Error: CATEGORY is required. Usage: make readme TYPE=component|pipeline CATEGORY=data_processing NAME=my_component"; exit 1; fi
@if [ -z "$(NAME)" ]; then echo "Error: NAME is required. Usage: make readme TYPE=component|pipeline CATEGORY=data_processing NAME=my_component"; exit 1; fi
@if [ "$(TYPE)" = "component" ]; then \
$(UVRUN) -m scripts.generate_readme --component $(TYPE)s/$(CATEGORY)/$(NAME) --fix; \
elif [ "$(TYPE)" = "pipeline" ]; then \
$(UVRUN) -m scripts.generate_readme --pipeline $(TYPE)s/$(CATEGORY)/$(NAME) --fix; \
@if [ -z "$(TYPE)" ]; then echo "Error: TYPE is required. Usage: make readme TYPE=component|pipeline CATEGORY=data_processing NAME=my_component [SUBCATEGORY=x]"; exit 1; fi
@if [ -z "$(CATEGORY)" ]; then echo "Error: CATEGORY is required. Usage: make readme TYPE=component|pipeline CATEGORY=data_processing NAME=my_component [SUBCATEGORY=x]"; exit 1; fi
@if [ -z "$(NAME)" ]; then echo "Error: NAME is required. Usage: make readme TYPE=component|pipeline CATEGORY=data_processing NAME=my_component [SUBCATEGORY=x]"; exit 1; fi
@if [ "$(TYPE)" = "component" ] || [ "$(TYPE)" = "pipeline" ]; then \
if [ -n "$(SUBCATEGORY)" ]; then \
$(UVRUN) -m scripts.generate_readme --$(TYPE) $(TYPE)s/$(CATEGORY)/$(SUBCATEGORY)/$(NAME) --fix; \
else \
$(UVRUN) -m scripts.generate_readme --$(TYPE) $(TYPE)s/$(CATEGORY)/$(NAME) --fix; \
fi; \
else \
echo "Error: TYPE must be either 'component' or 'pipeline'"; exit 1; \
fi

sync-packages:
@$(UVRUN) scripts/sync_packages.py
7 changes: 7 additions & 0 deletions components/training/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Training Components

This directory contains components in the **Training** category:

## Subcategories

- [Sklearn Models](./sklearn_models/README.md)
7 changes: 7 additions & 0 deletions components/training/sklearn_models/OWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
approvers:
# TODO: Add your GitHub username here (must be a Kubeflow community member)
# - your-github-username
reviewers:
# TODO: Add reviewers' GitHub usernames here
# - reviewer1
# - reviewer2
5 changes: 5 additions & 0 deletions components/training/sklearn_models/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Sklearn Models

This subcategory contains components in the **Sklearn Models** group:

- [Logistic Regression](./logistic_regression/README.md): Logistic Regression component.
1 change: 1 addition & 0 deletions components/training/sklearn_models/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Assets in the sklearn_models subcategory."""
7 changes: 7 additions & 0 deletions components/training/sklearn_models/logistic_regression/OWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
approvers:
# TODO: Add your GitHub username here (must be a Kubeflow community member)
# - your-github-username
reviewers:
# TODO: Add reviewers' GitHub usernames here
# - reviewer1
# - reviewer2
39 changes: 39 additions & 0 deletions components/training/sklearn_models/logistic_regression/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Logistic Regression ✨

> ⚠️ **Stability: alpha** — This asset is not yet stable and may change.

## Overview 🧾

Logistic Regression component.

TODO: Add a detailed description of what this component does.

Args: input_param: Description of the component parameter. # Add descriptions for other parameters

Returns: Description of what the component returns.

## Inputs 📥

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `input_param` | `str` | `None` | |

## Outputs 📤

| Name | Type | Description |
|------|------|-------------|
| Output | `str` | |

## Metadata 🗂️

- **Name**: logistic_regression
- **Stability**: alpha
- **Dependencies**:
- Kubeflow:
- Name: Pipelines, Version: >=2.15.2
- **Tags**:
- training
- **Last Verified**: 2026-02-11 20:18:36+00:00
- **Owners**:
- Approvers: None
- Reviewers: None
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from .component import logistic_regression

__all__ = ["logistic_regression"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
from kfp import dsl


@dsl.component(
base_image="python:3.11",
# packages_to_install=["numpy", "pandas"], # Add your dependencies here
)
def logistic_regression(
# Add your component parameters here
input_param: str,
# Add your output artifacts here
# output_artifact: dsl.Output[dsl.Artifact]
) -> str: # Specify your return type
"""Logistic Regression component.

TODO: Add a detailed description of what this component does.

Args:
input_param: Description of the component parameter.
# Add descriptions for other parameters

Returns:
Description of what the component returns.
"""
# TODO: Implement your component logic here


if __name__ == "__main__":
from kfp.compiler import Compiler

Compiler().compile(
logistic_regression,
package_path=__file__.replace(".py", "_component.yaml"),
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
name: logistic_regression
stability: alpha # New component without proven track record
dependencies:
kubeflow:
- name: Pipelines
version: '>=2.15.2'
# external_services: # Add if component uses external services

Check warning on line 8 in components/training/sklearn_models/logistic_regression/metadata.yaml

View workflow job for this annotation

GitHub Actions / yaml-lint

8:3 [comments-indentation] comment not indented like content
# - name: Example Service
# version: ">=1.0.0"
tags:
- training
# Add more relevant tags here
lastVerified: 2026-02-11T20:18:36Z
# links: # Add relevant links
# documentation: https://your-docs-url.com
# issue_tracker: https://github.com/kubeflow/pipelines-components/issues
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Test package for component tests
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
"""Local runner tests for the logistic_regression component."""

from ..component import logistic_regression


class TestLogisticRegressionLocalRunner:
"""Test component with LocalRunner (subprocess execution)."""

def test_local_execution(self, setup_and_teardown_subprocess_runner): # noqa: F811
"""Test component execution with LocalRunner."""
# TODO: Implement local runner tests for your component

# Example test structure:
result = logistic_regression(input_param="test_value")

# Add assertions about expected outputs if needed
assert result is not None
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
"""Tests for the logistic_regression component."""

from ..component import logistic_regression


class TestLogisticRegressionUnitTests:
"""Unit tests for component logic."""

def test_component_function_exists(self):
"""Test that the component function is properly imported."""
assert callable(logistic_regression)
assert hasattr(logistic_regression, "python_func")

def test_component_with_default_parameters(self):
"""Test component with valid input parameters."""
# TODO: Implement unit tests for your component

# Example test structure:
result = logistic_regression.python_func(input_param="test_value")
assert isinstance(result, str)
assert "test_value" in result

# TODO: Add more comprehensive unit tests
# @mock.patch("external_library.some_function")
# def test_component_with_mocked_dependencies(self, mock_function):
# """Test component behavior with mocked external calls."""
# pass
1 change: 1 addition & 0 deletions components/training/sklearn_models/shared/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Shared utilities for the sklearn_models subcategory."""
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""Shared utility functions for the sklearn_models subcategory."""


# TODO: Add shared utility functions, classes, or constants here.
Loading
Loading