diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml new file mode 100644 index 0000000..c89f815 --- /dev/null +++ b/.github/workflows/docs.yml @@ -0,0 +1,41 @@ +name: Docs + +on: + push: + branches: [main] + workflow_dispatch: + +permissions: + contents: read + pages: write + id-token: write + +# Allow only one concurrent deployment. +concurrency: + group: pages + cancel-in-progress: false + +jobs: + build: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: astral-sh/setup-uv@v5 + - uses: actions/setup-python@v5 + with: + python-version: "3.13" + - run: uv sync --only-group docs + - run: uv run mkdocs build --strict + - uses: actions/upload-pages-artifact@v3 + with: + path: site + + deploy: + needs: build + runs-on: ubuntu-latest + environment: + name: github-pages + url: ${{ steps.deployment.outputs.page_url }} + steps: + - id: deployment + uses: actions/deploy-pages@v4 diff --git a/.gitignore b/.gitignore index 3734fa3..817966c 100644 --- a/.gitignore +++ b/.gitignore @@ -38,6 +38,9 @@ wandb/ # Brand asset build cache assets/.tools/.cache/ +# mkdocs build output +site/ + # OS .DS_Store diff --git a/CHANGELOG.md b/CHANGELOG.md index 3a33ff7..4a95c01 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,21 @@ All notable changes to `fireflyframework-datascience` are documented here. The p ## [Unreleased] +### Documentation & developer experience + +- **A tested, runnable [tutorial](docs/tutorial.md)** (`samples/tutorial.py`) — a guided end-to-end tour + (boot → load/validate → AutoML → GenAI feature engineering → agentic loop → serve) that runs offline + with no LLM key. A test guarantees it works. +- **A thorough [LLM-configuration guide](docs/llm-configuration.md)** — providers + model strings, API + keys, enabling GenAI, cost/budget gating, secure execution, and offline/test usage. +- **A professional [mkdocs Material docs site](https://fireflyframework.github.io/fireflyframework-datascience/)** + (`mkdocs.yml`, `docs` dependency group) — builds clean under `--strict`; deployed to GitHub Pages by a + new `Docs` workflow. All internal links fixed. +- **Better visuals** — a refined `assets/banner.svg` (eyebrow, data-constellation motif) and an expanded + generated diagram set (8 diagrams: architecture, hexagonal, automl-loop, genai-fusion, agentic-loop, + auto-configuration, security, ecosystem) under `docs/img/`. +- **Polished README** (compelling 5-line quick start, docs-site link) and a new **`CONTRIBUTING.md`**. + ### AMLB benchmark (Tier-1) - **`benchmarks/amlb_benchmark.py`** — runs the AutoML facade across real OpenML-CC18 tasks (with diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..51e00a3 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,74 @@ +# Contributing to Firefly DataScience + +Thanks for helping build the framework. This guide gets you from clone to green PR. + +## Development setup + +Requires **Python 3.13** and [`uv`](https://docs.astral.sh/uv/). On macOS, the boosting libraries need +OpenMP: + +```bash +brew install libomp # macOS only (xgboost / lightgbm) +git clone https://github.com/fireflyframework/fireflyframework-datascience +cd fireflyframework-datascience +uv sync --extra tabular --extra data --extra validation --group dev +``` + +The single framework dependency, `fireflyframework-agentic`, resolves from its public git repo. +Add extras as you work on them: `--extra dl` (PyTorch), `--extra nlp` (HuggingFace), `--extra tabfm` +(TabPFN), `--extra genai` (the agentic accelerators), or `--extra full`. + +## The quality gate + +Every change must pass the same gate CI runs: + +```bash +uv run ruff check src/ tests/ # lint +uv run ruff format --check src/ tests/ # format +uv run pyright # type-check +uv run pytest # tests (integration/nightly excluded by default) +``` + +- **`-m integration`** runs network/heavy tests (OpenML, HuggingFace downloads). +- **`-m nightly`** runs long-running suites (full AMLB, GPU). + +## Conventions + +- **CalVer** `YY.MM.PATCH`; the version lives in `src/fireflyframework_datascience/_version.py`. +- **Apache-2.0 header** on every `.py`: `# Copyright 2026 Firefly Software Foundation.` +- **Hexagonal**: each module is a light `__init__.py` (ports = `Protocol`s + DTOs), heavy `adapters.py` + (concrete impls, lazy-importing optional libraries), and a light `auto_configuration.py` that + registers beans via `@bean`, gated by `@conditional_on_class` / `@conditional_on_property`. +- **Lazy imports** of optional heavy dependencies are deliberate (keeps the core importable without any + extra). The `PLC0415` rule is therefore relaxed for the DataScience subtree. + +## Adding an adapter + +1. Define (or reuse) the `Protocol` port in the module's `__init__.py`. +2. Implement the adapter in `adapters.py`, importing the heavy library *inside* the method and raising + `AdapterUnavailableError("MyAdapter", "")` when it is missing. +3. Register it in `auto_configuration.py` behind `@conditional_on_class("")`. +4. Add the entry point under `[project.entry-points."firefly_datascience.auto_configuration"]`. +5. Add the optional dependency to `[project.optional-dependencies]`. +6. Write a test (mark it `integration`/`nightly` if it needs network/GPU). + +## Docs + +Docs are an [mkdocs Material](https://squidfunk.github.io/mkdocs-material/) site under `docs/`. + +```bash +uv sync --only-group docs +uv run mkdocs serve # live preview at http://127.0.0.1:8000 +uv run mkdocs build --strict # must pass (no broken links) +``` + +Diagrams are generated — edit `assets/tools/gen_diagrams.py`, run it, and commit the SVGs: + +```bash +uv run python assets/tools/gen_diagrams.py # writes docs/img/*.svg +``` + +## Commits & PRs + +Keep the gate green, write a clear commit message, open a PR against `main`, and make sure CI passes. +Thank you! 🐝 diff --git a/README.md b/README.md index ba3b3d9..751fa96 100644 --- a/README.md +++ b/README.md @@ -27,11 +27,11 @@ --- -> **Status:** active build. Delivered and green (ruff + pyright + 87 tests): **SP0** Foundation and -> Firefly DNA · **SP1** classical tabular AutoML · **SP2** GenAI feature engineering · **SP3** the -> agentic ML-engineering loop · **SP4** deep-learning / TabFM ports (verified sklearn-MLP; gated -> Torch/TabPFN) · **SP5** serving, lineage and the Lumen credit-risk sample. **SP6** (documentation -> book) is in progress. See [`docs/`](docs/index.md) for the full guide. +> **Status:** all sub-projects delivered and green (ruff · pyright · 90+ tests). Classical tabular +> AutoML · GenAI feature engineering · the agentic ML-engineering loop · deep learning (PyTorch +> Lightning) + NLP (HuggingFace) + vision · TabFM · serving · the OpenML-AMLB benchmark harness. +> **New here? Start with the [Tutorial](docs/tutorial.md)** or browse the +> **[documentation site](https://fireflyframework.github.io/fireflyframework-datascience/)**. ## What is this? @@ -53,22 +53,33 @@ swappability, and security by default. ## Quick start ```bash -uv add fireflyframework-datascience # core -uv add 'fireflyframework-datascience[automl-stack]' # + classical AutoML + tracking +uv add 'fireflyframework-datascience[tabular]' # classical AutoML +# or: uv add 'fireflyframework-datascience[automl-stack]' # + TabPFN, MLflow, OpenML ``` +Train, rank, and evaluate models in five lines: + ```python -from fireflyframework_datascience import FireflyDataScienceApplication +from fireflyframework_datascience.automl import AutoML +from fireflyframework_datascience.datasets.adapters import SklearnDatasetLoader -app = FireflyDataScienceApplication.run() # prints banner + wiring summary -print(app.config.default_ml_framework) +train, test = SklearnDatasetLoader().load("breast_cancer").train_test_split() +result = AutoML().fit(train) # cross-validates candidates, picks the winner +print(result.leaderboard_table()) # random_forest / linear / hist_gradient_boosting … +print(result.evaluate(test)) # holdout roc_auc ≈ 0.98 ``` +Boot it as a Firefly application (auto-configuration + dependency injection), or use the CLI: + ```bash firefly-ds doctor # check your environment & installed adapters firefly-ds introspect # boot the app and show discovered auto-configurations ``` +Add a real LLM for GenAI feature engineering and the agentic loop — see +[Configuring the LLM](docs/llm-configuration.md). The full guided walkthrough is the +[Tutorial](docs/tutorial.md). + ## Architecture Five acyclic layers, mirroring `fireflyframework-agentic` with a **DataScience** layer inserted. Every @@ -76,7 +87,7 @@ ML/MLOps library is a swappable adapter behind a `Protocol` port, registered by auto-configuration** and resolved through a type-hint **dependency-injection container**.

- Firefly DataScience layered architecture + Firefly DataScience layered architecture

``` @@ -87,14 +98,18 @@ The GenAI ↔ classical fusion is governed: the LLM proposes code; the classical cost/benefit gate keeps only what beats the baseline.

- Governed GenAI and classical fusion + Governed GenAI and classical fusion

## Documentation +📖 **Full docs site:** + | Guide | | |---|---| +| [Tutorial](docs/tutorial.md) | the guided end-to-end walkthrough (runs offline; tested) | | [Quick Start](docs/quickstart.md) | install, boot, first AutoML run, the `firefly-ds` CLI | +| [Configuring the LLM](docs/llm-configuration.md) | providers, API keys, model selection, cost gating | | [Architecture](docs/architecture.md) | layers, hexagonal ports, auto-configuration, the DI container | | [Configuration](docs/configuration.md) | env / `.env` / YAML / profiles precedence | | [Datasets](docs/datasets.md) | the `Dataset` container and loaders | diff --git a/assets/banner.svg b/assets/banner.svg index d0b3790..5a96550 100644 --- a/assets/banner.svg +++ b/assets/banner.svg @@ -1,90 +1,83 @@ - - - + + + - - + + - - + + - - - - + + + + - + - + - - - - + + - - - - - - + + + + - - - - - - + + + + + - - + - - - - - - - + + + + + + + + + + + + + + - - - - - - - + + THE  FIREFLY  FRAMEWORK - - + + - Firefly DataScience + Firefly DataScience - + - AutoML that fuses GenAI with classical ML & Deep Learning. + AutoML that fuses GenAI with classical ML & Deep Learning. - Hexagonal & secure-by-default · built on Firefly Agentic & Pydantic AI · the Firefly Framework + Hexagonal · secure-by-default · governed GenAI · built on Firefly Agentic & Pydantic AI diff --git a/assets/tools/gen_diagrams.py b/assets/tools/gen_diagrams.py index 45e7f10..115e46f 100644 --- a/assets/tools/gen_diagrams.py +++ b/assets/tools/gen_diagrams.py @@ -65,12 +65,13 @@ def arrow(x1: float, y1: float, x2: float, y2: float, color: str = LINE) -> str: def _svg(w: int, h: int, body: str, aria: str) -> str: + footer = _text(w - 14, h - 12, "✦ firefly-datascience", 10.5, SUB, 600, "end") return ( f'' f'' f'' - f"{body}\n" + f"{body}{footer}\n" ) @@ -144,14 +145,104 @@ def diagram_genai_fusion() -> str: return _svg(860, 330, "".join(body), "GenAI and classical fusion") +def _feedback(x1: float, x2: float, y: float, label: str) -> str: + mid = y + 56 + return ( + f'' + f'' + + _text((x1 + x2) / 2, mid + 4, label, 11.5, ACCENT_S, 700) + ) + + +def diagram_agentic_loop() -> str: + steps = [ + ("Propose", "LLM"), + ("Code", "feature/pipeline"), + ("Execute", "sandboxed"), + ("Observe", "cross-validate"), + ("Verify", "judge ≠ ran"), + ("Select", "best verified"), + ] + body = [_text(490, 34, "Agentic ML-Engineering Loop", 18, TITLE, 800)] + x = 24 + centers: list[float] = [] + for i, (t, s) in enumerate(steps): + body.append(card(x, 92, 142, 60, t, s, accent=(i == 4))) + centers.append(x + 71) + if i < len(steps) - 1: + body.append(arrow(x + 142, 122, x + 162, 122)) + x += 162 + body.append(_feedback(centers[4], centers[0], 152, "reflect")) + return _svg(980, 230, "".join(body), "Agentic ML-engineering loop") + + +def diagram_auto_configuration() -> str: + steps = [ + ("Entry points", "the auto_configuration group"), + ("Discover", "load adapter classes"), + ("Evaluate conditions", "@conditional_on_*"), + ("Register beans", "DI container"), + ("Wiring summary", "ready"), + ] + body = [_text(430, 34, "Entry-Point Auto-Configuration", 18, TITLE, 800)] + y = 60 + for i, (t, s) in enumerate(steps): + body.append(card(250, y, 360, 54, t, s, accent=(i == 2))) + if i < len(steps) - 1: + body.append(arrow(430, y + 54, 430, y + 68)) + y += 68 + return _svg(860, y + 14, "".join(body), "Entry-point auto-configuration flow") + + +def diagram_security() -> str: + body = [_text(470, 34, "Secure-by-Default Execution", 18, TITLE, 800)] + body.append(card(40, 80, 230, 64, "Static analysis", "deny imports / dunder / exec", accent=True)) + body.append(arrow(270, 112, 300, 112)) + tiers = [ + ("Monty", "deny-by-default (default)"), + ("Docker / E2B", "full ML, opt-in"), + ("HITL approval", "before non-sandboxed"), + ] + x = 300 + for t, s in tiers: + body.append(card(x, 80, 200, 64, t, s)) + if x < 690: + body.append(_text(x + 207, 116, "→", 16, SUB, 700)) + x += 215 + body.append(card(300, 168, 415, 50, "Cost / benefit gate", "keep GenAI only on measured lift")) + body.append(_text(470, 250, "The LLM never gets ambient capability; every step is gated and audited.", 12.5, SUB, 600)) + return _svg(940, 280, "".join(body), "Secure-by-default execution tiers") + + +def diagram_ecosystem() -> str: + body = [_text(430, 34, "Firefly Ecosystem", 18, TITLE, 800)] + body.append(card(60, 110, 200, 70, "Firefly Agentic", "GenAI · Pydantic AI")) + body.append(card(330, 100, 220, 88, "Firefly DataScience", "AutoML · this repo", accent=True)) + body.append(card(620, 110, 200, 70, "PyFly", "structure / IoC")) + body.append(arrow(260, 145, 330, 145)) + body.append(_text(295, 137, "reuses", 11, SUB, 600)) + body.append(arrow(620, 150, 550, 150)) + body.append(_text(585, 142, "mirrors", 11, SUB, 600)) + body.append(_text(440, 232, "Hard-depends on Agentic; mirrors PyFly's hexagonal IoC.", 12.5, SUB, 600)) + return _svg(860, 262, "".join(body), "Firefly ecosystem relationships") + + def main() -> None: - out = Path(__file__).resolve().parents[1] / "diagrams" + # Diagrams live under docs/ so they are served by the mkdocs site and resolve in GitHub's + # markdown rendering of docs/*.md (relative ``img/.svg``). The README, at the repo root, + # references them as ``docs/img/.svg``. + out = Path(__file__).resolve().parents[2] / "docs" / "img" out.mkdir(parents=True, exist_ok=True) figures = { "architecture.svg": diagram_architecture(), "hexagonal.svg": diagram_hexagonal(), "automl-loop.svg": diagram_automl_loop(), "genai-classical-fusion.svg": diagram_genai_fusion(), + "agentic-loop.svg": diagram_agentic_loop(), + "auto-configuration.svg": diagram_auto_configuration(), + "security.svg": diagram_security(), + "ecosystem.svg": diagram_ecosystem(), } for name, svg in figures.items(): (out / name).write_text(svg) diff --git a/docs/agentic-loop.md b/docs/agentic-loop.md index cb9bfab..211d8e8 100644 --- a/docs/agentic-loop.md +++ b/docs/agentic-loop.md @@ -165,6 +165,6 @@ loop = AgenticAutoML( ## See also - [Datasets](datasets.md) — the `Dataset` the loop searches over. -- [Models & trainers](models.md) — the trainer registry candidates draw from. -- [Evaluation](evaluation.md) — metrics, scoring, and the trivial baseline. -- [Preprocessing](preprocessing.md) — the pipeline wrapped around every candidate. +- [Models & trainers](automl.md) — the trainer registry candidates draw from. +- [Evaluation](automl.md) — metrics, scoring, and the trivial baseline. +- [Preprocessing](automl.md) — the pipeline wrapped around every candidate. diff --git a/docs/architecture.md b/docs/architecture.md index 5a02870..3e2b80a 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -4,7 +4,7 @@ This page explains how the pieces fit together: the five layers, the ports-and-adapters (hexagonal) core, entry-point auto-configuration, the dependency-injection container, and the `FireflyDataScienceApplication` startup lifecycle. -![Five-layer architecture](../assets/diagrams/architecture.svg) +![Five-layer architecture](img/architecture.svg) ## The five layers @@ -20,7 +20,7 @@ The core stays importable with **no** optional ML extra installed — vendor imp ## Hexagonal: ports and adapters -![Ports and adapters](../assets/diagrams/hexagonal.svg) +![Ports and adapters](img/hexagonal.svg) A **port** is a `Protocol` the domain depends on. An **adapter** is a concrete class that implements it. The container binds them by type annotation, so swapping an adapter never touches calling code. @@ -164,7 +164,7 @@ Passing `auto_configurations=[...]` **replaces** discovery entirely (handy for h ## See also -- [Getting started](./getting-started.md) +- [Getting started](quickstart.md) - [Configuration](./configuration.md) -- [Ports and adapters reference](./ports-and-adapters.md) -- [Writing an auto-configuration](./auto-configuration.md) +- [Ports and adapters reference](index.md) +- [Writing an auto-configuration](index.md) diff --git a/docs/automl.md b/docs/automl.md index 4aadb39..2987a55 100644 --- a/docs/automl.md +++ b/docs/automl.md @@ -7,7 +7,7 @@ trainer that supports the task (optionally tuning each one), and returns a fitte leaderboard. It is import-light: scikit-learn is only loaded when you actually call `fit`, so `from fireflyframework_datascience.automl import AutoML` stays cheap. -![The AutoML loop](../assets/diagrams/automl-loop.svg) +![The AutoML loop](img/automl-loop.svg) ## Quick start @@ -177,7 +177,7 @@ when the winning estimator exposes `predict_proba`. ## See also - [Datasets and loaders](./datasets.md) -- [Models and trainers](./models.md) -- [Hyperparameter tuning](./tuning.md) -- [Evaluation and metrics](./evaluation.md) -- [GenAI + classical fusion](./genai.md) +- [Models and trainers](automl.md) +- [Hyperparameter tuning](index.md) +- [Evaluation and metrics](index.md) +- [GenAI + classical fusion](genai-features.md) diff --git a/docs/benchmarks.md b/docs/benchmarks.md index 5b4b22e..2a06977 100644 --- a/docs/benchmarks.md +++ b/docs/benchmarks.md @@ -113,6 +113,6 @@ Tier 3 measures the *agent*, not a single estimator: given a task description an ## See also - [Datasets API](./datasets.md) -- [Container & auto-configuration](./container.md) -- [Task types](./task-types.md) -- [Getting started](./getting-started.md) +- [Container & auto-configuration](index.md) +- [Task types](index.md) +- [Getting started](quickstart.md) diff --git a/docs/configuration.md b/docs/configuration.md index 249fb6f..4ca414e 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -175,6 +175,6 @@ export FIREFLY_DATASCIENCE_BANNER__MODE=OFF ## See also -- [Getting Started](./getting-started.md) -- [GenAI Accelerator](./genai.md) -- [Code Execution](./execution.md) +- [Getting Started](quickstart.md) +- [GenAI Accelerator](genai-features.md) +- [Code Execution](security.md) diff --git a/docs/datasets.md b/docs/datasets.md index 5b2594a..646d50f 100644 --- a/docs/datasets.md +++ b/docs/datasets.md @@ -143,7 +143,7 @@ If the `openml` package is not installed, `load` raises `AdapterUnavailableError ## See also -- [Core types](core.md) — `TaskType` and the rest of the core enums -- [Adapters](adapters.md) — the adapter pattern and the `data` / `tabular` extras -- [Feature engineering](features.md) — consumers of `Dataset.with_features` -- [Getting started](getting-started.md) +- [Core types](architecture.md) — `TaskType` and the rest of the core enums +- [Adapters](architecture.md) — the adapter pattern and the `data` / `tabular` extras +- [Feature engineering](genai-features.md) — consumers of `Dataset.with_features` +- [Getting started](quickstart.md) diff --git a/docs/deep-learning.md b/docs/deep-learning.md index 4281a1d..1405e55 100644 --- a/docs/deep-learning.md +++ b/docs/deep-learning.md @@ -156,6 +156,6 @@ Start with `MLPTrainer` for a dependency-light neural baseline, reach for `TabPF ## See also - [Datasets](./datasets.md) — the `Dataset` container and `DatasetLoaderPort` -- [Models](./models.md) — the fitted `Model` wrapper and `TrainerPort` -- [Preprocessing](./preprocessing.md) — `build_pipeline`, shared by every adapter -- [Core Types](./core-types.md) — `TaskType` and friends +- [Models](automl.md) — the fitted `Model` wrapper and `TrainerPort` +- [Preprocessing](automl.md) — `build_pipeline`, shared by every adapter +- [Core Types](configuration.md) — `TaskType` and friends diff --git a/docs/genai-features.md b/docs/genai-features.md index ceab0e8..67f0904 100644 --- a/docs/genai-features.md +++ b/docs/genai-features.md @@ -8,7 +8,7 @@ cross-validation lift of each one, and a `CostBenefitGate` keeps a feature only beats the current baseline by a measurable margin. The LLM never touches the score — it just generates candidates, and the data does the rest. -![GenAI proposes, classical CV decides](../assets/diagrams/genai-classical-fusion.svg) +![GenAI proposes, classical CV decides](img/genai-classical-fusion.svg) ## The loop @@ -169,6 +169,6 @@ engineer = GenAIFeatureEngineer( ## See also - [Datasets](datasets.md) -- [Evaluation & Metrics](evaluation.md) -- [Core Types](core-types.md) +- [Evaluation & Metrics](automl.md) +- [Core Types](configuration.md) - [AutoML Pipeline](automl.md) diff --git a/docs/img/agentic-loop.svg b/docs/img/agentic-loop.svg new file mode 100644 index 0000000..a465677 --- /dev/null +++ b/docs/img/agentic-loop.svg @@ -0,0 +1 @@ +Agentic ML-Engineering LoopProposeLLMCodefeature/pipelineExecutesandboxedObservecross-validateVerifyjudge ≠ ranSelectbest verifiedreflect✦ firefly-datascience diff --git a/assets/diagrams/architecture.svg b/docs/img/architecture.svg similarity index 94% rename from assets/diagrams/architecture.svg rename to docs/img/architecture.svg index 766eda9..f7c2106 100644 --- a/assets/diagrams/architecture.svg +++ b/docs/img/architecture.svg @@ -1 +1 @@ -Firefly DataScience — Layered ArchitectureCoreconfig · DI · banner · plugin discovery · typesAgent (reused: fireflyframework-agentic)FireflyAgent · memory · tools · embeddings · vectorstoresDataSciencedatasets · features · models · evaluation · automl · servingIntelligencereasoning/search · validation · observability · security · cost-benefit gateOrchestrationDAG pipelines · @workflow (HITL, budgets) · Airflow operators +Firefly DataScience — Layered ArchitectureCoreconfig · DI · banner · plugin discovery · typesAgent (reused: fireflyframework-agentic)FireflyAgent · memory · tools · embeddings · vectorstoresDataSciencedatasets · features · models · evaluation · automl · servingIntelligencereasoning/search · validation · observability · security · cost-benefit gateOrchestrationDAG pipelines · @workflow (HITL, budgets) · Airflow operators✦ firefly-datascience diff --git a/docs/img/auto-configuration.svg b/docs/img/auto-configuration.svg new file mode 100644 index 0000000..b6fe39a --- /dev/null +++ b/docs/img/auto-configuration.svg @@ -0,0 +1 @@ +Entry-Point Auto-ConfigurationEntry pointsthe auto_configuration groupDiscoverload adapter classesEvaluate conditions@conditional_on_*Register beansDI containerWiring summaryready✦ firefly-datascience diff --git a/assets/diagrams/automl-loop.svg b/docs/img/automl-loop.svg similarity index 93% rename from assets/diagrams/automl-loop.svg rename to docs/img/automl-loop.svg index fe3c763..5c13b16 100644 --- a/assets/diagrams/automl-loop.svg +++ b/docs/img/automl-loop.svg @@ -1 +1 @@ -Classical AutoML PipelineDatasetload · validateCandidatestrainers × paramsCross-Validateclassical engineSelectbest by metricFit + Serveleaderboard · model +Classical AutoML PipelineDatasetload · validateCandidatestrainers × paramsCross-Validateclassical engineSelectbest by metricFit + Serveleaderboard · model✦ firefly-datascience diff --git a/docs/img/banner.svg b/docs/img/banner.svg new file mode 100644 index 0000000..5a96550 --- /dev/null +++ b/docs/img/banner.svg @@ -0,0 +1,83 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + THE  FIREFLY  FRAMEWORK + + + + + + Firefly DataScience + + + + + + AutoML that fuses GenAI with classical ML & Deep Learning. + + + Hexagonal · secure-by-default · governed GenAI · built on Firefly Agentic & Pydantic AI + diff --git a/docs/img/ecosystem.svg b/docs/img/ecosystem.svg new file mode 100644 index 0000000..f37f300 --- /dev/null +++ b/docs/img/ecosystem.svg @@ -0,0 +1 @@ +Firefly EcosystemFirefly AgenticGenAI · Pydantic AIFirefly DataScienceAutoML · this repoPyFlystructure / IoCreusesmirrorsHard-depends on Agentic; mirrors PyFly's hexagonal IoC.✦ firefly-datascience diff --git a/assets/diagrams/genai-classical-fusion.svg b/docs/img/genai-classical-fusion.svg similarity index 93% rename from assets/diagrams/genai-classical-fusion.svg rename to docs/img/genai-classical-fusion.svg index 0c17ea3..1ebf18b 100644 --- a/assets/diagrams/genai-classical-fusion.svg +++ b/docs/img/genai-classical-fusion.svg @@ -1 +1 @@ -GenAI × Classical Fusion (governed)LLM proposesfeature / pipeline codeClassical enginetrains · cross-validatesCost/Benefit Gatekeep only on liftaccept ✓ / reject ✗measured, not assumedThe LLM never decides — the measured score does. +GenAI × Classical Fusion (governed)LLM proposesfeature / pipeline codeClassical enginetrains · cross-validatesCost/Benefit Gatekeep only on liftaccept ✓ / reject ✗measured, not assumedThe LLM never decides — the measured score does.✦ firefly-datascience diff --git a/assets/diagrams/hexagonal.svg b/docs/img/hexagonal.svg similarity index 95% rename from assets/diagrams/hexagonal.svg rename to docs/img/hexagonal.svg index 04fbb23..ec313c0 100644 --- a/assets/diagrams/hexagonal.svg +++ b/docs/img/hexagonal.svg @@ -1 +1 @@ -Hexagonal Ports & AdaptersAutoML Corelibrary-agnosticDatasetLoaderPortsklearn · OpenML · HFTrainerPortRF · boosting · MLPSearchPolicyPortdefault · OptunaTrackerPortMLflow · noopModelServerPortlocal · BentoMLFeatureEngineerPortGenAI (CAAFE) +Hexagonal Ports & AdaptersAutoML Corelibrary-agnosticDatasetLoaderPortsklearn · OpenML · HFTrainerPortRF · boosting · MLPSearchPolicyPortdefault · OptunaTrackerPortMLflow · noopModelServerPortlocal · BentoMLFeatureEngineerPortGenAI (CAAFE)✦ firefly-datascience diff --git a/docs/img/security.svg b/docs/img/security.svg new file mode 100644 index 0000000..753067b --- /dev/null +++ b/docs/img/security.svg @@ -0,0 +1 @@ +Secure-by-Default ExecutionStatic analysisdeny imports / dunder / execMontydeny-by-default (default)Docker / E2Bfull ML, opt-inHITL approvalbefore non-sandboxedCost / benefit gatekeep GenAI only on measured liftThe LLM never gets ambient capability; every step is gated and audited.✦ firefly-datascience diff --git a/docs/index.md b/docs/index.md index a240a68..9af5256 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,7 +1,14 @@ +

+ Firefly DataScience +

+ # Firefly DataScience Documentation **AutoML that fuses GenAI with classical ML & Deep Learning — hexagonal, secure-by-default, native to the Firefly Framework.** +> New here? Jump to the **[Tutorial](tutorial.md)** for a guided, runnable walkthrough, or +> **[Configuring the LLM](llm-configuration.md)** to wire up GenAI. + `fireflyframework-datascience` is a state-of-the-art Python metaframework for AutoML. It pairs **GenAI** — built on [`fireflyframework-agentic`](https://github.com/fireflyframework/fireflyframework-agentic), which wraps [Pydantic AI](https://ai.pydantic.dev/) — with **traditional ML and Deep Learning**, so any @@ -14,7 +21,7 @@ GenAI step is gated behind a measured improvement over a seeded classical baseli governed, measurably-gated accelerator over a battle-tested classical core — never a black box.

- Firefly DataScience architecture + Firefly DataScience architecture

## The 7 pillars diff --git a/docs/llm-configuration.md b/docs/llm-configuration.md new file mode 100644 index 0000000..f26bdf0 --- /dev/null +++ b/docs/llm-configuration.md @@ -0,0 +1,157 @@ +# Configuring the LLM + +**How to point Firefly DataScience at a real LLM for GenAI feature engineering and the agentic loop.** + +GenAI is **off by default** — the framework is classical-first, and everything except the GenAI steps +runs with no LLM at all. When you do enable GenAI, it is powered by +[`fireflyframework-agentic`](https://github.com/fireflyframework/fireflyframework-agentic), which wraps +[Pydantic AI](https://ai.pydantic.dev/) — so any provider Pydantic AI supports works here. + +## 1. Enable GenAI + +Two things switch it on: the `genai.enabled` flag, and an installed `genai` extra. + +```bash +uv add 'fireflyframework-datascience[genai]' # installs the agentic GenAI accelerators +export FIREFLY_DATASCIENCE_GENAI__ENABLED=true # turn the GenAI auto-configurations on +``` + +Or in `firefly-datascience.yaml`: + +```yaml +genai: + enabled: true + default_model: openai:gpt-4o + cost_benefit_gate: true +``` + +With `genai.enabled=true`, the `FeaturesAutoConfiguration` (GenAI feature engineering) and +`EngineeringAutoConfiguration` (the agentic loop) register their agent-backed beans. The agent — and its +API client — is built **lazily on first use**, so the application still boots without a key. + +## 2. Choose a provider and model + +Set `genai.default_model` to a Pydantic AI model string, `":"`: + +| Provider | Model string (example) | API key env var | +|---|---|---| +| OpenAI | `openai:gpt-4o`, `openai:gpt-4o-mini` | `OPENAI_API_KEY` | +| Anthropic | `anthropic:claude-sonnet-4-5`, `anthropic:claude-opus-4-1` | `ANTHROPIC_API_KEY` | +| Google | `google-gla:gemini-2.0-flash` | `GEMINI_API_KEY` | +| Groq | `groq:llama-3.3-70b-versatile` | `GROQ_API_KEY` | +| Mistral | `mistral:mistral-large-latest` | `MISTRAL_API_KEY` | +| Ollama (local) | `openai:llama3.2` via a local base URL | — (runs locally) | + +```bash +export FIREFLY_DATASCIENCE_GENAI__DEFAULT_MODEL=anthropic:claude-sonnet-4-5 +export ANTHROPIC_API_KEY=sk-ant-... +``` + +## 3. Where to put API keys + +Keys are read from the environment (Pydantic AI's convention). Options, in order of convenience: + +```bash +# 1. Shell environment +export OPENAI_API_KEY=sk-... + +# 2. A local .env file (loaded automatically; real env vars always win) +echo 'OPENAI_API_KEY=sk-...' >> .env +``` + +> **Security.** Never commit API keys. Keep them in `.env` (git-ignored) or a secrets manager. The +> framework never logs keys, and `OutputGuard` (from agentic) redacts secrets from model output. + +## 4. Use it + +Once enabled, swap the deterministic stand-in proposers (used in tests/tutorials) for the agent-backed +ones — they pick up `genai.default_model` automatically: + +```python +from fireflyframework_datascience.features.genai import AgentFeatureProposer, GenAIFeatureEngineer +from fireflyframework_datascience.datasets.adapters import SklearnDatasetLoader + +train, _ = SklearnDatasetLoader().load("breast_cancer").train_test_split() + +# The LLM proposes feature code; classical CV measures the lift; the gate decides. +engineer = GenAIFeatureEngineer(AgentFeatureProposer(model="openai:gpt-4o")) +result = engineer.engineer(train) +print(result.summary()) # e.g. "3 accepted, 5 rejected; roc_auc 0.97 -> 0.98 (+0.01)" +``` + +```python +from fireflyframework_datascience.engineering.loop import AgenticAutoML, AgentSolutionProposer + +run = AgenticAutoML(AgentSolutionProposer(model="anthropic:claude-sonnet-4-5")).solve(train) +print(run.summary()) # the LLM reflects on history; the engine trains/verifies each candidate +``` + +Or wire everything from the application context (the model comes from config): + +```python +from fireflyframework_datascience import FireflyDataScienceApplication + +app = FireflyDataScienceApplication.run() # genai.enabled -> agent beans registered +engineer = app.get(...) # resolve the FeatureEngineerPort bean, already wired with your model +``` + +## 5. Cost, budget, and governance + +GenAI is a **measurably-gated accelerator**, never a black box: + +```yaml +genai: + enabled: true + cost_benefit_gate: true # auto-disable a GenAI step that does not beat the seeded baseline + budget_usd: 5.0 # optional hard spend ceiling for a run +``` + +- The **`CostBenefitGate`** accepts a proposed feature/candidate only if it improves the + cross-validation score — the LLM never decides, the measured score does. +- Token usage and cost are tracked by agentic's `UsageTracker`; a `BudgetGate` enforces `budget_usd`. + +## 6. Secure code execution + +LLM-proposed feature code runs through static safety analysis and a restricted namespace. Choose the +sandbox tier under `execution`: + +```yaml +execution: + sandbox: monty # monty (default, deny-by-default) | docker | e2b | local + require_approval: true # human-in-the-loop before any non-sandboxed execution + timeout_seconds: 60 +``` + +See [Security Model](security.md) for the full trust model. + +## 7. Offline & testing (no key required) + +For development, tests, and the [tutorial](tutorial.md), use the deterministic stand-ins — they exercise +the exact same propose → execute → measure → gate loop without any LLM: + +```python +from fireflyframework_datascience.features import StaticFeatureProposer, FeatureProposal +from fireflyframework_datascience.engineering import SequenceProposer, SolutionCandidate +``` + +To unit-test the *real* agent integration without a network, pass Pydantic AI's `TestModel`: + +```python +from pydantic_ai.models.test import TestModel +from fireflyframework_datascience.features.genai import AgentFeatureProposer + +proposer = AgentFeatureProposer(model=TestModel(custom_output_args={"features": [...]})) +``` + +## Troubleshooting + +| Symptom | Fix | +|---|---| +| `OpenAIError: Missing credentials` | Set `OPENAI_API_KEY` (or the provider's key). The agent builds lazily, so this only fires on first GenAI call. | +| GenAI steps don't run | Confirm `genai.enabled=true` **and** the `genai` extra is installed (`firefly-ds doctor`). | +| Every proposed feature is rejected | Working as designed — the gate found no measurable lift. Lower `min_gain` or try a stronger model. | + +## See also + +- [GenAI Feature Engineering](genai-features.md) · [Agentic Loop](agentic-loop.md) · + [Configuration](configuration.md) · [Security Model](security.md) · [Tutorial](tutorial.md) diff --git a/docs/quickstart.md b/docs/quickstart.md index ea80963..0a7eed2 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -152,5 +152,5 @@ firefly-ds introspect --config-dir ./config --profile local --profile gpu - [Configuration](configuration.md) — `firefly-datascience.yaml`, profiles, and `FireflyDataScienceConfig` - [Datasets](datasets.md) — the `Dataset` container and `DatasetLoaderPort` - [AutoML](automl.md) — trainers, search policies, evaluators, and the leaderboard -- [GenAI](genai.md) — fusing Firefly Agentic with classical ML -- [CLI reference](cli.md) — every `firefly-ds` command +- [GenAI](genai-features.md) — fusing Firefly Agentic with classical ML +- [CLI reference](quickstart.md) — every `firefly-ds` command diff --git a/docs/security.md b/docs/security.md index 7285749..994c60f 100644 --- a/docs/security.md +++ b/docs/security.md @@ -160,6 +160,6 @@ Be precise about what these controls do and do not give you: ## See also - [Configuration](configuration.md) -- [Feature Engineering](features.md) -- [GenAI Accelerators](genai.md) -- [Getting Started](getting-started.md) +- [Feature Engineering](genai-features.md) +- [GenAI Accelerators](genai-features.md) +- [Getting Started](quickstart.md) diff --git a/docs/serving.md b/docs/serving.md index b68f8b7..6558e3b 100644 --- a/docs/serving.md +++ b/docs/serving.md @@ -173,6 +173,6 @@ Without the `openlineage` client installed, construction raises `AdapterUnavaila ## See also -- [Models & Training](models.md) -- [Tuning](tuning.md) -- [Getting Started](getting-started.md) +- [Models & Training](automl.md) +- [Tuning](automl.md) +- [Getting Started](quickstart.md) diff --git a/docs/tutorial.md b/docs/tutorial.md new file mode 100644 index 0000000..a836664 --- /dev/null +++ b/docs/tutorial.md @@ -0,0 +1,121 @@ +# Tutorial + +**A guided, end-to-end tour of Firefly DataScience — from booting the app to serving a model.** + +This tutorial mirrors the runnable script [`samples/tutorial.py`](https://github.com/fireflyframework/fireflyframework-datascience/blob/main/samples/tutorial.py), which is +covered by a test, so everything here is guaranteed to work. It runs **offline with no LLM key** — the +GenAI steps use deterministic stand-ins, and we show how to switch on a real LLM at the end. + +```bash +uv add 'fireflyframework-datascience[tabular]' +uv run python samples/tutorial.py +``` + +We use a synthetic **credit-risk** dataset whose default risk is driven by *debt-to-income* — a ratio +deliberately withheld from the model, so feature engineering has something real to discover. + +## 1. Boot the application + +```python +from fireflyframework_datascience import FireflyDataScienceApplication + +app = FireflyDataScienceApplication.run() +``` + +This prints the banner and a wiring summary, loads configuration, builds the dependency-injection +container, and discovers every adapter via entry-point auto-configuration. `app.bean_count` and +`app.applied_auto_configurations` tell you what got wired. See [Architecture](architecture.md). + +## 2. Load and validate the data + +```python +from fireflyframework_datascience.validation.adapters import BasicValidator + +dataset, validation = ... # build the credit dataset (see the script) +report = BasicValidator().validate(dataset.X, dataset.y) +assert report.ok # no all-null columns, no null target, etc. +train, test = dataset.train_test_split(test_size=0.25, random_state=0) +``` + +The `BasicValidator` catches empty data, all-null/constant columns, duplicate rows, and null targets +before you waste time training. See [Datasets](datasets.md). + +## 3. Classical AutoML + +```python +from fireflyframework_datascience.automl import AutoML + +result = AutoML(cv=4).fit(train) +print(result.leaderboard_table()) +print(result.evaluate(test)) # holdout metrics +``` + +AutoML cross-validates each candidate trainer (RandomForest, Linear, HistGradientBoosting, and the +boosting libraries if installed), ranks them on a task-appropriate metric (`roc_auc` for binary), and +refits the winner. Expected: a leaderboard topped by `linear` at **roc_auc ≈ 0.85** on holdout. See +[Classical AutoML](automl.md). + +## 4. GenAI feature engineering + +```python +from fireflyframework_datascience.features import StaticFeatureProposer, FeatureProposal +from fireflyframework_datascience.features.genai import GenAIFeatureEngineer + +proposer = StaticFeatureProposer([ + FeatureProposal("debt_to_income", "df['debt_to_income'] = df['loan_amount'] / (df['income'] + 1)", "DTI"), + FeatureProposal("noise", "df['noise'] = 0.0", "should be rejected"), +]) +engineered = GenAIFeatureEngineer(proposer, cv=4).engineer(train) +print(engineered.summary()) +``` + +The loop is **propose → execute (safely) → measure CV lift → gate**. `debt_to_income` (the hidden +driver) is **accepted** because it lifts the score; the constant `noise` feature is **rejected**. The +LLM never decides — the measured score does. See [GenAI Feature Engineering](genai-features.md). + +> Here a `StaticFeatureProposer` stands in for the LLM so the tutorial runs offline. With a real model +> you'd use `AgentFeatureProposer(model="openai:gpt-4o")` — see [Configuring the LLM](llm-configuration.md). + +## 5. The agentic ML-engineering loop + +```python +from fireflyframework_datascience.engineering import SequenceProposer, SolutionCandidate +from fireflyframework_datascience.engineering.loop import AgenticAutoML + +proposer = SequenceProposer([SolutionCandidate("linear"), SolutionCandidate("random_forest"), + SolutionCandidate("hist_gradient_boosting")]) +run = AgenticAutoML(proposer, cv=3).solve(train) +print(run.summary()) +``` + +Each candidate is trained, cross-validated, and **verified** — it must beat a trivial baseline, not +merely run (the "correctness ≠ ran" principle) — before the best one is selected. `run.attempts` is the +full audited trail. See [Agentic Loop](agentic-loop.md). + +## 6. Serve the model + +```python +from fireflyframework_datascience.serving import LocalModelServer + +server = LocalModelServer() +server.load(result.best_model) +prediction = server.predict(test.X.iloc[[0]]) # score one applicant +``` + +See [Serving & Lineage](serving.md). + +## Turn on a real LLM + +```bash +export OPENAI_API_KEY=sk-... # or ANTHROPIC_API_KEY=... +export FIREFLY_DATASCIENCE_GENAI__ENABLED=true +export FIREFLY_DATASCIENCE_GENAI__DEFAULT_MODEL=openai:gpt-4o +``` + +Then use `AgentFeatureProposer` / `AgentSolutionProposer` in place of the stand-ins. The full guide, +including providers, keys, cost gating, and secure execution, is in +[Configuring the LLM](llm-configuration.md). + +## See also + +- [Quick Start](quickstart.md) · [Configuration](configuration.md) · [Use Case: Lumen](use-case-lumen.md) diff --git a/docs/use-case-lumen.md b/docs/use-case-lumen.md index b6c5426..ca29b42 100644 --- a/docs/use-case-lumen.md +++ b/docs/use-case-lumen.md @@ -172,8 +172,8 @@ Exact numbers vary with your scikit-learn version, but the shape is stable: **`d ## See also -- [GenAI Feature Engineering](genai-feature-engineering.md) +- [GenAI Feature Engineering](genai-features.md) - [AutoML](automl.md) - [Datasets](datasets.md) - [Serving](serving.md) -- [Getting Started](getting-started.md) +- [Getting Started](quickstart.md) diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 0000000..bb76a0a --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,79 @@ +site_name: Firefly DataScience +site_description: AutoML that fuses GenAI with classical ML & Deep Learning — built on Firefly Agentic. +site_url: https://fireflyframework.github.io/fireflyframework-datascience/ +repo_url: https://github.com/fireflyframework/fireflyframework-datascience +repo_name: fireflyframework/fireflyframework-datascience +copyright: Copyright 2026 Firefly Software Foundation · Apache-2.0 +docs_dir: docs + +theme: + name: material + logo: img/banner.svg + favicon: img/banner.svg + palette: + - media: "(prefers-color-scheme: light)" + scheme: default + primary: cyan + accent: cyan + toggle: + icon: material/weather-night + name: Switch to dark mode + - media: "(prefers-color-scheme: dark)" + scheme: slate + primary: cyan + accent: cyan + toggle: + icon: material/weather-sunny + name: Switch to light mode + features: + - navigation.sections + - navigation.top + - navigation.footer + - navigation.instant + - content.code.copy + - content.code.annotate + - search.suggest + - search.highlight + - toc.follow + icon: + repo: fontawesome/brands/github + +markdown_extensions: + - admonition + - attr_list + - md_in_html + - tables + - toc: + permalink: true + - pymdownx.highlight: + anchor_linenums: true + - pymdownx.inlinehilite + - pymdownx.superfences + - pymdownx.details + - pymdownx.tabbed: + alternate_style: true + +plugins: + - search + +nav: + - Home: index.md + - Getting started: + - Quick Start: quickstart.md + - Tutorial: tutorial.md + - Configuration: configuration.md + - Configuring the LLM: llm-configuration.md + - Concepts: + - Architecture: architecture.md + - Datasets: datasets.md + - Classical AutoML: automl.md + - GenAI Feature Engineering: genai-features.md + - Agentic ML-Engineering Loop: agentic-loop.md + - Deep Learning & TabFM: deep-learning.md + - Serving & Lineage: serving.md + - Security Model: security.md + - Benchmarks: benchmarks.md + - Use case — Lumen: use-case-lumen.md + +not_in_nav: | + /superpowers/ diff --git a/pyproject.toml b/pyproject.toml index 2aa499e..f17ca60 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -97,6 +97,7 @@ dev = [ "ruff>=0.7.0", "pyright>=1.1.380", ] +docs = ["mkdocs-material>=9.5.0"] [tool.hatch.metadata] # Allow the PEP 508 direct git URL for fireflyframework-agentic (it is not on PyPI). diff --git a/samples/tutorial.py b/samples/tutorial.py new file mode 100644 index 0000000..f9077af --- /dev/null +++ b/samples/tutorial.py @@ -0,0 +1,176 @@ +# Copyright 2026 Firefly Software Foundation. +"""Firefly DataScience — the end-to-end tutorial. + +A single, runnable tour of the whole framework on a realistic (synthetic) credit-risk dataset. It runs +**offline with no LLM key** (the GenAI steps use deterministic stand-in proposers) and prints, at the +end, exactly how to switch on a real LLM. + +Run it: + uv run python samples/tutorial.py # needs the `tabular` extra + +Every step is covered by ``tests/samples/test_tutorial.py`` — the tutorial is guaranteed to work. +""" + +from __future__ import annotations + +from typing import Any + +import numpy as np +import pandas as pd + +from fireflyframework_datascience import FireflyDataScienceApplication +from fireflyframework_datascience.automl import AutoML +from fireflyframework_datascience.core.types import TaskType +from fireflyframework_datascience.datasets import Dataset +from fireflyframework_datascience.engineering import SequenceProposer, SolutionCandidate +from fireflyframework_datascience.engineering.loop import AgenticAutoML +from fireflyframework_datascience.features import FeatureProposal, StaticFeatureProposer +from fireflyframework_datascience.features.genai import GenAIFeatureEngineer +from fireflyframework_datascience.serving import LocalModelServer +from fireflyframework_datascience.validation.adapters import BasicValidator + + +def make_credit_dataset(n: int = 800, seed: int = 11) -> Dataset: + """A synthetic credit-risk dataset whose default risk is driven by *debt-to-income* — a ratio that + is deliberately NOT given to the model, so feature engineering has something real to discover.""" + rng = np.random.RandomState(seed) + income = rng.normal(60_000, 18_000, n).clip(15_000, None) + loan_amount = rng.normal(18_000, 10_000, n).clip(1_000, None) + employment_years = rng.uniform(0, 30, n).round(1) + num_prior_defaults = rng.poisson(0.6, n) + dti = loan_amount / income + logit = -2.6 + 5.0 * dti + 1.3 * num_prior_defaults - 0.05 * employment_years + rng.normal(0, 0.25, n) + default = (rng.uniform(0, 1, n) < 1.0 / (1.0 + np.exp(-logit))).astype(int) + X = pd.DataFrame( + { + "income": income.round(2), + "loan_amount": loan_amount.round(2), + "employment_years": employment_years, + "num_prior_defaults": num_prior_defaults, + } + ) + return Dataset( + "credit_applicants", + X, + pd.Series(default, name="default"), + task=TaskType.BINARY, + target_name="default", + feature_names=list(X.columns), + ) + + +def _logistic_scorer(task: TaskType) -> Any: + from sklearn.linear_model import LogisticRegression + + return LogisticRegression(max_iter=1000) + + +def step_1_boot() -> dict[str, int]: + """Boot the application: banner, config, dependency injection, auto-configuration.""" + app = FireflyDataScienceApplication.run(print_output=False) + return {"beans": app.bean_count, "auto_configs": len(app.applied_auto_configurations)} + + +def step_2_load_and_validate() -> tuple[Dataset, Any]: + """Build the dataset and sanity-check it before training.""" + dataset = make_credit_dataset() + return dataset, BasicValidator().validate(dataset.X, dataset.y) + + +def step_3_classical_automl(train: Dataset) -> Any: + """Run classical AutoML — cross-validate candidate models and pick the winner.""" + return AutoML(cv=4).fit(train) + + +def step_4_genai_feature_engineering(train: Dataset) -> Any: + """GenAI feature engineering, offline. With a real LLM you would use ``AgentFeatureProposer``; here a + deterministic proposer stands in so the tutorial runs without a key. The cost/benefit gate keeps a + feature only if it measurably lifts the score — ``debt_to_income`` (the hidden driver) is accepted, + the constant ``noise`` feature is rejected.""" + proposer = StaticFeatureProposer( + [ + FeatureProposal("debt_to_income", "df['debt_to_income'] = df['loan_amount'] / (df['income'] + 1)", "DTI"), + FeatureProposal("noise", "df['noise'] = 0.0", "should be rejected"), + ] + ) + return GenAIFeatureEngineer(proposer, scorer_estimator=_logistic_scorer, cv=4).engineer(train) + + +def step_5_agentic_loop(train: Dataset) -> Any: + """The agentic ML-engineering loop, offline. With a real LLM you would use ``AgentSolutionProposer``; + here a fixed candidate sequence stands in. Each candidate is trained, cross-validated, and verified + (it must beat a trivial baseline) before selection.""" + proposer = SequenceProposer( + [SolutionCandidate("linear"), SolutionCandidate("random_forest"), SolutionCandidate("hist_gradient_boosting")] + ) + return AgenticAutoML(proposer, cv=3, max_iterations=4).solve(train) + + +def step_6_serve(model: Any, sample_x: Any) -> Any: + """Serve the winning model in-process and score a sample applicant.""" + server = LocalModelServer() + server.load(model) + return server.predict(sample_x) + + +def run() -> dict[str, Any]: + """Run the whole tutorial and return a structured report (used by the test).""" + boot = step_1_boot() + dataset, validation = step_2_load_and_validate() + train, test = dataset.train_test_split(test_size=0.25, random_state=0) + + automl = step_3_classical_automl(train) + automl_eval = automl.evaluate(test) + engineered = step_4_genai_feature_engineering(train) + agentic = step_5_agentic_loop(train) + prediction = step_6_serve(automl.best_model, test.X.iloc[[0]]) + + return { + "boot": boot, + "validation_ok": validation.ok, + "automl_winner": automl.best_model.name, + "automl_roc_auc": automl_eval.metrics["roc_auc"], + "leaderboard": automl.leaderboard_table(), + "fe_accepted": [a.proposal.name for a in engineered.accepted], + "fe_rejected": [r.proposal.name for r in engineered.rejected], + "fe_lift": engineered.lift, + "agentic_best": agentic.best_candidate.trainer if agentic.best_candidate else None, + "agentic_verified": len(agentic.valid_attempts), + "sample_prediction": int(prediction[0]), + } + + +def main() -> None: + print("=" * 72) + print(" Firefly DataScience — end-to-end tutorial (credit-risk)") + print("=" * 72) + report = run() + print(f"\n[1] App booted: {report['boot']['beans']} beans, {report['boot']['auto_configs']} auto-configurations") + print(f"[2] Data validated: ok={report['validation_ok']}") + print(f"[3] Classical AutoML winner: {report['automl_winner']} (holdout roc_auc={report['automl_roc_auc']:.4f})") + print(" leaderboard:") + for line in report["leaderboard"].splitlines(): + print(f" {line}") + print( + f"[4] GenAI features: accepted={report['fe_accepted']} rejected={report['fe_rejected']} (lift {report['fe_lift']:+.4f})" + ) + print(f"[5] Agentic loop best: {report['agentic_best']} ({report['agentic_verified']} verified candidates)") + print(f"[6] Served prediction for one applicant: default={report['sample_prediction']}") + print("\n" + "-" * 72) + print(" Turn on a REAL LLM (GenAI feature engineering + the agentic loop):") + print("-" * 72) + print( + " export OPENAI_API_KEY=sk-... # or ANTHROPIC_API_KEY=...\n" + " export FIREFLY_DATASCIENCE_GENAI__ENABLED=true\n" + " export FIREFLY_DATASCIENCE_GENAI__DEFAULT_MODEL=openai:gpt-4o # or anthropic:claude-sonnet-4-5\n" + "\n" + " Then use the agent-backed proposers instead of the stand-ins:\n" + " from fireflyframework_datascience.features.genai import AgentFeatureProposer\n" + " from fireflyframework_datascience.engineering.loop import AgentSolutionProposer\n" + "\n" + " Full guide: docs/llm-configuration.md" + ) + + +if __name__ == "__main__": + main() diff --git a/tests/samples/test_tutorial.py b/tests/samples/test_tutorial.py new file mode 100644 index 0000000..1f82484 --- /dev/null +++ b/tests/samples/test_tutorial.py @@ -0,0 +1,28 @@ +# Copyright 2026 Firefly Software Foundation. +"""Guarantees the end-to-end tutorial actually runs (the user asked us to ensure it works).""" + +from __future__ import annotations + + +def _tutorial(): # type: ignore[no-untyped-def] + import pathlib + import sys + + sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[2] / "samples")) + import tutorial + + return tutorial + + +def test_tutorial_runs_end_to_end() -> None: + report = _tutorial().run() + assert report["boot"]["beans"] > 0 + assert report["validation_ok"] is True + assert report["automl_winner"] + assert report["automl_roc_auc"] > 0.7 + assert "debt_to_income" in report["fe_accepted"] # the hidden driver is discovered + assert "noise" in report["fe_rejected"] # the useless feature is gated out + assert report["fe_lift"] > 0 + assert report["agentic_best"] + assert report["agentic_verified"] >= 1 + assert report["sample_prediction"] in (0, 1) diff --git a/uv.lock b/uv.lock index c332d5b..d3e7b0f 100644 --- a/uv.lock +++ b/uv.lock @@ -674,6 +674,19 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/df/73/b6e24bd22e6720ca8ee9a85a0c4a2971af8497d8f3193fa05390cbd46e09/backoff-2.2.1-py3-none-any.whl", hash = "sha256:63579f9a0628e06278f7e47b7d7d5b6ce20dc65c5e96a6f3ca99a6adca0396e8", size = 15148, upload-time = "2022-10-05T19:19:30.546Z" }, ] +[[package]] +name = "backrefs" +version = "7.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/5e/a7/a7dd63622beef68cc0d3c3c36d472e143dd95443d5ebf14cd1a5b4dfbf11/backrefs-7.0.tar.gz", hash = "sha256:4989bb9e1e99eb23647c7160ed51fb21d0b41b5d200f2d3017da41e023097e82", size = 7012453, upload-time = "2026-04-28T16:28:04.215Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d4/39/39a31d7eae729ea14ed10c3ccef79371197177b9355a86cb3525709e8502/backrefs-7.0-py310-none-any.whl", hash = "sha256:b57cd227ea556b0aed3dc9b8da4628db4eabc0402c6d7fcfc69283a93955f7e9", size = 380824, upload-time = "2026-04-28T16:27:55.647Z" }, + { url = "https://files.pythonhosted.org/packages/c9/b5/9302644225ba7dfa934a2ff2b9c7bb85701313a90dddb3dfaf693fa5bae2/backrefs-7.0-py311-none-any.whl", hash = "sha256:a0fa7360c63509e9e077e174ef4e6d3c21c8db94189b9d957289ae6d794b9475", size = 392626, upload-time = "2026-04-28T16:27:57.42Z" }, + { url = "https://files.pythonhosted.org/packages/36/da/87912ddec6e06feffbaa3d7aa18fc6352bee2e8f1fee185d7d1690f8f4e8/backrefs-7.0-py312-none-any.whl", hash = "sha256:ca42ce6a49ace3d75684dfa9937f3373902a63284ecb385ce36d15e5dcb41c12", size = 398537, upload-time = "2026-04-28T16:27:58.913Z" }, + { url = "https://files.pythonhosted.org/packages/00/bb/90ba423612b6aa0adccc6b1874bcd4a9b44b660c0c16f346611e00f64ac3/backrefs-7.0-py313-none-any.whl", hash = "sha256:f2c52955d631b9e1ac4cd56209f0a3a946d592b98e7790e77699339ae01c102a", size = 400491, upload-time = "2026-04-28T16:28:00.928Z" }, + { url = "https://files.pythonhosted.org/packages/3e/5c/fb93d3092640a24dfb7bd7727a24016d7c01774ca013e60efd3f683c8002/backrefs-7.0-py314-none-any.whl", hash = "sha256:a6448b28180e3ca01134c9cf09dcebafad8531072e09903c5451748a05f24bc9", size = 412349, upload-time = "2026-04-28T16:28:02.412Z" }, +] + [[package]] name = "bcrypt" version = "5.0.0" @@ -2153,6 +2166,9 @@ dev = [ { name = "pytest-cov" }, { name = "ruff" }, ] +docs = [ + { name = "mkdocs-material" }, +] [package.metadata] requires-dist = [ @@ -2206,6 +2222,7 @@ dev = [ { name = "pytest-cov", specifier = ">=5.0.0" }, { name = "ruff", specifier = ">=0.7.0" }, ] +docs = [{ name = "mkdocs-material", specifier = ">=9.5.0" }] [[package]] name = "flask" @@ -2416,6 +2433,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/b3/bb/d71d6da82763528c2c2ed6b59a9d6142c6595545a4c448e2085d155e88c2/gguf-0.19.0-py3-none-any.whl", hash = "sha256:70bcd10edfe697fb2dad6e40af2234b9d8ece9a41a99761405121ebda1c3c1cd", size = 118475, upload-time = "2026-05-06T13:04:02.588Z" }, ] +[[package]] +name = "ghp-import" +version = "2.1.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "python-dateutil" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/d9/29/d40217cbe2f6b1359e00c6c307bb3fc876ba74068cbab3dde77f03ca0dc4/ghp-import-2.1.0.tar.gz", hash = "sha256:9c535c4c61193c2df8871222567d7fd7e5014d835f97dc7b7439069e2413d343", size = 10943, upload-time = "2022-05-02T15:47:16.11Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/f7/ec/67fbef5d497f86283db54c22eec6f6140243aae73265799baaaa19cd17fb/ghp_import-2.1.0-py3-none-any.whl", hash = "sha256:8337dd7b50877f163d4c0289bc1f1c7f127550241988d568c1db512c4324a619", size = 11034, upload-time = "2022-05-02T15:47:14.552Z" }, +] + [[package]] name = "gitdb" version = "4.0.12" @@ -3527,6 +3556,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/bc/b1/a0ec7a5a9db730a08daef1fdfb8090435b82465abbf758a596f0ea88727e/mako-1.3.12-py3-none-any.whl", hash = "sha256:8f61569480282dbf557145ce441e4ba888be453c30989f879f0d652e39f53ea9", size = 78521, upload-time = "2026-04-28T19:01:10.393Z" }, ] +[[package]] +name = "markdown" +version = "3.10.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/2b/f4/69fa6ed85ae003c2378ffa8f6d2e3234662abd02c10d216c0ba96081a238/markdown-3.10.2.tar.gz", hash = "sha256:994d51325d25ad8aa7ce4ebaec003febcce822c3f8c911e3b17c52f7f589f950", size = 368805, upload-time = "2026-02-09T14:57:26.942Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/de/1f/77fa3081e4f66ca3576c896ae5d31c3002ac6607f9747d2e3aa49227e464/markdown-3.10.2-py3-none-any.whl", hash = "sha256:e91464b71ae3ee7afd3017d9f358ef0baf158fd9a298db92f1d4761133824c36", size = 108180, upload-time = "2026-02-09T14:57:25.787Z" }, +] + [[package]] name = "markdown-it-py" version = "4.2.0" @@ -3672,6 +3710,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/b3/38/89ba8ad64ae25be8de66a6d463314cf1eb366222074cfda9ee839c56a4b4/mdurl-0.1.2-py3-none-any.whl", hash = "sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8", size = 9979, upload-time = "2022-08-14T12:40:09.779Z" }, ] +[[package]] +name = "mergedeep" +version = "1.3.4" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/3a/41/580bb4006e3ed0361b8151a01d324fb03f420815446c7def45d02f74c270/mergedeep-1.3.4.tar.gz", hash = "sha256:0096d52e9dad9939c3d975a774666af186eda617e6ca84df4c94dec30004f2a8", size = 4661, upload-time = "2021-02-05T18:55:30.623Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/2c/19/04f9b178c2d8a15b076c8b5140708fa6ffc5601fb6f1e975537072df5b2a/mergedeep-1.3.4-py3-none-any.whl", hash = "sha256:70775750742b25c0d8f36c55aed03d24c3384d17c951b3175d898bd778ef0307", size = 6354, upload-time = "2021-02-05T18:55:29.583Z" }, +] + [[package]] name = "methodtools" version = "0.4.7" @@ -3747,6 +3794,75 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/e0/31/472fa3c766aa9b7869dceb6804344cc9945115ae2ee871258b404d0e7c84/mistralai-2.5.0-py3-none-any.whl", hash = "sha256:f29813f1c2e4c19d24707cb5f74ea5c071d106fcaa84dd856a7e4f2d36d908b2", size = 1216614, upload-time = "2026-06-23T17:05:20.819Z" }, ] +[[package]] +name = "mkdocs" +version = "1.6.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "click" }, + { name = "colorama", marker = "sys_platform == 'win32'" }, + { name = "ghp-import" }, + { name = "jinja2" }, + { name = "markdown" }, + { name = "markupsafe" }, + { name = "mergedeep" }, + { name = "mkdocs-get-deps" }, + { name = "packaging" }, + { name = "pathspec" }, + { name = "pyyaml" }, + { name = "pyyaml-env-tag" }, + { name = "watchdog" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/bc/c6/bbd4f061bd16b378247f12953ffcb04786a618ce5e904b8c5a01a0309061/mkdocs-1.6.1.tar.gz", hash = "sha256:7b432f01d928c084353ab39c57282f29f92136665bdd6abf7c1ec8d822ef86f2", size = 3889159, upload-time = "2024-08-30T12:24:06.899Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/22/5b/dbc6a8cddc9cfa9c4971d59fb12bb8d42e161b7e7f8cc89e49137c5b279c/mkdocs-1.6.1-py3-none-any.whl", hash = "sha256:db91759624d1647f3f34aa0c3f327dd2601beae39a366d6e064c03468d35c20e", size = 3864451, upload-time = "2024-08-30T12:24:05.054Z" }, +] + +[[package]] +name = "mkdocs-get-deps" +version = "0.2.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "mergedeep" }, + { name = "platformdirs" }, + { name = "pyyaml" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/ce/25/b3cccb187655b9393572bde9b09261d267c3bf2f2cdabe347673be5976a6/mkdocs_get_deps-0.2.2.tar.gz", hash = "sha256:8ee8d5f316cdbbb2834bc1df6e69c08fe769a83e040060de26d3c19fad3599a1", size = 11047, upload-time = "2026-03-10T02:46:33.632Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/29/744136411e785c4b0b744d5413e56555265939ab3a104c6a4b719dad33fd/mkdocs_get_deps-0.2.2-py3-none-any.whl", hash = "sha256:e7878cbeac04860b8b5e0ca31d3abad3df9411a75a32cde82f8e44b6c16ff650", size = 9555, upload-time = "2026-03-10T02:46:32.256Z" }, +] + +[[package]] +name = "mkdocs-material" +version = "9.7.6" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "babel" }, + { name = "backrefs" }, + { name = "colorama" }, + { name = "jinja2" }, + { name = "markdown" }, + { name = "mkdocs" }, + { name = "mkdocs-material-extensions" }, + { name = "paginate" }, + { name = "pygments" }, + { name = "pymdown-extensions" }, + { name = "requests" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/45/29/6d2bcf41ae40802c4beda2432396fff97b8456fb496371d1bc7aad6512ec/mkdocs_material-9.7.6.tar.gz", hash = "sha256:00bdde50574f776d328b1862fe65daeaf581ec309bd150f7bff345a098c64a69", size = 4097959, upload-time = "2026-03-19T15:41:58.161Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/2c/01/bc663630c510822c95c47a66af9fa7a443c295b47d5f041e5e6ae62ef659/mkdocs_material-9.7.6-py3-none-any.whl", hash = "sha256:71b84353921b8ea1ba84fe11c50912cc512da8fe0881038fcc9a0761c0e635ba", size = 9305470, upload-time = "2026-03-19T15:41:55.217Z" }, +] + +[[package]] +name = "mkdocs-material-extensions" +version = "1.3.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/79/9b/9b4c96d6593b2a541e1cb8b34899a6d021d208bb357042823d4d2cabdbe7/mkdocs_material_extensions-1.3.1.tar.gz", hash = "sha256:10c9511cea88f568257f960358a467d12b970e1f7b2c0e5fb2bb48cab1928443", size = 11847, upload-time = "2023-11-22T19:09:45.208Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/5b/54/662a4743aa81d9582ee9339d4ffa3c8fd40a4965e033d77b9da9774d3960/mkdocs_material_extensions-1.3.1-py3-none-any.whl", hash = "sha256:adff8b62700b25cb77b53358dad940f3ef973dd6db797907c49e3c2ef3ab4e31", size = 8728, upload-time = "2023-11-22T19:09:43.465Z" }, +] + [[package]] name = "mlflow" version = "3.14.0" @@ -4838,6 +4954,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/20/12/38679034af332785aac8774540895e234f4d07f7545804097de4b666afd8/packaging-25.0-py3-none-any.whl", hash = "sha256:29572ef2b1f17581046b3a2227d5c611fb25ec70ca1ba8554b24b0e69331a484", size = 66469, upload-time = "2025-04-19T11:48:57.875Z" }, ] +[[package]] +name = "paginate" +version = "0.5.7" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/ec/46/68dde5b6bc00c1296ec6466ab27dddede6aec9af1b99090e1107091b3b84/paginate-0.5.7.tar.gz", hash = "sha256:22bd083ab41e1a8b4f3690544afb2c60c25e5c9a63a30fa2f483f6c60c8e5945", size = 19252, upload-time = "2024-08-25T14:17:24.139Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/90/96/04b8e52da071d28f5e21a805b19cb9390aa17a47462ac87f5e2696b9566d/paginate-0.5.7-py2.py3-none-any.whl", hash = "sha256:b885e2af73abcf01d9559fd5216b57ef722f8c42affbb63942377668e35c7591", size = 13746, upload-time = "2024-08-25T14:17:22.55Z" }, +] + [[package]] name = "pandas" version = "2.3.3" @@ -5844,6 +5969,19 @@ crypto = [ { name = "cryptography" }, ] +[[package]] +name = "pymdown-extensions" +version = "11.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "markdown" }, + { name = "pyyaml" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/47/67/f1e79672a5f91985577c7984c9709ca110e4fd37fe7fd167b60422e6ccc2/pymdown_extensions-11.0.tar.gz", hash = "sha256:8269cef0247f9e2d0a62fcea10860aba05c1cbab5470fd4b63230b96434dc589", size = 857049, upload-time = "2026-06-23T02:27:45.146Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/af/b6/1ae53367e28b9cffa3be7574e13fbe4589694272fd47710fbdbafd3d63c6/pymdown_extensions-11.0-py3-none-any.whl", hash = "sha256:fbc4acb641814fa9d17521bbd21a5240ef739a662f11c06330c4b78c93e954d6", size = 269415, upload-time = "2026-06-23T02:27:43.826Z" }, +] + [[package]] name = "pyparsing" version = "3.3.2" @@ -6087,6 +6225,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/f1/12/de94a39c2ef588c7e6455cfbe7343d3b2dc9d6b6b2f40c4c6565744c873d/pyyaml-6.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:ebc55a14a21cb14062aa4162f906cd962b28e2e9ea38f9b4391244cd8de4ae0b", size = 149341, upload-time = "2025-09-25T21:32:56.828Z" }, ] +[[package]] +name = "pyyaml-env-tag" +version = "1.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "pyyaml" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/eb/2e/79c822141bfd05a853236b504869ebc6b70159afc570e1d5a20641782eaa/pyyaml_env_tag-1.1.tar.gz", hash = "sha256:2eb38b75a2d21ee0475d6d97ec19c63287a7e140231e4214969d0eac923cd7ff", size = 5737, upload-time = "2025-05-13T15:24:01.64Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/04/11/432f32f8097b03e3cd5fe57e88efb685d964e2e5178a48ed61e841f7fdce/pyyaml_env_tag-1.1-py3-none-any.whl", hash = "sha256:17109e1a528561e32f026364712fee1264bc2ea6715120891174ed1b980d2e04", size = 4722, upload-time = "2025-05-13T15:23:59.629Z" }, +] + [[package]] name = "pyyaml-ft" version = "8.0.0" @@ -7682,6 +7832,27 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/f0/77/b5ce9696c8cb955521a7941fbc443e78b2f504894c6ae1a2d0b1de6e12ae/wandb-0.28.0-py3-none-win_arm64.whl", hash = "sha256:c5b0faf1b84cf79ebabed77538c1940a4c6053e815f767a4004e877a1354bed1", size = 22378208, upload-time = "2026-06-23T00:38:47.148Z" }, ] +[[package]] +name = "watchdog" +version = "6.0.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/db/7d/7f3d619e951c88ed75c6037b246ddcf2d322812ee8ea189be89511721d54/watchdog-6.0.0.tar.gz", hash = "sha256:9ddf7c82fda3ae8e24decda1338ede66e1c99883db93711d8fb941eaa2d8c282", size = 131220, upload-time = "2024-11-01T14:07:13.037Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/68/98/b0345cabdce2041a01293ba483333582891a3bd5769b08eceb0d406056ef/watchdog-6.0.0-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:490ab2ef84f11129844c23fb14ecf30ef3d8a6abafd3754a6f75ca1e6654136c", size = 96480, upload-time = "2024-11-01T14:06:42.952Z" }, + { url = "https://files.pythonhosted.org/packages/85/83/cdf13902c626b28eedef7ec4f10745c52aad8a8fe7eb04ed7b1f111ca20e/watchdog-6.0.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:76aae96b00ae814b181bb25b1b98076d5fc84e8a53cd8885a318b42b6d3a5134", size = 88451, upload-time = "2024-11-01T14:06:45.084Z" }, + { url = "https://files.pythonhosted.org/packages/fe/c4/225c87bae08c8b9ec99030cd48ae9c4eca050a59bf5c2255853e18c87b50/watchdog-6.0.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:a175f755fc2279e0b7312c0035d52e27211a5bc39719dd529625b1930917345b", size = 89057, upload-time = "2024-11-01T14:06:47.324Z" }, + { url = "https://files.pythonhosted.org/packages/a9/c7/ca4bf3e518cb57a686b2feb4f55a1892fd9a3dd13f470fca14e00f80ea36/watchdog-6.0.0-py3-none-manylinux2014_aarch64.whl", hash = "sha256:7607498efa04a3542ae3e05e64da8202e58159aa1fa4acddf7678d34a35d4f13", size = 79079, upload-time = "2024-11-01T14:06:59.472Z" }, + { url = "https://files.pythonhosted.org/packages/5c/51/d46dc9332f9a647593c947b4b88e2381c8dfc0942d15b8edc0310fa4abb1/watchdog-6.0.0-py3-none-manylinux2014_armv7l.whl", hash = "sha256:9041567ee8953024c83343288ccc458fd0a2d811d6a0fd68c4c22609e3490379", size = 79078, upload-time = "2024-11-01T14:07:01.431Z" }, + { url = "https://files.pythonhosted.org/packages/d4/57/04edbf5e169cd318d5f07b4766fee38e825d64b6913ca157ca32d1a42267/watchdog-6.0.0-py3-none-manylinux2014_i686.whl", hash = "sha256:82dc3e3143c7e38ec49d61af98d6558288c415eac98486a5c581726e0737c00e", size = 79076, upload-time = "2024-11-01T14:07:02.568Z" }, + { url = "https://files.pythonhosted.org/packages/ab/cc/da8422b300e13cb187d2203f20b9253e91058aaf7db65b74142013478e66/watchdog-6.0.0-py3-none-manylinux2014_ppc64.whl", hash = "sha256:212ac9b8bf1161dc91bd09c048048a95ca3a4c4f5e5d4a7d1b1a7d5752a7f96f", size = 79077, upload-time = "2024-11-01T14:07:03.893Z" }, + { url = "https://files.pythonhosted.org/packages/2c/3b/b8964e04ae1a025c44ba8e4291f86e97fac443bca31de8bd98d3263d2fcf/watchdog-6.0.0-py3-none-manylinux2014_ppc64le.whl", hash = "sha256:e3df4cbb9a450c6d49318f6d14f4bbc80d763fa587ba46ec86f99f9e6876bb26", size = 79078, upload-time = "2024-11-01T14:07:05.189Z" }, + { url = "https://files.pythonhosted.org/packages/62/ae/a696eb424bedff7407801c257d4b1afda455fe40821a2be430e173660e81/watchdog-6.0.0-py3-none-manylinux2014_s390x.whl", hash = "sha256:2cce7cfc2008eb51feb6aab51251fd79b85d9894e98ba847408f662b3395ca3c", size = 79077, upload-time = "2024-11-01T14:07:06.376Z" }, + { url = "https://files.pythonhosted.org/packages/b5/e8/dbf020b4d98251a9860752a094d09a65e1b436ad181faf929983f697048f/watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl", hash = "sha256:20ffe5b202af80ab4266dcd3e91aae72bf2da48c0d33bdb15c66658e685e94e2", size = 79078, upload-time = "2024-11-01T14:07:07.547Z" }, + { url = "https://files.pythonhosted.org/packages/07/f6/d0e5b343768e8bcb4cda79f0f2f55051bf26177ecd5651f84c07567461cf/watchdog-6.0.0-py3-none-win32.whl", hash = "sha256:07df1fdd701c5d4c8e55ef6cf55b8f0120fe1aef7ef39a1c6fc6bc2e606d517a", size = 79065, upload-time = "2024-11-01T14:07:09.525Z" }, + { url = "https://files.pythonhosted.org/packages/db/d9/c495884c6e548fce18a8f40568ff120bc3a4b7b99813081c8ac0c936fa64/watchdog-6.0.0-py3-none-win_amd64.whl", hash = "sha256:cbafb470cf848d93b5d013e2ecb245d4aa1c8fd0504e863ccefa32445359d680", size = 79070, upload-time = "2024-11-01T14:07:10.686Z" }, + { url = "https://files.pythonhosted.org/packages/33/e8/e40370e6d74ddba47f002a32919d91310d6074130fe4e17dabcafc15cbf1/watchdog-6.0.0-py3-none-win_ia64.whl", hash = "sha256:a1914259fa9e1454315171103c6a30961236f508b9b623eae470268bbcc6a22f", size = 79067, upload-time = "2024-11-01T14:07:11.845Z" }, +] + [[package]] name = "watchfiles" version = "1.2.0"