fireflyframework · ancongui · Jun 25, 2026 · Jun 25, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -19,6 +19,11 @@ All notable changes to `fireflyframework-datascience` are documented here. The p
   generated diagram set (8 diagrams: architecture, hexagonal, automl-loop, genai-fusion, agentic-loop,
   auto-configuration, security, ecosystem) under `docs/img/`.
 - **Polished README** (compelling 5-line quick start, docs-site link) and a new **`CONTRIBUTING.md`**.
+- **Fully diagrammed** — all 8 diagrams embedded across the README ("how it works" visual tour) and the
+  docs pages; a `docs/README.md` table-of-contents for GitHub folder browsing.
+- **Repository metadata** — description, homepage (docs site), and 20 topics set via `gh`.
+- **Fix:** the `.gitignore` rule `datasets/` was excluding the `datasets` source module from git (it had
+  never been committed); anchored the data-artifact ignores to the repo root and formatted the module.
 
 ### AMLB benchmark (Tier-1)
 

diff --git a/README.md b/README.md
@@ -80,25 +80,68 @@ Add a real LLM for GenAI feature engineering and the agentic loop — see
 [Configuring the LLM](docs/llm-configuration.md). The full guided walkthrough is the
 [Tutorial](docs/tutorial.md).
 
-## Architecture
+## How it works
 
-Five acyclic layers, mirroring `fireflyframework-agentic` with a **DataScience** layer inserted. Every
-ML/MLOps library is a swappable adapter behind a `Protocol` port, registered by **entry-point
-auto-configuration** and resolved through a type-hint **dependency-injection container**.
+### Layered architecture
+
+Five acyclic layers, mirroring `fireflyframework-agentic` with a **DataScience** layer inserted:
+`Core → Agent (reused) → DataScience → Intelligence → Orchestration`.
 
 <p align="center">
-  <img src="docs/img/architecture.svg" alt="Firefly DataScience layered architecture" width="70%">
+  <img src="docs/img/architecture.svg" alt="Firefly DataScience layered architecture" width="78%">
 </p>
 
-```
-Core → Agent (reused: agentic) → DataScience → Intelligence → Orchestration
-```
+### Hexagonal ports & adapters
+
+Every ML/MLOps library (scikit-learn, XGBoost, AutoGluon, TabPFN, PyTorch Lightning, HuggingFace,
+MLflow, BentoML, …) is a swappable adapter behind a `Protocol` port. The core stays library-agnostic.
+
+<p align="center">
+  <img src="docs/img/hexagonal.svg" alt="Hexagonal ports and adapters" width="78%">
+</p>
+
+### Auto-configuration
+
+Adapters self-register via entry points and are wired by a type-hint dependency-injection container,
+gated by `@conditional_on_*` — exactly like Spring Boot / pyfly.
+
+<p align="center">
+  <img src="docs/img/auto-configuration.svg" alt="Entry-point auto-configuration" width="62%">
+</p>
+
+### Classical AutoML
+
+<p align="center">
+  <img src="docs/img/automl-loop.svg" alt="Classical AutoML pipeline" width="88%">
+</p>
+
+### Governed GenAI × classical fusion
+
+The LLM proposes code/features; a deterministic engine measures; a **cost/benefit gate** keeps only
+what beats the seeded baseline. The LLM never decides — the measured score does.
+
+<p align="center">
+  <img src="docs/img/genai-classical-fusion.svg" alt="Governed GenAI and classical fusion" width="78%">
+</p>
+
+### The agentic ML-engineering loop
+
+Propose → execute (sandboxed) → observe → **verify** (correctness ≠ ran) → reflect → select.
+
+<p align="center">
+  <img src="docs/img/agentic-loop.svg" alt="Agentic ML-engineering loop" width="92%">
+</p>
+
+### Secure by default
+
+<p align="center">
+  <img src="docs/img/security.svg" alt="Secure-by-default execution tiers" width="88%">
+</p>
 
-The GenAI ↔ classical fusion is governed: the LLM proposes code; the classical engine measures; a
-cost/benefit gate keeps only what beats the baseline.
+### Where it fits
 
 <p align="center">
-  <img src="docs/img/genai-classical-fusion.svg" alt="Governed GenAI and classical fusion" width="70%">
+  <img src="docs/img/ecosystem.svg" alt="Firefly ecosystem" width="70%">
 </p>
 
 ## Documentation

diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,57 @@
+# Firefly DataScience — Documentation
+
+**The complete documentation set.** Browse it as a rendered site at
+**<https://fireflyframework.github.io/fireflyframework-datascience/>**, or read the Markdown here.
+
+<p align="center">
+  <img src="img/banner.svg" alt="Firefly DataScience" width="100%">
+</p>
+
+## Table of contents
+
+### Getting started
+| Page | What it covers |
+|---|---|
+| [Home / Overview](index.md) | what the framework is, the 7 pillars, the architecture at a glance |
+| [Quick Start](quickstart.md) | install, boot, your first AutoML run, the `firefly-ds` CLI |
+| [Tutorial](tutorial.md) | the guided, runnable end-to-end walkthrough (offline, tested) |
+| [Configuration](configuration.md) | env vars, `.env`, YAML, and profile precedence |
+| [Configuring the LLM](llm-configuration.md) | providers, API keys, model selection, cost & budget gating |
+
+### Concepts
+| Page | What it covers |
+|---|---|
+| [Architecture](architecture.md) | the five layers, hexagonal ports/adapters, the DI container, auto-configuration |
+| [Datasets](datasets.md) | the `Dataset` container, loaders, `train_test_split`, task inference |
+| [Classical AutoML](automl.md) | the `AutoML` facade, trainers, search policies, metrics, the leaderboard |
+| [GenAI Feature Engineering](genai-features.md) | propose → execute → measure → gate; the `CostBenefitGate` |
+| [Agentic ML-Engineering Loop](agentic-loop.md) | propose → train → verify → reflect → select |
+| [Deep Learning & TabFM](deep-learning.md) | sklearn-MLP, PyTorch Lightning, HuggingFace, TabPFN |
+| [Serving & Lineage](serving.md) | the in-process server, gated backends, lineage |
+| [Security Model](security.md) | secure code execution, sandbox tiers, prompt-injection defense |
+| [Benchmarks](benchmarks.md) | the three-tier AMLB/OpenML-anchored evaluation strategy |
+
+### Use case
+| Page | What it covers |
+|---|---|
+| [Lumen Lending — Credit Risk](use-case-lumen.md) | a full, realistic walkthrough end to end |
+
+## Diagrams
+
+All diagrams are generated (WeasyPrint-safe SVG, teal palette) by
+[`assets/tools/gen_diagrams.py`](../assets/tools/gen_diagrams.py) into [`img/`](img):
+
+| Diagram | |
+|---|---|
+| [Architecture](img/architecture.svg) | the five-layer design |
+| [Hexagonal ports](img/hexagonal.svg) | ports & adapters around a library-agnostic core |
+| [Auto-configuration](img/auto-configuration.svg) | entry-point discovery → conditions → beans |
+| [AutoML pipeline](img/automl-loop.svg) | the classical AutoML flow |
+| [GenAI × classical fusion](img/genai-classical-fusion.svg) | the governed fusion |
+| [Agentic loop](img/agentic-loop.svg) | propose → verify → reflect → select |
+| [Security tiers](img/security.svg) | the secure-by-default execution model |
+| [Ecosystem](img/ecosystem.svg) | how this sits beside Agentic and PyFly |
+
+---
+
+<sub>Copyright 2026 Firefly Software Foundation · Licensed under the Apache License 2.0</sub>
diff --git a/docs/agentic-loop.md b/docs/agentic-loop.md
@@ -10,6 +10,10 @@ an iteration and patience budget.
 
 The whole cycle is: **propose → train/CV → verify → reflect → select**.
 
+<p align="center">
+  <img src="img/agentic-loop.svg" alt="The agentic ML-engineering loop" width="85%">
+</p>
+
 ## The pieces
 
 | Type | Role |

diff --git a/docs/architecture.md b/docs/architecture.md
@@ -168,3 +168,11 @@ Passing `auto_configurations=[...]` **replaces** discovery entirely (handy for h
 - [Configuration](./configuration.md)
 - [Ports and adapters reference](index.md)
 - [Writing an auto-configuration](index.md)
+
+## Auto-configuration flow
+
+Adapters self-register via the `firefly_datascience.auto_configuration` entry-point group; the application context discovers them, evaluates their conditions, and registers the surviving beans.
+
+<p align="center">
+  <img src="img/auto-configuration.svg" alt="Entry-point auto-configuration" width="62%">
+</p>
diff --git a/docs/datasets.md b/docs/datasets.md
@@ -8,6 +8,10 @@ scikit-learn are imported lazily), so the `Dataset` type and the `DatasetLoaderP
 usable without the `tabular` extra installed. Concrete loaders live in
 `fireflyframework_datascience.datasets.adapters`.
 
+<p align="center">
+  <img src="img/hexagonal.svg" alt="Hexagonal ports and adapters" width="85%">
+</p>
+
 ## The `Dataset` container
 
 `Dataset` is a dataclass. The only required fields are `name` and `X`.

diff --git a/docs/security.md b/docs/security.md
@@ -4,6 +4,10 @@
 
 The GenAI accelerators (CAAFE-style automated feature engineering, agentic analysis) ask a model to *write Python that runs against your data*. That is an attack surface. The framework's job is to make the default path safe even when the model is wrong, compromised, or steered by adversarial data. This page describes the trust model, the controls that enforce it, and — importantly — where those controls stop.
 
+<p align="center">
+  <img src="img/security.svg" alt="Secure-by-default execution tiers" width="85%">
+</p>
+
 ## Threat model
 
 The model is **not** trusted. We assume any of:

diff --git a/docs/serving.md b/docs/serving.md
@@ -4,6 +4,10 @@
 
 Firefly DataScience keeps the core dependency-free. A fitted `Model` is served by a `ModelServerPort`; experiment runs go through a `TrackerPort`; data/model lineage flows through a `LineagePort`. Each port ships a zero-dependency default and an opt-in adapter behind an extra.
 
+<p align="center">
+  <img src="img/ecosystem.svg" alt="Firefly ecosystem" width="85%">
+</p>
+
 ## The model-server port
 
 ```python

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -75,5 +75,8 @@ nav:
       - Benchmarks: benchmarks.md
   - Use case — Lumen: use-case-lumen.md
 
-not_in_nav: |
-  /superpowers/
+# README.md is the GitHub folder index (table of contents); index.md is the site home. Keep the
+# former in the repo but exclude it (and the local-only specs) from the built site.
+exclude_docs: |
+  README.md
+  superpowers/