Skip to content

Latest commit

Β 

History

History
204 lines (154 loc) Β· 10.1 KB

File metadata and controls

204 lines (154 loc) Β· 10.1 KB

Firefly DataScience β€” AutoML that fuses GenAI with classical ML and Deep Learning, built on Firefly Agentic

Firefly DataScience

AutoML that fuses GenAI with classical ML & Deep Learning β€” hexagonal, secure-by-default, native to the Firefly Framework.

Python 3.13+ Β Β·Β  License: Apache 2.0 Β Β·Β  Built on Firefly Agentic Β Β·Β  ruff Β Β·Β  pyright

The LLM proposes; a deterministic classical engine decides. GenAI is a governed, measurably-gated accelerator over a battle-tested classical core β€” never a black box.

Copyright 2026 Firefly Software Foundation Β· Licensed under the Apache License 2.0


Status: all sub-projects delivered and green (ruff Β· pyright Β· 90+ tests). Classical tabular AutoML Β· GenAI feature engineering Β· the agentic ML-engineering loop Β· deep learning (PyTorch Lightning) + NLP (HuggingFace) + vision Β· TabFM Β· serving Β· the OpenML-AMLB benchmark harness. New here? Start with the Tutorial or browse the documentation site.

What is this?

fireflyframework-datascience is a state-of-the-art Python metaframework for AutoML. It combines GenAI (built on fireflyframework-agentic, which wraps Pydantic AI) with traditional ML and Deep Learning, so any team can apply data science to any project quickly β€” with production governance, hexagonal swappability, and security by default.

  • One reproducible pattern. The LLM proposes code/features/pipelines/seeds; a deterministic classical engine trains, scores, and selects; every GenAI step is gated behind a measured improvement over a seeded classical baseline.
  • Hexagonal & swappable. Each ML/MLOps library sits behind a Protocol port, so the core stays library-agnostic. Adapters that ship today: scikit-learn, XGBoost, LightGBM, CatBoost, TabPFN, PyTorch Lightning, HuggingFace, and MLflow. Ports with reference or planned adapters (AutoGluon, Feast, BentoML packaging, a model registry) are marked as such in the docs β€” the seams exist; the adapters are landing.
  • Firefly-native. Auto-configuration, dependency injection, a startup banner + wiring summary, CalVer, and the same CI gates as the rest of the Firefly Framework.

Why it matters

Beyond the engineering, Firefly DataScience is designed to change five things that decide whether data science actually delivers business value:

  • Faster time-to-value β€” AutoML chooses and tunes the model and an agentic loop iterates, so a benchmarked, production-grade model is days of work, not quarters.
  • Governed GenAI β€” the LLM proposes, a deterministic engine measures, and a cost/benefit gate keeps only what beats the baseline. Every decision is logged and auditable; no unproven AI output ships.
  • No vendor lock-in β€” open (Apache-2.0) and hexagonal: every ML library and every LLM provider is a swappable adapter, and the whole framework is self-hostable.
  • Lower cost & risk β€” classical-first (cheap, reproducible) with secure-by-default execution of any generated code; generative AI is used only where it measurably pays.
  • Production-ready β€” serving, data validation, lineage and real benchmarks are built in, not bolted on.

Proven, not promised β€” unbiased and significance-tested. Under nested cross-validation (no selection bias), Firefly's AutoML significantly beats a single LogisticRegression (Ξ” +0.029, p = 0.046) and a single XGBoost (Ξ” +0.030, p = 7.5e-6), and is statistically on par with RandomForest β€” adapting per dataset, up to +0.15 on non-linear phoneme. With a real LLM (claude-haiku-4-5), governed GenAI feature engineering adds a significant +0.021 lift on a linear model (p = 0.0039) by rediscovering a withheld driver (revenue = price Γ— units) from the schema β€” and the cost/benefit gate guarantees it never regresses, at < $0.01. Every number is reproducible β€” see the benchmark results.

πŸ“„ The whole story in one document: The Complete Guide (PDF) combines the executive summary and strategic case with the architecture, a full hands-on tutorial, and the benchmark evidence β€” for both leaders and engineers.

Quick start

uv add 'fireflyframework-datascience[tabular]'        # classical AutoML
# or:  uv add 'fireflyframework-datascience[automl-stack]'   # + TabPFN, MLflow, OpenML

Train, rank, and evaluate models in five lines:

from fireflyframework_datascience.automl import AutoML
from fireflyframework_datascience.datasets.adapters import SklearnDatasetLoader

train, test = SklearnDatasetLoader().load("breast_cancer").train_test_split()
result = AutoML().fit(train)               # cross-validates candidates, picks the winner
print(result.leaderboard_table())          # random_forest / linear / hist_gradient_boosting …
print(result.evaluate(test))               # holdout roc_auc β‰ˆ 0.98

Boot it as a Firefly application (auto-configuration + dependency injection), or use the CLI:

firefly-ds doctor       # check your environment & installed adapters
firefly-ds introspect   # boot the app and show discovered auto-configurations

Add a real LLM for GenAI feature engineering and the agentic loop β€” see Configuring the LLM. The full guided walkthrough is the Tutorial.

How it works

Layered architecture

Five acyclic layers, mirroring fireflyframework-agentic with a DataScience layer inserted: Core β†’ Agent (reused) β†’ DataScience β†’ Intelligence β†’ Orchestration.

Firefly DataScience layered architecture

Hexagonal ports & adapters

Each ML/MLOps library sits behind a Protocol port, so the core stays library-agnostic. Shipping adapters today: scikit-learn, XGBoost, LightGBM, CatBoost, TabPFN, PyTorch Lightning, HuggingFace, MLflow. AutoGluon, Feast, BentoML packaging and a model registry are ports with reference/planned adapters.

Hexagonal ports and adapters

Auto-configuration

Adapters self-register via entry points and are wired by a type-hint dependency-injection container, gated by @conditional_on_* β€” exactly like Spring Boot / pyfly.

Entry-point auto-configuration

Classical AutoML

Classical AutoML pipeline

Governed GenAI Γ— classical fusion

The LLM proposes code/features; a deterministic engine measures; a cost/benefit gate keeps only what beats the seeded baseline. The LLM never decides β€” the measured score does.

Governed GenAI and classical fusion

The agentic ML-engineering loop

Propose β†’ execute (sandboxed) β†’ observe β†’ verify (correctness β‰  ran) β†’ reflect β†’ select.

Agentic ML-engineering loop

Secure by default

Secure-by-default execution tiers

Where it fits

Firefly ecosystem

Documentation

πŸ“– Full docs site: https://fireflyframework.github.io/fireflyframework-datascience/

Guide
Tutorial the guided end-to-end walkthrough (runs offline; tested)
Samples runnable demos β€” tutorial, real-LLM showcase, finance/retail
Quick Start install, boot, first AutoML run, the firefly-ds CLI
Configuring the LLM providers, API keys, model selection, cost gating
Architecture layers, hexagonal ports, auto-configuration, the DI container
Configuration env / .env / YAML / profiles precedence
Datasets the Dataset container and loaders
Classical AutoML the AutoML facade, trainers, search, metrics, calibration, ensembling, PR-AUC selection & CV strategies
Explainability deterministic global + local feature importances (permutation, SHAP)
GenAI Feature Engineering propose β†’ execute β†’ measure β†’ gate; the persisted audit trail
Agentic ML-Engineering Loop propose β†’ verify β†’ reflect β†’ select
Deep Learning & TabFM MLP, TabPFN, the PyTorch integration point
Serving & Lineage in-process and gated servers, lineage
Security Model secure code execution, sandbox tiers, prompt-injection defense
Benchmarks the three-tier AMLB-anchored evaluation strategy
Use Case: Lumen Lending the end-to-end credit-risk walkthrough

License

Apache-2.0. Copyright 2026 Firefly Software Foundation.