Scientific evaluation (nested CV), GenAI-value ablation, boosting tests; fix Pages images by ancongui · Pull Request #7 · fireflyframework/fireflyframework-datascience

ancongui · 2026-06-25T13:03:04Z

Scientific, unbiased evaluation

scientific_eval.py — nested 5-fold CV (inner CV selects, untouched outer fold scores — no selection bias) vs LogReg / RandomForest / XGBoost, with Wilcoxon. Firefly AutoML beats single LogReg (p=0.046) and single XGBoost (p=7.5e-6), on par with RandomForest; adapts per dataset.
genai_value.py — controlled ablation with a real LLM: GenAI feature engineering lifts a linear model +0.0205 ROC-AUC (p=0.0039) by rediscovering revenue = price × units; the gate guarantees no regression; < $0.01.
tests/models/test_boosting.py — explicit XGBoost / LightGBM / CatBoost fit, predict, params.
Integration tests for both harnesses (verified live). RESULTS.md + docs updated with the rigorous, honest numbers.

Fix: diagrams not rendering on GitHub Pages

use_directory_urls: false so raw-HTML <img src="img/.."> resolve from the site root on every page (the agentic-loop and other diagrams were 404ing under directory URLs).

Gate: ruff clean · pyright 0 · mkdocs --strict ok · 100 tests pass (9 integration deselected). No API key in the repo.

…s; fix Pages images - benchmarks/scientific_eval.py: NESTED 5-fold CV (unbiased — inner CV selects, outer fold scores) vs LogReg/RandomForest/XGBoost + Wilcoxon. Firefly beats single LogReg (p=0.046) and single XGBoost (p=7.5e-6), on par with RandomForest; adapts per dataset. - benchmarks/genai_value.py: controlled ablation w/ real LLM — GenAI feature engineering lifts a linear model +0.0205 ROC-AUC (p=0.0039) by rediscovering revenue=price*units; gate guarantees no regression; <$0.01. Significant, metered, honest. - tests/models/test_boosting.py: explicit XGBoost/LightGBM/CatBoost fit/predict/params. - integration tests for both harnesses; RESULTS.md + docs updated with rigorous numbers. - FIX: mkdocs use_directory_urls=false so raw-HTML <img src='img/..'> render on GitHub Pages (the agentic-loop and other diagrams were 404ing under directory URLs). Gate: ruff clean, pyright 0, mkdocs --strict ok, 100 tests pass (9 integration deselected).

ancongui merged commit 24f7444 into main Jun 25, 2026

ancongui deleted the feat/scientific-eval branch June 25, 2026 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scientific evaluation (nested CV), GenAI-value ablation, boosting tests; fix Pages images#7

Scientific evaluation (nested CV), GenAI-value ablation, boosting tests; fix Pages images#7
ancongui merged 1 commit into
mainfrom
feat/scientific-eval

ancongui commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ancongui commented Jun 25, 2026

Scientific, unbiased evaluation

Fix: diagrams not rendering on GitHub Pages

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant