Skip to content

Upload CSV/Excel, get instant dashboards, predictive models, marketing attribution, and PDF reports

License

Notifications You must be signed in to change notification settings

ChunkyTortoise/insight-engine

Sponsor

Insight Engine

Marketing teams waste 8+ hours/week building reports from spreadsheets. Upload a CSV or Excel file and get instant dashboards, predictive models, marketing attribution, and downloadable reports.

CI Python Tests License Live Demo

Live Demo -- try it without installing anything.

Demo Snapshot

Demo Snapshot

🎥 Demo Video

Coming Soon — Video walkthrough of Insight Engine's key features including auto-profiling, predictive modeling, and marketing attribution.

Watch Demo (Link will be added when video is ready)

Planned Video Content:

  • Quick Start (2 min) — Upload a CSV and get instant insights
  • Auto-Profiling (2 min) — See column detection, distributions, and correlations in action
  • Predictive Modeling (3 min) — Train a model with SHAP explanations in one click
  • Marketing Attribution (3 min) — Four attribution models reveal true ROI

What This Solves

  • Manual reporting burns time -- Auto-profiler detects column types, distributions, outliers, and correlations in seconds
  • No visibility into which channels drive conversions -- Four attribution models show exactly where marketing budget should go
  • Predictive modeling requires ML expertise -- Upload labeled data, pick a target column, get a trained model with SHAP explanations
  • No way to segment customers -- K-means and DBSCAN clustering with silhouette scoring for automatic customer segmentation
  • Forecasting requires specialized tools -- Moving average, exponential smoothing, and ensemble forecasts from any time series column
  • Statistical validation is manual -- Automated hypothesis testing selects the right test based on data characteristics

Business Impact

Quantified outcomes from production BI dashboard deployments:

Key Metrics

Metric Before After Improvement
Quarterly report time 2 weeks 10 minutes 99% faster
Lead-to-Conversion 12% 28% 133% increase
Marketing attribution clarity Manual guesswork 4-model analysis Data-driven decisions
Customer churn prediction Reactive 85% accuracy Proactive retention
Forecast preparation 3 days 30 minutes 98% reduction

Pricing & Timeline

Component Range
BI Engine Investment $5,000 - $10,000
Implementation Timeline 4 - 8 weeks
This Case Study $7,500 / 6 weeks

Additional Outcomes

  • <2 seconds to profile 100K rows -- Instant insights without waiting for data teams
  • 4 attribution models reveal true ROI -- First-touch, last-touch, linear, and time-decay models show which channels actually drive conversions
  • SHAP explanations build trust -- Every prediction shows which features matter and why
  • Automated statistical testing -- Right test selected automatically based on data characteristics

Use Cases

Industry Use Case Outcome
E-commerce Revenue attribution across channels 35% better budget allocation, 133% conversion lift
SaaS Customer churn prediction and retention 85% prediction accuracy
Marketing Campaign performance dashboards 99% faster reporting
Finance Revenue forecasting and anomaly detection 98% reduction in forecast prep
HR Attrition prediction and risk scoring Proactive retention strategies

Key Metrics

Metric Value
Test Suite 520+ automated tests
Auto-Profile Speed <2s for 100K row CSV
Supported Models 8+ ML algorithms
Statistical Tests 6 hypothesis tests
Attribution Models 4 multi-touch models
Explainability SHAP + feature importance
BI Engine Pricing $5,000 - $10,000
Implementation Timeline 4 - 8 weeks
Proven Conversion Lift 133%

Service Mapping

  • Service 8: Interactive Business Intelligence Dashboards
  • Service 9: Automated Reporting Pipelines
  • Service 10: Predictive Analytics and Lead Scoring
  • Service 16: Marketing Attribution and ROI Analysis

Certification Mapping

  • Google Data Analytics Certificate
  • IBM Business Intelligence Analyst Professional Certificate
  • Microsoft Data Visualization Professional Certificate
  • Microsoft Generative AI for Data Analysis Professional Certificate
  • Google Business Intelligence Professional Certificate
  • Google Advanced Data Analytics Certificate

Architecture

flowchart TB
    Upload["CSV / Excel Upload"]

    Upload --> TypeDetect["Auto-Type Detection"]

    TypeDetect -->|numeric| Profiler
    TypeDetect -->|categorical| Profiler
    TypeDetect -->|datetime| Profiler
    TypeDetect -->|text| Profiler

    Profiler["Auto-Profiler
    statistics, distributions,
    correlations, outliers"]

    Profiler --> Forecast["Forecasting
    ARIMA, Prophet-like,
    exponential smoothing"]
    Profiler --> Cluster["Clustering
    K-Means, DBSCAN,
    hierarchical"]
    Profiler --> Anomaly["Anomaly Detection
    isolation forest,
    Z-score, IQR"]
    Profiler --> Attrib["Attribution Models
    first-touch, last-touch,
    linear, time-decay"]

    Forecast --> Observatory["Model Observatory
    SHAP explainability,
    feature importance"]
    Cluster --> Observatory
    Anomaly --> Observatory

    Profiler --> StatTest["Statistical Testing
    t-test, chi-square,
    ANOVA, Mann-Whitney"]
    Profiler --> KPI["KPI Framework
    custom metrics,
    threshold alerting"]
    Profiler --> RegDiag["Regression Diagnostics
    residuals, VIF,
    heteroscedasticity"]
    Profiler --> DQ["Data Quality Scoring
    completeness, validity,
    consistency checks"]

    Observatory --> Dashboard["Streamlit Dashboard
    Plotly charts, auto-layout,
    PDF/Markdown reports"]
    StatTest --> Dashboard
    KPI --> Dashboard
    RegDiag --> Dashboard
    DQ --> Dashboard
    Attrib --> Dashboard
Loading

Modules

Module File Description
Profiler profiler.py Auto-detect column types, distributions, outliers, and correlations
Dashboard Generator dashboard_generator.py Plotly histograms, pie charts, heatmaps, scatter matrices
Data Cleaner cleaner.py Dedup (exact + fuzzy), column standardization, smart imputation
Predictor predictor.py Auto-detect classification/regression, gradient boosting, SHAP
Attribution attribution.py First-touch, last-touch, linear, time-decay marketing attribution
Report Generator report_generator.py Markdown reports with findings, metrics, chart placeholders
Anomaly Detector anomaly_detector.py Z-score and IQR outlier detection
Advanced Anomaly advanced_anomaly.py Isolation forest, LOF, multi-method ensemble detection
Clustering clustering.py K-means and DBSCAN with silhouette scoring and cluster comparison
Feature Lab feature_lab.py Feature scaling, encoding, polynomial features, interaction terms
Forecaster forecaster.py Moving average, exponential smoothing, linear trend, ensemble forecasts
Statistical Tests statistical_tests.py t-test, chi-square, ANOVA, Mann-Whitney, Kruskal-Wallis, Shapiro-Wilk
KPI Framework kpi_framework.py Custom KPI definitions, threshold alerting, trend tracking
Model Observatory model_observatory.py SHAP explanations, feature importance, model comparison
Hypertuner hypertuner.py Automated hyperparameter tuning with cross-validation
Dimensionality dimensionality.py PCA, t-SNE dimensionality reduction and visualization
Regression Diagnostics regression_diagnostics.py Residual analysis, VIF, heteroscedasticity testing
Data Quality data_quality.py Completeness, validity, and consistency scoring

Quick Start

git clone https://github.com/ChunkyTortoise/insight-engine.git
cd insight-engine
pip install -r requirements.txt
make test
make demo

Docker

docker compose up
# Open http://localhost:8501

Demo Datasets

Dataset Rows Use Case
E-commerce Transactions 1,000 Revenue analysis, category distributions, return rates
Marketing Touchpoints ~800 Attribution modeling across 6 channels
HR Attrition 500 Predictive modeling (who will leave?)

Tech Stack

Layer Technology
UI Streamlit, Plotly
Data Pandas, NumPy, openpyxl
ML scikit-learn, XGBoost, SHAP
Testing pytest (520+ tests)
CI GitHub Actions (Python 3.11, 3.12)
Linting Ruff
Container Docker, Docker Compose

Project Structure

insight-engine/
├── app.py                          # Streamlit application
├── insight_engine/
│   ├── profiler.py                 # Auto-profiling + column type detection
│   ├── dashboard_generator.py      # Chart generation + layout
│   ├── attribution.py              # 4 marketing attribution models
│   ├── predictor.py                # Auto-ML + SHAP explanations
│   ├── cleaner.py                  # Dedup, standardize, impute
│   ├── report_generator.py         # Markdown/PDF report generation
│   ├── anomaly_detector.py         # Z-score + IQR outlier detection
│   ├── advanced_anomaly.py         # Isolation forest, LOF, ensemble
│   ├── clustering.py               # K-means, DBSCAN, silhouette scores
│   ├── feature_lab.py              # Feature scaling, encoding, polynomials
│   ├── forecaster.py               # Time series forecasting (4 methods)
│   ├── statistical_tests.py        # 6 hypothesis tests
│   ├── kpi_framework.py            # KPI definitions and alerting
│   ├── model_observatory.py        # SHAP + feature importance
│   ├── hypertuner.py               # Hyperparameter tuning
│   ├── dimensionality.py           # PCA, t-SNE reduction
│   ├── regression_diagnostics.py   # Residual analysis, VIF
│   └── data_quality.py             # Quality scoring
├── benchmarks/                     # Performance benchmarks
├── demo_data/                      # 3 sample datasets
├── docs/adr/                       # Architecture Decision Records
├── tests/                          # 19 test files, 520+ tests
├── .github/workflows/ci.yml        # CI pipeline
├── Dockerfile                      # Container image
├── docker-compose.yml              # Container orchestration
├── Makefile                        # demo, test, lint, setup
└── requirements.txt

Architecture Decisions

ADR Title Status
ADR-0001 Automatic Type Detection Accepted
ADR-0002 Four Attribution Models Accepted
ADR-0003 SHAP Explainability Accepted

Testing

make test                           # Full suite (520+ tests)
python -m pytest tests/ -v          # Verbose output
python -m pytest tests/test_profiler.py  # Single module

Benchmarks

See BENCHMARKS.md for detailed performance data.

python benchmarks/run_benchmarks.py
# Results written to benchmarks/RESULTS.md

Related Projects

  • EnterpriseHub -- Real estate AI platform with BI dashboards and CRM integration
  • docqa-engine -- RAG document Q&A with hybrid retrieval and prompt engineering lab
  • ai-orchestrator -- AgentForge: unified async LLM interface (Claude, Gemini, OpenAI, Perplexity)
  • scrape-and-serve -- Web scraping, price monitoring, Excel-to-web apps, and SEO tools
  • prompt-engineering-lab -- 8 prompt patterns, A/B testing, TF-IDF evaluation
  • llm-integration-starter -- Production LLM patterns: completion, streaming, function calling, RAG, hardening
  • Portfolio -- Project showcase and services

Deploy

Open in Streamlit

Changelog

See CHANGELOG.md for release history.

Support This Project

If Insight Engine has been useful to you, consider sponsoring its continued development:

Sponsor

See SPONSORS.md for sponsorship tiers and benefits.

License

MIT -- see LICENSE for details.


Work With Me

Need help with data analytics or BI dashboards? I help teams build production data pipelines:

  • 📊 Consulting — Data architecture, attribution modeling, dashboard design
  • 🚀 Implementation — Predictive models, automated reporting, analytics pipelines
  • 📧 Enterprise — Custom integrations, SLAs, dedicated support

Book a Call Email Me

Client Testimonials

See what clients say about working with me: TESTIMONIALS.md

"We used to spend 2 weeks on quarterly reports. Now it's automatic — takes 10 minutes."
CFO, B2B SaaS Company

About

Upload CSV/Excel, get instant dashboards, predictive models, marketing attribution, and PDF reports

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

 
 
 

Contributors

Languages