Predictive Analytics Agent

An automated machine learning pipeline that profiles datasets, preprocesses data, selects features, trains and compares multiple models, and generates a professional evaluation report, all without requiring an API key.

Try it live: eugen-goebel-predictive-analytics-agent-app-l05zcc.streamlit.app. Click "Use sample dataset (Customer Churn)" and the full ML pipeline runs in your browser: profiling → preprocessing → training → evaluation, with no API key required.

Screenshots

Data Analysis: auto-detected target column, task type, and preprocessing pipeline

Model Comparison: cross-validated accuracy, standard deviation, and training time across 4 models

Evaluation Results: test/train scores, overfitting detection, and accuracy comparison chart

Features

Auto-Detection: Automatically identifies the target column and task type (classification or regression)
Data Profiling: Analyzes data quality, distributions, missing values, and column statistics
Smart Preprocessing: Handles missing values, encodes categoricals, and scales features
Feature Selection: Applies variance thresholding and statistical feature selection (SelectKBest)
Model Comparison: Trains 4 models with 5-fold cross-validation and selects the best
Evaluation: Generates confusion matrices, feature importance charts, model comparison plots, and overfitting detection
Report Generation: Creates a professional DOCX report with all results and visualizations
Web Interface: Interactive Streamlit app for uploading data and exploring results
No API Key Required: Runs entirely locally using scikit-learn

Architecture

The project uses a multi-agent architecture where each agent handles one pipeline phase:

MLPipelineOrchestrator
├── DataProfiler          → Dataset analysis & target detection
├── PreprocessorAgent     → Cleaning, encoding, scaling
├── FeatureEngineerAgent  → Feature selection & ranking
├── ModelTrainerAgent     → Training & cross-validation (4 models)
├── EvaluatorAgent        → Metrics, charts, overfitting detection
└── ReportGenerator       → Professional DOCX report

Models Used

Classification:

Logistic Regression
Random Forest Classifier
Gradient Boosting Classifier
K-Nearest Neighbors

Regression:

Linear Regression
Random Forest Regressor
Gradient Boosting Regressor
K-Nearest Neighbors Regressor

Quick Start

Installation

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -r requirements.txt

CLI Usage

# Run with sample dataset
python main.py

# Run with your own data
python main.py path/to/your_data.csv

# Specify output directory
python main.py data.csv --output reports/

Web Interface

streamlit run app.py

Upload a CSV/Excel file or use the built-in sample dataset. The app displays:

Data profile with quality metrics
Preprocessing steps applied
Model comparison table with scores
Evaluation charts (confusion matrix, feature importance, etc.)
Download button for the full DOCX report

Sample Dataset

Includes a customer churn dataset (data/sample_customers.csv) with 80 rows and 11 features:

Feature	Description
age	Customer age
income	Annual income
credit_score	Credit score
years_customer	Years as customer
num_products	Number of products
has_mortgage	Has mortgage (0/1)
has_online_banking	Uses online banking (0/1)
monthly_charges	Monthly charges
total_charges	Total charges
support_calls	Number of support calls
churn	Target: whether customer churned (0/1)

Testing

pytest tests/ -v

35 tests covering all agents and the end-to-end pipeline.

Project Structure

predictive-analytics-agent/
├── agents/
│   ├── data_profiler.py        # Dataset profiling & analysis
│   ├── preprocessor.py         # Data cleaning & transformation
│   ├── feature_engineer.py     # Feature selection & ranking
│   ├── model_trainer.py        # Model training & comparison
│   ├── evaluator.py            # Model evaluation & charts
│   └── orchestrator.py         # Pipeline coordinator
├── utils/
│   └── report_generator.py     # DOCX report generation
├── tests/
│   ├── test_profiler.py
│   ├── test_preprocessor.py
│   ├── test_feature_engineer.py
│   ├── test_model_trainer.py
│   ├── test_evaluator.py
│   └── test_pipeline.py
├── data/
│   └── sample_customers.csv
├── app.py                      # Streamlit web interface
├── main.py                     # CLI entry point
└── requirements.txt

Tech Stack

scikit-learn: Machine learning models, preprocessing, evaluation
pandas: Data manipulation
matplotlib: Chart generation
Streamlit: Web interface
python-docx: Report generation
Pydantic: Data validation with typed models

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
agents		agents
data		data
docs/screenshots		docs/screenshots
tests		tests
utils		utils
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
app.py		app.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Analytics Agent

Screenshots

Features

Architecture

Models Used

Quick Start

Installation

CLI Usage

Web Interface

Sample Dataset

Testing

Project Structure

Tech Stack

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predictive Analytics Agent

Screenshots

Features

Architecture

Models Used

Quick Start

Installation

CLI Usage

Web Interface

Sample Dataset

Testing

Project Structure

Tech Stack

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages