Home

dataprof Wiki

Welcome to the DataProf wiki! This is your comprehensive guide to using DataProf for fast, efficient data profiling, ML readiness assessment, and automated preprocessing code generation.

🚀 Key Features

🐍 Actionable Code Generation - DataProf generates ready-to-use Python code for every ML recommendation!

Immediate Implementation: Get executable code for preprocessing steps
Framework Integration: Works with pandas, scikit-learn, and popular ML libraries
Complete Workflows: Generate entire preprocessing pipelines
Smart Recommendations: Context-aware suggestions based on your data

Transform from "Your data has missing values" to "Here's the exact code to fix it: df['age'].fillna(df['age'].median(), inplace=True)"

Written with ❤️ by your friendly neighborhood Maintainer, Andrea. I really hope you enjoy your stay here, using dataprof!

🚀 Quick Links

Main Repository - Source code and releases
Issues - Bug reports and feature requests
Releases - Download latest version

📚 Documentation Pages

Getting Started

Database Connectors - Direct database profiling for PostgreSQL, MySQL, SQLite, and DuckDB
CLI Guide - General usage guide to dataprof CLI commands and functionalities

Python Usage & ML Features

Python API Reference - Complete reference for all DataProf Python functions and classes, including code snippet generation APIs.
ML Features Guide - Complete guide to ML readiness assessment and automated preprocessing code generation.
Ecosystem Integrations - Complete guide to integrating DataProf with popular data science and ML tools.

Advanced Features

Apache Arrow Integration - High-performance columnar processing with 20x memory efficiency for large datasets
Performance Guide - Comprehensive performance analysis, benchmarks, and optimization tips
Benchmarking - Newly dataprof benchmarking system explained and planned features

📖 Development & Contribution

Development Setup

Development Workflow - Branch strategy, development workflow, and release process
IDE Setup Guide - Complete setup for VS Code, Rust, and Python development
DevContainer Setup - Containerized development environment with databases
Testing Guide - Comprehensive testing strategy and guidelines
Troubleshooting - Common issues and solutions

Contribution Guidelines

Contributing Guide - How to contribute to the project
Code of Conduct - Community guidelines and expectations
Security Policy - Security reporting and procedures

🎯 Quick Examples

Get Actionable Code Snippets

import dataprof

# Get ML readiness with code snippets
ml_score = dataprof.ml_readiness_score("data.csv")

for rec in ml_score.recommendations:
    if rec.code_snippet:
        print(f"📋 {rec.category}: {rec.description}")
        print(f"💻 Code: {rec.code_snippet}")

Generate Complete Preprocessing Script

# Generate full preprocessing pipeline
dataprof data.csv --ml-score --output-script preprocess.py

📖 Additional Resources

For more information, check out the main repository documentation:

README.md - Project overview and basic usage
CHANGELOG.md - Version history and latest features
Archive - Historical documentation and roadmaps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

dataprof Wiki

🚀 Key Features

🚀 Quick Links

📚 Documentation Pages

Getting Started

Python Usage & ML Features

Advanced Features

📖 Development & Contribution

Development Setup

Contribution Guidelines

🎯 Quick Examples

Get Actionable Code Snippets

Generate Complete Preprocessing Script

📖 Additional Resources

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally