-
Notifications
You must be signed in to change notification settings - Fork 1
Home
poinT92 edited this page Sep 25, 2025
·
14 revisions
Welcome to the DataProf wiki! This is your comprehensive guide to using DataProf for fast, efficient data profiling, ML readiness assessment, and automated preprocessing code generation.
🐍 Actionable Code Generation - DataProf generates ready-to-use Python code for every ML recommendation!
- Immediate Implementation: Get executable code for preprocessing steps
- Framework Integration: Works with pandas, scikit-learn, and popular ML libraries
- Complete Workflows: Generate entire preprocessing pipelines
- Smart Recommendations: Context-aware suggestions based on your data
Transform from "Your data has missing values" to "Here's the exact code to fix it: df['age'].fillna(df['age'].median(), inplace=True)"
Written with ❤️ by your friendly neighborhood Maintainer, Andrea. I really hope you enjoy your stay here, using dataprof!
- Main Repository - Source code and releases
- Issues - Bug reports and feature requests
- Releases - Download latest version
- Database Connectors - Direct database profiling for PostgreSQL, MySQL, SQLite, and DuckDB
- CLI Guide - General usage guide to dataprof CLI commands and functionalities
- Python API Reference - Complete reference for all DataProf Python functions and classes, including code snippet generation APIs.
- ML Features Guide - Complete guide to ML readiness assessment and automated preprocessing code generation.
- Ecosystem Integrations - Complete guide to integrating DataProf with popular data science and ML tools.
- Apache Arrow Integration - High-performance columnar processing with 20x memory efficiency for large datasets
- Performance Guide - Comprehensive performance analysis, benchmarks, and optimization tips
- Benchmarking - Newly dataprof benchmarking system explained and planned features
- Development Workflow - Branch strategy, development workflow, and release process
- IDE Setup Guide - Complete setup for VS Code, Rust, and Python development
- DevContainer Setup - Containerized development environment with databases
- Testing Guide - Comprehensive testing strategy and guidelines
- Troubleshooting - Common issues and solutions
- Contributing Guide - How to contribute to the project
- Code of Conduct - Community guidelines and expectations
- Security Policy - Security reporting and procedures
import dataprof
# Get ML readiness with code snippets
ml_score = dataprof.ml_readiness_score("data.csv")
for rec in ml_score.recommendations:
if rec.code_snippet:
print(f"📋 {rec.category}: {rec.description}")
print(f"💻 Code: {rec.code_snippet}")# Generate full preprocessing pipeline
dataprof data.csv --ml-score --output-script preprocess.pyFor more information, check out the main repository documentation:
-
README.md- Project overview and basic usage -
CHANGELOG.md- Version history and latest features - Archive - Historical documentation and roadmaps