Saladin

A Python library for understanding dirty data in machine learning pipelines.

Overview

Saladin provides tools to analyze and understand messy, real-world datasets before preprocessing. It goes beyond basic statistics to offer insights into data quality, categorical stability, missing patterns, and transformation readiness.

Features

Data Quality Assessment: Multi-dimensional evaluation of completeness, consistency, validity, and uniqueness.
Categorical Stability: Analyze how stable categorical features are across your dataset.
Missing Patterns: Detect random, systematic, or mostly missing data patterns.
Feature Relationships: Discover correlations and semantic groupings.
Transformation Readiness: Estimate how hard it will be to clean and transform your data.

Installation

pip install saladin

Or from source:

git clone https://github.com/lycoriolis/saladin.git
cd saladin
pip install -e .

Usage

import polars as pl
from saladin import DataUnderstandingEngine

# Load your dirty data
data = pl.DataFrame({
    'age': [25, 30, None, 40],
    'income': [30000, 50000, 60000, 80000],
    'city': ['NYC', 'LA', 'NYC', 'LA']
})

engine = DataUnderstandingEngine()
understanding = engine.understand(data)

print(engine.summary(understanding))

Requirements

Python 3.10+
Polars
NumPy

License

This project is licensed under a custom license that prohibits commercial use. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
test_manual.py		test_manual.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Saladin

Overview

Features

Installation

Usage

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Saladin

Overview

Features

Installation

Usage

Requirements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages