Kinship Python Toolkit

A unified research toolkit for kinship verification from face imagery.

This repository brings together multiple strands of kinship-verification research into one maintainable Python codebase. Instead of scattered scripts, mixed runtimes, and dataset-specific entrypoints, the toolkit provides a single place to run, compare, reproduce, and extend experiments across classical feature pipelines, metric-learning methods, deep models, and Gated Autoencoder style representation learning.

In addition to the bundled public benchmarks, the toolkit now supports a local private dataset adapter for richer in-house collections, including age-variant families, named family archives, and identical-twin subsets.

One of the strongest assets behind this project is a fully curated in-house kinship dataset collected from scratch by the project author. It extends the research beyond standard public benchmarks by covering richer family structures, age variation, and identical-twin material that are difficult to find together in a single collection.

Why This Project Matters

Kinship verification sits at the intersection of:

face analysis
representation learning
metric learning
family-structure modeling
explainable and reproducible biometric research

It is a difficult problem because kinship cues are often subtle, noisy, age-dependent, and non-identical. Unlike identity verification, the model is not trying to match the same person across images. It is trying to detect inherited facial structure, family resemblance, and relationship-specific similarity under variation in pose, age, lighting, expression, and image quality.

That makes this toolkit valuable as a research platform for:

benchmarking multiple kinship verification approaches in one place
studying how handcrafted and learned features behave differently
reproducing experiments on bundled kinship datasets
building cleaner ablations, reports, and new algorithm variants
providing a strong base for future publication-quality experiments

What This Toolkit Does

The toolkit provides a single Python interface for four major algorithm families:

classical
- Handcrafted feature pipelines derived from classical HOG/LBP-style kinship verification work
- Supports random, kfold, and chisq
kinver
- Metric-learning style pipeline over bundled precomputed feature representations
- Supports feature fusion, dimensionality reduction, Fisher-style selection, and MNRML-style projection
family-deep
- Native deep learning kinship pipeline for kinfacew and fiw
- Supports train, test, and demo-style execution through one CLI
gae
- Native Gated Autoencoder style feature-mapper for pairwise representation learning
- Supports standard and multiview

Everything is wrapped in:

one CLI
one config system
one reporting/output layout
one test suite
one repo-local data layout

Research Value

This repository is important not just because it runs algorithms, but because it creates a shared experimental language across very different families of methods.

With this toolkit, we can:

compare classical vs learned methods under one framework
study representation transfer across KinFaceW and FIW-style settings
stage and inspect private kinship datasets without mixing them into the public repository
inspect how feature-level and embedding-level methods differ
benchmark reproducibility without switching languages or toolchains
extend the project with new backbones, new datasets, and new evaluation protocols

In other words, this is not just an implementation repo. It is a research infrastructure repo for kinship verification.

In-House Dataset Contribution

Beyond the public benchmark support, this project is backed by a substantial original dataset contribution.

The local private collection under data/mydataset was assembled from scratch and gives the project a much broader experimental base than public kinship benchmarks alone. On the current workstation copy, the toolkit detects:

10 subsets
220 inferred family groups
805 people
2,318 images

What makes this collection especially valuable is its diversity:

conventional parent-child and sibling family structure
named-family archives collected as coherent family groups
age-centric subsets for cross-age kinship analysis
identical-twin material that opens the door to much harder resemblance studies

This matters because public kinship datasets are often narrow in scope. A richer private collection makes it possible to explore harder and more realistic questions around age variation, resemblance ambiguity, family composition, and twin-specific similarity.

For privacy, licensing, and repository-size reasons, this dataset remains local-only and is intentionally not pushed to GitHub. The toolkit therefore treats it as a first-class private research asset rather than a bundled public benchmark.

End-to-End Flow

The diagram below shows how the toolkit turns datasets, pair metadata, and feature inputs into reproducible kinship-verification results.

flowchart TD
    U[Researcher / CLI User]
    C[run_kinship.py / kinship.cli]
    CFG[Experiment Configs / Benchmark Presets]
    R[Runner + Registry]

    D1[KinFaceW Images + Pair Metadata]
    D2[KinVer Feature Matrices]
    D3[GAE Pair Feature Files]
    D4[Local FIW FIDs Images and Metadata]

    A1[Classical Pipeline]
    A2[KinVer Pipeline]
    A3[Family-Deep Pipeline]
    A4[GAE Pipeline]

    F1[Patch / HOG / LBP Pair Features]
    F2[Feature Fusion + Selection + PCA + MNRML]
    F3[Pairwise CNN / Siamese / Deep Embeddings]
    F4[Gated Pair Representation Learning]

    M1[SVM / Verification Score]
    M2[Metric-Learning Classification]
    M3[Kin / Non-Kin Probability]
    M4[Mapped Pair Representations]

    O[Outputs: JSON, CSV, Plots, Checkpoints, .mat Files]

    U --> C
    CFG --> C
    C --> R

    R --> A1
    R --> A2
    R --> A3
    R --> A4

    D1 --> A1
    D1 --> A3
    D2 --> A2
    D3 --> A4
    D4 --> A3

    A1 --> F1 --> M1 --> O
    A2 --> F2 --> M2 --> O
    A3 --> F3 --> M3 --> O
    A4 --> F4 --> M4 --> O

What Each Branch Means

classical
- starts from face pairs
- extracts handcrafted pair descriptors
- uses classical verification models such as SVM-based decision boundaries
kinver
- starts from bundled feature matrices
- performs feature fusion, selection, projection, and fold-wise evaluation
- returns relationship-specific verification accuracy
family-deep
- starts from paired face images
- learns similarity through native deep models
- produces kin / non-kin predictions, metrics, and checkpoints
gae
- starts from left/right pair feature matrices
- learns gated pairwise structure representations
- writes mapped features for downstream research workflows

Repository Layout

kinship-python-toolkit/
|-- configs/
|   |-- benchmarks/
|   `-- experiments/
|-- data/
|   |-- family/
|   |-- kinface/
|   `-- kinver/
|-- src/
|   `-- kinship/
|-- tests/
|-- run_kinship.py
|-- pyproject.toml
`-- README.md

Key folders

src/kinship/algorithms
- all maintained algorithm implementations
src/kinship/datasets
- dataset loading and metadata handling
src/kinship/features
- reusable feature extraction utilities
configs/experiments
- single experiment presets
configs/benchmarks
- grouped benchmark presets
data
- bundled runtime datasets and metadata used by the toolkit
outputs
- generated run artifacts, reports, checkpoints, and summaries

Bundled Data

The repository already includes the runtime data needed for the maintained paths:

classical
kinver
gae
family-deep on kinfacew

Bundled repo-local data:

data/kinface
- KinFaceW-I
- KinFaceW-II
- traindata
- testdata
data/kinver
- data-KinFaceW-I
- data-KinFaceW-II
data/family/data
- FIW metadata CSV files
data/FIDs
- local FIW FIDs image bundle and supporting FIW metadata when available
data/mydataset
- local private kinship dataset staging area with diverse family, age, and identical-twin subsets

FIW note:

The maintained family-deep pipeline now supports the repo-local data/FIDs/FIDs layout directly
These FIW assets are intentionally git-ignored because they are large and should stay local rather than being pushed to GitHub
The loader resolves mismatched FIW face indices within the expected family folder when possible and skips unresolved pairs when a local export is incomplete

Private dataset note:

The toolkit now includes a native mydataset adapter for local-only research collections under data/mydataset
On the current local copy, the adapter discovers 10 subsets, 220 inferred family groups, 805 people, and 2,318 images
The detected collection spans age-focused subsets, named-family archives, and an identical-twins subset
This in-house dataset is a major research contribution of the project because it was built from scratch and broadens the problem setting beyond standard public benchmarks
These assets are intentionally git-ignored and are meant for local experimentation only

See data/README.md for the bundled data note.

Installation

1. Create and activate a Python environment

python -m venv .venv
.venv\Scripts\Activate.ps1

2. Install the base toolkit dependencies

python -m pip install -U pip
python -m pip install numpy scipy scikit-learn pandas matplotlib pillow scikit-image pytest

3. Install optional deep-learning dependencies if you want `family-deep`

python -m pip install torch torchvision tqdm facenet-pytorch torchfile

4. Sanity check the installation

python run_kinship.py list

Step-by-Step Usage

1. See what the toolkit can run

python run_kinship.py list

This prints:

available algorithms
available experiment presets
available benchmark presets

2. Run a classical kinship baseline

python run_kinship.py classical --dataset KinFaceW-I --relation fs --method kfold

What this does:

loads KinFaceW-I father-son pairs
extracts the classical pair features
runs 5-fold evaluation
prints fold scores and mean accuracy

3. Run the KinVer pipeline

python run_kinship.py kinver --dataset KinFaceW-II --relation fs

What this does:

loads precomputed feature matrices
performs the KinVer-style fusion and projection workflow
evaluates across folds
returns fold scores, learned fusion weights, and mean accuracy

4. Run a native GAE experiment

python run_kinship.py run-config gae-fs-train-p16-standard

What this does:

loads a real bundled fs training pair matrix from data/kinface/traindata
runs the native standard GAE mapper
writes the resulting mapped representation to outputs/gae-real

To run the multiview variant:

python run_kinship.py run-config gae-fs-train-p16-multiview

5. Train the native deep kinship model on KinFaceW

python run_kinship.py run-config family-deep-kinfacew-small-siamese-train

What this does:

trains the native small_siamese_face_model
runs across KinFaceW-II folds
writes logs to outputs/family-deep-real/train-logs
saves fold checkpoints under outputs/family-deep-real/checkpoints

6. Test the trained deep model

python run_kinship.py run-config family-deep-kinfacew-small-siamese-test

Important:

this test preset expects the fold checkpoints produced by the train preset
if checkpoints do not exist yet, run the train preset first

7. Train the native deep kinship model on local FIW FIDs data

python run_kinship.py run-config family-deep-fiw-small-siamese-fs-train

What this does:

loads FIW pair metadata from data/family/data
resolves paired face images from data/FIDs/FIDs
trains the native small_siamese_face_model
writes logs to outputs/family-deep-fiw-real/train-logs
saves pair-type checkpoints under outputs/family-deep-fiw-real/checkpoints

8. Test the trained FIW model

python run_kinship.py run-config family-deep-fiw-small-siamese-fs-test

Important:

this test preset expects the checkpoints produced by the FIW train preset
if checkpoints do not exist yet, run the FIW train preset first

For a lighter end-to-end FIW validation on CPU, you can run a single relationship type directly:

python run_kinship.py family-deep --mode train --dataset-name fiw --data-path data/FIDs/FIDs --model-name small_siamese_face_model --bs 256 --num-epochs 1 --pair-types fs --output-dir outputs/family-deep-fiw-fs/train-logs --checkpoints-dir outputs/family-deep-fiw-fs/checkpoints
python run_kinship.py family-deep --mode test --dataset-name fiw --data-path data/FIDs/FIDs --model-name small_siamese_face_model --bs 256 --pair-types fs --output-dir outputs/family-deep-fiw-fs/test-logs --checkpoints-dir outputs/family-deep-fiw-fs/checkpoints

9. Run a benchmark preset

python run_kinship.py benchmark native-ports

This benchmark runs representative native experiments and produces:

per-run result.json
benchmark summary.json
benchmark summary.csv

10. Summarize the private local dataset

python run_kinship.py mydataset summary

This scans data/mydataset and reports inferred subset, family, person, and image counts without moving or modifying the private data.

To export a JSON summary:

python run_kinship.py mydataset summary --output-path outputs/mydataset/mydataset_summary.json

11. Export manifests from the private local dataset

Image-level inventory:

python run_kinship.py mydataset export-inventory --output-path outputs/mydataset/mydataset_inventory.csv

Pair manifest for a subset:

python run_kinship.py mydataset export-pairs --subset myDataSet_102 --output-path outputs/mydataset/mydataset_pairs.csv

On the current local copy, the myDataSet_102 pair export produced 2,373 pairs: 2,044 positive kin pairs and 329 sampled non-kin pairs.

Recommended First Run Path

If you want the fastest path to confirm the repo is healthy:

python run_kinship.py list
python run_kinship.py run-config classical-fs-kfold-smoke
python run_kinship.py run-config kinver-fs-smoke
python run_kinship.py run-config gae-fs-train-p16-standard
python run_kinship.py run-config family-deep-kinfacew-small-siamese-train
python run_kinship.py run-config family-deep-kinfacew-small-siamese-test

If you have local FIW assets under data/FIDs, you can also run:

python run_kinship.py run-config family-deep-fiw-small-siamese-fs-train
python run_kinship.py run-config family-deep-fiw-small-siamese-fs-test

Outputs and Reproducibility

Every config-driven run writes a timestamped folder under outputs/.

Typical artifacts include:

result.json
summary.txt
summary.json
summary.csv
plots
checkpoints
generated .mat feature outputs

This structure is designed for:

experiment traceability
side-by-side comparison
reproducible reruns
easy export into papers, reports, and notebooks
local private-dataset staging without contaminating the public Git history

Example Research Workflows

Compare classical and learned methods

python run_kinship.py run-config classical-fs-kfold-smoke
python run_kinship.py run-config kinver-fs-smoke
python run_kinship.py run-config family-deep-kinfacew-small-siamese-test

Study feature-mapper behavior

python run_kinship.py run-config gae-fs-train-p16-standard
python run_kinship.py run-config gae-fs-train-p16-multiview

Produce a compact benchmark table

python run_kinship.py benchmark native-ports

Testing

Run the test suite with:

$env:PYTHONPATH='src'
python -m pytest tests -q -p no:cacheprovider

Compile check:

python -m compileall src run_kinship.py

Responsible Research Note

Kinship verification is a sensitive research area connected to biometrics, privacy, and family inference. This repository is intended for:

academic research
reproducible experimentation
benchmarking and method development

It should be used thoughtfully and with appropriate ethical, legal, and institutional oversight.

Current Status

unified maintained Python codebase
repo-local bundled runtime data
local private-dataset adapter and manifest export tools
native algorithm implementations
config-driven experiments
benchmark presets
tested command-line interface

This repository is ready to serve as a clean foundation for:

future kinship-verification publications
reproducible experiments
student onboarding
model extensions
comparative benchmarking

In Short

This toolkit transforms a difficult, fragmented research space into a coherent, extensible, and reproducible Python platform for kinship verification.

It is not just a port.

It is the base of a serious research project.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
configs		configs
data		data
paper_draft		paper_draft
src/kinship		src/kinship
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
run_kinship.py		run_kinship.py

Folders and files

Latest commit

History

Repository files navigation

Kinship Python Toolkit

Why This Project Matters

What This Toolkit Does

Research Value

In-House Dataset Contribution

End-to-End Flow

What Each Branch Means

Repository Layout

Key folders

Bundled Data

Installation

1. Create and activate a Python environment

2. Install the base toolkit dependencies

3. Install optional deep-learning dependencies if you want family-deep

4. Sanity check the installation

Step-by-Step Usage

1. See what the toolkit can run

2. Run a classical kinship baseline

3. Run the KinVer pipeline

4. Run a native GAE experiment

5. Train the native deep kinship model on KinFaceW

6. Test the trained deep model

7. Train the native deep kinship model on local FIW FIDs data

8. Test the trained FIW model

9. Run a benchmark preset

10. Summarize the private local dataset

11. Export manifests from the private local dataset

Recommended First Run Path

Outputs and Reproducibility

Example Research Workflows

Compare classical and learned methods

Study feature-mapper behavior

Produce a compact benchmark table

Testing

Responsible Research Note

Current Status

In Short

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Install optional deep-learning dependencies if you want `family-deep`

Packages