A unified research toolkit for kinship verification from face imagery.
This repository brings together multiple strands of kinship-verification research into one maintainable Python codebase. Instead of scattered scripts, mixed runtimes, and dataset-specific entrypoints, the toolkit provides a single place to run, compare, reproduce, and extend experiments across classical feature pipelines, metric-learning methods, deep models, and Gated Autoencoder style representation learning.
In addition to the bundled public benchmarks, the toolkit now supports a local private dataset adapter for richer in-house collections, including age-variant families, named family archives, and identical-twin subsets.
One of the strongest assets behind this project is a fully curated in-house kinship dataset collected from scratch by the project author. It extends the research beyond standard public benchmarks by covering richer family structures, age variation, and identical-twin material that are difficult to find together in a single collection.
Kinship verification sits at the intersection of:
- face analysis
- representation learning
- metric learning
- family-structure modeling
- explainable and reproducible biometric research
It is a difficult problem because kinship cues are often subtle, noisy, age-dependent, and non-identical. Unlike identity verification, the model is not trying to match the same person across images. It is trying to detect inherited facial structure, family resemblance, and relationship-specific similarity under variation in pose, age, lighting, expression, and image quality.
That makes this toolkit valuable as a research platform for:
- benchmarking multiple kinship verification approaches in one place
- studying how handcrafted and learned features behave differently
- reproducing experiments on bundled kinship datasets
- building cleaner ablations, reports, and new algorithm variants
- providing a strong base for future publication-quality experiments
The toolkit provides a single Python interface for four major algorithm families:
classical- Handcrafted feature pipelines derived from classical HOG/LBP-style kinship verification work
- Supports
random,kfold, andchisq
kinver- Metric-learning style pipeline over bundled precomputed feature representations
- Supports feature fusion, dimensionality reduction, Fisher-style selection, and MNRML-style projection
family-deep- Native deep learning kinship pipeline for
kinfacewandfiw - Supports train, test, and demo-style execution through one CLI
- Native deep learning kinship pipeline for
gae- Native Gated Autoencoder style feature-mapper for pairwise representation learning
- Supports
standardandmultiview
Everything is wrapped in:
- one CLI
- one config system
- one reporting/output layout
- one test suite
- one repo-local data layout
This repository is important not just because it runs algorithms, but because it creates a shared experimental language across very different families of methods.
With this toolkit, we can:
- compare classical vs learned methods under one framework
- study representation transfer across KinFaceW and FIW-style settings
- stage and inspect private kinship datasets without mixing them into the public repository
- inspect how feature-level and embedding-level methods differ
- benchmark reproducibility without switching languages or toolchains
- extend the project with new backbones, new datasets, and new evaluation protocols
In other words, this is not just an implementation repo. It is a research infrastructure repo for kinship verification.
Beyond the public benchmark support, this project is backed by a substantial original dataset contribution.
The local private collection under data/mydataset was assembled from scratch and gives the project a much broader experimental base than public kinship benchmarks alone. On the current workstation copy, the toolkit detects:
10subsets220inferred family groups805people2,318images
What makes this collection especially valuable is its diversity:
- conventional parent-child and sibling family structure
- named-family archives collected as coherent family groups
- age-centric subsets for cross-age kinship analysis
- identical-twin material that opens the door to much harder resemblance studies
This matters because public kinship datasets are often narrow in scope. A richer private collection makes it possible to explore harder and more realistic questions around age variation, resemblance ambiguity, family composition, and twin-specific similarity.
For privacy, licensing, and repository-size reasons, this dataset remains local-only and is intentionally not pushed to GitHub. The toolkit therefore treats it as a first-class private research asset rather than a bundled public benchmark.
The diagram below shows how the toolkit turns datasets, pair metadata, and feature inputs into reproducible kinship-verification results.
flowchart TD
U[Researcher / CLI User]
C[run_kinship.py / kinship.cli]
CFG[Experiment Configs / Benchmark Presets]
R[Runner + Registry]
D1[KinFaceW Images + Pair Metadata]
D2[KinVer Feature Matrices]
D3[GAE Pair Feature Files]
D4[Local FIW FIDs Images and Metadata]
A1[Classical Pipeline]
A2[KinVer Pipeline]
A3[Family-Deep Pipeline]
A4[GAE Pipeline]
F1[Patch / HOG / LBP Pair Features]
F2[Feature Fusion + Selection + PCA + MNRML]
F3[Pairwise CNN / Siamese / Deep Embeddings]
F4[Gated Pair Representation Learning]
M1[SVM / Verification Score]
M2[Metric-Learning Classification]
M3[Kin / Non-Kin Probability]
M4[Mapped Pair Representations]
O[Outputs: JSON, CSV, Plots, Checkpoints, .mat Files]
U --> C
CFG --> C
C --> R
R --> A1
R --> A2
R --> A3
R --> A4
D1 --> A1
D1 --> A3
D2 --> A2
D3 --> A4
D4 --> A3
A1 --> F1 --> M1 --> O
A2 --> F2 --> M2 --> O
A3 --> F3 --> M3 --> O
A4 --> F4 --> M4 --> O
classical- starts from face pairs
- extracts handcrafted pair descriptors
- uses classical verification models such as SVM-based decision boundaries
kinver- starts from bundled feature matrices
- performs feature fusion, selection, projection, and fold-wise evaluation
- returns relationship-specific verification accuracy
family-deep- starts from paired face images
- learns similarity through native deep models
- produces kin / non-kin predictions, metrics, and checkpoints
gae- starts from left/right pair feature matrices
- learns gated pairwise structure representations
- writes mapped features for downstream research workflows
kinship-python-toolkit/
|-- configs/
| |-- benchmarks/
| `-- experiments/
|-- data/
| |-- family/
| |-- kinface/
| `-- kinver/
|-- src/
| `-- kinship/
|-- tests/
|-- run_kinship.py
|-- pyproject.toml
`-- README.md
src/kinship/algorithms- all maintained algorithm implementations
src/kinship/datasets- dataset loading and metadata handling
src/kinship/features- reusable feature extraction utilities
configs/experiments- single experiment presets
configs/benchmarks- grouped benchmark presets
data- bundled runtime datasets and metadata used by the toolkit
outputs- generated run artifacts, reports, checkpoints, and summaries
The repository already includes the runtime data needed for the maintained paths:
classicalkinvergaefamily-deeponkinfacew
Bundled repo-local data:
data/kinfaceKinFaceW-IKinFaceW-IItraindatatestdata
data/kinverdata-KinFaceW-Idata-KinFaceW-II
data/family/data- FIW metadata CSV files
data/FIDs- local FIW FIDs image bundle and supporting FIW metadata when available
data/mydataset- local private kinship dataset staging area with diverse family, age, and identical-twin subsets
FIW note:
- The maintained
family-deeppipeline now supports the repo-localdata/FIDs/FIDslayout directly - These FIW assets are intentionally git-ignored because they are large and should stay local rather than being pushed to GitHub
- The loader resolves mismatched FIW face indices within the expected family folder when possible and skips unresolved pairs when a local export is incomplete
Private dataset note:
- The toolkit now includes a native
mydatasetadapter for local-only research collections underdata/mydataset - On the current local copy, the adapter discovers
10subsets,220inferred family groups,805people, and2,318images - The detected collection spans age-focused subsets, named-family archives, and an identical-twins subset
- This in-house dataset is a major research contribution of the project because it was built from scratch and broadens the problem setting beyond standard public benchmarks
- These assets are intentionally git-ignored and are meant for local experimentation only
See data/README.md for the bundled data note.
python -m venv .venv
.venv\Scripts\Activate.ps1python -m pip install -U pip
python -m pip install numpy scipy scikit-learn pandas matplotlib pillow scikit-image pytestpython -m pip install torch torchvision tqdm facenet-pytorch torchfilepython run_kinship.py listpython run_kinship.py listThis prints:
- available algorithms
- available experiment presets
- available benchmark presets
python run_kinship.py classical --dataset KinFaceW-I --relation fs --method kfoldWhat this does:
- loads KinFaceW-I father-son pairs
- extracts the classical pair features
- runs 5-fold evaluation
- prints fold scores and mean accuracy
python run_kinship.py kinver --dataset KinFaceW-II --relation fsWhat this does:
- loads precomputed feature matrices
- performs the KinVer-style fusion and projection workflow
- evaluates across folds
- returns fold scores, learned fusion weights, and mean accuracy
python run_kinship.py run-config gae-fs-train-p16-standardWhat this does:
- loads a real bundled
fstraining pair matrix fromdata/kinface/traindata - runs the native standard GAE mapper
- writes the resulting mapped representation to
outputs/gae-real
To run the multiview variant:
python run_kinship.py run-config gae-fs-train-p16-multiviewpython run_kinship.py run-config family-deep-kinfacew-small-siamese-trainWhat this does:
- trains the native
small_siamese_face_model - runs across KinFaceW-II folds
- writes logs to
outputs/family-deep-real/train-logs - saves fold checkpoints under
outputs/family-deep-real/checkpoints
python run_kinship.py run-config family-deep-kinfacew-small-siamese-testImportant:
- this test preset expects the fold checkpoints produced by the train preset
- if checkpoints do not exist yet, run the train preset first
python run_kinship.py run-config family-deep-fiw-small-siamese-fs-trainWhat this does:
- loads FIW pair metadata from
data/family/data - resolves paired face images from
data/FIDs/FIDs - trains the native
small_siamese_face_model - writes logs to
outputs/family-deep-fiw-real/train-logs - saves pair-type checkpoints under
outputs/family-deep-fiw-real/checkpoints
python run_kinship.py run-config family-deep-fiw-small-siamese-fs-testImportant:
- this test preset expects the checkpoints produced by the FIW train preset
- if checkpoints do not exist yet, run the FIW train preset first
For a lighter end-to-end FIW validation on CPU, you can run a single relationship type directly:
python run_kinship.py family-deep --mode train --dataset-name fiw --data-path data/FIDs/FIDs --model-name small_siamese_face_model --bs 256 --num-epochs 1 --pair-types fs --output-dir outputs/family-deep-fiw-fs/train-logs --checkpoints-dir outputs/family-deep-fiw-fs/checkpoints
python run_kinship.py family-deep --mode test --dataset-name fiw --data-path data/FIDs/FIDs --model-name small_siamese_face_model --bs 256 --pair-types fs --output-dir outputs/family-deep-fiw-fs/test-logs --checkpoints-dir outputs/family-deep-fiw-fs/checkpointspython run_kinship.py benchmark native-portsThis benchmark runs representative native experiments and produces:
- per-run
result.json - benchmark
summary.json - benchmark
summary.csv
python run_kinship.py mydataset summaryThis scans data/mydataset and reports inferred subset, family, person, and image counts without moving or modifying the private data.
To export a JSON summary:
python run_kinship.py mydataset summary --output-path outputs/mydataset/mydataset_summary.jsonImage-level inventory:
python run_kinship.py mydataset export-inventory --output-path outputs/mydataset/mydataset_inventory.csvPair manifest for a subset:
python run_kinship.py mydataset export-pairs --subset myDataSet_102 --output-path outputs/mydataset/mydataset_pairs.csvOn the current local copy, the myDataSet_102 pair export produced 2,373 pairs: 2,044 positive kin pairs and 329 sampled non-kin pairs.
If you want the fastest path to confirm the repo is healthy:
python run_kinship.py list
python run_kinship.py run-config classical-fs-kfold-smoke
python run_kinship.py run-config kinver-fs-smoke
python run_kinship.py run-config gae-fs-train-p16-standard
python run_kinship.py run-config family-deep-kinfacew-small-siamese-train
python run_kinship.py run-config family-deep-kinfacew-small-siamese-testIf you have local FIW assets under data/FIDs, you can also run:
python run_kinship.py run-config family-deep-fiw-small-siamese-fs-train
python run_kinship.py run-config family-deep-fiw-small-siamese-fs-testEvery config-driven run writes a timestamped folder under outputs/.
Typical artifacts include:
result.jsonsummary.txtsummary.jsonsummary.csv- plots
- checkpoints
- generated
.matfeature outputs
This structure is designed for:
- experiment traceability
- side-by-side comparison
- reproducible reruns
- easy export into papers, reports, and notebooks
- local private-dataset staging without contaminating the public Git history
python run_kinship.py run-config classical-fs-kfold-smoke
python run_kinship.py run-config kinver-fs-smoke
python run_kinship.py run-config family-deep-kinfacew-small-siamese-testpython run_kinship.py run-config gae-fs-train-p16-standard
python run_kinship.py run-config gae-fs-train-p16-multiviewpython run_kinship.py benchmark native-portsRun the test suite with:
$env:PYTHONPATH='src'
python -m pytest tests -q -p no:cacheproviderCompile check:
python -m compileall src run_kinship.pyKinship verification is a sensitive research area connected to biometrics, privacy, and family inference. This repository is intended for:
- academic research
- reproducible experimentation
- benchmarking and method development
It should be used thoughtfully and with appropriate ethical, legal, and institutional oversight.
- unified maintained Python codebase
- repo-local bundled runtime data
- local private-dataset adapter and manifest export tools
- native algorithm implementations
- config-driven experiments
- benchmark presets
- tested command-line interface
This repository is ready to serve as a clean foundation for:
- future kinship-verification publications
- reproducible experiments
- student onboarding
- model extensions
- comparative benchmarking
This toolkit transforms a difficult, fragmented research space into a coherent, extensible, and reproducible Python platform for kinship verification.
It is not just a port.
It is the base of a serious research project.
