Viola-Jones face detector

From-scratch NumPy implementation of Viola & Jones (2001): Haar-like features, integral image, AdaBoost, and an attentional cascade with hard-negative mining. The detector itself uses no OpenCV.

Highlights

Pure NumPy, every piece built from scratch: integral image, Haar features, AdaBoost stumps, cascade, multi-scale sliding window, NMS.
Adaptive trainer (paper §3): stage depth and weak-classifier count both emerge from a two-condition early-stop (per-round recall plus FPR at the calibrated operating point), not from hard-coded sizes.
Vectorized inference: one integral image per scale, the whole cascade runs as a batched NumPy reduction over surviving windows.
Honest benchmarking: scored on the CBCL patch set and on in-the-wild FDDB, with OpenCV cascades as an external baseline, including a native NumPy port of OpenCV's pretrained cascade that runs inside this same pipeline.

Install

git clone https://github.com/salvacarrion/viola-jones.git
cd viola-jones
pip install -r requirements.txt

The training data is auto-downloaded on first run from the salvacarrion/face-detection HuggingFace dataset.

Quickstart

# 1. Prepare data (downloads + caches the dataset on first run)
python tools/prepare_data.py --face-source cbcl --neg-source mixed --resolution 19 --augment

# 2. Train the cascade (quick recipe, a few stages)
python main.py train --data-dir data/19_cbcl --max-stages 6 --max-wcs-per-stage 100

# 3. Evaluate on the CBCL benchmark
python main.py test --data-dir data/19_cbcl

# 4. Run detection on images
python main.py detect --detect-images images/people.png

Post-hoc threshold tuning (no retraining, only moves the per-stage cut points):

python tools/tune_thresholds.py --weights weights/19/cbcl__19_v1.pkl --data-dir data/19_cbcl --objective f1

See docs/WORKFLOW.md for the full data-prep and training recipes.

Results

CBCL patch benchmark

Per-patch face / non-face classification on the CBCL benchmark (472 faces, 23 573 non-faces). F1_tuned is the best F1 after post-hoc threshold tuning on the same model. Only canonical models are listed.

Resolution	Faces	Version	Stages	F1	F1_tuned	Train (approx)^†
19×19	CBCL	v1	11	0.570	0.634	~1 h
19×19	CBCL	v2	15	0.619	0.658	~4 h
19×19	CelebA_aligned	v1	3 (capped)	0.113	0.542	~1 h
19×19	CelebA_aligned	v2	6 (capped)	0.195	0.550	~4 h
19×19	CelebA_aligned (filtered)	v3	3 (capped)	0.106	0.603	~0.5 h
19×19	CelebA_aligned+CBCL	v1	11	0.596	0.639	~4 h
19×19	CelebA_aligned+CBCL	v2	16	0.628	0.661	~11 h
24×24	CBCL (smoke test)	smoke	10 (capped)	0.505	0.660	~5 h
24×24	CelebA_aligned	v1	9 (capped)	0.521	0.629	~31 h
24×24	CelebA_aligned ⭐	v2	11	0.571	0.661	~95 h

⭐ Project best: weights/24/celeba_aligned__24_v2_s11_tuned.pkl (tuned recall 0.625, specificity 0.995, precision 0.701, F1 0.661). CelebA-only caps at 3 stages at 19×19 but trains an 11-stage cascade at 24×24, which confirms the resolution hypothesis. The benchmark F1 understates it: the test set is CBCL, which this model never trains on, yet on real images it produces the cleanest, most diverse detections of any model here.

^† Times are approximate and normalized to the --precompute-sort-index regime, which is ~5x faster than the original runs (the 24×24 v2 deepening dropped round time from ~280 to ~50 s/round). Raw measured wall-clock and per-stage diagnostics are in docs/RESULTS.md.

In-the-wild detection (FDDB)

Full-image detection on all 10 FDDB folds (2845 images, 5171 faces), IoU-matched against ground truth. This is the fair common ground with OpenCV, since both detectors slide over the same images. AP and recall are reported at IoU 0.5 and at the more lenient 0.3.

Detector	AP@0.5	R@0.5	P@0.5	AP@0.3	R@0.3
OpenCV `alt` (cv2)	0.678	0.694	0.876	0.738	0.741
OpenCV `default` (cv2)	0.683	0.709	0.720	0.736	0.751
OpenCV `default` (our native port)	0.624	0.658	0.582	0.724	0.740
Ours 24×24 CelebA v2 ⭐ (min-face 80)	0.014	0.111	0.121	0.282	0.400
Ours 24×24 CelebA v2 ⭐ (min-face 40)	0.000	0.033	0.005	0.069	0.278

Each detector wins on the domain it was trained for. On tight CBCL crops our best model reaches F1 0.661 while OpenCV scores 0.000 (its cascade needs a context margin the crops do not have). On in-the-wild FDDB the relationship flips: OpenCV, trained on news photos of the same kind, reaches AP 0.68, while our CelebA-aligned model trails (the 19×19 models trail further still, AP@0.3 near 0). Two things hold our model back on FDDB: a false-positive flood from negatives that were never in-the-wild scenes, and a box convention tighter than FDDB's ellipse boxes. Raising --detect-min-face to 80 removes the small-scale false positives (FDDB has almost no tiny faces) and lifts AP@0.3 from 0.07 to 0.28.

Our best model (red) vs OpenCV default (blue), ground truth in green:

Full analysis, protocol, and the native-port parity check are in docs/OPENCV_COMPARISON_FINDINGS.md.

OpenCV baseline and native port

OpenCV's pretrained cascades double as the external baseline above and can be converted into a native model that runs inside this pipeline (pure NumPy, no cv2 at inference):

python tools/baseline_opencv.py detect --images images/people.png          # run cv2 directly
python tools/convert_opencv_cascade.py --cascade default                   # -> weights/24/opencv_default.pkl
python main.py detect --weights-path weights/24/opencv_default.pkl --detect-min-score 150

The port reproduces OpenCV's default cascade with 100% window-level parity (alt: 99.97%). alt2 and alt_tree use CART trees instead of stumps and are not supported.

Repo layout

main.py: CLI for train / test / detect.
violajones.py, adaboost.py, weakclassifier.py, features.py, utils.py: the detector.
opencv_cascade.py: native NumPy evaluator for an OpenCV cascade.
tools/: data prep, threshold tuning, per-stage diagnostics, hard-negative mining, OpenCV baseline (baseline_opencv.py), FDDB evaluation (eval_fddb.py), cascade conversion (convert_opencv_cascade.py), and the reeval.sh benchmark runner.

Docs

docs/FINDINGS.md: the technical narrative behind each design choice.
docs/RESULTS.md: full experimental log with per-stage diagnostics and raw timings.
docs/OPENCV_COMPARISON_FINDINGS.md: the OpenCV baseline, FDDB benchmark, and native port.
docs/WORKFLOW.md: data-prep and training recipes.

Citation

@inproceedings{viola2001rapid,
  author    = {Viola, Paul and Jones, Michael},
  title     = {Rapid object detection using a boosted cascade of simple features},
  booktitle = {Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2001},
  volume    = {1},
  pages     = {I--I},
  doi       = {10.1109/CVPR.2001.990517},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Viola-Jones face detector

Highlights

Install

Quickstart

Results

CBCL patch benchmark

In-the-wild detection (FDDB)

OpenCV baseline and native port

Repo layout

Docs

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.claude		.claude
docs		docs
images		images
scripts		scripts
tests		tests
tools		tools
weights		weights
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
adaboost.py		adaboost.py
features.py		features.py
main.py		main.py
opencv_cascade.py		opencv_cascade.py
requirements.txt		requirements.txt
utils.py		utils.py
violajones.py		violajones.py
weakclassifier.py		weakclassifier.py

Folders and files

Latest commit

History

Repository files navigation

Viola-Jones face detector

Highlights

Install

Quickstart

Results

CBCL patch benchmark

In-the-wild detection (FDDB)

OpenCV baseline and native port

Repo layout

Docs

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages