Real-time people detection and counting on video, powered by a lightweight SSD model (SSDLite + MobileNetV3) trained on the WiderPeople dataset.
FastCrowdVision is an end-to-end MLOps project: training an object detector optimized for mobile/edge devices, serving it through a FastAPI + WebSocket API with a web interface, containerizing with Docker, and deploying on Kubernetes (SSP Cloud).
The model detects 3 classes from the WiderPeople dataset: pedestrians, riders, partially-visible persons The SSDLite + MobileNetV3 architecture is optimized to run on CPU.
The trained model is available on HuggingFace: aayrapet/SsdFastCrowdVision
Access the video and upload it to website you will do in next steps:
https://minio.lab.sspcloud.fr/aayrapetyan/FastCrowdVision/datasets/20260416_121332.mp4
Important — SSP Cloud limitation: The SSP Cloud reverse-proxy blocks WebSocket connections. Even if the connection works and the uploading is possible, since video detection relies entirely on WebSocket streaming, the detection will not work when accessed through
https://fastcrowdvision.lab.sspcloud.fr. So we propose 3 alternatives how to run project and get good results.
You can still access it with :
https://fastcrowdvision.lab.sspcloud.fr
This project is hosted instead of SSD on another paid server. It operates using a shared IP address with routing rules to minimise the costs.
https://testwebmodel.natka.ovh/
Note: The hosted solution uses IP routing and may be slightly slower than a local deployment due to routing overhead.
FastCrowdVision/
├── .github/workflows/
│ └── docker-deploy.yml # CI/CD pipeline: build + push Docker image
├── argocd/
│ └── application.yaml # ArgoCD manifest for GitOps deployment
├── config/ # YAML backbone configuration files for SSD
├── datasets/
│ ├── WiderPeople/ # Download scripts for WiderPeople (Kaggle / S3)
│ └── voc/ # Download scripts for VOC2007 (Kaggle / S3)
├── kubernetes/ # Kubernetes manifests (deployment, service, ingress, pvc)
├── model/ # SSD architecture and components
│ ├── ssd.py # SSD / SSDLite
│ ├── mobilenetv2.py # MobileNetV2 backbone
│ ├── mobilenetv3.py # MobileNetV3 backbone
│ ├── detection.py # NMS post-processing
│ ├── priorbox.py # Anchor boxes
│ ├── l2norm.py # L2 normalization
│ └── utils.py # Utility functions (matching, decode, etc.)
├── training/ # Training pipeline
│ ├── train.py # Training loop
│ ├── eval.py # mAP evaluation + model loading
│ ├── multiloss.py # Multi-task loss function
│ ├── dataloader.py # PyTorch DataLoader
│ ├── transforms.py # Image transforms (train / test)
│ ├── multigpusetup.py # DDP multi-GPU setup
│ └── SsdTrainingPipelineVOC2007.py # CLI training script
├── serving/ # Detection API (what the Docker image runs)
│ ├── server.py # FastAPI + WebSocket
│ ├── inference.py # Model loading and per-frame detection
│ └── draw_inference.py # Inference visualization
├── scripts/ # Standalone CLI tools
│ └── SsdFastCrowdVision.py # Image inference + FPS benchmark
├── website/ # Static frontend (HTML/CSS/JS) served by FastAPI
├── tests/ # Unit tests (backbone, SSD forward, HNM)
├── requirements/
│ ├── requirements.txt # Full dependencies (training + dev)
│ └── requirements-api.txt # Minimal dependencies (API / inference only)
├── Dockerfile # Multi-stage image (builder + runtime slim)
├── .dockerignore
├── .env.example
├── pyproject.toml # Linter configuration (ruff)
└── README.md
Because SSP Cloud blocks WebSocket traffic, you need to run FastCrowdVision on your own machine. There are two options:
- Python 3.11+
- (Optional) CUDA GPU for training
git clone https://github.com/aayrapet/FastCrowdVision.git
cd FastCrowdVision
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements/requirements-api.txt
uvicorn serving.server:app --reloadThen open http://localhost:8000, upload a video and start detection.
You can either build the image yourself:
docker build -t fastcrowdvision .
docker run -p 8000:8000 fastcrowdvisionOr pull and run the pre-built image directly from GitHub Container Registry:
docker run -d \
--name fastcrowdvision \
-p 8000:8000 \
-e HF_HOME=/app/.cache/huggingface \
-v hf-cache:/app/.cache/huggingface \
ghcr.io/josiepierr/fastcrowdvision:latestThe -v hf-cache:... volume caches the model weights locally so they are only downloaded once.
The API is then accessible at http://localhost:8000.
| Endpoint | Method | Description |
|---|---|---|
GET /health |
HTTP | Check that the server and model are ready |
POST /upload |
HTTP | Upload a video, returns a session_id |
WS /ws/detect |
WebSocket | Frame-by-frame detection, results streamed as JSON |
curl -X POST http://localhost:8000/upload \
-F "file=@my_video.mp4"
# Returns: {"session_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"}The WebSocket client then sends the configuration:
{
"session_id": "xxxxxxxx-...",
"score_thr": 0.25,
"frame_skip": 2
}And receives for each processed frame:
{
"type": "detection",
"frame": 42,
"time": 1.4,
"boxes": [[x1, y1, x2, y2]],
"track_ids": [1, 3],
"scores": [0.87, 0.72],
"classes": ["pedestrians", "riders"],
"current_count": 2,
"total_unique": 5
}Performance tip: Without a GPU, set
frame_skip=2orframe_skip=3in the web interface to process every 3rd frame and speed up detection.
Training requires downloading the dataset first, then running the training pipeline.
Data is stored on S3 (SSP Cloud) and on Kaggle — code/data separation following MLOps best practices.
# From S3 (SSP Cloud)
python datasets/WiderPeople/s3/download.py
python datasets/voc/s3/download.py
# Or from Kaggle
python datasets/WiderPeople/kaggle/first_download.py
python datasets/voc/kaggle/first_download.pycp .env.example .envEdit .env and fill in your WandB credentials:
WANDB_API_KEY=<your_wandb_key>
ENTITY=<your_wandb_entity>
PROJECT=<your_wandb_project>
python training/SsdTrainingPipelineVOC2007.py \
'train-images_dir' \
'train-label_dir' \
'val-images_dir' \
'val-label_dir' \
'mobilenetv3large' \
nb_classes \
'ssd_voc2007_mv3large_aug' \
--optimizer "adam" \
--N_epochs 160 \
--lr_schedule_epochs 156 170Metrics (loss, mAP) are tracked in WandB throughout training, you can also restart training from last epoch, see more in details SsdTrainingPipelineVOC2007
The .github/workflows/docker-deploy.yml workflow triggers automatically on every push (all branches):
- Builds the Docker image
- Pushes to Docker Hub with a tag matching the branch name
- The
latesttag is only applied on pushes tomain
Required GitHub Secrets:
| Secret | Description |
|---|---|
DOCKERHUB_USERNAME |
Docker Hub username |
DOCKERHUB_TOKEN |
Docker Hub Access Token (hub.docker.com > Account Settings > Security) |
After each CI build, redeploy the pod from the SSP Cloud terminal:
kubectl rollout restart deployment/fastcrowdvision
kubectl rollout status deployment/fastcrowdvisionThe argocd/application.yaml file defines an ArgoCD application configured to watch the kubernetes/ folder in this repo with automatic sync (prune + self-heal).
Note: Access to the
argocdnamespace on the SSP Cloud cluster is restricted to platform admins. Continuous deployment is therefore handled manually viakubectl rollout restartafter each CI build.
pytest tests/Tests cover: SSD forward pass, MobileNetV2/V3 backbones, and Hard Negative Mining.