A production-ready facial recognition backend built with FastAPI, PostgreSQL, and NVIDIA Triton. This system detects faces, extracts feature vectors, performs similarity matching, and handles user enrollment with a complete REST API.
- Docker and Docker Compose installed on your system
- (Optional) NVIDIA GPU with CUDA support for faster inference
The system runs on CPU by default. GPU speeds up inference by ~3-5x but isn't required. Model files are included in the repository (model_repository/facenet/).
-
Clone the repository:
git clone <your-repo-url> cd face-matrix-backend
-
Set up environment variables:
cp .env.example .env # Edit .env with your preferred settings -
Build and run with Docker Compose:
docker build -t fastapi-server . docker-compose up -d -
The API will be available at
http://localhost:8080
Test the recognition endpoint with any face image:
curl -X POST http://localhost:8080/api/v1/recognise_user \
-F "image=@/path/to/your/image.jpg"If no match is found, you'll get a req_id. Use it to enroll:
curl -X POST http://localhost:8080/api/v1/add_user \
-H "Content-Type: application/json" \
-d '{
"name": "John Doe",
"birthdate": "1990-01-01",
"req_id": "your-req-id-here"
}'- FastAPI Backend:
http://localhost:8080 - PostgreSQL Database:
localhost:5432 - Triton Inference Server:
http://localhost:8000(HTTP),localhost:8001(gRPC),http://localhost:8002(Metrics)
- Accept an image as input.
- Detect a face in the image.
- Extract feature vectors using a pre-trained model.
- Search for a similar face in the database.
- Return user information if a match is found.
- Handle user enrollment by storing vectors and details in the database.
- Performance: API response time < 1 second.
- Scalability: Handle 10,000 faces with 1 reqs/sec.
- Availability: 99.99% uptime for up to 5 reqs/sec.
- Maintainability: Maintainability index ≥ 85 (≥ 20 by Visual Studio standards).
- Image Input: User uploads an image.
- Face Detection: Detect a face in the image.
- If no face is detected, raise an exception.
- Feature Extraction: Extract facial features from the detected face.
- Similarity Search: Perform a similarity search in the database for matching facial vectors.
- If a match is found, return user information.
- If no match is found:
- Store the facial vectors in the cache.
- Return a cache key to the user.
- User Uploads Info?: Check if the user provides additional information.
- If yes, add the user info and facial vectors to the database.
- If no, expire the cached facial vectors.
- Face Detection:
- Model: MTCNN
- Role: Detects, crops and aligns faces for features extraction.
- Feature Extraction:
- Model: FaceNet (InceptionResNetV1 pre-trained on “vggface2” dataset)
- Role: Generates 512-D vectors.
- Similarity Search:
- Tech: PostgreSQL with
pgvector. - Role: Perform similarity lookups.
- Tech: PostgreSQL with
- Database:
- Tech: PostgreSQL.
- Role: Persistent storage for user info and feature vectors, temporarily cache unmatched feature vectors.
- Model Inference Server:
- Tech: NVIDIA Triton Inference Server
- Role: Serves ML models via HTTP/gRPC. Handles batching, model versioning, and can run on CPU or GPU. Config files are in
model_repository/facenet/config.pbtxt.
The REST APIs will expose two key endpoints:
- Image Recognition API: Responsible for recognising a user from their uploaded image.
- Add User API: Will allow new users to add themselves in the system.
-
API Endpoint
POST /api/v1/recognise_user Content-Type: multipart/form-data
-
Request Parameters
image : The image file containing the face to be recognised.
{ "image": [binary data] }
-
Response
The API will respond with a JSON object containing the following fields:
- status: Indicates whether the recognition attempt was successful (
"success"or"error"). - message: A message describing the result of the recognition attempt (e.g., "Match found", "No match found").
- user_info: Contains user details if a match is found. If no match is found, this will be
null. - enrollmentKey: (only if no match is found) Includes:
- req_id: A unique identifier that the user can use later for enrollment without re-uploading the image.
- expiresAt: The expiration time of the temporary data in the cache.
- metadata: Information about the quality and confidence of the detected face in the image.
Example successful response:
{ "status": "success", "message": "Match found", "user_info": { "id": "12345", "name": "John Doe", "email": "john.doe@example.com" } }Example response when no match is found:
{ "status": "failure", "message": "no_match", "req_id": "req-67890" } - status: Indicates whether the recognition attempt was successful (
-
API Endpoint
POST /api/v1/add_user Content-Type: multipart/form-data
-
Request Parameters
The request should include the following parameters:
- name (string, required): The user's full name.
- birthdate (string, required): The user's birthdate in Date format.
- req_id (string, optional): Key to the cached facial vectors.
Request Example (Using Cached Request ID for Enrollment):
{ "name": "John Doe", "birthdate": "2002-10-19", "req_id": "req-67890" } -
Response
The API will respond with a JSON object containing the following fields:
- status: Indicates whether the enrollment attempt was successful (
"success"or"error"). - message: A descriptive message explaining the result (e.g., "User added successfully", "User already exists").
- user_id: The unique identifier assigned to the new user, returned if the user is successfully enrolled.
- error_code: Optional, included in case of errors.
Example Successful Response:
{ "status": "success", "message": "User added successfully", "user_id": "12345" }Example Error Response:
{ "status": "error", "message": "User already exists with this email", "error_code": 404 } - status: Indicates whether the enrollment attempt was successful (
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR,
email VARCHAR(255) UNIQUE NOT NULL,
birthdate DATE NOT NULL,
face_vector VECTOR(128) NOT NULL, CREATE TABLE IF NOT EXISTS "vectors_cache" (
timestamp timestamp NOT NULL DEFAULT NOW(),
key VARCHAR PRIMARY KEY,
face_vectors vector(512) NOT NULL
);Each service layer follows a standard approach to managing errors
Error Categories
- Client Errors (4XX):
- 400 - Bad Request: Input validation failures. (Face Not Detected, Similarity Threshold Not Met)
- 404 - Not Found: Resource not found.
- 409 - Conflict: Duplicate entries.
- Server Errors (5XX):
- 500 - Internal Server Error: Generic server issues.
- 503 - Service Unavailable: When external dependencies are down.
Error Response Structure
All error responses are structured like this:
error_code: error code as per the above categories.message: error message.requestId: Unique identifier to trace the error to specific requests.
{
"status": "error",
"error_code": 601,
"message": "Face not detected in the image"
}All benchmarks and load tests were conducted on the following hardware:
- Instance Type: AWS g4dn.xlarge
- GPU: 1x NVIDIA Tesla T4 (16 GB GPU memory)
- vCPUs: 4 vCPUs
- RAM: 16 GB
- Storage: Amazon EBS
This configuration represents a typical mid-range GPU instance suitable for production facial recognition workloads.
- Name: Labeled Faces in the Wild (LFW)
- Link: Dataset Link
- Details:
- Total number of unique identities: 5,750+
- Images per identity: Varies, with multiple images for some identities.
- False Rejection Rate (FRR):
- Objective: Measure how often the system fails to recognise the correct match for a given identity.
- Setup:
- Extracted unique identities with more than one image.
- Used the first image to create the database of embeddings.
- Tested the second image of each identity for recognition.
- False Acceptance Rate (FAR):
- Objective: Measure how often the system incorrectly matches an input image to a different identity.
- Setup:
- Divided the dataset into two halves.
- Used the first half to populate the database of embeddings.
- Tested the second half for recognition.
| Metric | DB Size | Test data size | Errors | Rate(%) |
|---|---|---|---|---|
| FRR | 5650 | 1680 | 108 | 6.43 |
| FAR | 2796 | 3557 | 149 | 4.19 |
- False Rejection Rate (FRR): 6.43% indicates that approximately 1 in 16 genuine users might need to retry authentication
- False Acceptance Rate (FAR): 4.19% shows strong security, with only ~4 in 100 false matches
- These metrics can be tuned by adjusting the similarity threshold in the configuration
- Results may vary based on image quality, lighting conditions, and face angles
Note: The benchmarking scripts and test data are available in the project-data/benchmarking/ directory for reproducibility.
The load test was conducted on the same hardware environment (AWS g4dn.xlarge with NVIDIA T4 GPU) using the following parameters:
- Execution using the
constant-vusexecutor.
- 1st stage: 4 VUs (duration: 5 minutes)
- 2nd stage: 8 VUs (duration: 5 minutes)
- 3rd stage: 12 VUs (duration: 5 minutes)
- 95% of reqs must have a response time < 3000 ms.
- Failure rate must be < 5%.
- Dataset: Link
- Requests: Each request contained a randomly selected image from the enrolled dataset.
| Metric | Value |
|---|---|
| Max Throughput | 5.44 req/s |
| HTTP Failures | 0.0120 req/s |
| Avg. Response Time | 1.46 s |
| 95% Response Time | 2.86 s |
| Total Requests | 4,968 |
| Failed Requests | 11 (0%) |
| Unique URLs Tested | 6 |
| Endpoint | Scenario | Method | Status | Count | Avg | P95 | Max |
|---|---|---|---|---|---|---|---|
| /api/recognise_user | scenario_12_users | POST | 200 | 1.72K | 2.11s | 3.44s | 5.80s |
| /api/recognise_user | scenario_12_users | POST | 400 | 4 | 1.66s | 2.63s | 2.76s |
| /api/recognise_user | scenario_8_users | POST | 200 | 1.65K | 1.46s | 2.59s | 4.06s |
| /api/recognise_user | scenario_8_users | POST | 400 | 3 | 1.62s | 2.16s | 2.26s |
| /api/recognise_user | scenario_4_users | POST | 400 | 4 | 1.09s | 2.03s | 2.13s |
| /api/recognise_user | scenario_4_users | POST | 200 | 1.59K | 753ms | 1.33s | 2.73s |
A total of 9,936 checks were evaluated with 22 failures, resulting in a 99.8% overall success rate.
| Check Name | Success Rate | Success Count | Fail Count |
|---|---|---|---|
| status is 200 (scenario_8_users) | 99.8% | 1,649 | 3 |
| response matches schema (scenario_8_users) | 99.8% | 1,649 | 3 |
| status is 200 (scenario_12_users) | 99.8% | 1,715 | 4 |
| status is 200 (scenario_4_users) | 99.7% | 1,593 | 4 |
| response matches schema (scenario_4_users) | 99.7% | 1,593 | 4 |
| response matches schema (scenario_12_users) | 99.8% | 1,715 | 4 |
All performance expectations were met, with no defined thresholds being exceeded:
- http_req_duration: p(95) < 3000ms ✓
- http_req_failed: rate < 0.05 ✓
- Maximum response time: 6s (at 12 VUs)
- Average response time at peak: 3s
- 95% of requests at peak: < 5s
- Overall average: 5.5 req/s
- Peak: 7.3 req/s (at 12 VUs)
- Data sent peak: 66.5 KB/s (at 12 VUs)
- Data received peak: 1.79 KB/s (at 12 VUs)
- Total data sent: 44.1 MB
- Total data received: 1.18 MB
- Started at ~250% and stayed constant about ~300% as the load reached it’s peak.
- Started at ~150 MB and peaked at ~300 MB.
Triton won't start: Check that model_repository/facenet/1/model.onnx exists. The model file should be ~94MB.
Database connection errors: Wait 10-15 seconds after docker-compose up for Postgres to initialize. Check logs with docker-compose logs postgres.
Port conflicts: If 8080, 5432, or 8000-8002 are in use, modify the ports in docker-compose.yml.
Slow inference on CPU: First request is always slower (~2-3s). Subsequent requests should be under 1s. For faster performance, enable GPU support by uncommenting the runtime settings in docker-compose.yml.
The system is designed with modularity in mind:
- Face detection and feature extraction models are served via Triton and can be swapped out
- Vector similarity search uses PostgreSQL with pgvector, but the database layer is abstracted
- The FastAPI backend handles orchestration and can be extended with additional endpoints
Contributions are welcome. Feel free to fork this project and adapt it to your needs.


