Context
Find currently has basic health checks via Dockerflow middleware (__heartbeat__, __lbheartbeat__) and an internal self-test framework (core/selftests.py) that collects timing data for database, cache, and OpenSearch connectivity. However, there is no Prometheus-compatible metrics endpoint for scraping operational metrics.
For production observability, we need a /metrics endpoint that exposes standard application metrics in Prometheus format.
Requirements
- Add
prometheus-client library to dependencies in src/backend/pyproject.toml
- Create a
/metrics endpoint that exposes:
- Request counts and latencies by endpoint (
documents/index/, documents/search/, documents/delete/)
- HTTP response status code distribution
- OpenSearch query latencies
- Document indexing throughput (count, bulk size)
- Search query performance (query time, result counts)
- Consider adding middleware or decorators to instrument existing views in
core/views.py
- Ensure the metrics endpoint is excluded from authentication requirements
- Document the available metrics and scrape configuration
Technical notes
- The existing self-test framework in
core/selftests_builtin.py already collects duration_ms for each health check - these could be exposed as gauges
- Gunicorn configuration is in
docker/files/usr/local/etc/gunicorn/find.py if process-level metrics are needed
- Consider using
prometheus_client.multiprocess mode for multi-worker Gunicorn deployments
Context
Find currently has basic health checks via Dockerflow middleware (
__heartbeat__,__lbheartbeat__) and an internal self-test framework (core/selftests.py) that collects timing data for database, cache, and OpenSearch connectivity. However, there is no Prometheus-compatible metrics endpoint for scraping operational metrics.For production observability, we need a
/metricsendpoint that exposes standard application metrics in Prometheus format.Requirements
prometheus-clientlibrary to dependencies insrc/backend/pyproject.toml/metricsendpoint that exposes:documents/index/,documents/search/,documents/delete/)core/views.pyTechnical notes
core/selftests_builtin.pyalready collectsduration_msfor each health check - these could be exposed as gaugesdocker/files/usr/local/etc/gunicorn/find.pyif process-level metrics are neededprometheus_client.multiprocessmode for multi-worker Gunicorn deployments