Skip to content

Add Cert-Manager Prometheus Monitoring Dashboard#230

Open
qiridigital wants to merge 1 commit intoSigNoz:mainfrom
qiridigital:add-cert-manager-dashboard
Open

Add Cert-Manager Prometheus Monitoring Dashboard#230
qiridigital wants to merge 1 commit intoSigNoz:mainfrom
qiridigital:add-cert-manager-dashboard

Conversation

@qiridigital
Copy link

Summary

Adds a comprehensive Cert-Manager Prometheus monitoring dashboard with 6 sections and 19 panels, covering the full certificate lifecycle.

Closes SigNoz/signoz#6023

Dashboard Sections

1. General Overview (4 value panels)

  • Total Certificates — Count of all managed certificates
  • Active Certificates — Certificates in Ready state
  • Certificate Requests — Controller sync calls processed
  • Uptime — Time since last cert-manager restart

2. Certificate Issuance (2 graphs + 1 value)

  • Certificates Issued per Issuer — Breakdown by issuer name
  • Issuance Rate — Certificate issuance rate over time
  • Issuance Success Rate — Success percentage via formula (sync_calls - sync_errors) / sync_calls * 100\

3. Certificate Renewal (2 values + 1 graph)

  • Certificates Pending Renewal — Not-ready certificates count
  • Renewal Success Rate — Renewal success percentage
  • Renewal Duration — ACME client request duration over time

4. Error Metrics (3 graphs)

  • Certificate Issuance Errors — Error rate for issuance controller
  • Renewal Errors — Error rate for readiness controller
  • API Server Errors — ACME API requests with 4xx/5xx status codes

5. Resource Usage (3 graphs)

  • CPU Usage — Process CPU consumption
  • Memory Usage — Resident memory bytes
  • Workqueue Depth — Work queue depth by queue name

6. API and Event Metrics (3 graphs)

  • API Request Rate — ACME HTTP request rate by method/status
  • Event Processing Rate — Workqueue additions rate
  • Failed API Requests — Failed ACME requests by host/status

Template Variables

Variable Description
\
amespace\ Kubernetes namespace where cert-manager is deployed
\issuer\ Certificate issuer name (e.g., letsencrypt, vault)
\certificate_name\ Specific certificate name filter
\cluster\ Kubernetes cluster (multi-cluster setups)
\deployment_environment\ Environment (production, staging)

Key Metrics Used

  • \certmanager_certificate_ready_status\ — Certificate readiness state
  • \certmanager_controller_sync_call_count\ / \certmanager_controller_sync_error_count\ — Controller operations
  • \certmanager_http_acme_client_request_count\ / \certmanager_http_acme_client_request_duration_seconds\ — ACME client
  • \process_cpu_seconds_total\ / \process_resident_memory_bytes\ — Resource usage
  • \workqueue_adds_total\ / \workqueue_depth\ — Event processing

Implementation Notes

  • Uses SigNoz Query Builder format for 18/19 panels (PromQL used only for uptime calculation requiring \ ime()\ function)
  • Dashboard schema version: v4
  • Follows naming convention: \cert-manager-prometheus-v1.json\
  • Includes OpenTelemetry Collector configuration in README for metrics ingestion via Prometheus receiver on port 9402
  • References: cert-manager Prometheus Metrics and Grafana Dashboard #11001

Files

  • \cert-manager/cert-manager-prometheus-v1.json\ — Dashboard JSON (102KB, 19 panels)
  • \cert-manager/readme.md\ — Documentation with OTel Collector config

- 6 sections: General Overview, Certificate Issuance, Certificate Renewal,
  Error Metrics, Resource Usage, API and Event Metrics
- 19 monitoring panels covering certificate lifecycle, controller operations,
  ACME client requests, and process health
- 5 template variables: namespace, issuer, certificate_name, cluster,
  deployment_environment
- SigNoz Query Builder format with one PromQL panel for uptime calculation
- README with OpenTelemetry Collector configuration for cert-manager metrics
  ingestion via Prometheus receiver on port 9402

Closes SigNoz/signoz#6023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Dashboard] cert-manager

1 participant