Skip to content

feat(OFF-178): Grafana synthetic health check probes#385

Open
noahwhite wants to merge 1 commit into
developfrom
feature/OFF-178-grafana-synthetic-health-probes
Open

feat(OFF-178): Grafana synthetic health check probes#385
noahwhite wants to merge 1 commit into
developfrom
feature/OFF-178-grafana-synthetic-health-probes

Conversation

@noahwhite

Copy link
Copy Markdown
Owner

Summary

  • Establishes Grafana Synthetic Monitoring on the Grafana Cloud stack via grafana_synthetic_monitoring_installation
  • Creates an HTTP health check probe (grafana_synthetic_monitoring_check) that authenticates with X-Health-Check-Token header against https://<tenant_domain>/
  • Probes from 3 US locations (Atlanta, New York, San Francisco) every 2 minutes with medium alert sensitivity
  • Adds a dedicated PagerDuty service (ghost-stack-dev-01-health-check) with Events API V2 integration and service dependencies
  • Routes health check alerts via a new PagerDuty - Ghost Stack Health Check contact point with service=health-check label matching in the notification policy
  • Wires HEALTH_CHECK_TOKEN through CI workflows and infra-shell.sh as TF_VAR_health_check_token

Files changed

File Change
opentofu/modules/grafana-cloud/main.tofu SM installation, provider, probes data source, check resource, contact point, notification policy routing
opentofu/modules/grafana-cloud/variables.tofu metrics_publisher_key, pagerduty_health_check_integration_key, health_check_token, tenant_domain
opentofu/modules/pagerduty/main.tofu PagerDuty service, integration, service dependencies, output
opentofu/envs/dev/main.tofu Pass new variables to grafana-cloud module
opentofu/envs/dev/variables.tofu health_check_token variable
opentofu/envs/dev/tests/grafana-cloud.tofutest.hcl Mock resources, override data, 8 new assertions
.github/workflows/deploy-dev.yml Pass HEALTH_CHECK_TOKEN and TF_VAR_health_check_token through all steps
.github/workflows/pr-tofu-plan-develop.yml Same CI wiring
docker/scripts/infra-shell.sh Fetch/prompt/export TF_VAR_health_check_token

Prerequisites

The GC_ACCESS_TOK cloud access policy token needs these scopes for Synthetic Monitoring installation: stacks:read, metrics:write, logs:write, traces:write. Verify the existing token has these scopes before applying.

Test plan

  • tofu fmt -check -recursive passes
  • tofu test passes (14/14)
  • tofu plan shows expected new resources (SM installation, check, PD service, contact point, notification policy update)
  • tofu apply creates SM infrastructure successfully
  • Health check probe appears in Grafana Synthetic Monitoring UI
  • Probe successfully authenticates and returns HTTP 200
  • Simulated failure (wrong token or stopped Ghost) triggers PagerDuty alert

Closes OFF-178

…ites

Establishes Grafana Synthetic Monitoring infrastructure to probe tenant
health check endpoints. Creates an SM installation on the Grafana Cloud
stack, configures HTTP checks that authenticate with X-Health-Check-Token
from 3 US probe locations (Atlanta, New York, San Francisco), and routes
alerts through a dedicated PagerDuty service.

- grafana_synthetic_monitoring_installation enables SM on the stack
- grafana_synthetic_monitoring_check probes https://<domain>/ every 2min
- PagerDuty service + contact point + notification policy routing
- CI workflows and infra-shell pass HEALTH_CHECK_TOKEN as TF_VAR
- OpenTofu tests cover all new resources (14/14 passing)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant