Skip to content

feat(infra): add status dashboard for devnet deployments#733

Merged
ktechmidas merged 2 commits intov1.0-devfrom
infra/status
Mar 2, 2026
Merged

feat(infra): add status dashboard for devnet deployments#733
ktechmidas merged 2 commits intov1.0-devfrom
infra/status

Conversation

@ktechmidas
Copy link
Contributor

@ktechmidas ktechmidas commented Feb 28, 2026

Summary

  • Adds automatic status dashboard deployment to web-1 for devnets
  • Accessible at https://status.<devnet-name>.networks.dash.org
  • Uses the dashpay/status Docker image (built via CI on dashpay/status)

Terraform

  • New Classic ELB with ACM certificate for the status.* subdomain
  • Route53 CNAME record pointing to the status ELB
  • Security group ingress rule for port 3010

Ansible

  • status_monitoring role: deploys the dashmon-check.sh monitoring script to all masternodes and HP masternodes
  • status_dashboard role: pulls and runs the status Docker container on web-1, with the inventory file and SSH deploy key mounted as volumes
  • Conditionally enabled only for devnet networks (when: dash_network == "devnet")

Usage

# Included automatically in full devnet deploy
./bin/deploy devnet-<name>

# Deploy status dashboard only (on existing devnet)
./bin/deploy -p --tags=status_dashboard devnet-<name>

Test plan

  • Deploy a devnet and verify status page comes up at https://status.<devnet>.networks.dash.org
  • Verify all masternodes and HP masternodes appear on the dashboard
  • Verify real-time SSE updates are working
  • Verify standalone --tags=status_dashboard deployment works

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Added comprehensive Testnet Infrastructure Operations Guide covering architecture, deployment procedures, and troubleshooting.
  • New Features

    • Introduced status dashboard for monitoring masternode health and network metrics.
    • Added automated status monitoring across all masternodes with real-time system metrics.
  • Chores

    • Enhanced infrastructure automation to support status dashboard and monitoring deployment.

@coderabbitai
Copy link

coderabbitai bot commented Feb 28, 2026

Warning

Rate limit exceeded

@ktechmidas has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 27 minutes and 57 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between b325f89 and 8ac1272.

📒 Files selected for processing (10)
  • .gitignore
  • ansible/deploy.yml
  • ansible/roles/status_dashboard/defaults/main.yml
  • ansible/roles/status_dashboard/tasks/main.yml
  • ansible/roles/status_dashboard/templates/docker-compose.yml.j2
  • ansible/roles/status_monitoring/files/dashmon-check.sh
  • ansible/roles/status_monitoring/tasks/main.yml
  • terraform/aws/main.tf
  • terraform/aws/security_groups.tf
  • terraform/aws/variables.tf
📝 Walkthrough

Walkthrough

These changes introduce a complete status monitoring and dashboard infrastructure for the Dash testnet. This includes new Ansible roles for deploying a status dashboard service and monitoring scripts, Terraform AWS infrastructure for the dashboard frontend (ELB, ACM certificate, Route53 DNS), and comprehensive operational documentation. The monitoring collects node metrics while the dashboard serves them via a web interface on port 3010.

Changes

Cohort / File(s) Summary
Documentation
INFRA.MD
Adds comprehensive testnet infrastructure operations guide detailing architecture, deployment procedures, troubleshooting, node management, and ProTx lifecycle.
Ansible Deployment Configuration
ansible/deploy.yml
Adds new play to deploy status monitoring to masternodes, integrates status_dashboard role into faucet setup, and enables HP masternode deployment tuning with fact gathering and serial execution.
Status Dashboard Role
ansible/roles/status_dashboard/defaults/main.yml, ansible/roles/status_dashboard/tasks/main.yml, ansible/roles/status_dashboard/templates/docker-compose.yml.j2
Introduces new Ansible role to deploy status dashboard service via Docker Compose, including configuration variables, directory setup, file copying, and container orchestration.
Status Monitoring Role
ansible/roles/status_monitoring/files/dashmon-check.sh, ansible/roles/status_monitoring/tasks/main.yml
Introduces new monitoring script that collects blockchain and system metrics from masternodes (HP and regular variants) and deployment task to install it system-wide.
Terraform AWS Infrastructure
terraform/aws/main.tf, terraform/aws/security_groups.tf, terraform/aws/variables.tf
Adds AWS infrastructure for status dashboard: ELB frontend, ACM certificate with DNS validation, Route53 DNS record, and security group ingress rule; mirrors existing insight/web deployment patterns.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 We've built a dashboard, shiny and new,
Where status metrics hop right through!
Ansible deploys with furry precision,
Terraform configs, monitoring vision—
Infrastructure blooming, oh what a sight! 🌱

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: adding a status dashboard feature for devnet deployments. It is concise, specific, and directly reflects the main objective of the pull request.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch infra/status

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
INFRA.MD (1)

283-293: Consider moving “Current Node Status” to a time-scoped ops log.

This static table will age quickly in-repo; linking to a live source/runbook entry reduces stale operational guidance risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@INFRA.MD` around lines 283 - 293, The static "Current Node Status" table
under the "Current Node Status" heading in INFRA.MD should be removed and
replaced with a short paragraph pointing readers to the time-scoped operations
log or runbook (e.g., "Node Status: see ops log/runbook for live status") plus a
link and last-updated timestamp; move the existing table content into the ops
log entry identified for node status so historical snapshots remain there, and
ensure INFRA.MD's "Current Node Status" heading now references that live source
(include the runbook/ops-log identifier or URL and an indication of refresh
frequency).
ansible/roles/status_dashboard/tasks/main.yml (1)

27-32: Make compose pull/recreate behavior controllable via force flags.

Hard-coding pull: always + recreate: always causes unnecessary churn on every run; wire these to manual override flags.

Suggested fix
 - name: Start status dashboard
   community.docker.docker_compose_v2:
     project_src: "{{ status_dashboard_path }}"
     state: present
-    pull: always
-    recreate: always
+    pull: "{{ (skip_dashmate_image_update | default(false)) | ternary('never', 'always') }}"
+    recreate: "{{ (force_dashmate_rebuild | default(false)) | ternary('always', 'auto') }}"

As per coding guidelines ansible/**/*.yml: "Use force flags (force_dashmate_rebuild, force_dashmate_reinstall, force_ssl_regenerate, force_logs_config, skip_dashmate_image_update) as manual overrides in Ansible playbooks and tasks when needed".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ansible/roles/status_dashboard/tasks/main.yml` around lines 27 - 32, The task
"Start status dashboard" using community.docker.docker_compose_v2 currently
hard-codes pull: always and recreate: always; change these to honor the repo
force flags: replace pull with a conditional using force_dashmate_reinstall and
skip_dashmate_image_update (e.g. pull enabled when force_dashmate_reinstall is
true or skip_dashmate_image_update is false) and replace recreate with a boolean
driven by force_dashmate_rebuild (e.g. recreate: "{{ force_dashmate_rebuild |
default(false) }}"), ensuring you use the existing variable names
force_dashmate_rebuild, force_dashmate_reinstall and skip_dashmate_image_update
so runs only pull/recreate when explicitly forced.
ansible/deploy.yml (1)

241-248: Consider adding gather_facts: false for faster deployment.

This new play doesn't explicitly set gather_facts, defaulting to true. If the status_monitoring role doesn't require host facts, adding gather_facts: false would improve deployment speed consistent with the project's optimization patterns.

♻️ Suggested improvement
 - name: Deploy status monitoring to masternodes
   hosts: masternodes,hp_masternodes
   become: true
+  gather_facts: false
   roles:
     - status_monitoring
   tags:
     - full_deploy
     - status_dashboard

As per coding guidelines: "Add dashmate_deploy tag, set gather_facts: false, and use strategy: free in ansible/deploy.yml to enable fast, parallel deployments".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ansible/deploy.yml` around lines 241 - 248, The play deploying "Deploy status
monitoring to masternodes" (hosts: masternodes,hp_masternodes) should be updated
to follow deployment guidelines: add gather_facts: false, add strategy: free,
and include the dashmate_deploy tag alongside existing tags so the
status_monitoring role runs faster and in parallel; locate the play block that
declares the role status_monitoring and modify it to include these three fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ansible/roles/status_dashboard/tasks/main.yml`:
- Around line 15-19: The "Copy SSH deploy key for status dashboard" task exposes
sensitive key data and lacks explicit ownership; update the ansible.builtin.copy
task to include no_log: true and set owner and group to the appropriate runtime
user (use existing variables such as status_dashboard_user or a deploy user) so
the copied file at "{{ status_dashboard_path }}/ssh_key" is owned correctly and
not logged; keep mode "0600" as-is to preserve permissions.

In `@ansible/roles/status_monitoring/files/dashmon-check.sh`:
- Around line 6-11: The script currently uses set -euo pipefail so a failing
call to "dashmate status" in dashmon-check.sh can abort the script before
emitting metrics; change the "sudo -u dashmate dashmate status 2>&1" invocation
to be non-fatal (e.g., capture its output and exit code, or append "|| true") so
errors don’t trigger script exit, still emit the "===SYSMETRICS===" marker and
any collected output; ensure you preserve the sudo invocation and the echo
"===SYSMETRICS===" line (or its surrounding logic) so telemetry is always
printed even if dashmate status fails.

In `@terraform/aws/security_groups.tf`:
- Around line 214-225: The ingress rule for the Status Dashboard currently opens
var.status_port to all cidr_blocks from aws_subnet.public and the VPN EIP
(aws_eip.vpn); replace that broad CIDR-based rule with a
security-group-restricted rule so only the ELB can access the dashboard. Modify
the ingress block (or replace it with an aws_security_group_rule) to remove
cidr_blocks and instead use source_security_group_id = aws_security_group.elb.id
(or security_groups = [aws_security_group.elb.id] if using the inline ingress
block) targeting var.status_port, and keep the existing description; this
ensures only the ELB security group (aws_security_group.elb) can reach the
dashboard rather than the whole public subnets.

---

Nitpick comments:
In `@ansible/deploy.yml`:
- Around line 241-248: The play deploying "Deploy status monitoring to
masternodes" (hosts: masternodes,hp_masternodes) should be updated to follow
deployment guidelines: add gather_facts: false, add strategy: free, and include
the dashmate_deploy tag alongside existing tags so the status_monitoring role
runs faster and in parallel; locate the play block that declares the role
status_monitoring and modify it to include these three fields.

In `@ansible/roles/status_dashboard/tasks/main.yml`:
- Around line 27-32: The task "Start status dashboard" using
community.docker.docker_compose_v2 currently hard-codes pull: always and
recreate: always; change these to honor the repo force flags: replace pull with
a conditional using force_dashmate_reinstall and skip_dashmate_image_update
(e.g. pull enabled when force_dashmate_reinstall is true or
skip_dashmate_image_update is false) and replace recreate with a boolean driven
by force_dashmate_rebuild (e.g. recreate: "{{ force_dashmate_rebuild |
default(false) }}"), ensuring you use the existing variable names
force_dashmate_rebuild, force_dashmate_reinstall and skip_dashmate_image_update
so runs only pull/recreate when explicitly forced.

In `@INFRA.MD`:
- Around line 283-293: The static "Current Node Status" table under the "Current
Node Status" heading in INFRA.MD should be removed and replaced with a short
paragraph pointing readers to the time-scoped operations log or runbook (e.g.,
"Node Status: see ops log/runbook for live status") plus a link and last-updated
timestamp; move the existing table content into the ops log entry identified for
node status so historical snapshots remain there, and ensure INFRA.MD's "Current
Node Status" heading now references that live source (include the
runbook/ops-log identifier or URL and an indication of refresh frequency).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3cf17de and b325f89.

📒 Files selected for processing (10)
  • INFRA.MD
  • ansible/deploy.yml
  • ansible/roles/status_dashboard/defaults/main.yml
  • ansible/roles/status_dashboard/tasks/main.yml
  • ansible/roles/status_dashboard/templates/docker-compose.yml.j2
  • ansible/roles/status_monitoring/files/dashmon-check.sh
  • ansible/roles/status_monitoring/tasks/main.yml
  • terraform/aws/main.tf
  • terraform/aws/security_groups.tf
  • terraform/aws/variables.tf

ktechmidas and others added 2 commits February 28, 2026 15:27
Add automatic status dashboard deployment to web-1 for devnets.
Accessible at https://status.<devnet>.networks.dash.org.

Terraform:
- New Classic ELB with ACM cert for status subdomain
- Route53 CNAME record
- Security group rule for port 3010

Ansible:
- status_monitoring role: deploys dashmon-check.sh to all masternodes
- status_dashboard role: runs dashpay/status Docker container on web-1
  with inventory file and SSH key mounted
- Conditionally enabled for devnet networks only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add no_log and explicit ownership to SSH key copy task
- Make dashmate status non-fatal in dashmon-check.sh (|| true)
- Restrict status dashboard SG to ELB only (drop subnet/VPN CIDRs)
- Add gather_facts: false, strategy: free, dashmate_deploy tag to
  status monitoring play
- Honor force flags for docker-compose pull/recreate
- Gitignore INFRA.MD (local ops doc, not part of the project)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ktechmidas ktechmidas merged commit 4c40c4f into v1.0-dev Mar 2, 2026
2 checks passed
@ktechmidas ktechmidas deleted the infra/status branch March 2, 2026 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants