feat(infra): add status dashboard for devnet deployments#733
feat(infra): add status dashboard for devnet deployments#733ktechmidas merged 2 commits intov1.0-devfrom
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughThese changes introduce a complete status monitoring and dashboard infrastructure for the Dash testnet. This includes new Ansible roles for deploying a status dashboard service and monitoring scripts, Terraform AWS infrastructure for the dashboard frontend (ELB, ACM certificate, Route53 DNS), and comprehensive operational documentation. The monitoring collects node metrics while the dashboard serves them via a web interface on port 3010. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
INFRA.MD (1)
283-293: Consider moving “Current Node Status” to a time-scoped ops log.This static table will age quickly in-repo; linking to a live source/runbook entry reduces stale operational guidance risk.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@INFRA.MD` around lines 283 - 293, The static "Current Node Status" table under the "Current Node Status" heading in INFRA.MD should be removed and replaced with a short paragraph pointing readers to the time-scoped operations log or runbook (e.g., "Node Status: see ops log/runbook for live status") plus a link and last-updated timestamp; move the existing table content into the ops log entry identified for node status so historical snapshots remain there, and ensure INFRA.MD's "Current Node Status" heading now references that live source (include the runbook/ops-log identifier or URL and an indication of refresh frequency).ansible/roles/status_dashboard/tasks/main.yml (1)
27-32: Make compose pull/recreate behavior controllable via force flags.Hard-coding
pull: always+recreate: alwayscauses unnecessary churn on every run; wire these to manual override flags.Suggested fix
- name: Start status dashboard community.docker.docker_compose_v2: project_src: "{{ status_dashboard_path }}" state: present - pull: always - recreate: always + pull: "{{ (skip_dashmate_image_update | default(false)) | ternary('never', 'always') }}" + recreate: "{{ (force_dashmate_rebuild | default(false)) | ternary('always', 'auto') }}"As per coding guidelines
ansible/**/*.yml: "Use force flags (force_dashmate_rebuild,force_dashmate_reinstall,force_ssl_regenerate,force_logs_config,skip_dashmate_image_update) as manual overrides in Ansible playbooks and tasks when needed".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/roles/status_dashboard/tasks/main.yml` around lines 27 - 32, The task "Start status dashboard" using community.docker.docker_compose_v2 currently hard-codes pull: always and recreate: always; change these to honor the repo force flags: replace pull with a conditional using force_dashmate_reinstall and skip_dashmate_image_update (e.g. pull enabled when force_dashmate_reinstall is true or skip_dashmate_image_update is false) and replace recreate with a boolean driven by force_dashmate_rebuild (e.g. recreate: "{{ force_dashmate_rebuild | default(false) }}"), ensuring you use the existing variable names force_dashmate_rebuild, force_dashmate_reinstall and skip_dashmate_image_update so runs only pull/recreate when explicitly forced.ansible/deploy.yml (1)
241-248: Consider addinggather_facts: falsefor faster deployment.This new play doesn't explicitly set
gather_facts, defaulting totrue. If thestatus_monitoringrole doesn't require host facts, addinggather_facts: falsewould improve deployment speed consistent with the project's optimization patterns.♻️ Suggested improvement
- name: Deploy status monitoring to masternodes hosts: masternodes,hp_masternodes become: true + gather_facts: false roles: - status_monitoring tags: - full_deploy - status_dashboardAs per coding guidelines: "Add
dashmate_deploytag, setgather_facts: false, and usestrategy: freeinansible/deploy.ymlto enable fast, parallel deployments".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ansible/deploy.yml` around lines 241 - 248, The play deploying "Deploy status monitoring to masternodes" (hosts: masternodes,hp_masternodes) should be updated to follow deployment guidelines: add gather_facts: false, add strategy: free, and include the dashmate_deploy tag alongside existing tags so the status_monitoring role runs faster and in parallel; locate the play block that declares the role status_monitoring and modify it to include these three fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ansible/roles/status_dashboard/tasks/main.yml`:
- Around line 15-19: The "Copy SSH deploy key for status dashboard" task exposes
sensitive key data and lacks explicit ownership; update the ansible.builtin.copy
task to include no_log: true and set owner and group to the appropriate runtime
user (use existing variables such as status_dashboard_user or a deploy user) so
the copied file at "{{ status_dashboard_path }}/ssh_key" is owned correctly and
not logged; keep mode "0600" as-is to preserve permissions.
In `@ansible/roles/status_monitoring/files/dashmon-check.sh`:
- Around line 6-11: The script currently uses set -euo pipefail so a failing
call to "dashmate status" in dashmon-check.sh can abort the script before
emitting metrics; change the "sudo -u dashmate dashmate status 2>&1" invocation
to be non-fatal (e.g., capture its output and exit code, or append "|| true") so
errors don’t trigger script exit, still emit the "===SYSMETRICS===" marker and
any collected output; ensure you preserve the sudo invocation and the echo
"===SYSMETRICS===" line (or its surrounding logic) so telemetry is always
printed even if dashmate status fails.
In `@terraform/aws/security_groups.tf`:
- Around line 214-225: The ingress rule for the Status Dashboard currently opens
var.status_port to all cidr_blocks from aws_subnet.public and the VPN EIP
(aws_eip.vpn); replace that broad CIDR-based rule with a
security-group-restricted rule so only the ELB can access the dashboard. Modify
the ingress block (or replace it with an aws_security_group_rule) to remove
cidr_blocks and instead use source_security_group_id = aws_security_group.elb.id
(or security_groups = [aws_security_group.elb.id] if using the inline ingress
block) targeting var.status_port, and keep the existing description; this
ensures only the ELB security group (aws_security_group.elb) can reach the
dashboard rather than the whole public subnets.
---
Nitpick comments:
In `@ansible/deploy.yml`:
- Around line 241-248: The play deploying "Deploy status monitoring to
masternodes" (hosts: masternodes,hp_masternodes) should be updated to follow
deployment guidelines: add gather_facts: false, add strategy: free, and include
the dashmate_deploy tag alongside existing tags so the status_monitoring role
runs faster and in parallel; locate the play block that declares the role
status_monitoring and modify it to include these three fields.
In `@ansible/roles/status_dashboard/tasks/main.yml`:
- Around line 27-32: The task "Start status dashboard" using
community.docker.docker_compose_v2 currently hard-codes pull: always and
recreate: always; change these to honor the repo force flags: replace pull with
a conditional using force_dashmate_reinstall and skip_dashmate_image_update
(e.g. pull enabled when force_dashmate_reinstall is true or
skip_dashmate_image_update is false) and replace recreate with a boolean driven
by force_dashmate_rebuild (e.g. recreate: "{{ force_dashmate_rebuild |
default(false) }}"), ensuring you use the existing variable names
force_dashmate_rebuild, force_dashmate_reinstall and skip_dashmate_image_update
so runs only pull/recreate when explicitly forced.
In `@INFRA.MD`:
- Around line 283-293: The static "Current Node Status" table under the "Current
Node Status" heading in INFRA.MD should be removed and replaced with a short
paragraph pointing readers to the time-scoped operations log or runbook (e.g.,
"Node Status: see ops log/runbook for live status") plus a link and last-updated
timestamp; move the existing table content into the ops log entry identified for
node status so historical snapshots remain there, and ensure INFRA.MD's "Current
Node Status" heading now references that live source (include the
runbook/ops-log identifier or URL and an indication of refresh frequency).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
INFRA.MDansible/deploy.ymlansible/roles/status_dashboard/defaults/main.ymlansible/roles/status_dashboard/tasks/main.ymlansible/roles/status_dashboard/templates/docker-compose.yml.j2ansible/roles/status_monitoring/files/dashmon-check.shansible/roles/status_monitoring/tasks/main.ymlterraform/aws/main.tfterraform/aws/security_groups.tfterraform/aws/variables.tf
Add automatic status dashboard deployment to web-1 for devnets. Accessible at https://status.<devnet>.networks.dash.org. Terraform: - New Classic ELB with ACM cert for status subdomain - Route53 CNAME record - Security group rule for port 3010 Ansible: - status_monitoring role: deploys dashmon-check.sh to all masternodes - status_dashboard role: runs dashpay/status Docker container on web-1 with inventory file and SSH key mounted - Conditionally enabled for devnet networks only Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add no_log and explicit ownership to SSH key copy task - Make dashmate status non-fatal in dashmon-check.sh (|| true) - Restrict status dashboard SG to ELB only (drop subnet/VPN CIDRs) - Add gather_facts: false, strategy: free, dashmate_deploy tag to status monitoring play - Honor force flags for docker-compose pull/recreate - Gitignore INFRA.MD (local ops doc, not part of the project) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dd62661 to
8ac1272
Compare
Summary
https://status.<devnet-name>.networks.dash.orgdashpay/statusDocker image (built via CI on dashpay/status)Terraform
status.*subdomainAnsible
status_monitoringrole: deploys thedashmon-check.shmonitoring script to all masternodes and HP masternodesstatus_dashboardrole: pulls and runs the status Docker container on web-1, with the inventory file and SSH deploy key mounted as volumeswhen: dash_network == "devnet")Usage
Test plan
https://status.<devnet>.networks.dash.org--tags=status_dashboarddeployment works🤖 Generated with Claude Code
Summary by CodeRabbit
Documentation
New Features
Chores