atnine-guard

DeFi Portfolio Guardian — real-time monitoring, risk alerting, and decision orchestration for multi-chain DeFi positions.

System Overview

atnine-guard is a Node.js backend system that continuously monitors DeFi portfolio positions across multiple blockchains, evaluates configurable risk rules against live on-chain data, and dispatches alerts to stakeholders via Telegram. It supports multi-tenant isolation through a file-based vault and coordinates work across three independent processes using Redis.

┌──────────────────────────────────────────────────────────────────┐
│                        atnine-guard                              │
│                                                                  │
│   ┌───────────┐     ┌─────────────┐     ┌──────────────┐        │
│   │    API     │     │  Scheduler  │     │    Worker     │        │
│   │  (Fastify) │     │  (5s tick)  │     │   (job loop)  │        │
│   └─────┬─────┘     └──────┬──────┘     └──────┬───────┘        │
│         │                  │                    │                │
│         │          ┌───────▼────────┐           │                │
│         │          │     Redis      │◄──────────┘                │
│         │          │  (job queue +  │                            │
│         │          │   locks +      │                            │
│         │          │   state)       │                            │
│         │          └────────────────┘                            │
│         │                                                        │
│         ▼                                                        │
│   ┌─────────────────────────────────────────┐                    │
│   │              Vault (filesystem)          │                    │
│   │  vault/tenants/TEN-*/   vault/public/    │                    │
│   └─────────────────────────────────────────┘                    │
│                                                                  │
│   ┌──────────┐  ┌──────────┐  ┌───────────┐  ┌──────────┐      │
│   │ Ethereum │  │   BSC    │  │  Osmosis  │  │ Zigchain │ ...   │
│   └──────────┘  └──────────┘  └───────────┘  └──────────┘      │
└──────────────────────────────────────────────────────────────────┘

Architecture

Process Model

The system runs as three independent OS processes that communicate exclusively through Redis. Each can be scaled, restarted, or deployed independently.

graph LR
    subgraph Processes
        API["API Server<br/><i>Fastify · port 3000</i>"]
        SCH["Scheduler<br/><i>5s tick loop</i>"]
        WRK["Worker Pool<br/><i>blocking dequeue</i>"]
    end

    subgraph Redis
        Q["q:jobs<br/>(normal queue)"]
        PQ["q:jobs:priority<br/>(critical queue)"]
        DLQ["q:jobs:dlq<br/>(dead-letter)"]
        LOCK["lock:*<br/>(distributed locks)"]
        STATE["refresh:ts:*<br/>delta:hash:*<br/>heartbeat:*"]
    end

    subgraph Storage
        VAULT["Vault<br/>(filesystem)"]
    end

    SCH -- "enqueue jobs" --> Q
    SCH -- "enqueue critical" --> PQ
    SCH -- "write state" --> STATE
    WRK -- "dequeue (BLPOP)" --> Q
    WRK -- "dequeue (LPOP)" --> PQ
    WRK -- "failed jobs" --> DLQ
    WRK -- "acquire/release" --> LOCK
    WRK -- "read/write positions,<br/>incidents, decisions" --> VAULT
    API -- "read" --> VAULT
    API -- "cache reads" --> STATE

Process	Entry point	Role
API Server	`src/api/server.js`	Fastify v5 HTTP server. Serves portfolio data, incidents, decisions, playbooks. Exposes `/health` and `/metrics` (Prometheus). All data routes require Telegram-based authentication.
Scheduler	`src/scheduler/main.js`	Ticks every 5 seconds. Scans tenant directories, determines which tenants are due for a refresh based on tier cadence, and enqueues jobs. Manages CEX mid-price refresh (every 60s).
Worker Pool	`src/workers/main.js`	Blocking dequeue loop. Processes six job types under distributed locks with heartbeat-extended TTLs. Retries transient errors up to 3 times; sends fatal failures to the dead-letter queue.

Data Flow

sequenceDiagram
    participant S as Scheduler
    participant R as Redis Queue
    participant W as Worker
    participant C as Blockchain RPCs
    participant V as Vault (filesystem)
    participant T as Telegram

    loop Every 5 seconds
        S->>S: Scan vault/tenants/ for TEN-* dirs
        S->>S: Compute tenant config hash (SHA-256)
        S->>R: Compare delta hash + check cadence
        alt Tenant is due OR config changed
            S->>R: Enqueue SYNC_BALANCES
            S->>R: Enqueue SYNC_PROTOCOL_POSITIONS
            S->>R: Enqueue EVAL_RULES
        end
    end

    loop Every 60 seconds
        S->>R: Enqueue FETCH_CEX_MIDS
    end

    loop Worker dequeue loop
        W->>R: BLPOP q:jobs:priority, q:jobs (5s timeout)
        R-->>W: Job payload
        W->>R: Acquire lock (SET NX PX 60000)

        alt SYNC_BALANCES / SYNC_PROTOCOL_POSITIONS
            W->>C: Fetch on-chain data via adapters
            C-->>W: Balances / positions
            W->>V: Write vault/.../20_positions/latest.json
        end

        alt EVAL_RULES
            W->>V: Read positions + CEX mids
            W->>W: Run rule evaluators (drift, out-of-range, oracle, RPC health)
            W->>V: Append incidents to 30_alerts/incidents/{date}.md
            alt Urgent incident detected
                W->>R: Enqueue DISPATCH_ALERTS (critical priority)
            end
        end

        alt DISPATCH_ALERTS
            W->>V: Read tenant members
            W->>T: Send Telegram notifications (rate-limited, deduped)
        end

        alt FETCH_CEX_MIDS
            W->>C: Fetch reference prices from CEX APIs
            W->>V: Write vault/public/market/cex_mids_latest.json
        end

        W->>R: Release lock (Lua atomic compare-and-delete)
    end

Redis Communication Layer

Redis serves four roles. No other inter-process communication mechanism is used.

Role	Keys	Mechanism
Job Queue	`q:jobs` (normal), `q:jobs:priority` (critical), `q:jobs:dlq` (dead-letter)	Redis lists. Workers dequeue via `BLPOP` with 5s timeout. Priority queue checked first via non-blocking `LPOP`.
Distributed Locks	`lock:{scope}`	`SET key uuid PX ttl NX`. Released atomically via Lua script that checks ownership. Workers extend lock TTL every 10s via heartbeat (`PEXPIRE`). Max job runtime enforced at 360s.
Scheduling State	`refresh:ts:{tenantId}`, `delta:hash:{tenantId}`, `refresh:ts:cex_mids`	Tracks last refresh timestamps and config hashes for delta detection.
Heartbeats	`heartbeat:scheduler`, `heartbeat:worker:{pid}`	Written every tick/loop iteration with 30s TTL. Absence signals a dead process.

Connection config: Single ioredis client with automatic retry (exponential backoff, max 5s delay). URL from REDIS_URL env var (default redis://127.0.0.1:6379).

Adapter Layer

Blockchain access is abstracted behind a registry of chain and protocol adapters. Adapters are lazy-loaded on first use and cached for the process lifetime.

graph TD
    REG["Adapter Registry<br/><code>src/adapters/registry.js</code>"]

    subgraph Chain Adapters
        EVM["EVM Adapter<br/><i>Ethereum, BSC, Base</i><br/>Multicall batching"]
        COSMOS["Cosmos Adapter<br/><i>Zigchain, Osmosis</i><br/>LCD REST API"]
    end

    subgraph Protocol Adapters
        UV2["UniV2<br/><i>DEX V2 LPs</i>"]
        UV3["UniV3<br/><i>Concentrated liquidity</i>"]
        OSCL["OsmosisCL<br/><i>Osmosis CL pools</i>"]
        MARS["Mars Protocol<br/><i>Cosmos lending</i>"]
        AAVE["Aave V3<br/><i>EVM lending</i>"]
    end

    subgraph Resilience
        FB["RPC Fallback Manager<br/><i>Primary → Secondary</i>"]
        CB["Circuit Breaker<br/><i>CLOSED → OPEN → HALF_OPEN</i>"]
    end

    REG --> EVM
    REG --> COSMOS
    REG --> UV2
    REG --> UV3
    REG --> OSCL
    REG --> MARS
    REG --> AAVE
    EVM --> FB
    COSMOS --> FB
    FB --> CB

RPC Fallback: Each chain has a primary and secondary RPC URL. The circuit breaker monitors failures per chain scope. When the failure count hits the threshold (default 5), the circuit opens and the fallback manager switches to the secondary endpoint. After a cooldown (60s), the circuit enters half-open state and tests the primary with up to 3 probe requests. Three consecutive successes close the circuit and restore the primary.

Cost Tracking: Every RPC call is counted per chain with a configurable costPerCall. Daily cost reports are written to vault/public/metrics/.

Rule Engine

Located in src/engine/rules/. Each rule evaluator receives the tenant's current positions and reference data, and returns an array of incidents.

graph LR
    POS["Positions<br/>(from vault)"] --> EVAL["Rule Evaluator"]
    CEX["CEX Mid-Prices<br/>(from vault/public)"] --> EVAL
    THR["Thresholds<br/>(per-tenant + defaults)"] --> EVAL

    EVAL --> R1["dexOutOfRange<br/><i>current_tick outside<br/>lower..upper range</i><br/>Severity: URGENT"]
    EVAL --> R2["dexDrift<br/><i>DEX price vs CEX<br/>reference deviation</i>"]
    EVAL --> R3["oracleDrift<br/><i>Stale or divergent<br/>price feeds</i>"]
    EVAL --> R4["rpcHealth<br/><i>RPC endpoint<br/>degradation</i>"]

    R1 --> INC["Incidents"]
    R2 --> INC
    R3 --> INC
    R4 --> INC

Thresholds (src/engine/thresholds.js): Per-tenant overrides stored in the vault. Falls back to system defaults when tenant-specific config is absent.

Playbooks (src/engine/playbooks.js): Ordered response steps triggered by incident types. Loaded from config/default-playbooks.json with per-tenant customization support. Playbook types:

DEX Out of Range Response
DEX Drift Response
Lending Health Low Response
Oracle Drift Response
RPC Degraded Response

Vault (Persistent Storage)

All persistent data is stored as JSON and Markdown files on the local filesystem. No external database is required.

vault/
├── tenants/
│   └── TEN-{tenantId}/
│       ├── 00_meta/
│       │   ├── tenant.json           # tenant_id, refresh_interval_sec, telegram_user_id
│       │   └── members.json          # team members with roles (admin/operator/viewer)
│       ├── 10_wallets/
│       │   └── wallets.json          # tracked wallet addresses per chain
│       ├── 20_positions/
│       │   ├── latest.json           # current positions (written by SYNC jobs)
│       │   └── posture/latest.json   # portfolio summary (NAV, risk score, chain breakdown)
│       ├── 30_alerts/
│       │   ├── incidents/{date}.md   # daily incident logs (append-only)
│       │   └── decisions/            # logged alert decisions
│       ├── 40_policies/              # risk policies
│       ├── 50_decisions/             # pending and executed decisions
│       │   └── latest.json
│       ├── 60_proposals/             # playbook proposals
│       ├── 70_reports/               # generated reports
│       └── 80_agent/                 # agent state
│
└── public/
    ├── market/
    │   └── cex_mids_latest.json      # latest CEX reference prices
    ├── health/                       # chain health metrics
    └── incidents/                    # public incident logs

Write safety: All JSON writes use atomic write (write to .tmp, then rename) to prevent corruption from crashes. Path traversal is blocked at the tenantPath() and publicPath() functions via null-byte detection, .. segment rejection, and resolve() boundary checks.

Resilience Patterns

graph TD
    subgraph "Failure Handling Pipeline"
        JOB["Job Execution"] --> |success| DONE["Complete"]
        JOB --> |TransientError| RETRY{"attempts < 3?"}
        RETRY --> |yes| REQUEUE["Re-enqueue<br/>attempts + 1"]
        RETRY --> |no| DLQ["Dead-Letter Queue"]
        JOB --> |FatalError| DLQ
        JOB --> |timeout > 360s| DLQ
        JOB --> |unknown type| DLQ
    end

    subgraph "Circuit Breaker States"
        CLOSED["CLOSED<br/><i>all calls pass through</i>"]
        CLOSED --> |"failures ≥ 5"| OPEN["OPEN<br/><i>calls rejected immediately</i>"]
        OPEN --> |"60s cooldown"| HALF["HALF_OPEN<br/><i>probe requests (max 3)</i>"]
        HALF --> |"3 consecutive successes"| CLOSED
        HALF --> |"any failure"| OPEN
    end

    subgraph "Lock Safety"
        ACQ["Acquire Lock<br/><code>SET NX PX 60000</code>"]
        HB["Heartbeat<br/><code>PEXPIRE</code> every 10s"]
        REL["Release Lock<br/><i>Lua: compare owner UUID<br/>then DEL</i>"]
        ACQ --> HB --> REL
    end

Pattern	Implementation	Parameters
Retry with backoff	`TransientError` triggers re-enqueue	Max 3 attempts
Dead-letter queue	`FatalError` or max retries exceeded	Inspect via `scripts/dlq.js`
Circuit breaker	Per-chain scope, 3-state FSM	5 failures to open, 60s reset, 3 successes to close
RPC failover	Primary/secondary endpoint switching	Triggered by circuit breaker OPEN state
Distributed locks	Redis SET NX with Lua release	60s TTL, 10s heartbeat, 360s max runtime
Rate limiting	Token bucket (API + Telegram)	Configurable tokens/interval
Cold-start stagger	Scheduler delays initial enqueues	50 RPC calls/sec max on first tick
Jitter	±20% randomization on cadence	Prevents thundering herd

API Surface

Base URL: http://localhost:3000 (configurable via API_PORT)

Method	Path	Auth	Description
`GET`	`/health`	None	Liveness check. Returns `{ status, checks: { redis, vault } }`. Returns 503 if degraded.
`GET`	`/metrics`	None	Prometheus metrics in text exposition format.
`GET`	`/posture`	Telegram	Portfolio summary: NAV, risk score, chain allocation.
`GET`	`/positions`	Telegram	Detailed position list across all chains/protocols.
`GET`	`/decisions`	Telegram	Pending and executed decisions.
`GET`	`/incidents`	Telegram	Incident log for the tenant.
`GET`	`/thresholds`	Telegram	Risk thresholds (tenant-specific or defaults).
`GET`	`/devices`	Telegram	Registered device fingerprints.
`GET`	`/exports`	Telegram	Data export endpoints.
`GET`	`/playbooks`	Telegram	Configured automation playbooks.
`GET`	`/benchmarks`	Telegram	Portfolio benchmark comparisons.

Middleware pipeline (applied in order for authenticated routes):

telegramAuthMiddleware — validates Telegram bot API credentials, extracts user ID
deviceBindingMiddleware — binds and verifies device fingerprints (requires Redis)
rateLimitMiddleware — token-bucket rate limiting per user
tenantScopeMiddleware — resolves tenant from user ID, enforces isolation

Response caching: Read-through cache backed by Redis with configurable TTLs per route. Falls back gracefully if Redis is unavailable.

Telegram Integration

Located in src/telegram/. Handles all user-facing notifications.

Module	Purpose
`bot.js`	Bot initialization, polling setup
`send.js`	Single message sending with error handling
`broadcast.js`	Multi-recipient message dispatch
`commands.js`	Bot command handlers
`rateLimiter.js`	Per-user send rate limiting
`cooldown.js`	Cooldown tracking between alerts
`dedup.js`	Prevents duplicate alert delivery
`batcher.js`	Batches multiple messages into consolidated sends
`acknowledge.js`	Tracks message acknowledgment status

Security Model

Authentication: Telegram bot API-based. User identity derived from Telegram user ID.
Authorization: Role-based — admin, operator, viewer per tenant.
Tenant isolation: Each tenant has an independent vault directory (TEN-{id}). Path traversal blocked at the vault path layer.
Device binding: Optional device fingerprint binding stored in Redis. Prevents session hijacking.
Advisor sandbox (src/advisor/sandbox.js): LLM-assisted analysis runs in a restricted context with glob-pattern file whitelisting, shell injection detection (;, |, &, `, $(), ${}), rate limits, and prompt length caps.
Secrets: .env file should be chmod 600. Never committed (in .gitignore).

Supported Chains & Protocols

Chain	Type	RPC	Multicall	Protocols
Ethereum	EVM	`eth.llamarpc.com`	`0xcA11bde...CA11`	UniV2, UniV3, Aave V3
BSC	EVM	`bsc-dataseed.binance.org`	`0xcA11bde...CA11`	—
Base	EVM	`mainnet.base.org`	`0xcA11bde...CA11`	—
Zigchain	Cosmos (LCD)	`rpc.zigchain.com`	N/A	Mars (lending)
Osmosis	Cosmos (LCD)	`rpc.osmosis.zone`	N/A	OsmosisCL (concentrated liquidity)

All chains have secondary/fallback RPC endpoints configured in config/chains.json.

Environment Variables

Variable	Required	Default	Description
`REDIS_URL`	No	`redis://127.0.0.1:6379`	Redis connection string
`TELEGRAM_BOT_TOKEN`	Yes	—	Telegram Bot API token (from @BotFather)
`VAULT_ROOT`	No	`./vault`	Filesystem path for vault storage
`NODE_ENV`	No	`development`	`development` / `production`
`API_PORT`	No	`3000`	HTTP port for the API server
`LOG_LEVEL`	No	`info`	Pino log level (`trace`, `debug`, `info`, `warn`, `error`, `fatal`)

Copy .env.example to .env and fill in the values:

cp .env.example .env

Mockup / Local Development Setup

Prerequisites

Node.js ≥ 20 (see .nvmrc)
Docker (for Redis)
A Telegram Bot Token (optional for local testing without alerts)

Steps

# 1. Clone and install
git clone <repo-url> && cd atnine-guard
nvm use            # switches to Node 20 via .nvmrc
npm install

# 2. Start Redis
docker compose up -d

# 3. Configure environment
cp .env.example .env
# Edit .env — at minimum set TELEGRAM_BOT_TOKEN for alert testing

# 4. Seed the vault with mock data
npm run seed-vault
# Creates:
#   - Tenant "seed-001" with 3 members (admin/operator/viewer)
#   - 2 mock wallets (Ethereum EOA + Cosmos)
#   - Mock positions (UniV3 LP + Mars lending)
#   - Mock posture (NAV, risk score)
#   - Sample incident log
#   - Pending rebalance decision
#   - CEX mid-prices for BTC, ETH, USDC, USDT, DAI

# 5. Start all three processes in watch mode
npm run dev
# Runs concurrently with --watch:
#   - API server     → http://localhost:3000
#   - Scheduler      → enqueuing jobs every 5s
#   - Worker         → processing jobs

# 6. Verify
curl http://localhost:3000/health
# → { "status": "ok", "checks": { "redis": "ok", "vault": "ok" } }

curl http://localhost:3000/metrics
# → Prometheus text format metrics

Running individual processes

npm run start:api        # API only
npm run start:scheduler  # Scheduler only
npm run start:worker     # Worker only

Code quality

npm run lint             # ESLint (import ordering, strict equality, prefer-const)
npm run format           # Prettier (single quotes, semicolons, 100 char width)

Pre-commit hooks (via Husky + lint-staged) run eslint --fix and prettier --write automatically on staged .js files.

Production Deployment

Target architecture

graph TB
    subgraph "VPS (Ubuntu/Debian)"
        subgraph "systemd services"
            API["defi-api.service<br/><code>node src/api/server.js</code>"]
            SCH["defi-scheduler.service<br/><code>node src/scheduler/main.js</code>"]
            WRK["defi-worker.service<br/><code>node src/workers/main.js</code>"]
        end

        REDIS["redis.service<br/><i>bound to 127.0.0.1</i>"]
        VAULT["Vault directory<br/><code>/opt/atnine-guard/vault/</code>"]
        ENV[".env<br/><code>chmod 600</code>"]

        API --> REDIS
        SCH --> REDIS
        WRK --> REDIS
        API --> VAULT
        WRK --> VAULT
    end

    subgraph "External"
        TG["Telegram API"]
        RPC["Blockchain RPCs<br/>(Ethereum, BSC, Base,<br/>Osmosis, Zigchain)"]
    end

    WRK --> TG
    WRK --> RPC
    API --> TG

Server Preparation

# Create service user (no login shell)
sudo useradd --system --create-home --shell /usr/sbin/nologin defi

# Install Node.js 20
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs

# Install Redis
sudo apt-get install -y redis-server

# Firewall
sudo ufw allow 22/tcp    # SSH
sudo ufw allow 3000/tcp  # API (or put behind reverse proxy)
sudo ufw enable

# SSH hardening (edit /etc/ssh/sshd_config)
# PasswordAuthentication no
# PermitRootLogin no
sudo systemctl restart sshd

# Optional: fail2ban + unattended-upgrades
sudo apt-get install -y fail2ban unattended-upgrades
sudo systemctl enable fail2ban

Redis Hardening

sudo vim /etc/redis/redis.conf

# Bind to localhost only
bind 127.0.0.1

# Set a password (update REDIS_URL in .env accordingly)
requirepass <strong-password>

# Disable dangerous commands
rename-command FLUSHALL ""
rename-command FLUSHDB ""
rename-command DEBUG ""

sudo systemctl restart redis

If using a Redis password, update REDIS_URL:

REDIS_URL=redis://:your-password@127.0.0.1:6379

Application Deployment

# Deploy code
sudo mkdir -p /opt/atnine-guard
sudo chown defi:defi /opt/atnine-guard
sudo -u defi git clone <repo-url> /opt/atnine-guard
cd /opt/atnine-guard
sudo -u defi npm ci --production

# Configure secrets
sudo -u defi cp .env.example .env
sudo -u defi vim .env          # Set TELEGRAM_BOT_TOKEN, REDIS_URL, NODE_ENV=production
sudo chmod 600 /opt/atnine-guard/.env

# Initialize vault
sudo -u defi npm run seed-vault

# Set vault permissions
sudo chown -R defi:defi /opt/atnine-guard/vault
sudo chmod -R 700 /opt/atnine-guard/vault

systemd Services

Three unit files are provided in systemd/. All run as the defi user, restart on failure (5s delay), and depend on redis.service.

# Install service files
sudo cp systemd/defi-api.service /etc/systemd/system/
sudo cp systemd/defi-scheduler.service /etc/systemd/system/
sudo cp systemd/defi-worker.service /etc/systemd/system/

# Reload and enable
sudo systemctl daemon-reload
sudo systemctl enable defi-api defi-scheduler defi-worker

# Start all services
sudo systemctl start defi-api defi-scheduler defi-worker

# Check status
sudo systemctl status defi-api defi-scheduler defi-worker

Service details:

Service	Unit file	ExecStart	Depends on
`defi-api`	`systemd/defi-api.service`	`node src/api/server.js`	`network.target`, `redis.service`
`defi-scheduler`	`systemd/defi-scheduler.service`	`node src/scheduler/main.js`	`network.target`, `redis.service`
`defi-worker`	`systemd/defi-worker.service`	`node src/workers/main.js`	`network.target`, `redis.service`

All services use EnvironmentFile=/opt/atnine-guard/.env and log to the systemd journal:

# View logs
sudo journalctl -u defi-api -f
sudo journalctl -u defi-scheduler -f
sudo journalctl -u defi-worker -f

# View all services combined
sudo journalctl -u defi-api -u defi-scheduler -u defi-worker --since "1 hour ago"

Hardening Verification

Run the included verification script to check security posture:

sudo bash scripts/verify-hardening.sh

This checks:

SSH: password auth disabled, root login disabled
Firewall: UFW active
Redis: bound to localhost, password set
Process security: Node.js not running as root
Secrets: .env permissions are 600
Services: fail2ban running, unattended-upgrades enabled

Backup & Restore

# Run a vault backup
node scripts/backup.js run [--backup-dir /path/to/backups]

# List existing backups
node scripts/backup.js list [--backup-dir /path/to/backups]

# Prune old backups (default: 30 day retention)
node scripts/backup.js prune [--retention-days 30]

# Restore from backup
node scripts/backup.js restore <backup-path> <target-path>

Monitoring & Observability

Health endpoint:

# Returns 200 if healthy, 503 if degraded
curl http://localhost:3000/health

Prometheus metrics at /metrics:

Metric	Type	Labels	Description
`jobs_processed_total`	Counter	`type`, `status`	Total jobs processed
`job_duration_seconds`	Histogram	`type`	Job execution duration
`queue_depth`	Gauge	`queue`	Current queue depth
`rpc_requests_total`	Counter	`chain`, `method`	RPC calls made
`rpc_latency_seconds`	Histogram	`chain`	RPC call latency
`alerts_sent_total`	Counter	`channel`	Alerts dispatched
`circuit_breaker_state`	Gauge	`scope`	Circuit breaker state (0=closed, 1=open, 2=half-open)

Structured logging: All processes use Pino with JSON output in production. Development mode uses pino-pretty with colorized, human-readable output.

Heartbeat monitoring: External monitors can check:

heartbeat:scheduler (Redis key, 30s TTL) — scheduler is alive
heartbeat:worker:{pid} (Redis key, 30s TTL) — worker is alive
/health endpoint — API is alive with Redis and vault connectivity

Operational Runbook

Dead-Letter Queue (DLQ)

# Inspect failed jobs
node scripts/dlq.js inspect [--limit 50]

# Replay a single job
node scripts/dlq.js replay <job-id>

# Replay all DLQ jobs (resets attempt counts)
node scripts/dlq.js replay-all

# Purge the DLQ
node scripts/dlq.js purge

Scheduler tiers

Tier	Cadence	Trigger
Hot	30s	`refresh_interval_sec ≤ 30` in tenant.json
Warm	120s	`30 < refresh_interval_sec ≤ 120`
Cold	600s	`refresh_interval_sec > 120`

Cadences have ±20% jitter applied. Tenants are force-refreshed immediately when their config hash changes (delta detection), regardless of cadence.

Common operations

# Restart a single service
sudo systemctl restart defi-worker

# Scale workers (run additional instances)
# Each worker uses its own PID for lock scoping
sudo systemctl start defi-worker    # additional instance

# Check Redis queue depth
redis-cli LLEN q:jobs
redis-cli LLEN q:jobs:priority
redis-cli LLEN q:jobs:dlq

# Check heartbeats
redis-cli GET heartbeat:scheduler
redis-cli KEYS "heartbeat:worker:*"

# Force-refresh a tenant (delete its refresh timestamp)
redis-cli DEL refresh:ts:<tenantId>

# Clear a stale lock
redis-cli DEL lock:job:SYNC_BALANCES:<tenantId>

Admin Scripts

All scripts are in scripts/ and run via node scripts/<name>.js.

Script	Purpose
`seed-vault.js`	Initialize vault with mock tenant, wallets, positions, incidents
`backup.js`	Vault backup with run/list/prune/restore subcommands
`dlq.js`	Manage the dead-letter queue
`audit-chain.js`	Audit blockchain state
`audit-export.js`	Export audit logs
`migrate.js`	Data migration utilities
`rules.js`	Manage risk rules
`rpc-costs.js`	RPC cost tracking and reporting
`log-rotation.js`	Log rotation management
`load-test.js`	Generate load for testing
`tmp-cleanup.js`	Clean up temporary files
`restore-test.js`	Test backup restore process
`verify-hardening.sh`	Check VPS security posture

Testing

Uses Node.js native test runner (node --test). No external test framework.

# Run all tests (unit + integration + smoke + load)
npm test

# Run a single test file
node --test tests/unit/scheduler.test.js

# Smoke tests only (60s timeout)
npm run test:smoke

# Load tests only (120s timeout)
npm run test:load

Test structure

tests/
├── unit/           # Fast, isolated tests (mocked dependencies)
├── integration/    # Tests with real adapters / Redis
├── smoke/          # End-to-end system verification
├── load/           # Performance and concurrency tests
├── fixtures/       # Mock data (positions, wallets, configs)
└── helpers/        # Test utilities and mock setup

Project Structure

atnine-guard/
├── config/
│   ├── chains.json              # Chain definitions (RPC URLs, type, costs)
│   ├── protocols.json           # Protocol definitions (addresses, types)
│   ├── majors.json              # Major token symbols for CEX tracking
│   └── default-playbooks.json   # Default incident response playbooks
│
├── src/
│   ├── api/
│   │   ├── server.js            # Fastify app setup and startup
│   │   ├── cache.js             # Redis-backed read-through cache
│   │   ├── middleware/          # Auth, rate limit, device binding, tenant scope
│   │   └── routes/              # posture, positions, decisions, incidents, etc.
│   │
│   ├── scheduler/
│   │   └── main.js              # Tick loop, tier scheduling, delta detection
│   │
│   ├── workers/
│   │   ├── main.js              # Job dequeue loop with lock management
│   │   └── handlers/            # Per-job-type handler implementations
│   │
│   ├── adapters/
│   │   ├── registry.js          # Lazy-loading adapter factory
│   │   ├── rpcFallback.js       # Primary/secondary endpoint failover
│   │   ├── interfaces.js        # Adapter interface contracts
│   │   ├── chains/              # EVM (Multicall), Cosmos (LCD)
│   │   └── protocols/           # UniV2, UniV3, OsmosisCL, Mars, AaveV3
│   │
│   ├── engine/
│   │   ├── rules/               # Risk rule evaluators
│   │   ├── thresholds.js        # Per-tenant threshold management
│   │   ├── playbooks.js         # Incident response orchestration
│   │   └── benchmarks.js        # Portfolio benchmarking
│   │
│   ├── telegram/                # Bot, send, broadcast, dedup, rate limit, etc.
│   ├── vault/                   # Filesystem storage (paths, init, fs, append, checksum)
│   ├── advisor/                 # LLM sandbox for analysis
│   ├── config/                  # Config loader
│   └── util/                    # Redis, queue, lock, errors, logger, metrics,
│                                # circuit breaker, rate limiter, HTTP, env
│
├── scripts/                     # Admin and operational scripts
├── systemd/                     # Service unit files for production
├── tests/                       # unit, integration, smoke, load
├── docs/                        # Architecture docs, ADRs, runbooks
├── docker-compose.yml           # Redis for local dev
├── package.json                 # Scripts, dependencies, engine constraints
├── .env.example                 # Environment variable template
└── .nvmrc                       # Node 20

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.husky		.husky
config		config
docs		docs
scripts		scripts
src		src
systemd		systemd
tests		tests
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.lintstagedrc.json		.lintstagedrc.json
.nvmrc		.nvmrc
.prettierrc		.prettierrc
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
claude_backend_orchestrator_prompt.md		claude_backend_orchestrator_prompt.md
defi-decisionops.jsx		defi-decisionops.jsx
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

atnine-guard

Table of Contents

System Overview

Architecture

Process Model

Data Flow

Redis Communication Layer

Adapter Layer

Rule Engine

Vault (Persistent Storage)

Resilience Patterns

API Surface

Telegram Integration

Security Model

Supported Chains & Protocols

Environment Variables

Mockup / Local Development Setup

Prerequisites

Steps

Running individual processes

Code quality

Production Deployment

Target architecture

Server Preparation

Redis Hardening

Application Deployment

systemd Services

Hardening Verification

Backup & Restore

Monitoring & Observability

Operational Runbook

Dead-Letter Queue (DLQ)

Scheduler tiers

Common operations

Admin Scripts

Testing

Test structure

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages