DeFi Portfolio Guardian — real-time monitoring, risk alerting, and decision orchestration for multi-chain DeFi positions.
- System Overview
- Architecture
- Supported Chains & Protocols
- Environment Variables
- Mockup / Local Development Setup
- Production Deployment
- Operational Runbook
- Admin Scripts
- Testing
- Project Structure
atnine-guard is a Node.js backend system that continuously monitors DeFi portfolio positions across multiple blockchains, evaluates configurable risk rules against live on-chain data, and dispatches alerts to stakeholders via Telegram. It supports multi-tenant isolation through a file-based vault and coordinates work across three independent processes using Redis.
┌──────────────────────────────────────────────────────────────────┐
│ atnine-guard │
│ │
│ ┌───────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ API │ │ Scheduler │ │ Worker │ │
│ │ (Fastify) │ │ (5s tick) │ │ (job loop) │ │
│ └─────┬─────┘ └──────┬──────┘ └──────┬───────┘ │
│ │ │ │ │
│ │ ┌───────▼────────┐ │ │
│ │ │ Redis │◄──────────┘ │
│ │ │ (job queue + │ │
│ │ │ locks + │ │
│ │ │ state) │ │
│ │ └────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Vault (filesystem) │ │
│ │ vault/tenants/TEN-*/ vault/public/ │ │
│ └─────────────────────────────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
│ │ Ethereum │ │ BSC │ │ Osmosis │ │ Zigchain │ ... │
│ └──────────┘ └──────────┘ └───────────┘ └──────────┘ │
└──────────────────────────────────────────────────────────────────┘
The system runs as three independent OS processes that communicate exclusively through Redis. Each can be scaled, restarted, or deployed independently.
graph LR
subgraph Processes
API["API Server<br/><i>Fastify · port 3000</i>"]
SCH["Scheduler<br/><i>5s tick loop</i>"]
WRK["Worker Pool<br/><i>blocking dequeue</i>"]
end
subgraph Redis
Q["q:jobs<br/>(normal queue)"]
PQ["q:jobs:priority<br/>(critical queue)"]
DLQ["q:jobs:dlq<br/>(dead-letter)"]
LOCK["lock:*<br/>(distributed locks)"]
STATE["refresh:ts:*<br/>delta:hash:*<br/>heartbeat:*"]
end
subgraph Storage
VAULT["Vault<br/>(filesystem)"]
end
SCH -- "enqueue jobs" --> Q
SCH -- "enqueue critical" --> PQ
SCH -- "write state" --> STATE
WRK -- "dequeue (BLPOP)" --> Q
WRK -- "dequeue (LPOP)" --> PQ
WRK -- "failed jobs" --> DLQ
WRK -- "acquire/release" --> LOCK
WRK -- "read/write positions,<br/>incidents, decisions" --> VAULT
API -- "read" --> VAULT
API -- "cache reads" --> STATE
| Process | Entry point | Role |
|---|---|---|
| API Server | src/api/server.js |
Fastify v5 HTTP server. Serves portfolio data, incidents, decisions, playbooks. Exposes /health and /metrics (Prometheus). All data routes require Telegram-based authentication. |
| Scheduler | src/scheduler/main.js |
Ticks every 5 seconds. Scans tenant directories, determines which tenants are due for a refresh based on tier cadence, and enqueues jobs. Manages CEX mid-price refresh (every 60s). |
| Worker Pool | src/workers/main.js |
Blocking dequeue loop. Processes six job types under distributed locks with heartbeat-extended TTLs. Retries transient errors up to 3 times; sends fatal failures to the dead-letter queue. |
sequenceDiagram
participant S as Scheduler
participant R as Redis Queue
participant W as Worker
participant C as Blockchain RPCs
participant V as Vault (filesystem)
participant T as Telegram
loop Every 5 seconds
S->>S: Scan vault/tenants/ for TEN-* dirs
S->>S: Compute tenant config hash (SHA-256)
S->>R: Compare delta hash + check cadence
alt Tenant is due OR config changed
S->>R: Enqueue SYNC_BALANCES
S->>R: Enqueue SYNC_PROTOCOL_POSITIONS
S->>R: Enqueue EVAL_RULES
end
end
loop Every 60 seconds
S->>R: Enqueue FETCH_CEX_MIDS
end
loop Worker dequeue loop
W->>R: BLPOP q:jobs:priority, q:jobs (5s timeout)
R-->>W: Job payload
W->>R: Acquire lock (SET NX PX 60000)
alt SYNC_BALANCES / SYNC_PROTOCOL_POSITIONS
W->>C: Fetch on-chain data via adapters
C-->>W: Balances / positions
W->>V: Write vault/.../20_positions/latest.json
end
alt EVAL_RULES
W->>V: Read positions + CEX mids
W->>W: Run rule evaluators (drift, out-of-range, oracle, RPC health)
W->>V: Append incidents to 30_alerts/incidents/{date}.md
alt Urgent incident detected
W->>R: Enqueue DISPATCH_ALERTS (critical priority)
end
end
alt DISPATCH_ALERTS
W->>V: Read tenant members
W->>T: Send Telegram notifications (rate-limited, deduped)
end
alt FETCH_CEX_MIDS
W->>C: Fetch reference prices from CEX APIs
W->>V: Write vault/public/market/cex_mids_latest.json
end
W->>R: Release lock (Lua atomic compare-and-delete)
end
Redis serves four roles. No other inter-process communication mechanism is used.
| Role | Keys | Mechanism |
|---|---|---|
| Job Queue | q:jobs (normal), q:jobs:priority (critical), q:jobs:dlq (dead-letter) |
Redis lists. Workers dequeue via BLPOP with 5s timeout. Priority queue checked first via non-blocking LPOP. |
| Distributed Locks | lock:{scope} |
SET key uuid PX ttl NX. Released atomically via Lua script that checks ownership. Workers extend lock TTL every 10s via heartbeat (PEXPIRE). Max job runtime enforced at 360s. |
| Scheduling State | refresh:ts:{tenantId}, delta:hash:{tenantId}, refresh:ts:cex_mids |
Tracks last refresh timestamps and config hashes for delta detection. |
| Heartbeats | heartbeat:scheduler, heartbeat:worker:{pid} |
Written every tick/loop iteration with 30s TTL. Absence signals a dead process. |
Connection config: Single ioredis client with automatic retry (exponential backoff, max 5s delay). URL from REDIS_URL env var (default redis://127.0.0.1:6379).
Blockchain access is abstracted behind a registry of chain and protocol adapters. Adapters are lazy-loaded on first use and cached for the process lifetime.
graph TD
REG["Adapter Registry<br/><code>src/adapters/registry.js</code>"]
subgraph Chain Adapters
EVM["EVM Adapter<br/><i>Ethereum, BSC, Base</i><br/>Multicall batching"]
COSMOS["Cosmos Adapter<br/><i>Zigchain, Osmosis</i><br/>LCD REST API"]
end
subgraph Protocol Adapters
UV2["UniV2<br/><i>DEX V2 LPs</i>"]
UV3["UniV3<br/><i>Concentrated liquidity</i>"]
OSCL["OsmosisCL<br/><i>Osmosis CL pools</i>"]
MARS["Mars Protocol<br/><i>Cosmos lending</i>"]
AAVE["Aave V3<br/><i>EVM lending</i>"]
end
subgraph Resilience
FB["RPC Fallback Manager<br/><i>Primary → Secondary</i>"]
CB["Circuit Breaker<br/><i>CLOSED → OPEN → HALF_OPEN</i>"]
end
REG --> EVM
REG --> COSMOS
REG --> UV2
REG --> UV3
REG --> OSCL
REG --> MARS
REG --> AAVE
EVM --> FB
COSMOS --> FB
FB --> CB
RPC Fallback: Each chain has a primary and secondary RPC URL. The circuit breaker monitors failures per chain scope. When the failure count hits the threshold (default 5), the circuit opens and the fallback manager switches to the secondary endpoint. After a cooldown (60s), the circuit enters half-open state and tests the primary with up to 3 probe requests. Three consecutive successes close the circuit and restore the primary.
Cost Tracking: Every RPC call is counted per chain with a configurable costPerCall. Daily cost reports are written to vault/public/metrics/.
Located in src/engine/rules/. Each rule evaluator receives the tenant's current positions and reference data, and returns an array of incidents.
graph LR
POS["Positions<br/>(from vault)"] --> EVAL["Rule Evaluator"]
CEX["CEX Mid-Prices<br/>(from vault/public)"] --> EVAL
THR["Thresholds<br/>(per-tenant + defaults)"] --> EVAL
EVAL --> R1["dexOutOfRange<br/><i>current_tick outside<br/>lower..upper range</i><br/>Severity: URGENT"]
EVAL --> R2["dexDrift<br/><i>DEX price vs CEX<br/>reference deviation</i>"]
EVAL --> R3["oracleDrift<br/><i>Stale or divergent<br/>price feeds</i>"]
EVAL --> R4["rpcHealth<br/><i>RPC endpoint<br/>degradation</i>"]
R1 --> INC["Incidents"]
R2 --> INC
R3 --> INC
R4 --> INC
Thresholds (src/engine/thresholds.js): Per-tenant overrides stored in the vault. Falls back to system defaults when tenant-specific config is absent.
Playbooks (src/engine/playbooks.js): Ordered response steps triggered by incident types. Loaded from config/default-playbooks.json with per-tenant customization support. Playbook types:
- DEX Out of Range Response
- DEX Drift Response
- Lending Health Low Response
- Oracle Drift Response
- RPC Degraded Response
All persistent data is stored as JSON and Markdown files on the local filesystem. No external database is required.
vault/
├── tenants/
│ └── TEN-{tenantId}/
│ ├── 00_meta/
│ │ ├── tenant.json # tenant_id, refresh_interval_sec, telegram_user_id
│ │ └── members.json # team members with roles (admin/operator/viewer)
│ ├── 10_wallets/
│ │ └── wallets.json # tracked wallet addresses per chain
│ ├── 20_positions/
│ │ ├── latest.json # current positions (written by SYNC jobs)
│ │ └── posture/latest.json # portfolio summary (NAV, risk score, chain breakdown)
│ ├── 30_alerts/
│ │ ├── incidents/{date}.md # daily incident logs (append-only)
│ │ └── decisions/ # logged alert decisions
│ ├── 40_policies/ # risk policies
│ ├── 50_decisions/ # pending and executed decisions
│ │ └── latest.json
│ ├── 60_proposals/ # playbook proposals
│ ├── 70_reports/ # generated reports
│ └── 80_agent/ # agent state
│
└── public/
├── market/
│ └── cex_mids_latest.json # latest CEX reference prices
├── health/ # chain health metrics
└── incidents/ # public incident logs
Write safety: All JSON writes use atomic write (write to .tmp, then rename) to prevent corruption from crashes. Path traversal is blocked at the tenantPath() and publicPath() functions via null-byte detection, .. segment rejection, and resolve() boundary checks.
graph TD
subgraph "Failure Handling Pipeline"
JOB["Job Execution"] --> |success| DONE["Complete"]
JOB --> |TransientError| RETRY{"attempts < 3?"}
RETRY --> |yes| REQUEUE["Re-enqueue<br/>attempts + 1"]
RETRY --> |no| DLQ["Dead-Letter Queue"]
JOB --> |FatalError| DLQ
JOB --> |timeout > 360s| DLQ
JOB --> |unknown type| DLQ
end
subgraph "Circuit Breaker States"
CLOSED["CLOSED<br/><i>all calls pass through</i>"]
CLOSED --> |"failures ≥ 5"| OPEN["OPEN<br/><i>calls rejected immediately</i>"]
OPEN --> |"60s cooldown"| HALF["HALF_OPEN<br/><i>probe requests (max 3)</i>"]
HALF --> |"3 consecutive successes"| CLOSED
HALF --> |"any failure"| OPEN
end
subgraph "Lock Safety"
ACQ["Acquire Lock<br/><code>SET NX PX 60000</code>"]
HB["Heartbeat<br/><code>PEXPIRE</code> every 10s"]
REL["Release Lock<br/><i>Lua: compare owner UUID<br/>then DEL</i>"]
ACQ --> HB --> REL
end
| Pattern | Implementation | Parameters |
|---|---|---|
| Retry with backoff | TransientError triggers re-enqueue |
Max 3 attempts |
| Dead-letter queue | FatalError or max retries exceeded |
Inspect via scripts/dlq.js |
| Circuit breaker | Per-chain scope, 3-state FSM | 5 failures to open, 60s reset, 3 successes to close |
| RPC failover | Primary/secondary endpoint switching | Triggered by circuit breaker OPEN state |
| Distributed locks | Redis SET NX with Lua release | 60s TTL, 10s heartbeat, 360s max runtime |
| Rate limiting | Token bucket (API + Telegram) | Configurable tokens/interval |
| Cold-start stagger | Scheduler delays initial enqueues | 50 RPC calls/sec max on first tick |
| Jitter | ±20% randomization on cadence | Prevents thundering herd |
Base URL: http://localhost:3000 (configurable via API_PORT)
| Method | Path | Auth | Description |
|---|---|---|---|
GET |
/health |
None | Liveness check. Returns { status, checks: { redis, vault } }. Returns 503 if degraded. |
GET |
/metrics |
None | Prometheus metrics in text exposition format. |
GET |
/posture |
Telegram | Portfolio summary: NAV, risk score, chain allocation. |
GET |
/positions |
Telegram | Detailed position list across all chains/protocols. |
GET |
/decisions |
Telegram | Pending and executed decisions. |
GET |
/incidents |
Telegram | Incident log for the tenant. |
GET |
/thresholds |
Telegram | Risk thresholds (tenant-specific or defaults). |
GET |
/devices |
Telegram | Registered device fingerprints. |
GET |
/exports |
Telegram | Data export endpoints. |
GET |
/playbooks |
Telegram | Configured automation playbooks. |
GET |
/benchmarks |
Telegram | Portfolio benchmark comparisons. |
Middleware pipeline (applied in order for authenticated routes):
telegramAuthMiddleware— validates Telegram bot API credentials, extracts user IDdeviceBindingMiddleware— binds and verifies device fingerprints (requires Redis)rateLimitMiddleware— token-bucket rate limiting per usertenantScopeMiddleware— resolves tenant from user ID, enforces isolation
Response caching: Read-through cache backed by Redis with configurable TTLs per route. Falls back gracefully if Redis is unavailable.
Located in src/telegram/. Handles all user-facing notifications.
| Module | Purpose |
|---|---|
bot.js |
Bot initialization, polling setup |
send.js |
Single message sending with error handling |
broadcast.js |
Multi-recipient message dispatch |
commands.js |
Bot command handlers |
rateLimiter.js |
Per-user send rate limiting |
cooldown.js |
Cooldown tracking between alerts |
dedup.js |
Prevents duplicate alert delivery |
batcher.js |
Batches multiple messages into consolidated sends |
acknowledge.js |
Tracks message acknowledgment status |
- Authentication: Telegram bot API-based. User identity derived from Telegram user ID.
- Authorization: Role-based —
admin,operator,viewerper tenant. - Tenant isolation: Each tenant has an independent vault directory (
TEN-{id}). Path traversal blocked at the vault path layer. - Device binding: Optional device fingerprint binding stored in Redis. Prevents session hijacking.
- Advisor sandbox (
src/advisor/sandbox.js): LLM-assisted analysis runs in a restricted context with glob-pattern file whitelisting, shell injection detection (;,|,&,`,$(),${}), rate limits, and prompt length caps. - Secrets:
.envfile should bechmod 600. Never committed (in.gitignore).
| Chain | Type | RPC | Multicall | Protocols |
|---|---|---|---|---|
| Ethereum | EVM | eth.llamarpc.com |
0xcA11bde...CA11 |
UniV2, UniV3, Aave V3 |
| BSC | EVM | bsc-dataseed.binance.org |
0xcA11bde...CA11 |
— |
| Base | EVM | mainnet.base.org |
0xcA11bde...CA11 |
— |
| Zigchain | Cosmos (LCD) | rpc.zigchain.com |
N/A | Mars (lending) |
| Osmosis | Cosmos (LCD) | rpc.osmosis.zone |
N/A | OsmosisCL (concentrated liquidity) |
All chains have secondary/fallback RPC endpoints configured in config/chains.json.
| Variable | Required | Default | Description |
|---|---|---|---|
REDIS_URL |
No | redis://127.0.0.1:6379 |
Redis connection string |
TELEGRAM_BOT_TOKEN |
Yes | — | Telegram Bot API token (from @BotFather) |
VAULT_ROOT |
No | ./vault |
Filesystem path for vault storage |
NODE_ENV |
No | development |
development / production |
API_PORT |
No | 3000 |
HTTP port for the API server |
LOG_LEVEL |
No | info |
Pino log level (trace, debug, info, warn, error, fatal) |
Copy .env.example to .env and fill in the values:
cp .env.example .env- Node.js ≥ 20 (see
.nvmrc) - Docker (for Redis)
- A Telegram Bot Token (optional for local testing without alerts)
# 1. Clone and install
git clone <repo-url> && cd atnine-guard
nvm use # switches to Node 20 via .nvmrc
npm install
# 2. Start Redis
docker compose up -d
# 3. Configure environment
cp .env.example .env
# Edit .env — at minimum set TELEGRAM_BOT_TOKEN for alert testing
# 4. Seed the vault with mock data
npm run seed-vault
# Creates:
# - Tenant "seed-001" with 3 members (admin/operator/viewer)
# - 2 mock wallets (Ethereum EOA + Cosmos)
# - Mock positions (UniV3 LP + Mars lending)
# - Mock posture (NAV, risk score)
# - Sample incident log
# - Pending rebalance decision
# - CEX mid-prices for BTC, ETH, USDC, USDT, DAI
# 5. Start all three processes in watch mode
npm run dev
# Runs concurrently with --watch:
# - API server → http://localhost:3000
# - Scheduler → enqueuing jobs every 5s
# - Worker → processing jobs
# 6. Verify
curl http://localhost:3000/health
# → { "status": "ok", "checks": { "redis": "ok", "vault": "ok" } }
curl http://localhost:3000/metrics
# → Prometheus text format metricsnpm run start:api # API only
npm run start:scheduler # Scheduler only
npm run start:worker # Worker onlynpm run lint # ESLint (import ordering, strict equality, prefer-const)
npm run format # Prettier (single quotes, semicolons, 100 char width)Pre-commit hooks (via Husky + lint-staged) run eslint --fix and prettier --write automatically on staged .js files.
graph TB
subgraph "VPS (Ubuntu/Debian)"
subgraph "systemd services"
API["defi-api.service<br/><code>node src/api/server.js</code>"]
SCH["defi-scheduler.service<br/><code>node src/scheduler/main.js</code>"]
WRK["defi-worker.service<br/><code>node src/workers/main.js</code>"]
end
REDIS["redis.service<br/><i>bound to 127.0.0.1</i>"]
VAULT["Vault directory<br/><code>/opt/atnine-guard/vault/</code>"]
ENV[".env<br/><code>chmod 600</code>"]
API --> REDIS
SCH --> REDIS
WRK --> REDIS
API --> VAULT
WRK --> VAULT
end
subgraph "External"
TG["Telegram API"]
RPC["Blockchain RPCs<br/>(Ethereum, BSC, Base,<br/>Osmosis, Zigchain)"]
end
WRK --> TG
WRK --> RPC
API --> TG
# Create service user (no login shell)
sudo useradd --system --create-home --shell /usr/sbin/nologin defi
# Install Node.js 20
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs
# Install Redis
sudo apt-get install -y redis-server
# Firewall
sudo ufw allow 22/tcp # SSH
sudo ufw allow 3000/tcp # API (or put behind reverse proxy)
sudo ufw enable
# SSH hardening (edit /etc/ssh/sshd_config)
# PasswordAuthentication no
# PermitRootLogin no
sudo systemctl restart sshd
# Optional: fail2ban + unattended-upgrades
sudo apt-get install -y fail2ban unattended-upgrades
sudo systemctl enable fail2bansudo vim /etc/redis/redis.conf
# Bind to localhost only
bind 127.0.0.1
# Set a password (update REDIS_URL in .env accordingly)
requirepass <strong-password>
# Disable dangerous commands
rename-command FLUSHALL ""
rename-command FLUSHDB ""
rename-command DEBUG ""
sudo systemctl restart redisIf using a Redis password, update REDIS_URL:
REDIS_URL=redis://:your-password@127.0.0.1:6379
# Deploy code
sudo mkdir -p /opt/atnine-guard
sudo chown defi:defi /opt/atnine-guard
sudo -u defi git clone <repo-url> /opt/atnine-guard
cd /opt/atnine-guard
sudo -u defi npm ci --production
# Configure secrets
sudo -u defi cp .env.example .env
sudo -u defi vim .env # Set TELEGRAM_BOT_TOKEN, REDIS_URL, NODE_ENV=production
sudo chmod 600 /opt/atnine-guard/.env
# Initialize vault
sudo -u defi npm run seed-vault
# Set vault permissions
sudo chown -R defi:defi /opt/atnine-guard/vault
sudo chmod -R 700 /opt/atnine-guard/vaultThree unit files are provided in systemd/. All run as the defi user, restart on failure (5s delay), and depend on redis.service.
# Install service files
sudo cp systemd/defi-api.service /etc/systemd/system/
sudo cp systemd/defi-scheduler.service /etc/systemd/system/
sudo cp systemd/defi-worker.service /etc/systemd/system/
# Reload and enable
sudo systemctl daemon-reload
sudo systemctl enable defi-api defi-scheduler defi-worker
# Start all services
sudo systemctl start defi-api defi-scheduler defi-worker
# Check status
sudo systemctl status defi-api defi-scheduler defi-workerService details:
| Service | Unit file | ExecStart | Depends on |
|---|---|---|---|
defi-api |
systemd/defi-api.service |
node src/api/server.js |
network.target, redis.service |
defi-scheduler |
systemd/defi-scheduler.service |
node src/scheduler/main.js |
network.target, redis.service |
defi-worker |
systemd/defi-worker.service |
node src/workers/main.js |
network.target, redis.service |
All services use EnvironmentFile=/opt/atnine-guard/.env and log to the systemd journal:
# View logs
sudo journalctl -u defi-api -f
sudo journalctl -u defi-scheduler -f
sudo journalctl -u defi-worker -f
# View all services combined
sudo journalctl -u defi-api -u defi-scheduler -u defi-worker --since "1 hour ago"Run the included verification script to check security posture:
sudo bash scripts/verify-hardening.shThis checks:
- SSH: password auth disabled, root login disabled
- Firewall: UFW active
- Redis: bound to localhost, password set
- Process security: Node.js not running as root
- Secrets:
.envpermissions are 600 - Services: fail2ban running, unattended-upgrades enabled
# Run a vault backup
node scripts/backup.js run [--backup-dir /path/to/backups]
# List existing backups
node scripts/backup.js list [--backup-dir /path/to/backups]
# Prune old backups (default: 30 day retention)
node scripts/backup.js prune [--retention-days 30]
# Restore from backup
node scripts/backup.js restore <backup-path> <target-path>Health endpoint:
# Returns 200 if healthy, 503 if degraded
curl http://localhost:3000/healthPrometheus metrics at /metrics:
| Metric | Type | Labels | Description |
|---|---|---|---|
jobs_processed_total |
Counter | type, status |
Total jobs processed |
job_duration_seconds |
Histogram | type |
Job execution duration |
queue_depth |
Gauge | queue |
Current queue depth |
rpc_requests_total |
Counter | chain, method |
RPC calls made |
rpc_latency_seconds |
Histogram | chain |
RPC call latency |
alerts_sent_total |
Counter | channel |
Alerts dispatched |
circuit_breaker_state |
Gauge | scope |
Circuit breaker state (0=closed, 1=open, 2=half-open) |
Structured logging: All processes use Pino with JSON output in production. Development mode uses pino-pretty with colorized, human-readable output.
Heartbeat monitoring: External monitors can check:
heartbeat:scheduler(Redis key, 30s TTL) — scheduler is aliveheartbeat:worker:{pid}(Redis key, 30s TTL) — worker is alive/healthendpoint — API is alive with Redis and vault connectivity
# Inspect failed jobs
node scripts/dlq.js inspect [--limit 50]
# Replay a single job
node scripts/dlq.js replay <job-id>
# Replay all DLQ jobs (resets attempt counts)
node scripts/dlq.js replay-all
# Purge the DLQ
node scripts/dlq.js purge| Tier | Cadence | Trigger |
|---|---|---|
| Hot | 30s | refresh_interval_sec ≤ 30 in tenant.json |
| Warm | 120s | 30 < refresh_interval_sec ≤ 120 |
| Cold | 600s | refresh_interval_sec > 120 |
Cadences have ±20% jitter applied. Tenants are force-refreshed immediately when their config hash changes (delta detection), regardless of cadence.
# Restart a single service
sudo systemctl restart defi-worker
# Scale workers (run additional instances)
# Each worker uses its own PID for lock scoping
sudo systemctl start defi-worker # additional instance
# Check Redis queue depth
redis-cli LLEN q:jobs
redis-cli LLEN q:jobs:priority
redis-cli LLEN q:jobs:dlq
# Check heartbeats
redis-cli GET heartbeat:scheduler
redis-cli KEYS "heartbeat:worker:*"
# Force-refresh a tenant (delete its refresh timestamp)
redis-cli DEL refresh:ts:<tenantId>
# Clear a stale lock
redis-cli DEL lock:job:SYNC_BALANCES:<tenantId>All scripts are in scripts/ and run via node scripts/<name>.js.
| Script | Purpose |
|---|---|
seed-vault.js |
Initialize vault with mock tenant, wallets, positions, incidents |
backup.js |
Vault backup with run/list/prune/restore subcommands |
dlq.js |
Manage the dead-letter queue |
audit-chain.js |
Audit blockchain state |
audit-export.js |
Export audit logs |
migrate.js |
Data migration utilities |
rules.js |
Manage risk rules |
rpc-costs.js |
RPC cost tracking and reporting |
log-rotation.js |
Log rotation management |
load-test.js |
Generate load for testing |
tmp-cleanup.js |
Clean up temporary files |
restore-test.js |
Test backup restore process |
verify-hardening.sh |
Check VPS security posture |
Uses Node.js native test runner (node --test). No external test framework.
# Run all tests (unit + integration + smoke + load)
npm test
# Run a single test file
node --test tests/unit/scheduler.test.js
# Smoke tests only (60s timeout)
npm run test:smoke
# Load tests only (120s timeout)
npm run test:loadtests/
├── unit/ # Fast, isolated tests (mocked dependencies)
├── integration/ # Tests with real adapters / Redis
├── smoke/ # End-to-end system verification
├── load/ # Performance and concurrency tests
├── fixtures/ # Mock data (positions, wallets, configs)
└── helpers/ # Test utilities and mock setup
atnine-guard/
├── config/
│ ├── chains.json # Chain definitions (RPC URLs, type, costs)
│ ├── protocols.json # Protocol definitions (addresses, types)
│ ├── majors.json # Major token symbols for CEX tracking
│ └── default-playbooks.json # Default incident response playbooks
│
├── src/
│ ├── api/
│ │ ├── server.js # Fastify app setup and startup
│ │ ├── cache.js # Redis-backed read-through cache
│ │ ├── middleware/ # Auth, rate limit, device binding, tenant scope
│ │ └── routes/ # posture, positions, decisions, incidents, etc.
│ │
│ ├── scheduler/
│ │ └── main.js # Tick loop, tier scheduling, delta detection
│ │
│ ├── workers/
│ │ ├── main.js # Job dequeue loop with lock management
│ │ └── handlers/ # Per-job-type handler implementations
│ │
│ ├── adapters/
│ │ ├── registry.js # Lazy-loading adapter factory
│ │ ├── rpcFallback.js # Primary/secondary endpoint failover
│ │ ├── interfaces.js # Adapter interface contracts
│ │ ├── chains/ # EVM (Multicall), Cosmos (LCD)
│ │ └── protocols/ # UniV2, UniV3, OsmosisCL, Mars, AaveV3
│ │
│ ├── engine/
│ │ ├── rules/ # Risk rule evaluators
│ │ ├── thresholds.js # Per-tenant threshold management
│ │ ├── playbooks.js # Incident response orchestration
│ │ └── benchmarks.js # Portfolio benchmarking
│ │
│ ├── telegram/ # Bot, send, broadcast, dedup, rate limit, etc.
│ ├── vault/ # Filesystem storage (paths, init, fs, append, checksum)
│ ├── advisor/ # LLM sandbox for analysis
│ ├── config/ # Config loader
│ └── util/ # Redis, queue, lock, errors, logger, metrics,
│ # circuit breaker, rate limiter, HTTP, env
│
├── scripts/ # Admin and operational scripts
├── systemd/ # Service unit files for production
├── tests/ # unit, integration, smoke, load
├── docs/ # Architecture docs, ADRs, runbooks
├── docker-compose.yml # Redis for local dev
├── package.json # Scripts, dependencies, engine constraints
├── .env.example # Environment variable template
└── .nvmrc # Node 20