Welcome to the SoroTask Keeper network! This guide provides step-by-step instructions on how to set up and run a SoroTask Keeper bot. By running a keeper, you help ensure tasks in the SoroTask network are executed reliably and on time.
See the centralized Glossary for definitions of domain-specific terms like Keeper, Resolver, and TaskConfig.
- Prerequisites
- Environment Variables
- Setup Instructions
- Dead-Letter Queue
- Mock Soroban RPC
- Chaos Testing
- Docker Deployment
- Troubleshooting
Before you begin, ensure you have the following installed on your machine:
The keeper bot requires certain configuration details to interact with the Stellar/Soroban network.
Create a .env file in the keeper directory and configure the following variables:
# The URL of the Soroban RPC server you are connecting to
SOROBAN_RPC_URL="https://rpc-futurenet.stellar.org"
# The network passphrase for the network you are targeting
NETWORK_PASSPHRASE="Test SDF Future Network ; October 2022"
# The secret key of the keeper account that will submit the transactions
KEEPER_SECRET="S..."
# The contract ID of the deployed SoroTask contract
CONTRACT_ID="C..."
# Polling interval in milliseconds (default: 10000ms = 10 seconds)
POLLING_INTERVAL_MS=10000
# Maximum number of concurrent task reads during polling (default: 10)
MAX_CONCURRENT_READS=10
# Maximum number of concurrent task executions (default: 3)
MAX_CONCURRENT_EXECUTIONS=3
# Maximum task ID to check (default: 100) - or use TASK_IDS for specific tasks
MAX_TASK_ID=100
# Optional: Comma-separated list of specific task IDs to monitor
# TASK_IDS="1,2,3,5,8"
# Wait for transaction confirmation (default: true, set to 'false' to disable)
WAIT_FOR_CONFIRMATION=true
# Structured logging
LOG_LEVEL=info
# Optional: pretty console output for local development only
# LOG_FORMAT=pretty
# Optional: folder/file for keeper execution idempotency state
# Default file: ./data/execution_locks.json
# KEEPER_STATE_DIR=./data
# IDEMPOTENCY_STATE_FILE=./data/execution_locks.json
# Optional: lock expiration controls (milliseconds)
# EXECUTION_LOCK_TTL_MS=120000
# EXECUTION_COMPLETED_MARKER_TTL_MS=30000
# Metrics / admin API
METRICS_PORT=3000
HEALTH_STALE_THRESHOLD_MS=60000
# KEEPER_ADMIN_TOKEN=replace-with-strong-random-token
# Stable work partitioning across keeper instances
KEEPER_SHARD_INDEX=0
KEEPER_SHARD_COUNT=1
# KEEPER_SHARD_LABEL=keeper-a
# Recurring schedule drift thresholds (seconds)
DRIFT_WARNING_SECONDS=60
DRIFT_CRITICAL_SECONDS=300SOROBAN_RPC_URL: This is the endpoint the bot uses to communicate with the network. You can use public nodes provided by Stellar or set up your own.NETWORK_PASSPHRASE: This ensures your bot is talking to the right network (e.g., Futurenet, Testnet, or Public Network).KEEPER_SECRET: Your keeper wallet's secret key. Keep this private and never commit it to version control (we've ensured.envis ignored by git).CONTRACT_ID: The deployed SoroTask contract address that the keeper will monitor and execute tasks from.POLLING_INTERVAL_MS: How often (in milliseconds) the keeper checks for due tasks. Lower values mean more frequent checks but higher RPC usage.MAX_CONCURRENT_READS: Maximum number of tasks to query in parallel during each poll. Higher values speed up polling but increase RPC load.MAX_CONCURRENT_EXECUTIONS: Maximum number of tasks that can be executed simultaneously. Controls execution throughput.MAX_TASK_ID: The keeper will check task IDs from 1 to this value. Alternatively, useTASK_IDSto specify exact task IDs.TASK_IDS: Optional comma-separated list of specific task IDs to monitor (e.g., "1,2,3,5"). If set, overridesMAX_TASK_ID.WAIT_FOR_CONFIRMATION: Whether to wait for transaction confirmation after submitting. Set to 'false' for fire-and-forget mode.LOG_LEVEL: Minimum log severity to emit (trace,debug,info,warn,error,fatal).LOG_FORMAT: Optional log renderer. Leave unset for JSON logs; set toprettyfor local human-readable output.KEEPER_STATE_DIR/IDEMPOTENCY_STATE_FILE: Location of persisted execution idempotency locks used to prevent duplicate submissions.EXECUTION_LOCK_TTL_MS: How long an in-progress execution lock is considered valid before stale recovery allows new work.EXECUTION_COMPLETED_MARKER_TTL_MS: Short-lived post-success marker to reduce accidental immediate duplicate submissions.KEEPER_ADMIN_TOKEN: Bearer token required to call the keeper admin pause/resume API.KEEPER_SHARD_INDEX/KEEPER_SHARD_COUNT: Stable shard assignment controls so multiple keeper instances can partition work without ambiguous ownership.KEEPER_SHARD_LABEL: Optional human-readable shard identifier used in metrics and logs.DRIFT_WARNING_SECONDS/DRIFT_CRITICAL_SECONDS: Thresholds for recurring execution drift classification.
The keeper includes a Dead-Letter Queue (DLQ) for handling repeatedly failing tasks:
# Maximum number of failures before a task is quarantined (default: 5)
DLQ_MAX_FAILURES=5
# Time window for counting failures in milliseconds (default: 3600000 = 1 hour)
DLQ_FAILURE_WINDOW_MS=3600000
# Enable automatic quarantine of repeatedly failing tasks (default: true)
DLQ_AUTO_QUARANTINE=true
# Maximum number of dead-letter records to keep (default: 1000)
DLQ_MAX_RECORDS=1000The DLQ automatically isolates tasks that fail repeatedly, preventing resource waste and providing diagnostic information for operators. See Dead-Letter Queue Documentation for details.
Docker provides a reproducible, portable deployment that works on any cloud VM, VPS, or container orchestrator. This is the recommended approach for production deployments.
-
Navigate to the Repository Root
cd /path/to/sorotask -
Configure Environment Variables
Copy the example environment file and configure it:
cp keeper/.env.example keeper/.env
Edit
keeper/.envwith your configuration (see Environment Variables section below). -
Start the Keeper
From the repository root:
docker compose up -d
This will:
- Build the Docker image with multi-stage optimization
- Start the keeper container in detached mode
- Mount
./keeper/datafor task registry persistence - Expose port 3001 for health checks and metrics
- Automatically restart the container unless explicitly stopped
-
View Logs
docker compose logs -f keeper
-
Check Health Status
curl http://localhost:3001/health
-
Stop the Keeper
docker compose down
Build the image:
cd keeper
npm run docker:buildRun standalone container:
cd keeper
npm run docker:runManual Docker commands:
# Build
docker build -t sorotask-keeper ./keeper
# Run with environment file and volume
docker run -d \
--name sorotask-keeper \
--env-file ./keeper/.env \
-p 3001:3001 \
-v $(pwd)/keeper/data:/app/data \
--restart unless-stopped \
sorotask-keeper
# View logs
docker logs -f sorotask-keeper
# Stop and remove
docker stop sorotask-keeper
docker rm sorotask-keeper- Multi-stage build: Optimized image size with separate dependency and runtime stages
- Security hardening: Runs as non-root user (
node) - Health checks: Built-in health monitoring via
/healthendpoint - Data persistence: Task registry persisted in
./keeper/datavolume - Automatic restart: Container restarts automatically on failure
- Log rotation: Configured with 10MB max size and 3 file retention
- Minimal base: Uses
node:20-alpinefor smallest footprint
AWS EC2 / DigitalOcean / Linode:
# SSH into your VM
ssh user@your-server-ip
# Clone repository
git clone https://github.com/your-org/sorotask.git
cd sorotask
# Configure environment
cp keeper/.env.example keeper/.env
nano keeper/.env # Edit with your settings
# Start with Docker Compose
docker compose up -d
# Verify it's running
docker compose ps
curl http://localhost:3001/healthKubernetes:
# Build and push to registry
docker build -t your-registry/sorotask-keeper:latest ./keeper
docker push your-registry/sorotask-keeper:latest
# Create ConfigMap from .env
kubectl create configmap keeper-config --from-env-file=keeper/.env
# Deploy (create your k8s manifests based on docker-compose.yml)
kubectl apply -f k8s/keeper-deployment.yamlOnce you have your prerequisite software and environment variables ready, follow these steps on a clean environment:
-
Navigate to the Keeper Directory
Open your terminal and navigate to thekeeperfolder if you haven't already:cd keeper -
Install Dependencies
Run the following command to install the required Node.js packages (soroban-client,dotenv, andnode-fetch):npm install
-
Run the Keeper Bot
Start the Node.js application to begin listening for and executing SoroTask tasks:node index.js
If successful, you will see output indicating that the Keeper has started, along with logs of its periodic checks for due tasks!
The keeper includes a Dead-Letter Queue (DLQ) system that automatically handles repeatedly failing tasks:
The DLQ captures and isolates tasks that fail repeatedly due to:
- Invalid configuration
- Broken target contracts
- Persistent permission problems
- Insufficient gas balance
- Automatic Quarantine: Tasks exceeding failure thresholds are automatically isolated
- Diagnostic Context: Full error history, stack traces, and task configuration preserved
- Operator Visibility: HTTP endpoints and Prometheus metrics for monitoring
- Recovery Mechanism: Manual recovery after issues are resolved
# Get DLQ overview
curl http://localhost:3000/dead-letter
# Get specific task details
curl http://localhost:3000/dead-letter/123keeper_quarantined_tasks_count # Current quarantined tasks
keeper_tasks_quarantined_total # Total tasks quarantined
keeper_tasks_recovered_total # Total tasks recovered
keeper_tasks_quarantined_skipped_total # Tasks skipped due to quarantine
Run the interactive demo to see the DLQ in action:
node examples/dead-letter-demo.jsFor complete documentation, see Dead-Letter Queue Guide.
If you want to test keeper flows without a full Soroban node, the keeper includes a lightweight mock JSON-RPC server.
cd keeper
npm run mock-rpcThen point the keeper at it:
export SOROBAN_RPC_URL=http://127.0.0.1:4100
export NETWORK_PASSPHRASE="Test SDF Future Network ; October 2022"Detailed usage, supported methods, and test examples are in docs/mock-soroban-rpc.md.
- Cause: The account associated with your
KEEPER_SECRETdoes not exist on the network you are trying to use. - Solution: Fund your keeper account. If you are on Testnet or Futurenet, use the Stellar Laboratory Friendbot to fund the public key associated with your secret. Ensure you've set the correct
NETWORK_PASSPHRASEand match the network on Stellar Laboratory.
- Cause: The bot cannot reach the specified RPC endpoint, or the endpoint rejected the request due to rate-limiting or an invalid URL setup.
- Solution:
- Double-check your
SOROBAN_RPC_URLin the.envfile for any typos. Ensure it includes the proper protocol (e.g.,https://). - If you're using a public RPC, you might be rate-limited. Wait a few moments and try again, or switch to a dedicated/private RPC provider node.
- Double-check your
- Cause: Application dependencies were not correctly or fully installed.
- Solution: Ensure you ran
npm installinside thekeeper/directory correctly. Try clearing cache or removingnode_modules(rm -rf node_modules) and runningnpm installagain.
The keeper includes a comprehensive chaos testing framework to validate resilience under realistic failure conditions. This helps ensure the keeper can handle network issues, RPC failures, and degraded performance.
# Run all chaos scenarios
npm run chaos-test
# Run specific scenarios
npm run chaos-test -- --scenario=latency,ratelimit
# List available scenarios
npm run chaos-test:list
# Run single scenario
npm run chaos-test:single latency
# Save report to file
npm run chaos-test -- --output=json --file=chaos-report.json- Latency Spikes - Test timeout handling with random delays
- Partial RPC Failure - Test graceful degradation when some methods fail
- Rate Limiting - Test backoff behavior under throttling
- Flaky Network - Test circuit breaker recovery
- Gradual Degradation - Test adaptive behavior to worsening conditions
- Complete Outage - Test worst-case scenario handling
Chaos tests are integrated into the Jest test suite:
# Run all tests including chaos tests
npm test
# Run only chaos tests
npm test -- chaos.test.jsSee Chaos Testing Documentation for detailed information.
The Keeper ships with a multi-stage Dockerfile and a docker-compose.yml at the repo root so you can run it on any server with a single command — no local Node.js installation required.
- Docker Engine ≥ 24 (tested on 29.x)
- Docker Compose v2 (
docker compose— space, not hyphen)
# 1. From the repo root, copy and fill in the environment file
cp keeper/.env.example keeper/.env
# Edit keeper/.env with your KEEPER_SECRET, CONTRACT_ID, etc.
# 2. Build the image and start the keeper in the background
docker compose up --build -d
# 3. Tail logs
docker compose logs -f keeperThe keeper's health and metrics endpoint will be reachable at http://localhost:3000/health.
Prometheus-formatted metrics are available at http://localhost:3000/metrics/prometheus. See docs/prometheus-metrics.md for detailed metric descriptions and Grafana dashboard examples.
# Should return {"status":"ok","uptime":...}
curl http://localhost:3000/health
# Or let Docker tell you (after ~30 s start_period)
docker compose ps
# Look for "healthy" in the STATUS columnThe task registry (data/tasks.json) is stored in ./keeper/data/ on the host and mounted into the container. It survives container restarts and upgrades automatically.
If you prefer to manage the container yourself without Compose, two npm convenience scripts are available. Run them from inside the keeper/ directory:
# Build the image
npm run docker:build
# Run the container (reads .env from the current directory)
npm run docker:run# Stop (data volume is preserved)
docker compose down
# Restart after config changes
docker compose up -d --buildIf you're still running into issues, feel free to open a GitHub issue or reach out to our community channels.