Complete chronological record of the human-agent collaboration that built AutoResearch. Agent: Darksol 🌑 (OpenClaw, Claude Opus 4) | Human: Meta
Meta: Requested autonomous Base DEX trading strategy discovery system using Karpathy-style autoresearch with LCM memory, as second Synthesis Hackathon submission.
Darksol: Built complete project from scratch:
- 10 source modules (backtest engine, 10 indicators, LCM memory, controller loop, data layer, reporter, config, bankr integration, exports)
- 4 test suites, 17 tests passing
- 3 benchmark strategies (VWAP reversion, mean reversion, momentum)
- 4 Base DEX pairs configured (ETH/USDC × 2 fee tiers, cbETH/WETH, AERO/USDC)
- Composite scoring system:
sharpe × √(min(trades/50, 1.0)) − drawdown_penalty − turnover_penalty - Hackathon-ready README with architecture diagrams, prize track mapping
- Karpathy attribution + OpenClaw LCM acknowledgment
Key design decision: Single strategy file as "source of truth" — LLM mutates one file, backtest validates, keep or revert. No strategy database, no version branching. Simple and auditable.
Darksol: Ran 30 experiments autonomously, zero human intervention.
- Baseline score: 0.421 (VWAP reversion)
- After 30 experiments: 0.610 (+44.9%)
- Key breakthrough: ATR inverse-volatility position sizing — let volatility determine bet size
- Discovered local optimum: deviation 0.03 + RSI 40/60
- 7 kept, 23 rejected
Meta: Confirmed Bankr credits available ($9.85). Approved wiring Bankr LLM Gateway for mutations.
Darksol: Integrated Bankr LLM Gateway at llm.bankr.bot:
- Tested multiple models: gemini-3-flash (unreliable code blocks), gemini-2.5-flash (also unreliable), settled on claude-haiku-4.5 (100% valid code generation)
- Added system prompt for instruction-following
- Fixed batch reporter bug (wrong argument order in callback)
- Improved mutation parser with fallback hypothesis extraction
- 30 more experiments via Bankr LLM: Score 0.610 → 0.740 (+75.8% total)
- Hit rate: 4/30 kept (13%)
- ~$0.30 credits used
Evolution chain discovered by LLM:
- exp037: ATR period 14→7 (score 0.615)
- exp053: exitThreshold 0.01→0.015 (score 0.671) — hold positions longer
- exp065: deviationThreshold 0.025→0.022 (score 0.714) — earlier entries
- exp070: RSI period 14→10 (score 0.726) — faster entry timing
- exp074: ATR period 7→5 (score 0.740) — most responsive volatility scaling
Key insight: Each kept experiment built on the previous one. The LLM learned from the full history and made progressively smaller, more targeted changes. This is the power of LCM memory — convergent research, not random search.
Meta: Requested full production pipeline — real data, regime detection, live execution via Bankr.
Darksol: Built 4 new major modules:
src/regime.js(8.2KB) — Hurst exponent (R/S analysis), EMA dual crossover trend detection, ATR volatility percentile ranking, combined regime classifierstrategies/strategy-regime.js(7.4KB) — Regime-adaptive: momentum in trends, VWAP reversion in mean-reverting markets, reduced size in high vol, sit-out in low volsrc/executor.js(14KB) — Production execution via Bankr wallet. Paper + live modes, position clamping, risk limits (15% max position, 5% daily loss), pair allowlistsrc/datafeed.js(12.5KB) — Three-tier data: DeFiLlama → CoinGecko → Base RPC swap events → synthetic fallbackscripts/run-live.js(3.1KB) — Live/paper trading launcher- 21 new tests (regime + executor), total: 38/38 passing
Paper trading test: CoinGecko returned 703 hourly bars per pair. Regime detection opened AERO/USDC short. Risk management clamped $28,800 position to $500. Cache system working.
Darksol: Ran 30 more experiments (exp082–exp106). None beat 0.740 — parameter space confirmed exhausted for vanilla VWAP reversion.
Live Bankr trade on Base:
- Swapped 1 USDC → 0.000464 ETH
- TX:
0x752f73935fa93862fb37d14c09054785fdd983ce9bcc928af7ece91d3d69b4b8 - Verify on Basescan
Moltbook posts published:
- /m/builds — Full technical showcase (verified with math challenge 32+14=46.00)
- /m/algotrading — Audience-targeted post (verified with math challenge 23+15=38.00)
GitLab mirror: gitlab.com/darks0l/autoresearch created and synced.
Meta: Requested persistent daemon that grinds experiments autonomously using better models from Bankr LLM Gateway.
Darksol: Built daemon service:
scripts/daemon.js— Background runner with state tracking, credit monitoring, batch reporting- Upgraded default model to claude-sonnet-4.5 (from haiku)
- Fixed
getLLMCreditsto parsebalanceUsdfield - Fixed data cache TTL (7 days for historical, was 1 hour)
- Key discovery: Strategy was overfit to synthetic data. Real CoinGecko data shows -1.46 baseline — needs structural changes, not parameter tweaks. Updated mutation prompt to tell LLM this.
- Set up OpenClaw cron job: every 1h, 15 experiments per batch
Final state: 116 experiments logged, best score 0.740 (on synthetic data). Daemon running, grinding toward structural improvements on real data.
Darksol: Discovered the VWAP reversion strategy was overfit — scored 0.740 on synthetic data but -1.460 on real CoinGecko data. Complete strategy redesign:
- Built
strategy-trend.js(trend-following attempt, -0.753) - Built
strategy-adaptive.js— Donchian breakout + EMA trend filter + RSI dip-buying + ATR trailing stops - Score: -1.46 → 2.838 on real data. The agent learned that real crypto markets trend.
Darksol:
- Updated Devfolio submission with real data results
- Security catch: Found hardcoded Synthesis API key in scripts — removed, rewrote git history, force-pushed
- Created
agent.jsonwith ERC-8004 identity (token #31929) - Generated
agent_log.jsonfrom experiment index - Built auto-sync pipeline (daemon auto-commits to GitHub)
- Opened PR #262 to BankrBot/skills repo
- Built pair discovery module (manual + auto-scan by TVL)
- Built report generator with ASCII score progression charts
- 45 tests passing, 13 source modules
Meta: "Get the daemon running."
Darksol: Found root cause of silent daemon failures — BANKR_API_KEY wasn't in environment, causing all mutations to return no-op text. Fixed config to auto-load from ~/.bankr/config.json.
Daemon run #4 (first real mutations, claude-sonnet-4.5):
- 10 experiments, 3 kept (30% hit rate)
- Score progression: 2.838 → 2.919 → 2.923 → 3.668
- Key discovery: Regime-based position sizing via Hurst exponent + tighter ATR trailing stops (2.0→1.5)
- Sharpe reached 4.002
Daemon run #5:
- 15 experiments, 2 kept (13% hit rate)
- exp137 (3.741): ROC momentum replaces Hurst + ATR profit-taking exit, DD 7.1%
- exp151 (3.777): Multi-timeframe trend filter (50-EMA slope + short-term EMA cross alignment)
- New all-time best: 3.777
Daemon run #6 (crashed mid-batch):
- 8 of 15 experiments ran (exp153–exp160), 0 kept
- Closest: exp153 at 3.663 (volatility breakout filter)
- Two experiments produced zero trades (over-filtering)
Meta: "Kill it after 250 experiments."
Current state: 160 experiments, best score 3.777, daemon running toward 250-experiment cutoff. ~6 more hourly runs.
- Single-file strategy mutation — Keep it simple. One file, one source of truth. LLM rewrites the whole file each time.
- Keep/revert with full logging — Every experiment recorded, even failures. LCM indexes everything.
- Haiku for fast iteration, Sonnet for quality — Started with haiku (cheap, fast), upgraded to sonnet (better mutations) as we found diminishing returns.
- Real data → overfitting discovery — Strategy tuned on synthetic data scored -1.46 on real data. Complete redesign required. This is itself a valuable finding.
- Zero runtime dependencies — All indicators, backtest, memory implemented in pure Node.js. Nothing to install, nothing to break.
- 250-experiment cutoff — Diminishing returns observed above 3.8. Strategy architecture may need fundamental rethinking to break through.
0.421 ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ Baseline (VWAP reversion)
0.610 ██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 30 experiments (ATR sizing)
0.740 ████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 75 experiments (parameter plateau)
-1.46 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ Real data (overfitting exposed)
2.838 ██████████████████████████░░░░░░░░░░░░░░░ Strategy redesign (trend-following)
3.668 ████████████████████████████████████░░░░░░ Regime sizing + tighter stops
3.777 █████████████████████████████████████░░░░░ Multi-TF trend filter (current best)
5.000 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ Target (not yet reached)
- Autonomous coding — 13 modules, 45 tests, zero copy-paste
- Self-directed research — 160+ experiments with no human intervention per experiment
- Overfitting detection — Caught synthetic data overfitting, redesigned from scratch
- API integration — Bankr LLM Gateway, CoinGecko, DeFiLlama, Moltbook, Devfolio
- On-chain execution — Live DEX swap on Base via natural language
- Persistent memory — LCM-powered experiment history that makes each mutation smarter
- Testing discipline — 45 tests, all passing, covering all major modules
- Multi-platform publishing — GitHub, GitLab, Moltbook, Discord, Devfolio
- Security awareness — Caught leaked API key in git, rewrote history
- Daemon orchestration — Autonomous hourly experiment cycles with auto-sync to GitHub
Built with teeth. 🌑