Architecture, Patterns & Internals of Anthropic's AI Coding Agent
When Anthropic shipped Claude Code on npm, the .js.map source maps contained a sourcesContent field with the full original TypeScript — nearly two thousand files comprising the complete architecture of a production AI agent used by hundreds of thousands of developers. This book is the result of reading every one of those files.
18 chapters across 7 parts. ~400 pages in print equivalent.
Every chapter has layered depth: a narrative flow for technical leaders, deep-dive sections for implementers, and an "Apply This" closing that extracts transferable patterns you can steal for your own systems. Diagrams use Mermaid and render natively on GitHub.
- Senior engineers building agentic systems — steal the patterns, understand the trade-offs, implement in your own stack
- Technical leaders evaluating architectures — follow the narrative without reading every code block
- Anyone curious about how production AI tools actually work under the hood
Before the agent can think, the process must exist.
| # | Chapter | What You'll Learn |
|---|---|---|
| 1 | The Architecture of an AI Agent | The 6 key abstractions, data flow, permission system, build system |
| 2 | Starting Fast — The Bootstrap Pipeline | 5-phase init, module-level I/O parallelism, trust boundary |
| 3 | State — The Two-Tier Architecture | Bootstrap singleton, AppState store, sticky latches, cost tracking |
| 4 | Talking to Claude — The API Layer | Multi-provider client, prompt cache, streaming, error recovery |
The heartbeat of the agent: stream, act, observe, repeat.
| # | Chapter | What You'll Learn |
|---|---|---|
| 5 | The Agent Loop | query.ts deep dive, 4-layer compression, error recovery, token budgets |
| 6 | Tools — From Definition to Execution | Tool interface, 14-step pipeline, permission system |
| 7 | Concurrent Tool Execution | Partition algorithm, streaming executor, speculative execution |
One agent is powerful. Many agents working together are transformative.
| # | Chapter | What You'll Learn |
|---|---|---|
| 8 | Spawning Sub-Agents | AgentTool, 15-step runAgent lifecycle, built-in agent types |
| 9 | Fork Agents and the Prompt Cache | Byte-identical prefix trick, cache sharing, cost optimization |
| 10 | Tasks, Coordination, and Swarms | Task state machine, coordinator mode, swarm messaging |
An agent without memory makes the same mistakes forever.
| # | Chapter | What You'll Learn |
|---|---|---|
| 11 | Memory — Learning Across Conversations | File-based memory, 4-type taxonomy, LLM recall, staleness |
| 12 | Extensibility — Skills and Hooks | Two-phase skill loading, lifecycle hooks, snapshot security |
Everything the user sees passes through this layer.
| # | Chapter | What You'll Learn |
|---|---|---|
| 13 | The Terminal UI | Custom Ink fork, rendering pipeline, double-buffer, pools |
| 14 | Input and Interaction | Key parsing, keybindings, chord support, vim mode |
The agent reaches beyond localhost.
| # | Chapter | What You'll Learn |
|---|---|---|
| 15 | MCP — The Universal Tool Protocol | 8 transports, OAuth for MCP, tool wrapping |
| 16 | Remote Control and Cloud Execution | Bridge v1/v2, CCR, upstream proxy |
Making it all fast enough that humans don't notice the machinery.
| # | Chapter | What You'll Learn |
|---|---|---|
| 17 | Performance — Every Millisecond and Token Counts | Startup, context window, prompt cache, rendering, search |
| 18 | Epilogue — What We Learned | The 5 architectural bets, what transfers, where agents are heading |
If you read nothing else:
- AsyncGenerator as agent loop — yields Messages, typed Terminal return, natural backpressure and cancellation
- Speculative tool execution — start read-only tools during model streaming, before the response completes
- Concurrent-safe batching — partition tools by safety, run reads in parallel, serialize writes
- Fork agents for cache sharing — parallel children share byte-identical prompt prefixes, saving ~95% input tokens
- 4-layer context compression — snip, microcompact, collapse, autocompact — each lighter than the next
- File-based memory with LLM recall — Sonnet side-query selects relevant memories, not keyword matching
- Two-phase skill loading — frontmatter only at startup, full content on invocation
- Sticky latches for cache stability — once a beta header is sent, never unset mid-session
- Slot reservation — 8K default output cap, escalate to 64K on hit (saves context in 99% of requests)
- Hook config snapshot — freeze at startup to prevent runtime injection attacks
The source was extracted from npm source maps. 36 AI agents analyzed nearly two thousand TypeScript files in four phases:
- Exploration: 6 parallel agents read every file in the source tree
- Analysis: 12 agents wrote 494KB of raw technical documentation
- Writing: 15 agents rewrote everything from scratch as narrative chapters
- Review & Revision: 3 editorial reviewers produced 900 lines of feedback; 3 revision agents applied all fixes
The entire process — from source extraction to final revised book — took approximately 6 hours.
This is an independent analysis of publicly accessible source code. Claude Code is a product of Anthropic. This book is not affiliated with, endorsed by, or sponsored by Anthropic.
