This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Thread is a service-library dual architecture for safe, fast, flexible code analysis and parsing built in Rust. It operates as both:
- Reusable Library Ecosystem - Modular crates (ast-engine, language, rule-engine) for AST-based pattern matching and transformation using tree-sitter parsers
- Persistent Service Platform - Long-lived service with incremental intelligence, content-addressed caching, and real-time code analysis
The project is forked from ast-grep and enhanced with ReCoco (Rust-only fork of CocoIndex) dataflow framework for production use as a code analysis engine for AI context generation. Thread's thread-flow crate serves as the implementation layer for ReCoco capabilities.
Key Differentiators:
- ✅ Content-Addressed Caching: 50x+ performance gains on repeated analysis via automatic incremental updates
- ✅ Dual Deployment: Single codebase compiles to both CLI (Rayon parallelism) and Edge (tokio async, Cloudflare Workers)
- ✅ Persistent Storage: Native integration with Postgres (local), D1 (edge), Qdrant (vectors)
- ✅ Dataflow Orchestration: Declarative pipelines for ETL and dependency tracking via ReCoco (implemented in
thread-flow)
Thread follows a service-library dual architecture (Constitution v2.0.0, Principle I) with seven main crates plus service layer:
thread-ast-engine- Core AST parsing, pattern matching, and transformation engine (forked from ast-grep-core)thread-language- Language definitions and tree-sitter parser integrations (supports 20+ languages)thread-rule-engine- Rule-based scanning and transformation system with YAML configuration supportthread-flow- Dataflow orchestration layer implementing ReCoco (Rust-only fork of CocoIndex) with heavy feature-gating for modular buildsthread-utilities- Shared utilities including SIMD optimizations and hash functionsthread-wasm- WebAssembly bindings for browser and edge deployment
thread-services- High-level service interfaces, API abstractions, and ReCoco integration- ReCoco Dataflow - Content-addressed caching, incremental ETL, and dependency tracking via
thread-flowcrate (public Rust-only dependency with heavy feature-gating per Constitution v2.0.0, Principle IV) - Storage Backends:
- Postgres (local CLI) - Persistent caching and analysis results
- D1 (Cloudflare Edge) - Distributed caching across CDN nodes
- Qdrant (optional) - Vector similarity search for semantic analysis
- Concurrency Models:
- Rayon (CLI) - CPU-bound parallelism for local multi-core utilization
- tokio (Edge) - Async I/O for horizontal scaling and Cloudflare Workers
xtask- Custom build tasks, primarily for WASM compilation with optimization
ReCoco is a public Rust-only fork of the CocoIndex dataflow framework, maintained as a separate open-source crate with heavy feature-gating for modular builds.
Development Control: While ReCoco lives in a separate repository, Thread maintainers have full control over ReCoco development. If Thread requires changes, new features, or bug fixes in ReCoco, these can be implemented directly without waiting on external maintainers.
- ReCoco (external dependency) - Public Rust crate providing dataflow abstractions, content-addressed caching primitives, and incremental computation framework
thread-flow(Thread crate) - Thread's implementation layer for ReCoco, providing:- Storage backend integrations (Postgres, D1, Qdrant)
- Thread-specific dataflow pipelines and ETL operations
- Incremental analysis orchestration for code graph updates
ReCoco uses Cargo feature flags to enable modular builds:
- Core dataflow primitives available without features
- Storage backends gated behind
postgres,d1,qdrantfeatures - Thread selectively enables only required features via
thread-flow
thread-services → thread-flow → recoco (public crate)
↓
[Postgres | D1 | Qdrant]
Thread maintains a clear separation between core library functionality and deployment-specific machinery:
The D1 storage backend is a first-class library feature in crates/flow/src/incremental/backends/d1.rs:
- ✅ Part of Thread's multi-backend storage abstraction
- ✅ API documentation in
docs/api/D1_INTEGRATION_API.md - ✅ Integration tests in
crates/flow/tests/incremental_d1_tests.rs - ✅ SQL migrations embedded in binary via
include_str!()fromcrates/flow/migrations/
Why D1 is core: D1 is SQLite-based storage that can be used in any environment (Cloudflare Workers, edge runtimes, embedded systems), not just Cloudflare-specific deployments.
Cloudflare Workers deployment materials are segregated in the gitignored crates/cloudflare/ directory:
- 🔒 Configuration:
config/wrangler.production.toml.example- Production Wrangler configuration - 📚 Documentation:
docs/EDGE_DEPLOYMENT.md- Comprehensive deployment guide (17KB) - 🚀 Scripts:
scripts/deploy.sh- Automated deployment automation (5.9KB) - 🏗️ Worker Implementation:
worker/- Complete Cloudflare Worker codebase
Access: The crates/cloudflare/ directory is gitignored (line 266 of .gitignore) to prevent accidental commits of proprietary deployment configurations and credentials.
Documentation: See crates/cloudflare/docs/README.md for complete inventory of deployment materials, workflows, secrets management, and troubleshooting guides.
- CLI Deployment (Postgres + Rayon):
docs/deployment/CLI_DEPLOYMENT.md - Edge Deployment (D1 + Cloudflare Workers):
crates/cloudflare/docs/EDGE_DEPLOYMENT.md(segregated) - D1 Backend API:
docs/api/D1_INTEGRATION_API.md(core library documentation)
# Build everything (except WASM)
mise run build
# or: cargo build --workspace
# Build in release mode
mise run build-release
# or: cargo build --workspace --release --features inline
# Build WASM for development
mise run build-wasm
# or: cargo run -p xtask build-wasm
# Build WASM in release mode
mise run build-wasm-release
# or: cargo run -p xtask build-wasm --release# Run all tests
mise run test
# or: hk run test
# or: cargo nextest run --all-features --no-fail-fast -j 1
# Full linting
mise run lint
# or: hk run check
# Auto-fix formatting and linting issues
mise run fix
# or: hk fix
# Run CI pipeline locally
mise run ci# Run specific test
cargo nextest run --manifest-path Cargo.toml test_name --all-features
# Run tests for specific crate
cargo nextest run -p thread-ast-engine --all-features
# Run benchmarks
cargo bench -p thread-rule-engine# Update dependencies
mise run update
# or: cargo update && cargo update --workspace
# Clean build artifacts
mise run clean
# Update license headers
mise run update-licenses
# or: ./scripts/update-licenses.pyThe thread-language crate provides built-in support for major programming languages via tree-sitter:
Tier 1 Languages (primary focus):
- Rust, JavaScript/TypeScript, Python, Go, Java
Tier 2 Languages (full support):
- C/C++, C#, PHP, Ruby, Swift, Kotlin, Scala
Tier 3 Languages (basic support):
- Bash, CSS, HTML, JSON, YAML, Lua, Elixir, Haskell
Thread's core strength is AST-based pattern matching using meta-variables:
$VAR- Captures a single AST node$$$ITEMS- Captures multiple consecutive nodes (ellipsis)$_- Matches any node without capturing
// Find function declarations
root.find("function $NAME($$$PARAMS) { $$$BODY }")
// Find variable assignments
root.find_all("let $VAR = $VALUE")
// Complex pattern matching
root.find("if ($COND) { $$$THEN } else { $$$ELSE }")The thread-rule-engine supports YAML-based rule definitions for code analysis:
id: no-var-declarations
message: "Use 'let' or 'const' instead of 'var'"
language: JavaScript
rule:
pattern: "var $NAME = $VALUE"
fix: "let $NAME = $VALUE"- SIMD optimizations in
thread-utilitiesfor fast string operations - Parallel processing capabilities with rayon
- Memory-efficient AST representation
- Content-addressable storage for deduplication
- dev: Fast compilation with basic optimizations
- dev-debug: Cranelift backend for faster debug builds
- release: Full LTO optimization
- wasm-release: Size-optimized for WebAssembly
Thread compiles to WebAssembly for edge deployment:
# Basic WASM build (for Cloudflare Workers)
cargo run -p xtask build-wasm
# Multi-threading WASM (for browsers)
cargo run -p xtask build-wasm --multi-threading
# Optimized release build
cargo run -p xtask build-wasm --release- Unit tests: In each crate's
src/directory - Integration tests: In
tests/directories - Benchmarks: In
benches/directories - Test data: In
test_data/directories
- cargo-nextest: Parallel test execution
- hk: Git hooks and linting orchestration
- mise: Development environment management
- typos: Spell checking
- reuse: License compliance
tree-sitter: AST parsing foundationregex: Pattern matching supportserde: Configuration serializationbit-set: Efficient set operationsrayon: Parallel processing
rapidhash: Fast non-cryptographic hashingmemchr: SIMD string searchingsimdeez: SIMD abstractions
- Run
mise run install-toolsto set up development environment - Make changes following existing patterns
- Run
mise run fixto apply formatting and linting - Run
mise run testto verify functionality - Use
mise run cito run full CI pipeline locally
All development MUST adhere to the Thread Constitution v2.0.0 (.specify/memory/constitution.md)
-
Service-Library Architecture (Principle I)
- Features MUST consider both library API design AND service deployment
- Libraries remain self-contained; services leverage ReCoco (via
thread-flow) for orchestration - Dual architecture is non-negotiable—both aspects are first-class citizens
-
Test-First Development (Principle III - NON-NEGOTIABLE)
- TDD mandatory: Tests → Approve → Fail → Implement
- All tests execute via
cargo nextest - No exceptions, no justifications accepted
-
Service Architecture & Persistence (Principle VI)
- Content-addressed caching MUST achieve >90% hit rate
- Storage targets: Postgres <10ms, D1 <50ms, Qdrant <100ms p95 latency
- Incremental updates MUST trigger only affected component re-analysis
Before any PR merge, verify:
- ✅
mise run lintpasses (zero warnings) - ✅
cargo nextest run --all-featurespasses (100% success) - ✅
mise run cicompletes successfully - ✅ Public APIs have rustdoc documentation
- ✅ Performance-sensitive changes include benchmarks
- ✅ Service features meet storage/cache/incremental requirements
- CLI Target: Test on Linux, macOS, Windows with Rayon parallelism
- Edge Target:
mise run build-wasm-releasesucceeds for Cloudflare Workers - Storage: Integration tests pass for all backends (Postgres, D1, Qdrant)
Review Process: All PRs MUST have constitution compliance verified by reviewers. Violations require explicit technical justification or rejection.
See .specify/memory/constitution.md for complete governance framework.
- Main codebase: AGPL-3.0-or-later
- Forked ast-grep components: AGPL-3.0-or-later AND MIT
- Documentation and config: MIT OR Apache-2.0
- See
VENDORED.mdfiles for specific attribution
The library provides multiple tools to help me AI assistants more efficient:
- MCP Tools:
- You always have access to
sequential-thinking. Use this to plan out tasks before executing and document things you learn along the way. Regularly refer back to it. context7provides a library of up-to-date code examples and API documentation for almost any library.
- You always have access to
- The
llm-edit.shscript:- Script in
scripts/llm-edit.shgives you an easy interface for providing multiple file edits in one go. Full details on how to use it are inscripts/README-llm-edit.md
- Script in
- When the user mentions "multi-file output", "generate files as json", or similar requests for bundled file generation, use the multi-file output system
- Execute using:
./llm-edit.sh <json_file> - Provide output as a single JSON object following the schema in
./README-llm-edit.md - The JSON must include an array of files, each with file_name, file_type, and file_content fields
- For binary files, encode content as base64 and set file_type to "binary"
- NEVER include explanatory text or markdown outside the JSON structure
- Rust (edition 2024, aligning with Thread's existing codebase) (001-realtime-code-graph)
- Multi-backend architecture with deployment-specific primaries: (001-realtime-code-graph)
- 001-realtime-code-graph: Added Rust (edition 2024, aligning with Thread's existing codebase)