CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Thread is a service-library dual architecture for safe, fast, flexible code analysis and parsing built in Rust. It operates as both:

Reusable Library Ecosystem - Modular crates (ast-engine, language, rule-engine) for AST-based pattern matching and transformation using tree-sitter parsers
Persistent Service Platform - Long-lived service with incremental intelligence, content-addressed caching, and real-time code analysis

The project is forked from ast-grep and enhanced with ReCoco (Rust-only fork of CocoIndex) dataflow framework for production use as a code analysis engine for AI context generation. Thread's thread-flow crate serves as the implementation layer for ReCoco capabilities.

Key Differentiators:

✅ Content-Addressed Caching: 50x+ performance gains on repeated analysis via automatic incremental updates
✅ Dual Deployment: Single codebase compiles to both CLI (Rayon parallelism) and Edge (tokio async, Cloudflare Workers)
✅ Persistent Storage: Native integration with Postgres (local), D1 (edge), Qdrant (vectors)
✅ Dataflow Orchestration: Declarative pipelines for ETL and dependency tracking via ReCoco (implemented in thread-flow)

Architecture

Thread follows a service-library dual architecture (Constitution v2.0.0, Principle I) with seven main crates plus service layer:

Library Core (Reusable Components)

thread-ast-engine - Core AST parsing, pattern matching, and transformation engine (forked from ast-grep-core)
thread-language - Language definitions and tree-sitter parser integrations (supports 20+ languages)
thread-rule-engine - Rule-based scanning and transformation system with YAML configuration support
thread-flow - Dataflow orchestration layer implementing ReCoco (Rust-only fork of CocoIndex) with heavy feature-gating for modular builds
thread-utilities - Shared utilities including SIMD optimizations and hash functions
thread-wasm - WebAssembly bindings for browser and edge deployment

Service Layer (Orchestration & Persistence)

thread-services - High-level service interfaces, API abstractions, and ReCoco integration
ReCoco Dataflow - Content-addressed caching, incremental ETL, and dependency tracking via thread-flow crate (public Rust-only dependency with heavy feature-gating per Constitution v2.0.0, Principle IV)
Storage Backends:
- Postgres (local CLI) - Persistent caching and analysis results
- D1 (Cloudflare Edge) - Distributed caching across CDN nodes
- Qdrant (optional) - Vector similarity search for semantic analysis
Concurrency Models:
- Rayon (CLI) - CPU-bound parallelism for local multi-core utilization
- tokio (Edge) - Async I/O for horizontal scaling and Cloudflare Workers

Build System

xtask - Custom build tasks, primarily for WASM compilation with optimization

ReCoco Integration

ReCoco is a public Rust-only fork of the CocoIndex dataflow framework, maintained as a separate open-source crate with heavy feature-gating for modular builds.

Development Control: While ReCoco lives in a separate repository, Thread maintainers have full control over ReCoco development. If Thread requires changes, new features, or bug fixes in ReCoco, these can be implemented directly without waiting on external maintainers.

Architectural Relationship

ReCoco (external dependency) - Public Rust crate providing dataflow abstractions, content-addressed caching primitives, and incremental computation framework
thread-flow (Thread crate) - Thread's implementation layer for ReCoco, providing:
- Storage backend integrations (Postgres, D1, Qdrant)
- Thread-specific dataflow pipelines and ETL operations
- Incremental analysis orchestration for code graph updates

Feature Gating

ReCoco uses Cargo feature flags to enable modular builds:

Core dataflow primitives available without features
Storage backends gated behind postgres, d1, qdrant features
Thread selectively enables only required features via thread-flow

Dependency Chain

thread-services → thread-flow → recoco (public crate)
                              ↓
                    [Postgres | D1 | Qdrant]

Deployment Architecture Separation

Thread maintains a clear separation between core library functionality and deployment-specific machinery:

Core Library (Open Source)

The D1 storage backend is a first-class library feature in crates/flow/src/incremental/backends/d1.rs:

✅ Part of Thread's multi-backend storage abstraction
✅ API documentation in docs/api/D1_INTEGRATION_API.md
✅ Integration tests in crates/flow/tests/incremental_d1_tests.rs
✅ SQL migrations embedded in binary via include_str!() from crates/flow/migrations/

Why D1 is core: D1 is SQLite-based storage that can be used in any environment (Cloudflare Workers, edge runtimes, embedded systems), not just Cloudflare-specific deployments.

Deployment Machinery (Segregated)

Cloudflare Workers deployment materials are segregated in the gitignored crates/cloudflare/ directory:

🔒 Configuration: config/wrangler.production.toml.example - Production Wrangler configuration
📚 Documentation: docs/EDGE_DEPLOYMENT.md - Comprehensive deployment guide (17KB)
🚀 Scripts: scripts/deploy.sh - Automated deployment automation (5.9KB)
🏗️ Worker Implementation: worker/ - Complete Cloudflare Worker codebase

Access: The crates/cloudflare/ directory is gitignored (line 266 of .gitignore) to prevent accidental commits of proprietary deployment configurations and credentials.

Documentation: See crates/cloudflare/docs/README.md for complete inventory of deployment materials, workflows, secrets management, and troubleshooting guides.

Deployment Documentation

CLI Deployment (Postgres + Rayon): docs/deployment/CLI_DEPLOYMENT.md
Edge Deployment (D1 + Cloudflare Workers): crates/cloudflare/docs/EDGE_DEPLOYMENT.md (segregated)
D1 Backend API: docs/api/D1_INTEGRATION_API.md (core library documentation)

Development Commands

Building

# Build everything (except WASM)
mise run build
# or: cargo build --workspace

# Build in release mode
mise run build-release
# or: cargo build --workspace --release --features inline

# Build WASM for development
mise run build-wasm
# or: cargo run -p xtask build-wasm

# Build WASM in release mode
mise run build-wasm-release
# or: cargo run -p xtask build-wasm --release

Testing and Quality

# Run all tests
mise run test
# or: hk run test
# or: cargo nextest run --all-features --no-fail-fast -j 1

# Full linting
mise run lint
# or: hk run check

# Auto-fix formatting and linting issues
mise run fix
# or: hk fix

# Run CI pipeline locally
mise run ci

Single Test Execution

# Run specific test
cargo nextest run --manifest-path Cargo.toml test_name --all-features

# Run tests for specific crate
cargo nextest run -p thread-ast-engine --all-features

# Run benchmarks
cargo bench -p thread-rule-engine

Utility Commands

# Update dependencies
mise run update
# or: cargo update && cargo update --workspace

# Clean build artifacts
mise run clean

# Update license headers
mise run update-licenses
# or: ./scripts/update-licenses.py

Key Language Support

The thread-language crate provides built-in support for major programming languages via tree-sitter:

Tier 1 Languages (primary focus):

Rust, JavaScript/TypeScript, Python, Go, Java

Tier 2 Languages (full support):

C/C++, C#, PHP, Ruby, Swift, Kotlin, Scala

Tier 3 Languages (basic support):

Bash, CSS, HTML, JSON, YAML, Lua, Elixir, Haskell

Pattern Matching System

Thread's core strength is AST-based pattern matching using meta-variables:

Meta-Variable Syntax

$VAR - Captures a single AST node
$$$ITEMS - Captures multiple consecutive nodes (ellipsis)
$_ - Matches any node without capturing

Example Usage

// Find function declarations
root.find("function $NAME($$$PARAMS) { $$$BODY }")

// Find variable assignments
root.find_all("let $VAR = $VALUE")

// Complex pattern matching
root.find("if ($COND) { $$$THEN } else { $$$ELSE }")

Rule System

The thread-rule-engine supports YAML-based rule definitions for code analysis:

id: no-var-declarations
message: "Use 'let' or 'const' instead of 'var'"
language: JavaScript
rule:
  pattern: "var $NAME = $VALUE"
fix: "let $NAME = $VALUE"

Performance Considerations

Optimization Features

SIMD optimizations in thread-utilities for fast string operations
Parallel processing capabilities with rayon
Memory-efficient AST representation
Content-addressable storage for deduplication

Build Profiles

dev: Fast compilation with basic optimizations
dev-debug: Cranelift backend for faster debug builds
release: Full LTO optimization
wasm-release: Size-optimized for WebAssembly

WASM Deployment

Thread compiles to WebAssembly for edge deployment:

# Basic WASM build (for Cloudflare Workers)
cargo run -p xtask build-wasm

# Multi-threading WASM (for browsers)
cargo run -p xtask build-wasm --multi-threading

# Optimized release build
cargo run -p xtask build-wasm --release

Testing Infrastructure

Test Organization

Unit tests: In each crate's src/ directory
Integration tests: In tests/ directories
Benchmarks: In benches/ directories
Test data: In test_data/ directories

Quality Tooling

cargo-nextest: Parallel test execution
hk: Git hooks and linting orchestration
mise: Development environment management
typos: Spell checking
reuse: License compliance

Dependencies

Core Dependencies

tree-sitter: AST parsing foundation
regex: Pattern matching support
serde: Configuration serialization
bit-set: Efficient set operations
rayon: Parallel processing

Performance Dependencies

rapidhash: Fast non-cryptographic hashing
memchr: SIMD string searching
simdeez: SIMD abstractions

Contributing Workflow

Run mise run install-tools to set up development environment
Make changes following existing patterns
Run mise run fix to apply formatting and linting
Run mise run test to verify functionality
Use mise run ci to run full CI pipeline locally

Constitutional Compliance

All development MUST adhere to the Thread Constitution v2.0.0 (.specify/memory/constitution.md)

Core Governance Principles

Service-Library Architecture (Principle I)
- Features MUST consider both library API design AND service deployment
- Libraries remain self-contained; services leverage ReCoco (via thread-flow) for orchestration
- Dual architecture is non-negotiable—both aspects are first-class citizens
Test-First Development (Principle III - NON-NEGOTIABLE)
- TDD mandatory: Tests → Approve → Fail → Implement
- All tests execute via cargo nextest
- No exceptions, no justifications accepted
Service Architecture & Persistence (Principle VI)
- Content-addressed caching MUST achieve >90% hit rate
- Storage targets: Postgres <10ms, D1 <50ms, Qdrant <100ms p95 latency
- Incremental updates MUST trigger only affected component re-analysis

Quality Gates (Constitutional Requirements)

Before any PR merge, verify:

✅ mise run lint passes (zero warnings)
✅ cargo nextest run --all-features passes (100% success)
✅ mise run ci completes successfully
✅ Public APIs have rustdoc documentation
✅ Performance-sensitive changes include benchmarks
✅ Service features meet storage/cache/incremental requirements

Deployment Validation

CLI Target: Test on Linux, macOS, Windows with Rayon parallelism
Edge Target: mise run build-wasm-release succeeds for Cloudflare Workers
Storage: Integration tests pass for all backends (Postgres, D1, Qdrant)

Review Process: All PRs MUST have constitution compliance verified by reviewers. Violations require explicit technical justification or rejection.

See .specify/memory/constitution.md for complete governance framework.

License Structure

Main codebase: AGPL-3.0-or-later
Forked ast-grep components: AGPL-3.0-or-later AND MIT
Documentation and config: MIT OR Apache-2.0
See VENDORED.md files for specific attribution

Tools for AI Assistants

The library provides multiple tools to help me AI assistants more efficient:

MCP Tools:
- You always have access to sequential-thinking. Use this to plan out tasks before executing and document things you learn along the way. Regularly refer back to it.
- context7 provides a library of up-to-date code examples and API documentation for almost any library.
The llm-edit.sh script:
- Script in scripts/llm-edit.sh gives you an easy interface for providing multiple file edits in one go. Full details on how to use it are in scripts/README-llm-edit.md

Multi-File Output System (llm-edit)

When the user mentions "multi-file output", "generate files as json", or similar requests for bundled file generation, use the multi-file output system
Execute using: ./llm-edit.sh <json_file>
Provide output as a single JSON object following the schema in ./README-llm-edit.md
The JSON must include an array of files, each with file_name, file_type, and file_content fields
For binary files, encode content as base64 and set file_type to "binary"
NEVER include explanatory text or markdown outside the JSON structure

Active Technologies

Rust (edition 2024, aligning with Thread's existing codebase) (001-realtime-code-graph)
Multi-backend architecture with deployment-specific primaries: (001-realtime-code-graph)

Recent Changes

001-realtime-code-graph: Added Rust (edition 2024, aligning with Thread's existing codebase)

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Architecture

Library Core (Reusable Components)

Service Layer (Orchestration & Persistence)

Build System

ReCoco Integration

Architectural Relationship

Feature Gating

Dependency Chain

Deployment Architecture Separation

Core Library (Open Source)

Deployment Machinery (Segregated)

Deployment Documentation

Development Commands

Building

Testing and Quality

Single Test Execution

Utility Commands

Key Language Support

Pattern Matching System

Meta-Variable Syntax

Example Usage

Rule System

Performance Considerations

Optimization Features

Build Profiles

WASM Deployment

Testing Infrastructure

Test Organization

Quality Tooling

Dependencies

Core Dependencies

Performance Dependencies

Contributing Workflow

Constitutional Compliance

Core Governance Principles

Quality Gates (Constitutional Requirements)

Deployment Validation

License Structure

Tools for AI Assistants

Multi-File Output System (llm-edit)

Active Technologies

Recent Changes