An MCP server for Rocq (formerly Coq) proof development. It exposes compilation, verification, querying, and interactive tactic stepping as MCP tools, so that LLM agents can write and check Rocq proofs.
- Rocq / Coq --
coqcmust be on yourPATH(needed by all tools). If the workspace contains a_RocqProjector_CoqProjectfile, the server parses it for load-path flags (-Q,-R,-I). For dune projects (no_CoqProjectbut adune-projectfile present), the server auto-detects load paths viadune coq topand writes a_RocqProjectfile in the workspace so that coq-lsp also picks them up. This generated file stays in the workspace and should be added to.gitignore. Otherwise it defaults to-Q <workspace> Test. - pet (from coq-lsp) -- optional, needed only for the interactive tools (
rocq_query,rocq_assumptions,rocq_start,rocq_check,rocq_step_multi,rocq_toc,rocq_notations). Ifpetis not installed, the compile and verify tools still work. - Python 3.11+
Using uv:
# Install (includes pytanque for interactive tools)
uv pip install -e .For development (includes pytest):
uv pip install -e ".[dev]"The server exposes eleven MCP tools:
| Tool | Description |
|---|---|
rocq_compile |
Batch-compile Rocq source code via coqc. Best for checking a finished proof. On error, returns error positions for jumping to interactive mode. For iterative development, prefer rocq_check. |
rocq_compile_file |
Like rocq_compile but takes a file path instead of source string. More efficient for large files (avoids transmitting full source over MCP). Cleans up compilation artifacts but preserves the source file. |
rocq_verify |
Verify that a proof actually proves the original statement. Wraps in a Module M. sandbox to catch type redefinition, Admitted/Abort, custom axioms, and statement mismatches. Run after rocq_compile succeeds. |
| Tool | Description |
|---|---|
rocq_query |
Search the Rocq environment — find lemmas, check types, inspect definitions. Two context modes: preamble (import commands as a string) or file (a .v file path whose definitions are in scope). Optional max_results parameter limits output for broad searches. Does not modify any proof state. |
rocq_assumptions |
Check what axioms a theorem depends on. Takes a required file parameter (path to the .v file where the theorem is defined) to set up the full environment. Returns "closed" (no axioms), "standard_only" (classical logic, Reals, etc.), or "suspicious" (unproved assumptions). |
rocq_start |
Start an interactive proof session and return proof goals. Three modes: (1) by theorem name, (2) by position — jump to any point in a file to inspect proof goals there (e.g., error positions from rocq_compile), (3) from imports. Returns a state_id for use with rocq_check and rocq_step_multi. Optional force_restart=True kills the PET process and clears all cached state before starting (use when PET is in a bad state). |
rocq_check |
Run proof commands with cached imports — fast iterative checking. On error, returns last_valid_state_id for immediate recovery via rocq_check(from_state=...) or rocq_step_multi(from_state=...). Includes stale_warning if the source file was modified since session start. |
rocq_step_multi |
Try multiple tactics at once — find what works without guessing. Useful for auto-solving subgoals (pass standard automation tactics) or exploring proof structure. Does not advance the state; commit the winner with rocq_check. Max 20 tactics per call. |
rocq_toc |
Get the structure of a .v file: all definitions, lemmas, theorems, and sections as a hierarchical outline. Does not require an active session. |
rocq_notations |
List all notations in a Rocq statement and how they resolve (which scope, which module). Helps debug notation ambiguity (e.g., is + in nat_scope or Z_scope?). |
Stale file warning: Interactive sessions (
rocq_start/rocq_check/rocq_step_multi) read the.vfile at session start and do not track subsequent edits. If another process or agent modifies the file while a session is active, the proof state becomes stale and tactics may fail or produce wrong results. In multi-agent setups, work on a copy of the file for interactive proving, or restart the session withrocq_startafter edits. Astale_warningfield is returned when a file modification is detected.
| Variable | Default | Description |
|---|---|---|
ROCQ_WORKSPACE |
current directory | Working directory for Rocq compilation. When set, all workspace parameters are constrained to this directory or its subdirectories. |
ROCQ_COQC_TIMEOUT |
60 |
Timeout (seconds) for rocq_compile |
ROCQ_VERIFY_TIMEOUT |
120 |
Timeout (seconds) for rocq_verify |
ROCQ_PET_TIMEOUT |
30 |
Timeout (seconds) for pytanque-based tools |
ROCQ_COQC_BINARY |
coqc |
Path to the coqc binary |
ROCQ_MAX_SOURCE_SIZE |
1000000 |
Maximum source size in bytes |
The verification tool (rocq_verify) uses defense in depth with three verification phases and multiple security layers.
rocq_verify tries up to three phases in sequence, falling back to the next if the previous one times out:
-
Phase 1 -- Module M sandbox. The proof is wrapped inside
Module M. ... End M.. The theorem is re-stated outside and proved viaexact M.<name>. This is the strongest sandbox but can time out on compute-heavy proofs. -
Phase 2 -- Shared-defs template. For problems with Inductive/Record/Definition types, type definitions are placed outside Module M to avoid nominal typing mismatches, while the proof stays inside the sandbox. Uses pytanque's
tocto extract problem structure. Falls back from Phase 1 when type incompatibilities are detected. -
Phase 3 -- Direct verification. When Phase 1 or Phase 2 times out or fails, the proof is compiled standalone (no Module M) with the full original timeout budget. Correctness is verified by comparing
Check <name>.output against the problem statement's expected type after normalization. Additional security checks compensate for the lack of a sandbox (see below). This phase handles compute-heavy proofs that are too slow under Module M wrapping.
The Module M sandbox prevents:
- Type redefinition cheating -- Inductive/Record types are generative in Rocq, so redefining
natasboolinside Module M creates an incompatible type that cannot unify with the realnatoutside. - Axiom spoofing -- User-declared axioms receive an
M.prefix inPrint Assumptionsoutput, which the stdlib whitelist rejects. Admitted/Abortusage -- Caught byPrint Assumptions.- Module escape --
End M.andReset/Back/Undoare forbidden commands (see Layer 2).
Source code is scanned for dangerous commands after stripping comments. The comment scanner matches Rocq's lexer exactly, including string literal tracking inside comments (preventing desynchronization attacks like (* " (* " *) End M.). Comments are replaced with spaces to preserve word boundaries.
Forbidden commands:
| Category | Commands |
|---|---|
| Filesystem | Redirect, Extraction "...", Separate Extraction, Recursive Extraction, Extraction Library, Cd, Load |
| Code loading | Declare ML Module, Add LoadPath, Add Rec LoadPath, Add ML Path |
| Sandbox escape | End M., Reset, Back, Undo |
| Safety bypass | bypass_check, Unset Guard Checking, Unset Positivity Checking, Unset Universe Checking |
| Escape hatches | Drop (OCaml toplevel) |
After compilation, Print Assumptions is checked against a whitelist of standard library axioms (classical logic, functional extensionality, Reals axioms, etc.). Axioms with qualified names must have a recognized stdlib prefix (Coq.*, Rocq.*, Stdlib.*, or known module prefixes like ClassicalDedekindReals.*). The M. prefix on user-declared axioms ensures they are always rejected.
Printing flags (Set Printing All, Set Printing Universes, Set Printing Width) are reset after End M. to prevent corruption of Print Assumptions output format.
Without the Module M sandbox, Phase 3 applies additional checks to compensate:
- Forbidden commands -- Same scanning as Phases 1 & 2 (Layer 2).
- Incomplete proof rejection --
Admitted,admit, andgive_upin the proof source are rejected outright. - Axiom-introducing commands blocked --
Axiom,Parameter, andConjecturedeclarations are rejected. (VariableandHypothesisare allowed since they are section-local and become parameters afterEnd Section, not global axioms.) - Print Assumptions check -- Same axiom whitelist as Phases 1 & 2 (Layer 3). However, without the
M.prefix from Module M, user-declared axioms could potentially spoof whitelisted names. - Type comparison -- The proven type (via
Check @<name>.withSet Printing All) is normalized and compared to the expected type from the problem statement. Universe annotations are stripped before comparison.
Known limitations of Phase 3:
- Without Module M, type redefinition attacks are not caught (e.g., redefining
natasboolthen proving a trivially true statement). - Notation/scope redefinition before identically-texted definitions can change kernel semantics without being detected by type comparison.
- Stdlib function shadowing (redefining functions called by the problem's definition) is not covered.
The verification_method field in the result indicates which phase was used ("module_m", "shared_defs", or "direct").
Important: The problem_statement parameter is treated as a trusted anchor. The server verifies that the proof proves the given statement, but does NOT verify that the statement itself is the correct problem. Callers must ensure problem_statement comes from a trusted source (e.g., a file on disk), not from the LLM being evaluated.
All tools that accept file paths validate that resolved paths stay within the configured workspace directory (preventing path traversal attacks).
When _RocqProject or _CoqProject is present, the server parses it for coqc load-path flags (-Q, -R, -I). For dune projects (no project file but a dune-project file exists), the server runs dune coq top to auto-detect load paths and writes a _RocqProject in the workspace so that coq-lsp also picks them up. This generated file (marked with a # Auto-generated by rocq-mcp from dune header) stays in the workspace and should be added to .gitignore. Existing user-created project files are never overwritten. For safety:
-argallowlist -- Only known-safe flags are passed through (e.g.,-noinit,-w,-impredicative-set). Dangerous flags like-load-vernac-sourceare silently dropped.- Path containment -- For
_RocqProject/_CoqProject, directories in-Q/-R/-Imust resolve within the workspace. Absolute paths and../traversals outside the workspace are rejected. For dune-detected paths, containment is checked against the dune project root (the directory containingdune-project), since build artifacts typically live in_build/at the project root.
The server uses stdio transport:
rocq-mcpAdd to your MCP client configuration (e.g., Claude Desktop, Claude Code):
{
"mcpServers": {
"rocq-mcp": {
"command": "rocq-mcp",
"env": {
"ROCQ_WORKSPACE": "/path/to/your/rocq/project"
}
}
}
}uv run pytestTests for pytanque-based tools (rocq_query, rocq_assumptions, rocq_start, rocq_check, rocq_step_multi, rocq_toc, rocq_notations) require pet to be installed. Integration tests will be skipped automatically if it is not available.
src/rocq_mcp/
__init__.py Package init
server.py MCP server, 11 tool definitions, pet subprocess management
compile.py coqc-based tools: compile, compile_file, verify
interactive.py pytanque-based tools: start, check, step_multi, query, assumptions, toc, notations
verify.py Rocq lexer scanner, Module M. verification, Print Assumptions parsing
tests/
conftest.py Shared fixtures
test_compile.py Tests for rocq_compile
test_compile_file.py Tests for rocq_compile_file
test_verify.py Tests for rocq_verify
test_assumptions.py Tests for rocq_assumptions
test_auto_solve.py Tests for sentence utilities and step_multi auto-solving
test_server.py Tests for server helpers (_format_error, _parse_project_flags, etc.)
test_format_error.py Tests for error formatting
test_query.py Tests for rocq_query (requires pet)
test_start.py Tests for rocq_start (requires pet)
test_check.py Tests for rocq_check (requires pet)
test_step_multi.py Tests for rocq_step_multi
test_toc.py Tests for rocq_toc
test_notations.py Tests for rocq_notations
test_timeout.py Tests for timeout handling
test_integration.py Integration tests
Apache 2.0 -- see LICENSE for details.