This file provides guidance for AI coding agents working on the nf-docs project.
nf-docs is a CLI tool that generates API documentation for Nextflow pipelines by querying the
Nextflow Language Server (LSP) and extracting information from nextflow_schema.json,
nextflow.config, meta.yml, and README.md. Output formats: HTML, JSON, YAML, Markdown.
pip install -e ".[dev]"pytest # all tests
pytest -v # verbose
pytest tests/test_models.py # single file
pytest tests/test_models.py::TestProcess # single class
pytest tests/test_models.py::TestProcess::test_creation # single test
pytest --cov=nf_docs --cov-report=term-missing # with coverageruff check src/ tests/ # lint
ruff check --fix src/ tests/ # lint + autofix
ruff format src/ tests/ # format
ty check src/ # type checking (NOT mypy)
pre-commit run --all-files # all hooks (ruff, ty, prettier, tailwind)nf-docs generate . -f html -o ./nf-docs/ # HTML output (also: json, yaml, markdown)
nf-docs generate . -f html --no-cache # skip cache
nf-docs generate . -f html -v # debug logging- Line length: 100 characters (enforced by ruff)
- Python version: 3.10+ (use native union types, builtin generics)
- Type hints: Required on all function signatures
- Docstrings: Required on all public modules, classes, and functions (Google style)
- Builtin generics:
list[str],dict[str, Any],tuple[int, str] - Union syntax:
X | None(neverOptional[X]) - Import
Anyfromtypingwhen needed - Use
TYPE_CHECKINGguard for circular imports
Sorted by ruff isort rules. Order: stdlib, third-party, local. Separate each group with a blank line:
import logging
from pathlib import Path
from typing import Any
from jinja2 import Environment
from nf_docs.models import Pipeline, Process- Modules:
snake_case(lsp_client.py,git_utils.py) - Classes:
PascalCase(PipelineExtractor,LSPClient) - Functions/methods:
snake_case(parse_hover_content,get_entry_workflow) - Private methods:
_prefix(_extract_from_lsp,_parse_symbol_name) - Constants:
UPPER_SNAKE_CASE(LANGUAGE_SERVER_JAR,NEXTFLOW_URL) - Tests:
TestXxxclasses,test_xxxmethods
Use @dataclass for all models (never pydantic/attrs). Every model should have a to_dict()
method:
@dataclass
class Example:
"""Brief description."""
name: str
description: str = ""
items: list[str] = field(default_factory=list)
def to_dict(self) -> dict[str, Any]:
"""Convert to dictionary representation."""
return {"name": self.name, "description": self.description, "items": self.items}Each module defines its own exception class (e.g., ExtractionError, LSPError,
SchemaParseError, ConfigParseError). Use specific exceptions, log recoverable issues as
warnings:
logger = logging.getLogger(__name__)
try:
result = some_operation()
except SpecificError as e:
logger.warning(f"Operation failed: {e}")
return NoneThe CLI layer catches domain exceptions and shows user-friendly messages via rich.
- Define
logger = logging.getLogger(__name__)at module level - Use f-strings in log calls:
logger.info(f"Parsed {count} parameters") - debug: Internal state, LSP communication, cache details
- info: User-relevant progress ("Parsed config with N parameters")
- warning: Recoverable issues ("Failed to parse schema")
- error: Failures that affect output
Tests use pytest with class-based organization. Shared fixtures in conftest.py include
sample_schema, sample_pipeline, sample_nextflow_file, etc.
class TestProcess:
"""Tests for Process model."""
def test_creation(self):
process = Process(name="FASTQC", docstring="QC tool")
assert process.name == "FASTQC"
def test_to_dict(self):
result = Process(name="TEST").to_dict()
assert result["name"] == "TEST"Use unittest.mock.patch to mock LSP and external calls:
with patch.object(PipelineExtractor, "_extract_from_lsp"):
extractor = PipelineExtractor(workspace_path=sample_pipeline)
pipeline = extractor.extract()Async test mode is auto (via asyncio_mode = "auto" in pyproject.toml).
src/nf_docs/
├── __init__.py # Version from importlib.metadata, re-exports models
├── cache.py # XDG-compliant caching with content hashing
├── cli.py # CLI entry point (rich-click)
├── config.py # User config from ~/.config/nf-docs/config.yaml
├── config_parser.py # Parse nextflow.config via `nextflow config -flat`
├── extractor.py # Main extraction orchestration
├── generation_info.py # Generation metadata (version, timestamp)
├── git_utils.py # Git info for source URLs
├── lsp_client.py # JSON-RPC over stdio to Nextflow Language Server
├── meta_parser.py # nf-core meta.yml parser
├── models.py # Data models (all @dataclass)
├── nextflow_env.py # Isolated NXF_HOME for clean Nextflow runs
├── nf_parser.py # Parse process/workflow hover content from LSP
├── progress.py # Callback-based progress reporting
├── schema_parser.py # nextflow_schema.json parser
├── tailwind.py # Tailwind CSS build utility + CLI
├── renderers/
│ ├── __init__.py # get_renderer() factory
│ ├── base.py # BaseRenderer ABC
│ ├── html.py # HTML renderer (Jinja2 + Tailwind)
│ ├── json.py # JSON renderer
│ ├── markdown.py # Markdown renderer (multi-file)
│ └── yaml.py # YAML renderer
└── templates/
├── html.html # Jinja2 HTML template
└── tailwind.css # Pre-built Tailwind CSS
- LSP Integration: JSON-RPC over stdio. Symbol names are prefixed (
"process FASTQC","workflow MAIN"). Hover results have a signature block then---then Groovydoc. Workspace must be indexed viadidChangeConfigurationbefore queries work. - Caching: Stored in
~/.cache/nf-docs/, keyed by pipeline path, package version, and SHA256 content hash of all.nf+ config + schema + README files. - Rendering: Strategy pattern via
BaseRendererABC withrender(),render_to_file(),render_to_directory(). Factory functionget_renderer(format)returns the appropriate renderer. HTML uses a single Jinja2 template with inline CSS (Tailwind) and JavaScript for search.