Complete Sphinx documentation for the Agent-Tunix GRPO training framework. Comprehensive guides covering installation, configuration, training, evaluation, and advanced topics.
- Installation (
docs/getting_started/installation.rst) - System requirements, CUDA setup, installation methods, troubleshooting - Quick Start (
docs/getting_started/quick_start.rst) - First training run, basic configuration, evaluation examples - Configuration (
docs/getting_started/configuration.rst) - YAML-based configuration system, Hydra framework, override patterns
- Training (
docs/guide/training.rst) - GRPO training process, monitoring, memory optimization, distributed training, checkpointing - Evaluation (
docs/guide/evaluation.rst) - Model evaluation, metrics, inference strategies, checkpoint selection - Hyperparameter Tuning (
docs/guide/hyperparameter_tuning.rst) - Tuning strategies, parameter sweeps, monitoring, best practices - Experiments (
docs/guide/experiments.rst) - Creating and using experiment presets for reproducibility
- Overview (
docs/config/overview.rst) - Configuration system architecture, composition, Hydra concepts - Model Configuration (
docs/config/model.rst) - Available models (270M, 1B, 4B), LoRA settings, mesh shapes, memory requirements - Optimizer Configuration (
docs/config/optimizer.rst) - AdamW optimizer, learning rate, warmup, weight decay, tuning strategies - Training Configuration (
docs/config/training.rst) - Batch sizes, checkpointing, evaluation intervals, seed management
- Training API (
docs/api/train.rst) -train()function, configuration classes, GRPO algorithm details, example usage - Evaluation API (
docs/api/evaluate.rst) -evaluate()function, inference configs, metrics, checkpoint selection - Models API (
docs/api/models.rst) - Model architectures, LoRA configuration, distributed training setup - Data API (
docs/api/data.rst) - Data loading, preprocessing, tokenization, custom datasets - Rewards API (
docs/api/rewards.rst) - Reward functions, built-in rewards, custom reward design
- Distributed Training (
docs/advanced/distributed_training.rst) - Multi-GPU setup, FSDP, tensor parallelism, multi-node training - Custom Rewards (
docs/advanced/custom_rewards.rst) - Designing custom reward functions, reward shaping, curriculum learning, debugging - Troubleshooting (
docs/advanced/troubleshooting.rst) - Common issues and solutions, debugging tips, error diagnosis
- FAQ (
docs/references/faq.rst) - Common questions covering setup, configuration, training, evaluation, hyperparameter tuning - Glossary (
docs/references/glossary.rst) - 100+ terms including GRPO, LoRA, FSDP, tensor parallelism, optimization concepts
Using uv:
uv pip install -e ".[docs]"Or using pip:
pip install -e ".[docs]"The documentation dependencies are defined in pyproject.toml under [project.optional-dependencies.docs].
cd docs
make htmlOutput will be in docs/_build/html/.
# Open in browser
open docs/_build/html/index.html
# Or serve locally
cd docs
make serve
# Then visit http://localhost:8000make clean # Remove build artifacts
make pdf # Build PDF (requires LaTeX)
make epub # Build EPUB ebook
make man # Build man pages
make text # Build plain textSee docs/BUILD.md for complete build instructions.
- 22 documentation pages covering all aspects
- ~15,000+ lines of documentation
- 100+ code examples and command references
- Complete API documentation with practical examples
- Clear hierarchical structure (Getting Started → Guides → Advanced)
- Consistent formatting with reStructuredText
- Cross-references between related topics
- Quick navigation with toctree
- Real command-line examples for all features
- Configuration examples for different scenarios
- Troubleshooting with solutions
- Step-by-step workflows
- Detailed parameter documentation
- Configuration option explanations
- API function references
- Glossary with 100+ ML/training terms
- Hydra configuration framework
- YAML-based configuration structure
- Configuration composition and overrides
- Nested parameter overrides with dot notation
- Experiment presets for reproducibility
- GRPO algorithm explanation
- Training process overview
- LoRA (Low-Rank Adaptation) details
- Gradient clipping and optimization
- Checkpoint management
- Monitoring with Weights & Biases and TensorBoard
- Evaluation metrics (accuracy, partial accuracy, format accuracy)
- Multiple inference strategies (greedy, standard, liberal)
- Temperature and sampling parameters
- Checkpoint evaluation and comparison
- Multiple-pass evaluation for uncertainty
- Learning rate search strategies
- Batch size tuning
- Model capacity adjustment
- LoRA rank tuning
- Parameter sweep patterns
- Grid search and random search
- Data Parallelism (FSDP)
- Tensor Parallelism (TP)
- Hybrid parallelism strategies
- Multi-node setup
- Communication optimization
- Performance monitoring
- Custom reward function design
- Reward shaping techniques
- Curriculum learning
- Custom data loaders
- Multi-aspect evaluation
- Reward normalization
docs/index.rst- Main indexdocs/getting_started/- 3 filesdocs/guide/- 4 filesdocs/config/- 4 filesdocs/api/- 5 filesdocs/advanced/- 3 filesdocs/references/- 2 files
docs/conf.py- Sphinx configuration with autodoc, napoleon, RTD themedocs/Makefile- Build automation (HTML, PDF, EPUB, etc.)docs/BUILD.md- Build instructionsrequirements-docs.txt- Python dependencies
docs/_static/- Static assets (CSS, JavaScript, images)docs/_templates/- Custom Sphinx templatesdocs/_build/- Generated documentation (created by build)
- autodoc - Auto-extract from Python docstrings
- napoleon - Google-style docstring support
- intersphinx - Cross-reference external docs (Python, NumPy, JAX)
- viewcode - Link to source code from API docs
- sphinx_rtd_theme - Read the Docs theme for professional appearance
- Local HTML (
make html→_build/html/) - Read the Docs (automatic builds on GitHub push)
- Static hosting (GitHub Pages, etc.)
- Read Quick Start → Get first training running
- Read Configuration → Understand parameter system
- Read Training Guide → Learn training process
- Experiment with examples
- Read Configuration Overview → Understand system
- Read specific config guides (Model, Optimizer, Training)
- Review examples for different GPU/scale scenarios
- Use
--cfg jobto preview before training
- Search FAQ for common questions
- Check Troubleshooting guide for specific issues
- Review error messages with solution suggestions
- Check Glossary for terminology
- Read Distributed Training for multi-GPU setup
- Read Custom Rewards for reward function design
- Review API reference for implementation details
- Check examples and patterns section
- Memory-constrained setups (11GB GPU)
- Balanced setups (48GB GPU)
- High-performance setups (80GB GPU)
- Multi-GPU distributed training
- Quick test (10 steps)
- Single GPU training
- Ablation studies
- Production runs
- Latest checkpoint evaluation
- Specific checkpoint evaluation
- Multiple inference strategies
- Batch evaluation workflows
- Learning rate search
- Batch size tuning
- Model size comparison
- Parameter sweeps
- Edit
.rstfiles in appropriate directories - Run
make clean && make htmlto rebuild - Review in
_build/html/before committing - Commit changes to documentation files
- Create new
.rstfile in appropriate directory - Add entry to parent
index.rstor relevanttoctree - Build with
make html - Verify in browser
- Keep examples synchronized with actual configuration files
- Test command examples before documenting
- Update when configuration changes
- Total Documentation Pages: 22
- Total Lines: 15,000+
- Configuration Examples: 50+
- Command Examples: 100+
- API Reference Functions: 10+
- Glossary Terms: 100+
- Cross-References: 200+
- Install dependencies:
pip install -r requirements-docs.txt - Build HTML:
cd docs && make html - View documentation:
open docs/_build/html/index.html - Deploy: Push to Read the Docs or static hosting
For documentation issues:
- Check FAQ section in documentation
- Search Troubleshooting guide
- Review Glossary for terminology
- Check GitHub issues for known problems
Documentation is included with Agent-Tunix project license.