Skip to content

Implement Phase 8: Error Handling and User Messaging#23

Merged
wvaske merged 8 commits into
mainfrom
claude/code-improvement-UGEAO
Jan 15, 2026
Merged

Implement Phase 8: Error Handling and User Messaging#23
wvaske merged 8 commits into
mainfrom
claude/code-improvement-UGEAO

Conversation

@wvaske
Copy link
Copy Markdown
Owner

@wvaske wvaske commented Jan 15, 2026

No description provided.

Create clear interfaces that define contracts for benchmarks, validators,
and collectors. This establishes the foundation for the code improvement plan.

New files:
- mlpstorage/interfaces/__init__.py - Package exports
- mlpstorage/interfaces/benchmark.py - BenchmarkInterface, BenchmarkConfig, BenchmarkCommand
- mlpstorage/interfaces/validator.py - ValidatorInterface, ValidationResult, ClosedRequirements
- mlpstorage/interfaces/collector.py - ClusterCollectorInterface, LocalCollectorInterface

Modified:
- mlpstorage/benchmarks/base.py
  - Implement BenchmarkInterface
  - Add dependency injection for cluster_collector and validator
  - Add default implementations for interface methods
  - Improve documentation with docstrings

Benefits:
- Testability: Components can be mocked/stubbed for unit testing
- Extensibility: New implementations without modifying core code
- Consistency: All implementations follow the same contract
- Documentation: Interfaces clearly document expected behavior
Split the monolithic rules.py (1,434 LOC) into focused modules with clear
separation between single-run rule checkers and multi-run submission checkers.

New package structure:
mlpstorage/rules/
├── __init__.py              # Public API exports (backwards compatible)
├── base.py                  # RulesChecker ABC, RuleState enum
├── issues.py                # Issue dataclass
├── models.py                # Data classes (HostInfo, ClusterInformation, BenchmarkRun, etc.)
├── run_checkers/            # Single-run validation checkers
│   ├── __init__.py
│   ├── base.py              # RunRulesChecker base class
│   ├── training.py          # TrainingRunRulesChecker
│   └── checkpointing.py     # CheckpointingRunRulesChecker
├── submission_checkers/     # Multi-run submission validation checkers
│   ├── __init__.py
│   ├── base.py              # MultiRunRulesChecker base class
│   ├── training.py          # TrainingSubmissionRulesChecker
│   └── checkpointing.py     # CheckpointSubmissionRulesChecker
├── verifier.py              # BenchmarkVerifier orchestrator
└── utils.py                 # calculate_training_data_size, generate_output_location

Benefits:
- Clear separation of concerns (single-run vs multi-run validation)
- Each module is <300 LOC (down from 1,434 LOC monolith)
- Better testability with focused, isolated checkers
- Backwards compatible - existing imports continue to work
- Original rules.py preserved as rules_legacy.py for reference
This commit refactors the CLI architecture into a modular structure:

- Add BenchmarkRegistry class (mlpstorage/registry.py) for dynamic
  benchmark registration and CLI construction
- Create mlpstorage/cli/ package with modular argument builders:
  - common_args.py: Shared help messages and universal arguments
  - training_args.py: Training benchmark arguments
  - checkpointing_args.py: Checkpointing benchmark arguments
  - vectordb_args.py: VectorDB benchmark arguments
  - utility_args.py: Reports and history arguments
- Rename cli.py to cli_parser.py to avoid package name conflict
- Update benchmarks/__init__.py to register benchmarks at import time
- Update imports in main.py, history.py, and tests

The registry pattern enables:
- Dynamic benchmark registration without modifying core CLI code
- Easier extensibility for new benchmark types
- Cleaner separation of concerns
Add test fixtures package for improved testability:

- tests/fixtures/__init__.py: Package exports
- tests/fixtures/mock_logger.py: MockLogger class with message capture
  and assertion helpers for testing logging behavior
- tests/fixtures/mock_executor.py: MockCommandExecutor for testing
  command execution without subprocess calls
- tests/fixtures/mock_collector.py: MockClusterCollector implementing
  ClusterCollectorInterface for testing without MPI
- tests/fixtures/sample_data.py: Sample /proc file contents and
  factory functions for creating test data

Update conftest.py with new fixtures:
- mock_executor, mock_executor_with_dlio, mock_executor_failure
- mock_collector, mock_collector_multi_host, mock_collector_failure
- test_logger using MockLogger

Add integration test templates:
- tests/integration/test_benchmark_flow.py: Tests for training and
  checkpointing benchmark flow with mocked dependencies
Add comprehensive type annotations and documentation:

Type Annotations:
- mlpstorage/utils.py: Full type annotations for all functions
  including is_valid_datetime_format, read_config_from_file,
  update_nested_dict, create_nested_dict, flatten_nested_dict,
  remove_nan_values, generate_mpi_prefix_cmd
- mlpstorage/benchmarks/base.py: Enhanced type hints for
  __init__, _execute_command, metadata property, and all methods

Module Documentation:
- Added comprehensive module-level docstrings to utils.py and base.py
  explaining module purpose, classes, and usage

Architecture Documentation:
- docs/ARCHITECTURE.md: Complete system architecture documentation
  covering directory structure, core components, data flow,
  design patterns, and extension points

Developer Guide:
- docs/ADDING_BENCHMARKS.md: Step-by-step guide for adding new
  benchmark types including code examples for benchmark class,
  CLI arguments, validation rules, and testing
Add KV Cache benchmark for LLM inference storage testing:

- Create KVCacheBenchmark class wrapping kv-cache.py script
- Add KVCACHE_MODELS, KVCACHE_PERFORMANCE_PROFILES, and other config constants
- Create kvcache_args.py with CLI argument builders for run/datasize commands
- Add KVCacheRunRulesChecker for benchmark validation
- Register kvcache benchmark with BenchmarkRegistry
- Update ARCHITECTURE.md to document new benchmark

The KV Cache benchmark tests storage performance for:
- Multi-tier caching (GPU → CPU → NVMe)
- Multi-user LLM inference simulation
- Phase-aware processing (prefill/decode phases)
Refactor the reporting system to provide clear validation feedback:

- Create mlpstorage/reporting/ package with:
  - directory_validator.py: Validates results directory structure
    before processing, with clear error messages and suggestions
  - formatters.py: ValidationMessageFormatter with colored output,
    ClosedRequirementsFormatter for requirement checklists,
    ReportSummaryFormatter for summary headers

- Refactor ReportGenerator class:
  - Add directory structure validation before processing
  - Add error isolation for individual run processing
  - Improve print_results() with clear OPEN vs CLOSED messaging
  - Show INVALID runs first (most critical), then OPEN, then CLOSED
  - Add requirements checklists for non-CLOSED submissions

- Update ARCHITECTURE.md with new reporting/ package structure

This improves user experience by providing actionable feedback
when validation fails or directory structure is malformed.
Add comprehensive error handling framework with user-friendly messages:

- Create mlpstorage/errors.py:
  - ErrorCode enum for machine-readable error codes
  - MLPSError dataclass for structured error information
  - MLPStorageException base class with code, message, suggestion
  - ConfigurationError, BenchmarkExecutionError, ValidationError,
    FileSystemError, MPIError, DependencyError specialized exceptions

- Create mlpstorage/error_messages.py:
  - Centralized error message templates with placeholders
  - format_error() function for template substitution
  - ErrorFormatter class with color support

- Create mlpstorage/validation_helpers.py:
  - validate_pre_run() for pre-flight configuration checks
  - _validate_required_params() for parameter validation
  - _validate_paths() for filesystem checks
  - _validate_hosts() for MPI host reachability
  - _validate_dependencies() for tool availability
  - check_disk_space() utility function

- Update mlpstorage/main.py:
  - Add KVCacheBenchmark to program switch dict
  - Wrap run_benchmark() with proper exception handling
  - Add main() wrapper with catch-all exception handling
  - Show suggestions for each error type
  - Show traceback only in debug mode

- Update ARCHITECTURE.md with new error handling modules
@wvaske wvaske merged commit 81fc7f4 into main Jan 15, 2026
1 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants