Implement Phase 8: Error Handling and User Messaging#23
Merged
Conversation
Create clear interfaces that define contracts for benchmarks, validators, and collectors. This establishes the foundation for the code improvement plan. New files: - mlpstorage/interfaces/__init__.py - Package exports - mlpstorage/interfaces/benchmark.py - BenchmarkInterface, BenchmarkConfig, BenchmarkCommand - mlpstorage/interfaces/validator.py - ValidatorInterface, ValidationResult, ClosedRequirements - mlpstorage/interfaces/collector.py - ClusterCollectorInterface, LocalCollectorInterface Modified: - mlpstorage/benchmarks/base.py - Implement BenchmarkInterface - Add dependency injection for cluster_collector and validator - Add default implementations for interface methods - Improve documentation with docstrings Benefits: - Testability: Components can be mocked/stubbed for unit testing - Extensibility: New implementations without modifying core code - Consistency: All implementations follow the same contract - Documentation: Interfaces clearly document expected behavior
Split the monolithic rules.py (1,434 LOC) into focused modules with clear separation between single-run rule checkers and multi-run submission checkers. New package structure: mlpstorage/rules/ ├── __init__.py # Public API exports (backwards compatible) ├── base.py # RulesChecker ABC, RuleState enum ├── issues.py # Issue dataclass ├── models.py # Data classes (HostInfo, ClusterInformation, BenchmarkRun, etc.) ├── run_checkers/ # Single-run validation checkers │ ├── __init__.py │ ├── base.py # RunRulesChecker base class │ ├── training.py # TrainingRunRulesChecker │ └── checkpointing.py # CheckpointingRunRulesChecker ├── submission_checkers/ # Multi-run submission validation checkers │ ├── __init__.py │ ├── base.py # MultiRunRulesChecker base class │ ├── training.py # TrainingSubmissionRulesChecker │ └── checkpointing.py # CheckpointSubmissionRulesChecker ├── verifier.py # BenchmarkVerifier orchestrator └── utils.py # calculate_training_data_size, generate_output_location Benefits: - Clear separation of concerns (single-run vs multi-run validation) - Each module is <300 LOC (down from 1,434 LOC monolith) - Better testability with focused, isolated checkers - Backwards compatible - existing imports continue to work - Original rules.py preserved as rules_legacy.py for reference
This commit refactors the CLI architecture into a modular structure: - Add BenchmarkRegistry class (mlpstorage/registry.py) for dynamic benchmark registration and CLI construction - Create mlpstorage/cli/ package with modular argument builders: - common_args.py: Shared help messages and universal arguments - training_args.py: Training benchmark arguments - checkpointing_args.py: Checkpointing benchmark arguments - vectordb_args.py: VectorDB benchmark arguments - utility_args.py: Reports and history arguments - Rename cli.py to cli_parser.py to avoid package name conflict - Update benchmarks/__init__.py to register benchmarks at import time - Update imports in main.py, history.py, and tests The registry pattern enables: - Dynamic benchmark registration without modifying core CLI code - Easier extensibility for new benchmark types - Cleaner separation of concerns
Add test fixtures package for improved testability: - tests/fixtures/__init__.py: Package exports - tests/fixtures/mock_logger.py: MockLogger class with message capture and assertion helpers for testing logging behavior - tests/fixtures/mock_executor.py: MockCommandExecutor for testing command execution without subprocess calls - tests/fixtures/mock_collector.py: MockClusterCollector implementing ClusterCollectorInterface for testing without MPI - tests/fixtures/sample_data.py: Sample /proc file contents and factory functions for creating test data Update conftest.py with new fixtures: - mock_executor, mock_executor_with_dlio, mock_executor_failure - mock_collector, mock_collector_multi_host, mock_collector_failure - test_logger using MockLogger Add integration test templates: - tests/integration/test_benchmark_flow.py: Tests for training and checkpointing benchmark flow with mocked dependencies
Add comprehensive type annotations and documentation: Type Annotations: - mlpstorage/utils.py: Full type annotations for all functions including is_valid_datetime_format, read_config_from_file, update_nested_dict, create_nested_dict, flatten_nested_dict, remove_nan_values, generate_mpi_prefix_cmd - mlpstorage/benchmarks/base.py: Enhanced type hints for __init__, _execute_command, metadata property, and all methods Module Documentation: - Added comprehensive module-level docstrings to utils.py and base.py explaining module purpose, classes, and usage Architecture Documentation: - docs/ARCHITECTURE.md: Complete system architecture documentation covering directory structure, core components, data flow, design patterns, and extension points Developer Guide: - docs/ADDING_BENCHMARKS.md: Step-by-step guide for adding new benchmark types including code examples for benchmark class, CLI arguments, validation rules, and testing
Add KV Cache benchmark for LLM inference storage testing: - Create KVCacheBenchmark class wrapping kv-cache.py script - Add KVCACHE_MODELS, KVCACHE_PERFORMANCE_PROFILES, and other config constants - Create kvcache_args.py with CLI argument builders for run/datasize commands - Add KVCacheRunRulesChecker for benchmark validation - Register kvcache benchmark with BenchmarkRegistry - Update ARCHITECTURE.md to document new benchmark The KV Cache benchmark tests storage performance for: - Multi-tier caching (GPU → CPU → NVMe) - Multi-user LLM inference simulation - Phase-aware processing (prefill/decode phases)
Refactor the reporting system to provide clear validation feedback:
- Create mlpstorage/reporting/ package with:
- directory_validator.py: Validates results directory structure
before processing, with clear error messages and suggestions
- formatters.py: ValidationMessageFormatter with colored output,
ClosedRequirementsFormatter for requirement checklists,
ReportSummaryFormatter for summary headers
- Refactor ReportGenerator class:
- Add directory structure validation before processing
- Add error isolation for individual run processing
- Improve print_results() with clear OPEN vs CLOSED messaging
- Show INVALID runs first (most critical), then OPEN, then CLOSED
- Add requirements checklists for non-CLOSED submissions
- Update ARCHITECTURE.md with new reporting/ package structure
This improves user experience by providing actionable feedback
when validation fails or directory structure is malformed.
Add comprehensive error handling framework with user-friendly messages:
- Create mlpstorage/errors.py:
- ErrorCode enum for machine-readable error codes
- MLPSError dataclass for structured error information
- MLPStorageException base class with code, message, suggestion
- ConfigurationError, BenchmarkExecutionError, ValidationError,
FileSystemError, MPIError, DependencyError specialized exceptions
- Create mlpstorage/error_messages.py:
- Centralized error message templates with placeholders
- format_error() function for template substitution
- ErrorFormatter class with color support
- Create mlpstorage/validation_helpers.py:
- validate_pre_run() for pre-flight configuration checks
- _validate_required_params() for parameter validation
- _validate_paths() for filesystem checks
- _validate_hosts() for MPI host reachability
- _validate_dependencies() for tool availability
- check_disk_space() utility function
- Update mlpstorage/main.py:
- Add KVCacheBenchmark to program switch dict
- Wrap run_benchmark() with proper exception handling
- Add main() wrapper with catch-all exception handling
- Show suggestions for each error type
- Show traceback only in debug mode
- Update ARCHITECTURE.md with new error handling modules
wvaske
added a commit
that referenced
this pull request
Feb 26, 2026
wvaske
added a commit
that referenced
this pull request
Feb 26, 2026
wvaske
added a commit
that referenced
this pull request
Mar 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.