Skip to content

[misc] fix: harden parser validation and stabilize cross-platform test behavior#56

Merged
tardis-key merged 10 commits into
verl-project:mainfrom
FightingZhen:offline_code_optim
May 19, 2026
Merged

[misc] fix: harden parser validation and stabilize cross-platform test behavior#56
tardis-key merged 10 commits into
verl-project:mainfrom
FightingZhen:offline_code_optim

Conversation

@FightingZhen
Copy link
Copy Markdown
Collaborator

@FightingZhen FightingZhen commented May 16, 2026

Summary

This PR improves path handling, parser robustness, and test coverage across data validation and profiling parsers. It also updates the quickstart profiling link to the current VeRL docs path.

Changes

  • rl_insight/data/rules.py: added shared path coercion, accepted Path inputs, and improved MSTX/NVTX validation errors for missing paths, empty files, and JSON parse failures.
  • rl_insight/parser/gmm_parser.py: normalized path parsing for rank/step extraction so GMM parsing works on Windows-style paths too.
  • rl_insight/parser/mstx_parser.py: extracted the legacy timestamp key helper and kept directory sorting stable.
  • rl_insight/parser/parser.py: tracked failed ranks during parallel parsing and added clearer logging when all ranks fail.
  • docs/overview/RL_Timeline_quickstart.md: fixed the NPU profiling tutorial link.
  • tests/data/test_paths.py, tests/data/test_data_checker.py, tests/data/test_rules.py: centralized shared test paths.
  • tests/doc/test_docs_urls.py: made URL checks more resilient by falling back to GET for blocked HEAD responses.
  • tests/parser/test_cluster_analysis.py: added coverage for legacy underscore-based MSTX directory ordering.
  • tests/parser/test_gmm_parser.py: added a cross-platform path parsing test for GMM inputs.

Tests

  • Added/updated unit tests for path coercion, parser path extraction, MSTX ordering, and docs URL validation.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces cross-platform path handling for GMM parsing, improves error reporting for parallel processing, and transitions the GMM heatmap visualization from Matplotlib to Pillow. While the Pillow implementation addresses stability, feedback highlights that it currently lacks essential visualization elements such as axes labels, titles, and colorbars, and uses fixed cell dimensions that may cause memory issues with large traces. Additionally, a suggestion was made to reduce code duplication in the GMM parser by reusing helper methods for step extraction.

Comment thread rl_insight/visualizer/gmm_visualizer.py Outdated
Comment thread rl_insight/visualizer/gmm_visualizer.py Outdated
Comment thread rl_insight/visualizer/gmm_visualizer.py Outdated
Comment thread rl_insight/parser/gmm_parser.py Outdated
@FightingZhen
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances cross-platform path handling, improves error reporting for missing or malformed profiling data, and replaces matplotlib with PIL in the GmmVisualizer to better manage large heatmap outputs. Test infrastructure was also updated with centralized paths and improved URL validation. Reviewer feedback identifies an opportunity to remove redundant exception handling in path coercion and suggests dynamically calculating text buffer sizes in the visualizer to avoid potential clipping issues.

Comment thread rl_insight/data/rules.py Outdated
Comment thread rl_insight/visualizer/gmm_visualizer.py Outdated
@FightingZhen
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances path handling, validation rules, and cross-platform compatibility across the rl_insight package. Key improvements include refactoring validation rules to support both string and Path objects, adding robust error tracking for parallel processing, and standardizing path extraction in the GMM parser. Test infrastructure was also reorganized for better maintainability. Feedback highlights a need to refine timestamp sorting for multi-day logs in the MSTX parser and to use context managers for streamed HTTP requests in documentation tests to avoid potential resource leaks.

Comment thread rl_insight/parser/mstx_parser.py
Comment thread tests/doc/test_docs_urls.py
@FightingZhen
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves path handling robustness and cross-platform compatibility across the codebase. Key changes include the introduction of path coercion and normalization helpers in the validation rules and GMM parser, enhanced error reporting for parallel processing, and more resilient URL validation in tests. Feedback suggests expanding the use of the new path coercion helper to remaining validation rules and adopting more idiomatic pathlib methods, such as using .glob(), .stat().st_size, and .as_posix() for file operations and path formatting.

Comment thread rl_insight/data/rules.py
Comment thread rl_insight/data/rules.py Outdated
Comment thread rl_insight/data/rules.py Outdated
Comment thread rl_insight/parser/gmm_parser.py
@tardis-key
Copy link
Copy Markdown
Collaborator

It seems the tests/doc/test_docs_urls.py failed to detect https://github.com/verl-project/verl/blob/main/docs/ascend_tutorial/profiling/ascend_profiling_zh.rst is invalid.

Copy link
Copy Markdown
Collaborator

@tardis-key tardis-key left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Please pass all checks before merging

@FightingZhen FightingZhen changed the title fix: harden parser validation and stabilize cross-platform test behavior [misc] fix: harden parser validation and stabilize cross-platform test behavior May 18, 2026
@FightingZhen
Copy link
Copy Markdown
Collaborator Author

/genimi review

@FightingZhen
Copy link
Copy Markdown
Collaborator Author

PR is ready for review @tardis-key

@tardis-key tardis-key merged commit 70494be into verl-project:main May 19, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants