## Online Monitor RL-Insight Online Monitoring System https://github.com/verl-project/rl-insight/issues/46 - [ ] experimental feature @mengchengtang https://github.com/verl-project/rl-insight/pull/53 - [ ] fix: distinguish inference metrics by rank - [ ] develop guidance - [ ] high concurrency testing - [ ] application in verl - [ ] Basic migration & key function records - [ ] rollout trace records https://github.com/verl-project/rl-insight/issues/45 - [ ] time consumption and performance records on ckpt engine & tq https://github.com/verl-project/rl-insight/issues/44 - [ ] hardware metric monitor - [ ] valuable training metric in verl - [ ] gateway adaptation - [ ] Different collector backend support - [ ] BaseCollector abstration - [ ] Other high-concurrency backend supports @fightingzhen - [ ] Different Visualization Service-Oriented Platform support - [ ] mlflow ## Offline Feature ### Expert Load Visualization Based on Dump/Profiling Data https://github.com/verl-project/rl-insight/issues/25 ### [RFC] Memory Analysis Feature https://github.com/verl-project/rl-insight/issues/42 - [ ] Parse memory trace data collected by MSTX/torch - [ ] Add memory data checker - [x] Add memory parser https://github.com/verl-project/rl-insight/pull/54 - [ ] Refactor memory parser and msxt parser - [ ] Support memory visualization (generate a html file) - [ ] Supports plotting memory histograms (line charts) - [ ] Supports plotting memory Gantt charts - [ ] Supports displaying memory detail information
Online Monitor
RL-Insight Online Monitoring System #46
Offline Feature
Expert Load Visualization Based on Dump/Profiling Data #25
[RFC] Memory Analysis Feature #42