You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
derive MultiJsonData, a universal data format shared by nvtx/mstx/torch_profile. It consists of json/json/gz from different directories, where the timeline data contains role/rank/start_time/end_time...
cross-process data collection capability. Due to limitations in the config scope of DistProfiler, it is necessary to use complex parameter passing to obtain configurations for different scenarios, which typically results in long control-chains and usage restrictions. A global instance for cross-process data sharing can be used to achieve more flexible data collection.
Metric
quantify some key metrics to evaluate training efficiency.
Core
Data
Offline pipeline
[mstx, torch_profile] feat: Refactor main pipeline to pipeline_runner #16
Online pipeline
CollectorController
Metric