You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Evaluate the Parquet reader path for the FLUX workload, after the optimizations tracked in #356 are in place, to determine whether Parquet can remain the data format for FLUX while still meeting the per-accelerator target throughput.
Dependency: This evaluation can only be carried out after #356 is implemented, since the persistent file handle and preserved row group cache are prerequisites for a meaningful Parquet performance assessment in FLUX.
Motivation
FLUX currently uses Parquet for sample storage. Before considering alternative on-disk formats, we need a clear, quantitative answer to a single question: can the Parquet reader path — once optimized as proposed in #356 (persistent file handle and preserved row group cache) — sustain the FLUX throughput target on representative storage backends?
A clean evaluation here will either confirm Parquet as the long-term format for FLUX or provide the evidence needed to motivate exploring alternatives.
Proposed methodology
Define a baseline FLUX dataset and a fixed accelerator/host configuration.
Run FLUX on the current Parquet reader as a baseline.
Summary
Evaluate the Parquet reader path for the FLUX workload, after the optimizations tracked in #356 are in place, to determine whether Parquet can remain the data format for FLUX while still meeting the per-accelerator target throughput.
Motivation
FLUX currently uses Parquet for sample storage. Before considering alternative on-disk formats, we need a clear, quantitative answer to a single question: can the Parquet reader path — once optimized as proposed in #356 (persistent file handle and preserved row group cache) — sustain the FLUX throughput target on representative storage backends?
A clean evaluation here will either confirm Parquet as the long-term format for FLUX or provide the evidence needed to motivate exploring alternatives.
Proposed methodology
Success criteria
Related