Checkpointing Benchmark: Invalid Submission Due to Incorrect Operation Count

## Overview

When running the MLPStorage checkpointing benchmark with separate write and read phases, the report generator incorrectly counts the total number of checkpoint operations, resulting in an `INVALID` submission.

## Test Script

```bash
export TEST_TIME=05082006

# Phase 1: Write 10 checkpoints (read disabled)
~/storage/mlpstorage checkpointing run --model llama3-8b \
  --closed \
  --hosts localhost \
  --num-processes 8 \
  --num-checkpoints-read 0 \
  --checkpoint-folder /mnt/alluxio/ckpts/8acc-${TEST_TIME} \
  --results-dir ~/mlps_results \
  --client-host-memory-in-gb 768 --file

# Clear page cache between phases
sudo sync
sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"

sleep 10

# Phase 2: Read 10 checkpoints (write disabled)
~/storage/mlpstorage checkpointing run --model llama3-8b \
  --closed \
  --hosts localhost \
  --num-processes 8 \
  --num-checkpoints-write 0 \
  --checkpoint-folder /mnt/alluxio/ckpts/8acc-${TEST_TIME} \
  --results-dir ~/mlps_results \
  --client-host-memory-in-gb 768 --file
```

### Intent

- Run **10 checkpoint writes** in the first phase (with `--num-checkpoints-read 0` to skip reads).
- Clear the OS page cache to ensure cold-cache read performance.
- Run **10 checkpoint reads** in the second phase (with `--num-checkpoints-write 0` to skip writes).

## Results Directory Structure

The two runs produce two separate result directories:

```
.
└── checkpointing
    └── llama3-8b
        ├── 20260508_103731    # Phase 1 (write)
        │   ├── {0..7}_output.json
        │   ├── {0..7}_per_epoch_stats.json
        │   ├── checkpointing_20260508_103731_metadata.json
        │   ├── checkpointing_20260508_103731_timeseries.json
        │   ├── checkpointing_run.stderr.log
        │   ├── checkpointing_run.stdout.log
        │   ├── dlio.log
        │   ├── dlio_config/
        │   └── summary.json
        └── 20260508_104334    # Phase 2 (read)
            ├── {0..7}_output.json
            ├── {0..7}_per_epoch_stats.json
            ├── checkpointing_20260508_104334_metadata.json
            ├── checkpointing_20260508_104334_timeseries.json
            ├── checkpointing_run.stderr.log
            ├── checkpointing_run.stdout.log
            ├── dlio.log
            ├── dlio_config/
            └── summary.json
```

## Report Generation & Error

### Command

```bash
~/storage/mlpstorage reports reportgen --file --results-dir ./mlps_results_test
```

### Output

```
2026-05-08 12:56:45|INFO: Directory validation passed: found 2 runs in 1 benchmark types
2026-05-08 12:56:45|INFO: Created benchmark run: checkpointing_run_llama3-8b_20260508_103731
2026-05-08 12:56:45|INFO: Created benchmark run: checkpointing_run_llama3-8b_20260508_104334
2026-05-08 12:56:45|INFO: Accumulating results from 2 runs
...
2026-05-08 12:56:45|ERROR: INVALID: [INVALID] Expected 10 total read operations, but found 20
2026-05-08 12:56:45|ERROR: INVALID: [INVALID] Expected 10 total write operations, but found 20
```

### Validation Report (Excerpt)

```
[INVALID] Checkpointing - llama3-8b
    Issues:
      [INVALID] checkpoint.num_checkpoints_read: Expected 10 total read operations, but found 20
      [INVALID] checkpoint.num_checkpoints_write: Expected 10 total write operations, but found 20

Checkpointing Benchmark CLOSED Requirements
===========================================
Requirements:
  [ ] 10 checkpoint write operations total
  [ ] 10 checkpoint read operations total
  [ ] Valid LLM model (llama3-8b, llama3-70b, llama3-405b)
  [ ] Only allowed parameter overrides used
```

## Problem Analysis

The benchmark expects a combined total of **10 writes** and **10 reads** across all runs. However, it appears that setting `--num-checkpoints-read 0` or `--num-checkpoints-write 0` does **not** actually result in 0 operations being recorded for that phase. Instead, each run still reports the default of 10 for both read and write operations.

Since there are 2 runs, the report generator sums them up:

| Operation | Run 1 (Write Phase) | Run 2 (Read Phase) | Total | Expected |
|-----------|--------------------|--------------------|-------|----------|
| Writes    | 10                 | 10                 | 20    | 10       |
| Reads     | 10                 | 10                 | 20    | 10       |

**Root Cause:** The `--num-checkpoints-read 0` and `--num-checkpoints-write 0` flags appear to be ignored or not properly reflected in the recorded metadata. Each run still reports 10 read and 10 write operations regardless of these flags, causing the aggregated totals to double.

## Expected Behavior

When `--num-checkpoints-read 0` is set, the run should record 0 read operations (and vice versa for writes). The correct totals should be:

| Operation | Run 1 (Write Phase) | Run 2 (Read Phase) | Total | Expected |
|-----------|--------------------|--------------------|-------|----------|
| Writes    | 10                 | 0                  | 10    | 10 ✅    |
| Reads     | 0                  | 10                 | 10    | 10 ✅    |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpointing Benchmark: Invalid Submission Due to Incorrect Operation Count #365

Overview

Test Script

Intent

Results Directory Structure

Report Generation & Error

Command

Output

Validation Report (Excerpt)

Problem Analysis

Expected Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Checkpointing Benchmark: Invalid Submission Due to Incorrect Operation Count #365

Description

Overview

Test Script

Intent

Results Directory Structure

Report Generation & Error

Command

Output

Validation Report (Excerpt)

Problem Analysis

Expected Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions