Bench Shuffle Pinned Memory Debugging

When running current bench_shuffle benchmarks on an NVL72 I'm seeing an error then pinned memory is enabled (this is the default)

>APP_ARGS="-C ucxx -w 3 -r 10 -g -s -x -l $((40 * 1024)) -p ${NUM_PARTITIONS} -o 4 -c ${NUM_COLUMNS} -n ${NUM_ROWS} -x -m async"

```
  -c ucxx (communicator)
  -r 10 (number of runs)
  -w 3 (number of warmup runs)
  -c 4 (number of columns)
  -n 67108864 (number of rows per rank)
  -p 50 (number of input partitions per rank)
  -o 4 (number of output partitions per rank)
  -m async (RMM memory resource)
  -l 40960 (device memory limit in MiB)
  -s (enable output discard to simulate streaming)
  -x (enable memory profiling)
  -g (use pre-partitioned input tables)
Local size: 50 GiB

terminate called after throwing an instance of 'cuda::__4::cuda_error'
  what():  /opt/conda/envs/rapidsmpf/include/rapids/cuda/__driver/driver_api.h:152 out of memory(2): Failed to allocate memory from a memory pool
[presto-gb200-gcn-07:1222606] *** Process received signal ***
[presto-gb200-gcn-07:1222606] Signal: Aborted (6)
```

When I disable pinned memory with `-L` the job completes without issue.  One interesting note is that this does not seem to reproduce on a cluster made up of DGXH100s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bench Shuffle Pinned Memory Debugging #1010

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bench Shuffle Pinned Memory Debugging #1010

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions