Skip to content

Multi-GPU fan-out #9

@SolidRegardless

Description

@SolidRegardless

Summary

Add support for distributing benchmark workloads across multiple GPUs, splitting batches and aggregating results to demonstrate multi-GPU scaling.

Motivation

Many workstations and servers have multiple GPUs installed. The current implementation targets a single accelerator at a time. Multi-GPU fan-out would demonstrate near-linear scaling for embarrassingly parallel workloads and provide a realistic picture of what production GPU-accelerated systems look like. This is especially relevant for domains like financial simulation and AI inference where multi-GPU setups are common.

Acceptance Criteria

  • Detect all available GPUs at startup and list them (name, memory, compute capability)
  • Add a --multi-gpu CLI flag that distributes work across all available GPUs of the same type
  • Split each batch evenly across available GPUs, execute in parallel, and merge results
  • Report per-GPU and aggregate throughput metrics
  • Report scaling efficiency (e.g. 2 GPUs achieving 1.9× = 95% efficiency)
  • Gracefully handle systems with only one GPU (run normally with a note)
  • Handle heterogeneous GPU configurations (different models) with appropriate warnings

Technical Notes

  • ILGPU's Context can enumerate multiple devices — use this for discovery
  • Each GPU will need its own Accelerator instance and memory buffers
  • Synchronisation between GPUs happens on the host side (no direct GPU-to-GPU needed for this use case)
  • Consider using Task.WhenAll or similar for parallel dispatch across GPUs
  • Batch splitting should account for uneven division (last GPU gets remainder)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions