Add NUMA-aware allocator support (Linux: libnuma, cross-platform strategy)

**Title:** Add NUMA-aware allocator support (Linux: libnuma, cross-platform strategy)

## Summary

Introduce a NUMA-aware allocator implementation for Morpheus, enabling memory allocation policies that are aware of Non-Uniform Memory Access (NUMA) topologies. Initial implementation will target Linux using `libnuma`, with a cross-platform abstraction to support Windows and provide graceful fallback on macOS.

## Motivation

Modern multi-socket and multi-core systems frequently exhibit NUMA characteristics, where memory access latency depends on the proximity of memory to the executing CPU core.

Without NUMA awareness, allocations may occur on remote nodes, leading to:

* **Increased memory access latency**
* **Reduced cache locality**
* **Cross-node memory traffic (QPI/Infinity Fabric penalties)**
* **Unpredictable performance in multi-threaded workloads**

For Morpheus use cases (e.g. high-throughput pipelines, parsing, transformation, and low-latency systems), these effects can be significant.

A NUMA-aware allocator enables:

* **Thread-local allocation on the correct NUMA node**
* **Explicit placement strategies (bind, interleave, preferred node)**
* **Improved scalability under parallel workloads**
* **Better alignment with CPU affinity strategies**

## Linux Implementation (libnuma)

Leverage `libnuma` to implement a NUMA-aware allocator:

### Features

* Allocate memory on a specific NUMA node (`numa_alloc_onnode`)
* Interleaved allocation across nodes (`numa_alloc_interleaved`)
* Preferred node allocation
* Query system topology (`numa_available`, `numa_num_configured_nodes`)

### Example API (conceptual)

```cpp
enum class numa_policy
{
    local,
    preferred,
    interleave,
    bind
};

template <typename T>
class numa_allocator
{
public:
    using value_type = T;

    numa_allocator(int node, numa_policy policy);

    T* allocate(std::size_t n);
    void deallocate(T* p, std::size_t n);

private:
    int node_;
    numa_policy policy_;
};
```

### Integration

* Works with Morpheus allocator-aware types
* Can be wrapped in `std::pmr::memory_resource` for runtime polymorphism

## Windows Support

Windows does not provide `libnuma`, but exposes NUMA functionality via the Win32 API:

### Relevant APIs

* `VirtualAllocExNuma`
* `GetNumaNodeProcessorMaskEx`
* `GetCurrentProcessorNumberEx`
* `SetThreadGroupAffinity`

### Strategy

* Implement a Windows-specific backend using `VirtualAllocExNuma`
* Map Morpheus `numa_policy` to:

  * Preferred node allocation
  * Explicit node binding where possible

### Limitations

* Less flexible than Linux `libnuma` (e.g. interleaving is less direct)
* Requires careful handling of processor groups on large systems

## macOS (Apple Silicon / Intel)

macOS does **not expose NUMA APIs** in a meaningful or controllable way:

* Apple Silicon uses a unified memory architecture (UMA)
* Intel macOS systems abstract NUMA details away from user-space

### Strategy

* Provide a **no-op / fallback allocator**
* Behaves like a standard allocator while preserving API compatibility
* Optionally:

  * Use thread affinity hints (limited impact)
  * Document that NUMA policies are ignored

## Cross-Platform Abstraction

Introduce a unified interface:

```cpp
class numa_memory_resource : public std::pmr::memory_resource
{
public:
    numa_memory_resource(int node, numa_policy policy);

private:
    void* do_allocate(size_t bytes, size_t alignment) override;
    void do_deallocate(void* p, size_t bytes, size_t alignment) override;
    bool do_is_equal(const memory_resource& other) const noexcept override;
};
```

### Backend Selection

* Linux → `libnuma`
* Windows → Win32 NUMA APIs
* macOS → fallback (standard allocation)

## Benefits

* Improved locality and reduced latency in multi-threaded workloads
* Better scalability on multi-socket systems
* Alignment with thread pinning / CPU affinity strategies
* Enables advanced users to tune performance-critical paths

## Risks / Considerations

* **Portability complexity**
  Requires multiple platform-specific implementations

* **Testing difficulty**
  NUMA effects are hardware-dependent; CI coverage may be limited

* **Misuse potential**
  Incorrect node selection can degrade performance

* **API design**
  Needs to balance flexibility with ease of use

## Alternatives Considered

* Ignore NUMA entirely
  → Leaves significant performance on the table for target use cases

* Rely on OS default policies
  → Often suboptimal for tightly controlled workloads

## Next Steps

1. Implement Linux prototype using `libnuma`
2. Design abstraction layer for cross-platform support
3. Add Windows backend using `VirtualAllocExNuma`
4. Provide macOS fallback implementation
5. Integrate with Morpheus allocator framework
6. Benchmark impact on representative workloads

---

**Open Questions**

* Should NUMA policy be compile-time or runtime configurable?
* Do we expose low-level controls or provide higher-level presets?
* Should thread affinity utilities be included alongside allocator support?

---

**Examples**

* https://github.com/ReidAtcheson/numaallocator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NUMA-aware allocator support (Linux: libnuma, cross-platform strategy) #466

Summary

Motivation

Linux Implementation (libnuma)

Features

Example API (conceptual)

Integration

Windows Support

Relevant APIs

Strategy

Limitations

macOS (Apple Silicon / Intel)

Strategy

Cross-Platform Abstraction

Backend Selection

Benefits

Risks / Considerations

Alternatives Considered

Next Steps

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add NUMA-aware allocator support (Linux: libnuma, cross-platform strategy) #466

Description

Summary

Motivation

Linux Implementation (libnuma)

Features

Example API (conceptual)

Integration

Windows Support

Relevant APIs

Strategy

Limitations

macOS (Apple Silicon / Intel)

Strategy

Cross-Platform Abstraction

Backend Selection

Benefits

Risks / Considerations

Alternatives Considered

Next Steps

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions