Skip to content

Hello from AICR β€” exploring integration between ai-cloud-validation and AICRΒ #525

Description

@mchmarny

Hi team πŸ‘‹

I'm a maintainer on NVIDIA/aicr (AI Cluster Runtime). I came across ai-cloud-validation and was intrigued by it β€” the provider-agnostic stub model, the breadth of checks (NCCL, NVLink, InfiniBand, NIM, BMC/Redfish posture, Slurm + K8s workloads), and the requirements-driven test plan are exactly the kind of real-hardware validation the ecosystem needs.

A bit about AICR

AICR generates and validates GPU-accelerated Kubernetes configurations (Snapshot β†’ Recipe β†’ Validate β†’ Bundle). Alongside the recipe workflow, we've built a general-purpose trust substrate: every artifact we emit gets a signed, tamper-evident, offline-verifiable attestation using Sigstore keyless signing + Fulcio + Rekor + in-toto/DSSE, with a content-addressed manifest and a small pointer file committed to the repo (so git log is the audit trail). It answers three questions about any artifact β€” where it came from (provenance), whether it actually worked on real hardware (validity), and whether it's been altered (falsification).

Where I see complementarity

We seem to sit on the same axis from different ends: ACV runs a broad, deep library of real-hardware validation checks, while AICR provides a cryptographic trust and portability layer for validation results. A few areas that might be worth exploring together:

  • Portable, verifiable evidence β€” giving validation results a signed, tamper-evident, offline-verifiable form that travels beyond any single reporting service, which is useful for air-gapped and sovereign contexts.
  • Shared validation coverage β€” our matrices overlap, so there may be room to reuse checks across both projects rather than duplicating them.
  • Contributor-driven coverage β€” reaching hardware no central lab owns by letting whoever has the hardware run the validation and publish results.
  • A complete trust story β€” combining hardware/device attestation with software supply-chain attestation toward an end-to-end "trustworthy bytes on trustworthy silicon" picture.

Just wanted to make sure you're aware of the trust primitives we've built (all OSS, reusable independent of AICR), and see if any of the above resonates. Happy to jam on a call, share a short design sketch, or prototype something if useful.

Either way β€” nice work on this. πŸ™Œ

Mark
github.com/NVIDIA/aicr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions