Hi team π
I'm a maintainer on NVIDIA/aicr (AI Cluster Runtime). I came across ai-cloud-validation and was intrigued by it β the provider-agnostic stub model, the breadth of checks (NCCL, NVLink, InfiniBand, NIM, BMC/Redfish posture, Slurm + K8s workloads), and the requirements-driven test plan are exactly the kind of real-hardware validation the ecosystem needs.
A bit about AICR
AICR generates and validates GPU-accelerated Kubernetes configurations (Snapshot β Recipe β Validate β Bundle). Alongside the recipe workflow, we've built a general-purpose trust substrate: every artifact we emit gets a signed, tamper-evident, offline-verifiable attestation using Sigstore keyless signing + Fulcio + Rekor + in-toto/DSSE, with a content-addressed manifest and a small pointer file committed to the repo (so git log is the audit trail). It answers three questions about any artifact β where it came from (provenance), whether it actually worked on real hardware (validity), and whether it's been altered (falsification).
Where I see complementarity
We seem to sit on the same axis from different ends: ACV runs a broad, deep library of real-hardware validation checks, while AICR provides a cryptographic trust and portability layer for validation results. A few areas that might be worth exploring together:
- Portable, verifiable evidence β giving validation results a signed, tamper-evident, offline-verifiable form that travels beyond any single reporting service, which is useful for air-gapped and sovereign contexts.
- Shared validation coverage β our matrices overlap, so there may be room to reuse checks across both projects rather than duplicating them.
- Contributor-driven coverage β reaching hardware no central lab owns by letting whoever has the hardware run the validation and publish results.
- A complete trust story β combining hardware/device attestation with software supply-chain attestation toward an end-to-end "trustworthy bytes on trustworthy silicon" picture.
Just wanted to make sure you're aware of the trust primitives we've built (all OSS, reusable independent of AICR), and see if any of the above resonates. Happy to jam on a call, share a short design sketch, or prototype something if useful.
Either way β nice work on this. π
Mark
github.com/NVIDIA/aicr
Hi team π
I'm a maintainer on NVIDIA/aicr (AI Cluster Runtime). I came across ai-cloud-validation and was intrigued by it β the provider-agnostic stub model, the breadth of checks (NCCL, NVLink, InfiniBand, NIM, BMC/Redfish posture, Slurm + K8s workloads), and the requirements-driven test plan are exactly the kind of real-hardware validation the ecosystem needs.
A bit about AICR
AICR generates and validates GPU-accelerated Kubernetes configurations (Snapshot β Recipe β Validate β Bundle). Alongside the recipe workflow, we've built a general-purpose trust substrate: every artifact we emit gets a signed, tamper-evident, offline-verifiable attestation using Sigstore keyless signing + Fulcio + Rekor + in-toto/DSSE, with a content-addressed manifest and a small pointer file committed to the repo (so
git logis the audit trail). It answers three questions about any artifact β where it came from (provenance), whether it actually worked on real hardware (validity), and whether it's been altered (falsification).Where I see complementarity
We seem to sit on the same axis from different ends: ACV runs a broad, deep library of real-hardware validation checks, while AICR provides a cryptographic trust and portability layer for validation results. A few areas that might be worth exploring together:
Just wanted to make sure you're aware of the trust primitives we've built (all OSS, reusable independent of AICR), and see if any of the above resonates. Happy to jam on a call, share a short design sketch, or prototype something if useful.
Either way β nice work on this. π
Mark
github.com/NVIDIA/aicr