Skip to content

Add Kubernetes API client with optional dependency [2/3]#65

Open
gustcol wants to merge 3 commits intofacebookresearch:mainfrom
gustcol:feature/k8s-api-client
Open

Add Kubernetes API client with optional dependency [2/3]#65
gustcol wants to merge 3 commits intofacebookresearch:mainfrom
gustcol:feature/k8s-api-client

Conversation

@gustcol
Copy link
Contributor

@gustcol gustcol commented Feb 27, 2026

Summary

  • Implement KubernetesApiClient using the official kubernetes Python library as an optional dependency (pip install gpucm[kubernetes])
  • Support both in-cluster (ServiceAccount) and kubeconfig authentication
  • Extract slurm.coreweave.com/job-id annotations for Slurm-K8s job correlation
  • Add KubernetesFakeClient for testing with injectable pod/node data

Stacked PR series: [1/3] #64[2/3] → [3/3]

Note: This PR is stacked on #64. Please merge #64 first.

Ref: #63

Test plan

  • All 13 client unit tests pass (test_kubernetes_client.py)
  • Tests cover: fake client filtering, API client with mocked CoreV1Api, error handling, import error graceful failure
  • ufmt formatting clean
  • flake8 linting clean

Introduce KubernetesPodRow/Payload and KubernetesNodeConditionRow/Payload
schemas following the existing Row+Payload(DerivedCluster) pattern, plus
a KubernetesClient Protocol mirroring SlurmClient for pluggable K8s data
sources. This is the foundation for Kubernetes-layer observability in
SUNK (Slurm-on-K8s) clusters.

Ref: facebookresearch#63
Implement KubernetesApiClient using the official kubernetes Python
library as an optional dependency. Supports both in-cluster and
kubeconfig auth, extracts slurm.coreweave.com/job-id annotations for
Slurm-K8s correlation, and emits one KubernetesPodRow per container.
Includes KubernetesFakeClient for testing with injectable data.

Ref: facebookresearch#63
@github-actions
Copy link

CI Commands

The following CI workflows run automatically on every push and pull request:

Workflow What it runs
GPU Cluster Monitoring Python CI lint, tests, typecheck, format, deb build, pyoxidizer builds
Go packages CI shelper tests, format, lint

The following commands can be used by maintainers to trigger additional tests that require access to secrets:

Command Description Requires approval?
/metaci tests Runs Meta internal integration tests (pytest) Yes — a maintainer must trigger the command and approve the deployment request
/metaci integration tests Same as above (alias) Yes

Note: Only repository maintainers (OWNER association) can trigger /metaci commands. After commenting the command, a maintainer must also navigate to the Actions tab and approve the deployment to the graph-api-access environment before the jobs will run. See the approval guidelines for what to approve or reject.

Add type: ignore[import-not-found] comments to kubernetes imports
since the library lacks type stubs and is an optional dependency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant