Skip to content

feat(adapter): replace kubelet HTTP polling with SharedIndexInformer#110

Merged
btwshivam merged 4 commits into
optiqor:mainfrom
huynna12:feat/kubernetes-shared-index-informer
Jun 4, 2026
Merged

feat(adapter): replace kubelet HTTP polling with SharedIndexInformer#110
btwshivam merged 4 commits into
optiqor:mainfrom
huynna12:feat/kubernetes-shared-index-informer

Conversation

@huynna12
Copy link
Copy Markdown
Contributor

@huynna12 huynna12 commented May 30, 2026

What

Replaces the periodic kubelet HTTP polling in internal/adapter/kubernetes.go
with a SharedIndexInformer scoped to the local node, giving instant pod
visibility and eliminating kubelet read-only API dependency.

Why

Fixes #19

How

  • SharedInformerFactory filtered via fields.OneTermEqualSelector("spec.nodeName",
    nodeName)
  • Two in-memory indexes (uidIndex + cgroupIndex) updated via event handlers for O(1)
    lookups
  • cgroupIndex covers pod UID variants (dashes/underscores) + container ID (full + 12-char
    prefix)
  • Exponential backoff (1s→2min) on API server failure; stale entries preserved so
    enrichment continues degraded
  • buildClientset() tries in-cluster service account first, falls back to KUBECONFIG for
    dev

Testing

  • go build ./... passes
  • go test ./... passes
  • go vet ./... passes
  • golangci-lint run ./... passes
  • Tested locally with:
  • N/A — pure docs/refactor
  • sudo ./bin/bpf-verify --read 5s confirms 6/6 programs still load
  • ./scripts/verify.sh passes (or specific phase: ./scripts/verify.sh quality)

Checklist

  • PR title follows Conventional Commits (feat(scope): subject)
  • All commits are DCO-signed (git commit -s)
  • No unrelated changes pulled in
  • Documentation updated where user-visible behavior changed
  • Added/updated tests for new code paths
  • If a new doctor rule, paired with a chaos scenario in scripts/verify.sh

  SharedIndexInformer

  Closes optiqor#19

  - Use SharedInformerFactory scoped to local node via spec.nodeName field
  selector
  - In-cluster service account auth by default, kubeconfig fallback for dev
  - Index pods by UID variants and container ID (full + 12-char prefix) for O(1)
   lookups
  - Exponential backoff retry (1s→2min); stale index preserved during reconnect
  - Remove all kubelet HTTP polling code paths entirely
  - Add 1000-pod fixture: correctness + p99 < 10µs latency assertions
  - Trim ClusterRole to pods get/list/watch only
  - Inject NODE_NAME via Downward API in DaemonSet

Signed-off-by: Heidi Ho <honhuhuynh1210@gmail.com>
@huynna12 huynna12 requested a review from btwshivam as a code owner May 30, 2026 00:17
@github-actions
Copy link
Copy Markdown

🚀 First PR — welcome aboard!

A few things to expect:

  1. CI: every PR runs build + race tests + lint + (eventually) the kernel matrix. If something fails, the log will tell you exactly which gate.
  2. DCO: every commit needs Signed-off-by:git commit -s adds it automatically.
  3. Conventional Commits: PR titles like feat(doctor): add new rule or fix(bpf): handle X. We squash-merge by default.
  4. Review: a maintainer will review within 72 hours. Suggestions are conversations, not orders — push back if something doesn't fit your context.

If you get stuck, reply here or jump to Discussions. We want this PR to land.

@github-actions github-actions Bot added level:advanced 200+ lines or 6+ files (auto-applied) testing Tests and test coverage area/k8s Kubernetes integration labels May 30, 2026
@btwshivam
Copy link
Copy Markdown
Member

@huynna12 lints are failing..

Copy link
Copy Markdown
Member

@btwshivam btwshivam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci's red from the go.mod bump to 1.26.0 (ci runs 1.25), and the title scope adapter/k8s isn't allowed (use feat(adapter): or feat(k8s):). fix both and it's close, the informer locking and cache-sync look right.

Comment thread go.mod Outdated
module github.com/optiqor/kerno

go 1.25.4
go 1.26.0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this bumps the go directive to 1.26.0, but ci runs 1.25.x, so go test dies with go.mod requires go >= 1.26.0 (running go 1.25.10) and golangci-lint fails because it's built with go1.25 and can't target 1.26.0. that's both red checks. looks like an accidental bump from go mod tidy on a 1.26 toolchain. set it back to go 1.25.4.

hostPort: {{ .Values.prometheus.port }}
protocol: TCP
env:
- name: NODE_NAME
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this duplicates KERNO_NODE_NAME four lines down, both injected from spec.nodeName. the adapter already falls back to KERNO_NODE_NAME (kubernetes.go:63-66), so NODE_NAME is redundant and it breaks the KERNO_ env convention the rest of the daemonset uses. drop it and rely on KERNO_NODE_NAME.

@btwshivam
Copy link
Copy Markdown
Member

rest eveything looks good.. well done.. fix them.. then we can merge!!

@huynna12 huynna12 changed the title feat(adapter/k8s): replace kubelet HTTP polling with SharedIndexInformer feat(adapter): replace kubelet HTTP polling with SharedIndexInformer May 31, 2026
Signed-off-by: Heidi Ho <honhuhuynh1210@gmail.com>
@huynna12 huynna12 requested a review from btwshivam June 1, 2026 00:20
Signed-off-by: Heidi Ho <honhuhuynh1210@gmail.com>
@huynna12
Copy link
Copy Markdown
Contributor Author

huynna12 commented Jun 1, 2026

rest eveything looks good.. well done.. fix them.. then we can merge!!

I fixed it all. Please check it again and let me know if there are things else that I need to change/fix. Thank you.

@btwshivam
Copy link
Copy Markdown
Member

btwshivam commented Jun 2, 2026

@huynna12 I can still see conflicts.. fix it please

@huynna12
Copy link
Copy Markdown
Contributor Author

huynna12 commented Jun 4, 2026

@btwshivam there is no conflict now

@btwshivam btwshivam merged commit e28f1dd into optiqor:main Jun 4, 2026
10 checks passed
@btwshivam btwshivam added the gssoc:approved Counted toward leaderboard label Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/k8s Kubernetes integration gssoc:approved Counted toward leaderboard level:advanced 200+ lines or 6+ files (auto-applied) testing Tests and test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

K8s: Replace kubelet HTTP polling with SharedIndexInformer for pod metadata

2 participants