feat(k8s): add Kubernetes and Kustomize indexing support#87
feat(k8s): add Kubernetes and Kustomize indexing support#87halindrome wants to merge 12 commits intoDeusData:mainfrom
Conversation
QA Round 1Model used: Claude Opus 4.6 Findings[Finding 1] MAJOR: Cached YAML result passed to handle_k8s_manifest produces zero Resource nodes
[Finding 2] MAJOR: Double file read for uncached K8s manifests
[Finding 3] MINOR: CONTRIBUTING.md states wrong extractor path
[Finding 4] MINOR: Case-sensitivity inconsistency in kustomization detection
[Finding 5] MINOR: No test for multi-document YAML — only first document processed
[Finding 6] MINOR: strlen on full content in cbm_is_k8s_manifest
[Finding 7] MINOR: Missing
[Finding 8] NIT: Redundant block_node unwrap in extract_k8s_scalars
[Finding 9] NIT: get_scalar_text has unbounded recursion on flow_node chains
[Finding 10] NIT: def.label = "Resource" is a string literal, not arena-allocated
SUMMARY: 0 critical, 2 major, 5 minor, 3 nit findings. |
QA Round 1 — Fixes AppliedCommit: f2678e7 [Major 1 + 2] Cached YAML result / double read — K8s manifests are discovered as CBM_LANG_YAML so their cached result has no Resource defs. Fix: always re-extract with CBM_LANG_K8S for files identified as K8s manifests, passing the already-loaded source buffer directly into the handler. Eliminates the cache misuse and the double file read in a single change. [Minor 3] CONTRIBUTING.md path — Corrected to [Minor 4] Case-sensitivity — Added a comment in language.c FILENAME_TABLE documenting the intentional behaviour: case-sensitive table entries; mixed-case variants fall through to CBM_LANG_YAML and are re-classified by cbm_is_kustomize_file() (case-insensitive) in pass_k8s. [Minor 5] Multi-doc YAML test — Added [Minor 6] strnlen — Replaced [Minor 7] crds field — Added [Nit 8] Dead block_node unwrap — Removed the unreachable second check in extract_k8s_scalars. [Nit 10] Arena-allocated label — Changed All 2051 tests pass. |
QA Round 2 —
|
QA Round 2 — Fixes AppliedCommit: ca3a29e [Minor 1] CONTRIBUTING.md [Nit 2] CONTRIBUTING.md tree-sitter contradiction — changed "do not use tree-sitter grammars" to "do not require a new tree-sitter grammar", with a note that they reuse the existing tree-sitter YAML grammar. [Nit 3] pass_k8s.c header comment — updated to "emit one Resource node per file (first document only — multi-document YAML is not yet supported)". All tests pass. |
QA Round 3 — Final ReviewModel used: Claude Opus 4.6 Previous Fix Verification
Code AnalysisAll changed files reviewed. Code is well-structured, follows existing infra-pass patterns, and memory management is correct throughout. Highlights:
Remaining IssuesNone. VerdictPASS — Ready to merge. All Round 1 and Round 2 fixes correctly applied. 2051 tests pass. Code is clean and follows project conventions. |
…tries - Add CBM_LANG_KUSTOMIZE and CBM_LANG_K8S to CBMLanguage enum (before CBM_LANG_COUNT) - Add kustomization.yaml/yml to FILENAME_TABLE mapped to CBM_LANG_KUSTOMIZE - Add Kustomize and Kubernetes entries to LANG_NAMES - Implement cbm_is_kustomize_file() with to_lower+strcmp pattern - Implement cbm_is_k8s_manifest() scanning first 4KB for apiVersion: via ci_strstr() - Declare both helpers in pipeline_internal.h Infrascan helpers section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- lang_specs.c: add LangSpec entries for CBM_LANG_KUSTOMIZE and CBM_LANG_K8S (both reuse tree_sitter_yaml()); add cbm_ts_language() switch cases - extract_k8s.c: new file implementing cbm_extract_k8s(); kustomize path walks block_sequence items under resources/bases/patches/components/ patchesStrategicMerge and emits CBMImport per scalar; k8s path extracts apiVersion/kind/metadata.name and emits CBMDefinition with label "Resource" and name "Kind/metadata-name"; malformed manifests (missing kind or name) produce zero definitions - cbm.h: declare cbm_extract_k8s() alongside other sub-extractor entry points - cbm.c: call cbm_extract_k8s() after unified extraction for the two new langs - Makefile.cbm: add extract_k8s.c to EXTRACTION_SRCS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sion - Add pass_k8s.c: cbm_pipeline_pass_k8s() iterates files, classifies kustomize overlays via cbm_is_kustomize_file() and k8s manifests via cbm_is_k8s_manifest(), emits Module/Resource nodes and IMPORTS/DEFINES edges - Kustomize files emit Module node (cbm_infra_qn) + IMPORTS edges per resources entry - K8s manifest files emit Resource nodes per top-level document with DEFINES edge - Falls back to file re-read + re-extraction when result_cache is unavailable - Declare cbm_pipeline_pass_k8s() prototype in pipeline_internal.h - Add pass_k8s.c to PIPELINE_SRCS in Makefile.cbm - Call pass after definitions pass in pipeline.c sequential path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add TEST(infra_is_kustomize_file): positive/negative/NULL cases - Add TEST(infra_is_k8s_manifest): apiVersion present/absent, kustomize file returns false, NULL guards - Add TEST(k8s_extract_kustomize): asserts 2 imports (deployment.yaml, service.yaml) from Kustomization resources list - Add TEST(k8s_extract_manifest): asserts Resource def with label "Resource" and name containing "Deployment" - Add TEST(k8s_extract_manifest_no_name): no crash, has_error==false - Fix extract_k8s.c get_scalar_text() to unwrap flow_node wrappers (tree-sitter YAML grammar wraps plain_scalar in flow_node) - Fix pass_k8s.c missing #include "foundation/compat.h" for CBM_TLS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix alignment violations in extract_k8s.c, pass_k8s.c, pass_infrascan.c - Fix spacing in language.c filename table rows (kustomization.yml entry) - Fix alignment in cbm.h CBMLanguage enum comments - Fix pipeline.c empty-body brace style Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- [Major 1+2] pass_k8s: always re-extract K8s manifests with CBM_LANG_K8S, discarding any cached YAML result; pass already-read source buffer to handle_k8s_manifest to eliminate the double file read - [Minor 3] CONTRIBUTING.md: fix extractor path to internal/cbm/extract_k8s.c and clarify tree-sitter YAML grammar usage - [Minor 4] language.c: add comment documenting intentional case-sensitive FILENAME_TABLE and case-insensitive cbm_is_kustomize_file() split behaviour - [Minor 5] test_pipeline.c: add k8s_extract_manifest_multidoc test pinning single-document-per-file extraction behaviour - [Minor 6] pass_infrascan: replace strlen with strnlen(content, 4096) in cbm_is_k8s_manifest to bound the scan - [Minor 7] extract_k8s: add "crds" to is_kustomize_list_key - [Nit 8] extract_k8s: remove dead second block_node unwrap in extract_k8s_scalars metadata descent - [Nit 10] extract_k8s: use cbm_arena_strdup for def.label "Resource" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CONTRIBUTING.md: fix CBMLanguage enum file reference from non-existent cbm_language.h to internal/cbm/cbm.h - CONTRIBUTING.md: rephrase "do not use tree-sitter grammars" to "do not require a new tree-sitter grammar", clarifying they reuse the existing YAML grammar - pass_k8s.c: correct header comment from "per top-level resource document" to "first document only — multi-document YAML is not yet supported" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2672dbb to
340862c
Compare
cbm_pipeline_pass_k8s() was called in the full pipeline (pipeline.c) but absent from pipeline_incremental.c. This meant k8s Resource nodes and kustomize Module nodes were never created or updated during incremental re-indexing — only after a fresh full index. Add the pass after the semantic pass, following the same timing-log pattern as the other incremental passes. Pass changed_files (not the full file list) so only modified/added YAML files are re-processed. Add two regression tests: - incremental_k8s_manifest_indexed: full index + add manifest via incremental, verifies Resource node appears in the DB - incremental_kustomize_module_indexed: same for kustomization.yaml Module node Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Check return value of cbm_pipeline_pass_k8s() in incremental path; log warning on failure (fail-open, matches full pipeline behaviour) - Add comment documenting pass ordering and the two known structural limitations: File→Resource DEFINES edges and cross-file kustomize IMPORTS edges are not emitted in incremental (gbuf only contains nodes for changed files; File nodes from pass_structure are absent) - Fix clang-format violation in incremental_kustomize_module_indexed test (single-line if condition) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Testing against a real gitops repositoryAfter opening this PR I tested the indexer against a live gitops monorepo (~10k files, git submodules, 213 Bug found: k8s pass missing from incremental pipeline
Fix: added the k8s pass to Known structural limitation (documented in code comment): In the incremental path, Clarification: InfraFile nodes are shell scripts, not YAML manifestsDuring testing I noticed the gitops repo had 35 k8s YAML manifests get |
The parallel execution path (used for repos above MIN_FILES_FOR_PARALLEL) was missing the cbm_pipeline_pass_k8s() call. The pass existed only in the sequential fallback path, so any repo large enough to trigger parallel indexing produced zero k8s Resource and kustomize Module nodes. Discovered via live testing against a 954-file gitops repo: the parallel path was taken, k8s pass was silently skipped, and search_graph returned 0 Resource nodes despite 213 kustomize overlays and hundreds of manifests. Add the pass after cbm_parallel_resolve() in the parallel branch, matching the same pattern (fail-open, cancel check, timing log) as the sequential path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #86
Summary
Adds first-class indexing support for Kubernetes manifests and Kustomize files, following the infra-pass pattern used by Dockerfile and docker-compose.
cbm_is_k8s_manifest,cbm_is_kustomize_file) — name-based for Kustomize,apiVersion:content heuristic for generic K8s manifestsextract_k8s.c) — parses manifests and kustomization files using the vendored YAML grammar; no new tree-sitter grammar neededpass_k8s.c) — emitsResourcenodes for K8s manifests (Kind/metadata.name) andModulenodes for Kustomize files withIMPORTSedges to each entry inresources:CBM_LANG_K8SandCBM_LANG_KUSTOMIZEadded; language table updated inlang_specs.ctests/test_pipeline.ccovering detection, extraction correctness, and null/empty-name guardResourceadded to Node Labels listWhat changed
src/pipeline/pass_infrascan.ccbm_is_k8s_manifest(),cbm_is_kustomize_file()helpersinternal/cbm/extract_k8s.csrc/pipeline/pass_k8s.cinternal/cbm/cbm.hCBM_LANG_K8S,CBM_LANG_KUSTOMIZEenum valuesinternal/cbm/lang_specs.ctests/test_pipeline.cREADME.mdResourceadded to Node Labels listCONTRIBUTING.mdHow to test
All three pass locally (2050 tests, clang-format clean, security 23/23).
Node labels
K8s manifests →
Resourcenodes (e.g.Deployment/my-app). Query:search_graph(label="Resource")Kustomize files →
Modulenodes withIMPORTSedges to each referenced resource file.