feat(agents): Add verifier agent package#5
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: yuwenma The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new devops_bench.agents.verifier package that provides a structured verification engine (via Pydantic specs and concrete verifiers) for validating Kubernetes cluster state during/after scenarios and disruptions.
Changes:
- Added
VerifierAgent.wait_for_condition()to execute single or compound verification specs with aggregated results. - Implemented two initial verifiers: pod health (
pod_healthy) and deployment replica convergence (scaling_complete). - Added package documentation and unit tests for the verifiers.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| devops_bench/agents/verifier/init.py | Adds verifier package marker. |
| devops_bench/agents/verifier/base.py | Defines BaseVerifier and the recursive VerificationResult model. |
| devops_bench/agents/verifier/spec.py | Defines Pydantic VerificationSpec and the supported spec union. |
| devops_bench/agents/verifier/verifier.py | Adds VerifierAgent orchestrator for single/compound specs and aggregated results. |
| devops_bench/agents/verifier/pod_healthy.py | Implements pod readiness verification via kubectl wait with polling fallback. |
| devops_bench/agents/verifier/scaling_complete.py | Implements deployment scaling convergence verification via polling kubectl get. |
| devops_bench/agents/verifier/test_verifier.py | Adds unit tests covering core verifier behaviors and compound specs. |
| devops_bench/agents/verifier/README.md | Documents the spec/result APIs, examples, and extension workflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| SingleVerificationSpec = Union[PodHealthyVerifier, ScalingCompleteVerifier] | ||
|
|
||
| # Top-level VerificationSpec which can parse a dict, a list, or a single checker spec. | ||
| class VerificationSpec(RootModel[Union[Dict[str, SingleVerificationSpec], List[SingleVerificationSpec], SingleVerificationSpec]]): |
There was a problem hiding this comment.
Wondering if we need to support all these variants. Wouldn't it be simpler to just support Dict? We can model the other types as a dict with some constraint on the caller and can be much easier to maintain.
There was a problem hiding this comment.
One big advantage of K8s API (spec and status) is its strong structured format even in YAML/JSON config. Golang enforces that by defining the structures and built it own k8s/yaml to parse the type. But Python does not have corrsponding type check if simply using the built in json parser lib. So I apply a structured type to enforce the same level of K8s YAML validaiton.
There was a problem hiding this comment.
My question was from an API perspective, do we want to support something so complex. Can we not define Dict[str, SingleVerificationSpec] as the interface? We enforce an id for any spec that we validate and it keeps interface and validation much simpler. Unless there's a strong reason for us to handle all these different data structures as input, I propose we keep the API interface very simple.
There was a problem hiding this comment.
Good point. If we want to follow the K8s API convension https://github.com/kubernetes/community/blob/main/contributors/devel/sig-architecture/api-conventions.md#spec-and-status, we should define the Dict[str, SingleVerificationSpec] because the interface has to be deterministic. If we don't plan to eventually converge to k8s CRD, I'm fine to loose the constraints here.
There was a problem hiding this comment.
I don't think we aim to converge to k8s CRD. That assumes that k8s event streams are the only source of verification which is not likely to be the case. We might have to verify a variety of conditions for state convergence.
I consider this more akin to unstructured.Unstructured which is essentially map[[string]interface{}. At this point we do not have clarity on the full API range of the verifier so I suggest we keep it simple and lean on a bit more specification on client side.
There was a problem hiding this comment.
Thanks for the clarification. Simplified the verifier Spec.
e0d6f5d to
7e4eafd
Compare
|
@yuwenma are the Copilot review comments addressed? If so, would you resolve them to make it clear? |
They are all resolved and marked now. |
|
@yuwenma Can you please squash the changes to keep commits atomic. |
7e4eafd to
dccb67f
Compare
| # Catch timeout or error and fallback to polling check | ||
| # Check status details | ||
| details = self._get_pods_details(timeout=min(KUBECTL_DEFAULT_TIMEOUT, remaining)) | ||
| last_details = details |
There was a problem hiding this comment.
When remaining is very small (e.g., 0.01 seconds) near the end of the timeout budget, kubectl get pods will fail unconditionally.
Consider doing something similar as ScalingCompleteVerifier (which uses timeout=max(1.0, remaining)) for the fallback check, and adding an explicit final check after the loop to ensure a reliable final evaluation.
| dep_data = json.loads(result.stdout) | ||
| ready_replicas = dep_data.get("status", {}).get("readyReplicas", 0) | ||
| success = ready_replicas >= self.min_replicas | ||
| reason = ( |
There was a problem hiding this comment.
Verify status.observedGeneration >= metadata.generation to be fully robust against race conditions where kubectl get deployment is queried before the controller has observed the latest scaling action.
|
Just noticed #8 depends on this PR. Is there a specific reason why the verifier agent doesn't use kubewatch directly, instead of breaking up into 2 commits/PRs? |
There was a problem hiding this comment.
Can you also lock the dependencies and add uv.lock here?
If you ensure you are enabling the pre-commit hooks as described in the AGENTS.md, it will block you from pushing automatically.
There was a problem hiding this comment.
Woohoo, +1 to AGENTS.md.
Done
| SingleVerificationSpec = Union[PodHealthyVerifier, ScalingCompleteVerifier] | ||
|
|
||
| # Top-level VerificationSpec which can parse a dict, a list, or a single checker spec. | ||
| class VerificationSpec(RootModel[Union[Dict[str, SingleVerificationSpec], List[SingleVerificationSpec], SingleVerificationSpec]]): |
There was a problem hiding this comment.
I don't think we aim to converge to k8s CRD. That assumes that k8s event streams are the only source of verification which is not likely to be the case. We might have to verify a variety of conditions for state convergence.
I consider this more akin to unstructured.Unstructured which is essentially map[[string]interface{}. At this point we do not have clarity on the full API range of the verifier so I suggest we keep it simple and lean on a bit more specification on client side.
These two PRs separate the pure migration (previously approved in gke-labs) and the new features to avoid dup-review. |
There was a problem hiding this comment.
These two PRs separate the pure migration (previously approved in gke-labs) and the new features to avoid dup-review.
@yuwenma let's just keep PR #8 (with 2 commits) and squash at merge time (I've applied the squash-merge label on PR #8).
If they're merged as 2 distinct commits, the main branch history will contain an intermediate commit with legacy kubectl wait polling loops that are immediately deleted in the next commit. The main branch should only reflect the final, working architectural state.
Would you rebase PR #8 to resolve the current merge conflicts with main, ensure the review feedback from this PR is carried over, and we move the review there?
dccb67f to
bb8e962
Compare
bb8e962 to
cc2c404
Compare
|
As suggested above, let's move the review / discussion to #8. |
This PR adds the verification engine. It provides a structured framework used by scenario managers and task evaluators (will be added separately) to validate cluster state (e.g., workload readiness, replica convergence) during and after chaos disruptions or operational tasks (will be added separately).