feat(synthetic): add rds_upstream suite with scenario 001 (part 1/6 for #1437)#2523
feat(synthetic): add rds_upstream suite with scenario 001 (part 1/6 for #1437)#2523Devesh36 wants to merge 1 commit into
Conversation
Greptile code reviewThis repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md. Run a review — add a PR comment with: Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5. Optional: automate with the greploop skill. |
Greptile SummaryThis PR introduces the
Confidence Score: 5/5Safe to merge — test-only additions with no production behavior change and no regressions to existing suites. All changes are confined to a new synthetic test suite directory. The fixture data is static JSON/YAML, the Python code re-uses existing tested rds_postgres loader and correlation helpers, and the init.py files, pytestmark placement, and envelope namespace issues raised in the previous review round have all been corrected. No production code paths are touched. No files require special attention. The aws_cloudwatch_metrics_envelope.json namespace field is worth monitoring if future agent code branches on that value per-metric, but it is a known design constraint of the split-file approach. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[pytest collects rds_upstream suite] --> B[test_suite.py]
A --> C[correlation/test_001_request_burst.py]
B --> B1[test_load_all_upstream_scenarios]
B --> B2[test_upstream_scenario_metadata_and_evidence]
B --> B3[test_request_burst_answer_key_expects_cross_system_evidence]
C --> D[Load 3 time-series from fixture JSON]
D --> E[Construct TopologyNodes for rds, web, worker]
E --> F1[score_time_window_correlation RDS vs Web CPU - high correlation]
E --> F2[score_time_window_correlation RDS vs Worker CPU - near zero]
F1 --> G[score_candidate_correlation - web_score]
F2 --> H[score_candidate_correlation - worker_score]
G --> I[rank_upstream_candidates]
H --> I
I --> J{Assert web-asg ranks first with higher confidence}
style J fill:#d4edda,stroke:#28a745
Reviews (2): Last reviewed commit: "feat(synthetic): add rds_upstream suite ..." | Re-trigger Greptile |
…#2523) Greptile P1: add __init__.py under tests/synthetic/rds_upstream/ and tests/synthetic/rds_upstream/correlation/ so pytest resolves absolute imports when the suite is run in isolation. Greptile P2: move pytestmark below all imports in correlation/test_001_request_burst.py (fixes E402 / isort ordering). Greptile P2: set CloudWatch envelope namespace to AWS/RDS,AWS/EC2 so RDS and EC2 Auto Scaling metrics are not all labeled under AWS/RDS alone.
1a51074 to
3a83a99
Compare
…Cloud#1437 Scaffolds a new synthetic RCA suite (tests/synthetic/rds_upstream/) that exercises cross-system "RDS symptom -> EC2/app upstream cause" attribution without live infrastructure, per EPIC Tracer-Cloud#1433. Scenario 001 (request-burst-ec2-app-tier): - RDS shows elevated CPU and connection count. - Real upstream cause is a web-tier EC2 fleet request burst; worker tier remains idle (decoy). - Fixtures include CloudWatch envelopes (RDS + EC2), RDS events, EC2 instances-by-tag, and ELB target health. Tests: - test_suite.py: loader integrity, metadata, evidence, and answer-key expectations. - correlation/test_001_request_burst.py: asserts web tier ranks above worker tier in cross-system correlation. Follow-up PRs will add scenarios 002-006 and the Makefile target. Refs: Tracer-Cloud#1437, Tracer-Cloud#1433 Co-authored-by: Cursor <cursoragent@cursor.com>
3a83a99 to
483ae69
Compare
|
@greptile-apps review again |
|
Scope is pretty big, so I think this is a good start. Maybe your API limit got cooked too like mine. Would still be nice later to test this with real investigations / "run_suite" though. We usually end up changing quite a bit after trying things with real LLM providers. Also maybe adding correlation pathway/report assertions later could help for the full #1437 acceptance. One thing that felt a bit hacky to me was the ""AWS/RDS,AWS/EC2"" namespace value, but probably okay for now. |
|
@Devesh36 adress the issues from ceren. |
Summary
This PR is part 1 of 6 for #1437 (epic #1433: DB symptom → upstream cause without trace IDs).
It introduces a new synthetic test suite at
tests/synthetic/rds_upstream/for cross-system incident fixtures: RDS/CloudWatch symptoms correlated with EC2 app tier + ELB evidence, without relying on trace or request IDs in logs.What’s included
tests/synthetic/rds_upstream/— separate from DB-onlytests/synthetic/rds_postgres/001-request-burst-ec2-app-tier— RDS CPU +DatabaseConnectionsspike driven by sustained load on the web EC2 tier (worker tier stays flat; red-herring tier)scenario_loader.py— thin wrapper aroundrds_postgres.scenario_loaderwith suite-localSUITE_DIRtest_suite.py— fixture load, metadata/evidence validation, answer-key checkscorrelation/test_001_request_burst.py— deterministic ranking: web ASG ranks above worker ASG using time-window + topology scoring (reusesrds_postgres.correlationhelpers)Scenario 001 (maps to #1437 test case 1)
orders-prod) — climbing connections + CPU.orders-web-asgbehind ALB target grouporders-web-tg.aws_cloudwatch_metrics,aws_rds_events,ec2_instances_by_tag,elb_target_health.scenario.yml).What’s intentionally not in this PR
run_suiteCLI wrapper /Makefiletarget — PR keeps scope to pytest-only validation (no live LLM required for CI/review).docs/changes — test-suite-only; no product behavior change.Design notes
rds_postgres/015-mysql-ec2-load-attribution(split CloudWatch metric JSON files + envelope) for consistency with existing mock AWS/Grafana backends.failure_moderemainsapplication_load_spike(allowed bytests/synthetic/schemas.py); scenario identity is viascenario_idand directory name.make test-cov) ignorestests/synthetic/— reviewers should run the commands below locally (same as other synthetic suites).Motivation
tests/synthetic/rds_postgres/is largely DB-centric. #1437 requires scored, deterministic scenarios that prove the agent can attribute RDS symptoms to upstream EC2/ALB/app layers using correlation and topology—not only in-database root causes.Related issues
Test plan
Required (no API keys)