- Replaced scrapli with Netmiko for all pytest SSH connections — fixes
AttributeErroron RouterOS and VyOS (scrapli'sGenericDriverdoes not havesend_configs) send_configs()→send_config_set()— Netmiko'sNoConfigmixin makes this work uniformly across all platforms, including RouterOS (no config mode)- Added
COMMIT_PLATFORMSset (junos,vyos) — JunOS and VyOS require explicitconn.commit()after config changes to apply candidate config - Platform mapping updated to Netmiko device_type strings:
cisco_ios,arista_eos,juniper_junos,aruba_aoscx,mikrotik_routeros,vyos conn.send_command(cmd).result→conn.send_command(cmd)— Netmiko returns string directlyconn.close()→conn.disconnect()
- ID naming convention: fixed contradiction — direction-encoded (
<setupDevice>_<verifyDevice>), not alphabetical. Updated in spec-schema.md, SKILL.md QC-6, WORKFLOW.md - device/peer semantics:
device= setup target,peer= verify target — now explicit in schema comments - peer.rid: clarified as the peer device's own router_id from INTENT.json
- QC-7 strengthened: teardown verify_cli/verify_field/verify_expected MUST equal setup snapshot_cli/snapshot_field/snapshot_expected — ensures rollback verification checks the same parameter that was changed
- CLAUDE.md bidirectional rule: added same-vendor exception to match QC-8 (same
cli_stylepairs → one direction only) - Scope guard: SKILL.md Step 1 warns for broad multi-protocol/multi-device requests
- conftest.py MUST use
spec_dir.glob("*.yaml")— never hardcode spec filenames (shareability fix) - Parametrization MUST use loaded spec list — never hardcode indices
- Emergency rollback playbook: all tasks MUST have
ignore_errors: true(best-effort recovery) deregister_rollbackcatchesValueErroron mismatch (robustness fix)- spec-renderers.md template: fixed
match_byaccess fromtest_entry.get("match_by")totest_entry["assertion"].get("match_by")
- WORKFLOW.md: all scrapli references updated to Netmiko; QC-6 and QC-7 updated; C1J RID corrected (10.10.10.10 → 22.22.22.11)
- Removed the Tier 1 (read-only) / Tier 2 (active) distinction — all tests now follow configure-wait-check-teardown
tierfield removed from YAML spec metadata and per-entry schemarollback_riskfield removed (no behavioral effect;descriptionfield is sufficient)setup,wait,teardownblocks are now mandatory for every test entrytry/finally(pytest) andblock/always(Ansible) are the only rendering patterns- Emergency rollback playbook always generated alongside every Ansible playbook
- Session rollback registry always present in
conftest.py - Step 1 tier detection logic (keyword-based, three-branch heuristic) removed entirely
- Step 6 simplified: single plan table with
⚠️ warning, "Proceed?" — no tier choice - QC-8 (was QC-9) directionality rule now applies to all tests universally
- Old QC-8 (
rollback_risk) removed; QC-9 renumbered to QC-8 - Requests for read-only state checks (e.g., "show me OSPF neighbor states") → agent asks user to be more specific about what condition to test
- Test scenarios updated: Q1–Q10, legacy tier-specific scenarios removed
- Added
rfc4271_bgp.md: BGP-4 core — FSM, message types, path attributes, decision process, NOTIFICATION errors, timers - Added
rfc4760_mpbgp.md: MP-BGP extensions — MP_REACH_NLRI, MP_UNREACH_NLRI, AFI/SAFI, capability advertisement - Added
rfc7868_eigrp.md: EIGRP — DUAL algorithm, packet types, RTP, classic/wide metrics, stub routing, SIA handling _RFC_PROTOCOL_MAPupdated:rfc4271→ bgp,rfc4760→ bgp,rfc7868→ eigrp- Added vendor BGP guides: Cisco IOS, Arista EOS, Juniper JunOS, Aruba AOS-CX, MikroTik RouterOS, VyOS
- Added vendor EIGRP guide: Cisco IOS
- Fixed
extract_metadatato handle single-part vendor names (e.g.,vyos) — protocol is now always the last filename segment - KB document count: 8 → 18 (5 RFCs + 6 OSPF guides + 6 BGP guides + 1 EIGRP guide)
- KB chunk count: ~199 → ~531 after re-ingestion
- New general-purpose skill replacing the protocol-specific
/ospf-adjskill - Handles any protocol, any feature, any test type from a natural language request
- Protocol, feature, device scope, and tier all derived dynamically from free-form input
- Tier determination logic (Step 1):
- Explicit active keywords ("active", "configure", "inject", "force", "simulate", "mismatch test", "negative test") → Tier 2 only, no clarification
- Explicit read-only keywords ("verify", "check", "confirm", "read-only") → Tier 1 only, no clarification
- Otherwise → assess tier-2 eligibility; if eligible, offer both tiers at Step 6 (Case B)
- Step 6 mandatory pause: test plan presented to user before any files are generated
- 9 quality controls (QC-1 through QC-9): RFC grounding, bidirectionality, vendor CLI, assertion specificity, tier 2 completeness, and tier 2 directionality
- Legacy
/ospf-adjskill archived tometadata/legacy/ospf-adj/SKILL.md
- New test tier: configure-wait-check-teardown lifecycle
setupblock: target device, config command, pre-flight snapshot (CLI + field + expected value)waitblock: post-config convergence delay (type: convergence / fixed / poll, seconds)teardownblock: revert command, verify rollback succeeded (CLI + field + expected value)teardown.verify_expectedmust equalsetup.snapshot_expected— both sourced from INTENT.jsonrollback_riskfield per test:low(timers, cost, priority),medium(auth, area type),high(redistribution, route policy, router-id)- Pytest renderer:
try/finallywrapper — teardown always runs regardless of test outcome - Session-level rollback registry in
conftest.py— emergency rollback on interrupted test runs - Ansible renderer:
block/alwayspattern — teardown inalwaysblock - Emergency rollback playbook (
playbook_<skill>_rollback.yml) generated alongside any tier 2 playbook - QC-9 tier 2 directionality: cross-vendor pairs (different
cli_style) generate tests in BOTH directions; same-vendor pairs generate ONE direction only
- Added "Configuration Revert Patterns" section to all 6 vendor guides
- Cisco IOS / IOS-XE:
no <command>pattern with exception table — auth requires two-step removal,area stub no-summarydistinct fromarea stub, router-id change requiresclear ip ospf processon some IOS-XE versions - Arista EOS:
no <command>ORdefault <command>(factory reset to platform default) - Juniper JunOS:
delete <config-path>+commit; built-inrollback <n>config versioning;commit confirmed <minutes>safety net; ABR default-metric required for stub/NSSA default route - Aruba AOS-CX: verified from official docs —
no area <id> nssa no-summaryis partial revert only (useno area <id> nssafor full removal);ip ospf shutdown≠ remove interface from area;no ip ospf message-digest-keyrequires KEY-ID argument - MikroTik RouterOS:
set <id> <param>=(empty string = revert to default);remove <id>for object-model entries - VyOS:
delete <config-path>+commit;commit-confirm <minutes>safety net;passive [disable]is a per-interface override whenpassive-interface defaultis active - KB chunk count: ~170 → ~199 after re-ingestion (pre-BGP/EIGRP vendor guides)
- Scoped intent queries:
query_intent("<device>")per named device instead of full topology dump; 2-device scoped run: ~24 KB → ~2.4 KB intent input (−90%) - Schema / renderer split:
spec-format.md(14.5 KB) split into:spec-schema.md(4 KB) — loaded at Step 7, before YAML generationspec-renderers.md(5.3 KB) — loaded at Step 9, before rendering
- Lazy loading: each file gated to the step that needs it — renderer patterns absent during spec generation (highest hallucination risk step)
- Total spec + intent overhead: −70% for scoped runs; estimated cost −60% on Sonnet 4.6
OPTIMIZATIONS.md— new section: Context Window and Token Cost Optimizations (before/after numbers, recommended next steps)testing/test_scenarios.md— rebuilt: S1–S10 legacy/ospf-adjscenarios archived; Q1–Q8/qascenarios covering Tier 1, Tier 2, dual-tier offering, QC-9 cross-vendor and same-vendor directionalitymetadata/workflow/WORKFLOW.md— updated to reflect/qaskill, Tier 2 lifecycle, current output pipeline, and file loading sequencemetadata/legacy/ospf-adj/SKILL.md— legacy skill preserved as reference
Initial release.
- 3 tools:
search_knowledge_base,query_intent,list_devices - Pydantic-validated inputs with typed enums for vendor, topic, protocol, cli_style
- Lazy-init ChromaDB with thread-safe singleton
- Compound metadata filtering (
$andfor multi-field queries)
- 8 knowledge base documents: 2 RFCs (2328, 3101) + 6 vendor guides (Cisco IOS, Arista EOS, Juniper JunOS, Aruba AOS-CX, MikroTik RouterOS, VyOS)
all-MiniLM-L6-v2embeddings (384 dimensions), ChromaDB HNSW index- Metadata:
vendor,topic,source,protocol— auto-derived from filenames - Contextual chunk headers prepended for embedding quality
RecursiveCharacterTextSplitterwith Markdown header separators (800 chars, 100 overlap)
- 16 devices across 4 autonomous systems
- 11 OSPF routers (4 leaf, 2 ABR, 2 core, 1 DC, 2 edge), 19 adjacency pairs
- 2 EIGRP-only routers, 3 BGP-only routers
- 6 platforms: Cisco IOS-XE, Arista EOS, Juniper JunOS, Aruba AOS-CX, MikroTik RouterOS, VyOS (FRR)
- Dual-area ABRs (D1C, D2B) with
area_typesdict; leaf routers witharea_typestring
- 8-step workflow: preflight, spec format load, pair extraction, KB research, YAML spec generation, pytest render, Ansible render, summary
- 8 test criteria (ADJ-01 through ADJ-08): Interface Up, Neighbor Presence, State FULL, Area ID Match, Timer Match, Stub Agreement, MTU Match, Router ID Unique
- Scoped output: 1 device = OR (all pairs for that device), 2+ devices = AND (pairs between listed devices only)
- Canonical filenames with sorted device suffix
- cli_style to KB vendor mapping table for accurate RAG queries
- Per-device test ID pattern for ADJ-08; pair-based for all others
- 6 quality controls (QC-1 through QC-6) enforced during generation
- YAML test spec schema with device, peer, context, query, assertion fields
- 8 assertion types: interface_up, neighbor_presence, neighbor_state, timer_match, area_match, stub_agreement, mtu_match, router_id_unique
- Pytest renderer guidance: scrapli platform mapping, session-scoped fixtures, JUnit XML output
- Ansible renderer guidance:
cli_commandmodule,ansible_network_osmapping, RFC traceability viavars.rfc
output/spec/— canonical YAML test specificationsoutput/pytest/— scrapli-based pytest suites withconftest.pyoutput/ansible/— Ansible playbooks withinventory.yml
CLAUDE.md— agent system prompt (tools, quality standards, data model)metadata/workflow/WORKFLOW.md— end-to-end test generation flowmetadata/scalability/OPTIMIZATIONS.md— RAG precision roadmap (9 optimizations, 2 implemented)testing/test_scenarios.md— 10 manual test scenarios with expected pair/test counts
Makefilewithsetup,install,ingest,cleantargets- Python 3.11+ venv (
aiqa/) - CPU-only PyTorch for local embeddings