Skip to content

Commit a790b72

Browse files
committed
Update to v1.2.0
1 parent 9c0e990 commit a790b72

29 files changed

Lines changed: 795 additions & 1450 deletions

.claude.example/settings.local.json

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,6 @@
77
"Bash(ls:*)"
88
],
99
"deny": [
10-
"Read(.env)",
11-
"Read(**/.env)",
12-
"Read(**/.env.*)",
13-
"Bash(cat .env*)",
14-
"Bash(less .env*)",
15-
"Bash(head .env*)",
16-
"Bash(tail .env*)",
17-
"Bash(more .env*)",
18-
"Bash(env)",
19-
"Bash(printenv *)",
2010
"Bash(ssh *)",
2111
"Bash(sshpass *)",
2212
"Bash(rm -rf *)",

.claude.example/skills/qa/SKILL.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Then stop.
1818

1919
Parse the XML structure:
2020
- Each `<testcase>` is one test scenario. The `name` attribute is the scenario name.
21-
- `<properties>` contain key-value pairs: `device`, `rfc_ref`, `description`, `rfc_citation`.
21+
- `<properties>` contain key-value pairs: `device`, `rfc_ref`, `description`.
2222
- A `<testcase>` with a `<failure>` child is a failed test. The `message` attribute and text content describe the failure.
2323
- A `<testcase>` without `<failure>` is a pass.
2424

@@ -73,6 +73,6 @@ Produce a concise report for the investigated failure:
7373
5. **RFC basis**: the protocol rule that explains the failure
7474
6. **Recovery status**: is the network still broken or has it been fixed?
7575

76-
If there are remaining uninvestigated failures, re-present the list (without the one just investigated) and ask the user to pick the next one. Repeat until all failures are investigated or the user stops.
76+
**IMPORTANT — always loop back.** After the report, if there are remaining uninvestigated failures, you MUST immediately re-present the remaining failure list and ask the user to pick the next one — do not wait for the user to ask. The user acknowledging a fix ("ok", "got it", "I'll do that") is NOT a signal to stop. Only stop looping if the user explicitly declines (e.g. "that's all", "no more", "skip the rest") or all failures have been investigated.
7777

7878
After investigating a failure, if its root cause likely explains other failures still on the list, say so — the user may choose to skip those.

.gitignore

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,4 @@ yana/
77
.pytest_cache/
88
.ruff_cache/
99
ansible_test_cases/
10-
ansible/collections/ansible_collections/
11-
results/
10+
ansible/collections/ansible_collections/

CLAUDE.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,10 @@ State clearly:
6666
- What the root cause is (or what further information is needed)
6767
- What the recommended fix is (configuration direction only — never push changes)
6868

69+
### Multi-failure investigations
70+
71+
When investigating multiple failures (e.g. via `/qa`), always loop back after each finding. Present remaining uninvestigated failures and ask the user to pick the next one. The user acknowledging a fix is not a signal to stop — only stop when the user explicitly declines or all failures are covered.
72+
6973
## Constraints
7074

7175
- **Read-only.** Never suggest commands that change device configuration. Diagnosis and direction only.

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ Run your tests with any framework. When something fails, YANA investigates - it
5252

5353
**Step 1 - Install and ingest:**
5454
```bash
55+
sudo apt install git make python3.12-venv
5556
cd ~ && git clone https://github.com/pdudotdev/YANA
5657
cd YANA && make setup
5758
```

core/settings.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
USERNAME = os.getenv("ROUTER_USERNAME", "")
55
PASSWORD = os.getenv("ROUTER_PASSWORD", "")
6+
PASSWORD_JUNOS = os.getenv("ROUTER_PASSWORD_JUNOS", "") or PASSWORD
67

78
SSH_TIMEOUT_OPS = 30
89
SSH_TIMEOUT_OPS_LONG = 90

ingest.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@
66
from dotenv import load_dotenv
77
load_dotenv(Path(__file__).parent / ".env")
88

9-
from langchain.schema import Document
10-
from langchain_community.embeddings import HuggingFaceEmbeddings
11-
from langchain_community.vectorstores import Chroma
9+
from langchain_core.documents import Document
10+
from langchain_huggingface import HuggingFaceEmbeddings
11+
from langchain_chroma import Chroma
1212
from langchain_text_splitters import RecursiveCharacterTextSplitter
1313

1414
from tools.rag import _CHROMA_DIR, _COLLECTION, _EMBEDDING_MODEL

metadata/workflow/WORKFLOW.md

Lines changed: 8 additions & 113 deletions
Original file line numberDiff line numberDiff line change
@@ -48,85 +48,13 @@ See [OPTIMIZATIONS.md](../scalability/OPTIMIZATIONS.md) for the full RAG optimiz
4848

4949
---
5050

51-
## Interactive Investigation
52-
53-
The user asks a question in Claude Code. The agent follows the diagnostic workflow defined in `CLAUDE.md`:
54-
55-
### Step 0 — Preflight
56-
57-
```
58-
get_status()
59-
```
60-
61-
Confirms which backends are active: inventory (device count), intent (router count), and ChromaDB availability. Displayed as a table before any investigation begins.
62-
63-
### Step 1 — Load the Protocol Skill
64-
65-
The agent reads the relevant skill file before starting. Skill files contain decision trees and query sequences — the agent follows them, it does not improvise.
66-
67-
| When to use | Skill file |
68-
|-------------|-----------|
69-
| Adjacency, neighbor state, LSDB, area type | `skills/ospf/SKILL.md` |
70-
| Path selection, PBR, route-maps, prefix-lists, AD conflicts | `skills/routing/SKILL.md` |
71-
| Reachability ("can't reach X from Y") | Start with `traceroute` to find the breaking hop, then load the appropriate skill |
72-
73-
### Step 2 — Search the Knowledge Base
74-
75-
```
76-
search_knowledge_base(query="OSPF neighbor stuck in INIT", topic="rfc", protocol="ospf")
77-
```
78-
79-
Returns RFC text and vendor documentation relevant to the issue. The `protocol` filter eliminates cross-protocol noise. The embedding model maps the question to nearby chunks even when the exact words differ.
80-
81-
### Step 3 — Query Live Devices
82-
83-
The agent queries the devices involved in the issue:
84-
85-
```
86-
query_intent(device="D1C") # what SHOULD the network look like?
87-
get_ospf("D1C", "neighbors") # what DOES it look like?
88-
get_ospf("D1C", "interfaces") # check timers, area, passive, auth
89-
traceroute("E1C", "192.168.42.1") # where does the path break?
90-
```
91-
92-
The skill file dictates which queries to run and in what order. For OSPF adjacency issues, the checklist is: timers → area type → network type → auth → passive → MTU → interface state. Stop at the first mismatch.
93-
94-
### Step 4 — Synthesize
95-
96-
The agent combines knowledge base context with live data. When they conflict, live data wins. The report states:
97-
98-
- What the data shows
99-
- Root cause with RFC citation
100-
- Fix direction (configuration guidance only — YANA never pushes config)
101-
102-
### Example
103-
104-
```
105-
User: "Why can't E1C reach A2A's loopback?"
106-
107-
Agent:
108-
1. get_status() → inventory, intent, ChromaDB all active
109-
2. Reads skills/ospf/SKILL.md
110-
3. get_routing("E1C", "ip_route") → 192.168.42.1 missing from VRF1
111-
4. get_ospf("E1C", "database") → No Type 3 LSA for 192.168.42.1
112-
5. query_intent() → A2A should be in Area 1 (stub), connected via D1C/D2B
113-
6. get_ospf("D1C", "neighbors") → D1C has no adjacency with A2A
114-
7. get_ospf("A2A", "interfaces") → A2A's Area 1 is "normal", not stub
115-
8. search_knowledge_base("E-bit mismatch stub area", topic="rfc", protocol="ospf")
116-
117-
Report: A2A is missing `area 1 stub`. RFC 2328 §10.5: E-bit mismatch
118-
causes Hellos to be silently discarded. Fix: add stub config to A2A.
119-
```
120-
121-
---
122-
12351
## QA Investigation
12452

12553
Run your tests with any framework. When something fails, YANA investigates.
12654

12755
### Test Results
12856

129-
YANA reads JUnit XML results from `results/`. JUnit XML is the de facto standard — produced by pytest (`--junitxml`), pyATS (`--xunit`), Robot Framework (`--xunit`), Ansible (junit callback), and most other test runners.
57+
YANA reads JUnit XML results from `results/`. JUnit XML is the de facto standard — produced by pytest (`--junitxml`), pyATS (`--xunit`), Robot Framework (`--xunit`), and most other test runners.
13058

13159
Place your test results in `results/` as `.xml` files. YANA doesn't care how the tests were run — it only needs the results.
13260

@@ -142,52 +70,19 @@ When tests fail, the user runs `/qa` in Claude Code. The skill (`.claude/skills/
14270
4. Present numbered failure list to the user
14371
5. User picks a failure to investigate
14472
6. Agent reads test context from <properties> (device, rfc_ref, description)
145-
7. Agent runs the same diagnostic workflow as interactive mode:
73+
7. Agent investigates:
74+
- get_status() → confirm backends are active
75+
- Load protocol skill (skills/ospf/SKILL.md or skills/routing/SKILL.md)
14676
- query_intent() → expected state
147-
- get_ospf/get_routing/get_interfaces → live state
148-
- Follows skill decision trees to trace the root cause
77+
- get_ospf/get_routing/get_interfaces/traceroute → live state
78+
- Follow skill decision trees to trace the root cause
14979
- search_knowledge_base → RFC context
15080
8. Reports findings (scenario, observed, current state, root cause, RFC basis)
15181
9. Re-presents remaining failures — user picks next, or stops
15282
```
15383

15484
If multiple failures share a root cause, the agent says so after investigating the first one — the user can skip the rest.
15585

156-
---
157-
158-
## Architecture Summary
86+
### Interactive Mode
15987

160-
```
161-
┌─────────────────────────────────────────┐
162-
│ Claude Code (UI) │
163-
│ │
164-
│ Interactive: User asks a question │
165-
│ QA: User runs /qa after tests │
166-
└──────────────┬──────────────────────────┘
167-
│ MCP protocol
168-
┌──────────────▼──────────────────────────┐
169-
│ YANA MCP Server │
170-
│ server/MCPServer.py │
171-
│ │
172-
│ 8 tools registered via FastMCP │
173-
└──┬───────┬───────┬───────┬──────────────┘
174-
│ │ │ │
175-
┌────────▼──┐ ┌──▼────┐ ┌▼─────┐ ┌▼──────────┐
176-
│ SSH tools │ │ RAG │ │Intent│ │ Status │
177-
│ get_ospf │ │search │ │query │ │ get_status│
178-
│ get_routing│ │_kb │ │_intent││ list_dev │
179-
│ get_intf │ │ │ │ │ │ │
180-
│ traceroute │ │ │ │ │ │ │
181-
└─────┬──────┘ └───┬───┘ └──┬───┘ └───────────┘
182-
│ │ │
183-
┌──────────▼──┐ ┌─────▼───┐ ┌──▼──────────┐
184-
│ Scrapli SSH │ │ChromaDB │ │ JSON files │
185-
│ 6 vendors │ │ + MiniLM│ │ data/*.json │
186-
│ env creds │ │ │ │ │
187-
└──────────────┘ └─────────┘ └─────────────┘
188-
189-
Test runners (separate process, not MCP):
190-
pytest, pyATS, Ansible, Robot Framework, etc.
191-
→ JUnit XML results in results/
192-
→ Consumed by /qa skill in Claude
193-
```
88+
YANA also handles ad-hoc questions outside the QA workflow. The user asks a question directly (e.g. "Why can't E1C reach A2A's loopback?") and the agent follows the same diagnostic process: preflight check via `get_status()`, load the relevant protocol skill, query live devices, search the knowledge base, and synthesize a report with root cause and RFC citation. The full interactive workflow is defined in `CLAUDE.md`.

requirements.txt

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@
22
fastmcp>=3.0,<4.0
33

44
# RAG pipeline
5-
langchain>=0.3,<0.4
6-
langchain-community>=0.3,<0.4
7-
chromadb>=0.6,<1.0
5+
langchain-core>=1.0,<2.0
6+
langchain-huggingface>=1.0,<2.0
7+
langchain-chroma>=1.0,<2.0
8+
langchain-text-splitters>=0.3,<1.0
89
sentence-transformers>=3.0,<4.0
910

1011
# Environment

results/network_qa_example.xml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<testsuites name="network_qa"
3+
tests="3"
4+
failures="2"
5+
timestamp="2026-03-30T06:15:42Z">
6+
<testsuite name="network_qa"
7+
tests="3"
8+
failures="2">
9+
<testcase name="route_to_a2a" classname="route_to_a2a">
10+
<properties>
11+
<property name="device" value="E1C"/>
12+
<property name="rfc_ref" value="RFC 2328 &sect;16"/>
13+
<property name="description" value="Verify E1C has route to A2A loopback 192.168.42.1 in VRF1"/>
14+
</properties>
15+
<failure message="Verify E1C has route to A2A loopback 192.168.42.1 in VRF1">Assertion route_exists(&quot;192.168.42.1&quot;) returned False. NETCONF response contains VRF1 routing entries but 192.168.42.1/32 is not present in the RIB.</failure>
16+
</testcase>
17+
<testcase name="ospf_adj_e1c_c1j" classname="ospf_adj_e1c_c1j">
18+
<properties>
19+
<property name="device" value="E1C"/>
20+
<property name="rfc_ref" value="RFC 2328 &sect;10.3"/>
21+
<property name="description" value="Verify E1C has FULL OSPF adjacency with C1J (router-id 22.22.22.11)"/>
22+
</properties>
23+
<failure message="Verify E1C has FULL OSPF adjacency with C1J (router-id 22.22.22.11)">Assertion ospf_neighbor_full(&quot;22.22.22.11&quot;) returned False. Neighbor 22.22.22.11 found but adjacency state is INIT, not FULL.</failure>
24+
</testcase>
25+
<testcase name="route_map_e1c_to_c1j" classname="route_map_e1c_to_c1j">
26+
<properties>
27+
<property name="device" value="C1J"/>
28+
<property name="rfc_ref" value="RFC 2328 &sect;16.4"/>
29+
<property name="description" value="Verify route-map on E1C redistributes static route 10.99.99.0/24 to C1J via OSPF"/>
30+
</properties>
31+
</testcase>
32+
</testsuite>
33+
</testsuites>

0 commit comments

Comments
 (0)