Skip to content

Commit 66a2913

Browse files
authored
Merge pull request #24 from QianCyrus/docs/doctor-arch-readme-version
docs: add doctor architecture doc, update README, bump version
2 parents aebbb7b + 6f03aed commit 66a2913

4 files changed

Lines changed: 97 additions & 2 deletions

File tree

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,14 @@ A lightweight personal AI assistant framework built on Python.
44

55
## Changelog
66

7+
### v0.4 — Observability Surfaces & Doctor Mode
8+
9+
- **Event Backbone (Track 0)**: standardized diagnostic events with `session_key` / `run_id` / `turn_id`.
10+
- **Health Surface (Track 1)**: machine-readable health/readiness snapshots (`health`, `status --deep --json`).
11+
- **Logging Surface (Track 2)**: structured diagnostic logs with session/run filters and follow mode.
12+
- **Doctor Surface**: chat-driven diagnosis commands (`/doctor`, `/doctor status`, `/doctor cancel`, `/doctor resume`) with provider precheck guidance and evidence-first diagnostics through `doctor_check`.
13+
- Architecture doc: [`docs/architecture/doctor-observability-architecture.md`](docs/architecture/doctor-observability-architecture.md).
14+
715
### v0.3 — ReAct Core, Prompt Security & Async Interrupt Handling
816

917
- **ReAct tracing core**: orchestrator now records thought/action/observation style steps with stronger iteration-cap handling.
@@ -379,6 +387,15 @@ User Message → [Channel] → [MessageBus] → [AgentLoop] → [ConversationOrc
379387
| **SubagentManager** | `agent/subagent.py` | Background sub-task execution |
380388
| **ProviderRegistry** | `providers/registry.py` | Provider metadata (17 specs) |
381389

390+
### Observability And Doctor
391+
392+
See [`docs/architecture/doctor-observability-architecture.md`](docs/architecture/doctor-observability-architecture.md) for the full design of:
393+
394+
- Event Backbone (Track 0)
395+
- Health Surface (Track 1)
396+
- Logging Surface (Track 2)
397+
- Doctor Surface (Codex-driven diagnosis)
398+
382399
## Project Structure
383400

384401
```
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# SnapAgent Observability + Doctor Architecture
2+
3+
## Goal
4+
5+
Build a lightweight but operable observability architecture so production issues can be diagnosed directly from chat channels (`/doctor`) without coupling diagnosis logic to core runtime internals.
6+
7+
## Design Principles
8+
9+
1. Thin connector, strong evidence.
10+
2. Session-scoped control, no global pause.
11+
3. Read-only diagnostics first, no risky auto-fix by default.
12+
4. Separate deterministic data collection from model reasoning.
13+
14+
## Four Surfaces
15+
16+
### 1) Event Backbone (Track 0)
17+
18+
- Unified event model: `DiagnosticEvent`.
19+
- Correlation fields: `session_key`, `run_id`, `turn_id`.
20+
- Message bus emits structured inbound/outbound/runtime events.
21+
22+
Role:
23+
- Foundation for cross-surface correlation and postmortem timeline reconstruction.
24+
25+
### 2) Health Surface (Track 1)
26+
27+
- CLI: `snapagent health --json`, `snapagent status --deep --json`.
28+
- Aggregates provider/config/workspace/channel/runtime queue evidence.
29+
30+
Role:
31+
- Fast readiness/liveness check and root-cause narrowing.
32+
33+
### 3) Logging Surface (Track 2)
34+
35+
- Structured JSONL sink (`diagnostic.jsonl`) with rotation/follow.
36+
- CLI: `snapagent logs --json --session ... --run ... --follow`.
37+
38+
Role:
39+
- Session/run-scoped evidence retrieval for operational debugging.
40+
41+
### 4) Doctor Surface (Codex-Driven)
42+
43+
- Chat commands:
44+
- `/doctor`
45+
- `/doctor status`
46+
- `/doctor cancel`
47+
- `/doctor resume`
48+
- `/doctor` first pauses current session tasks (reuse stop/cancel path).
49+
- Provider precheck before diagnostics:
50+
- if provider not ready, return setup guidance and block doctor mode.
51+
- guidance includes OAuth/API-key paths and validation command.
52+
- Diagnostic execution is model-driven via read-only tool:
53+
- `doctor_check(check=health|status|logs|events, session_key?, run_id?, lines?)`
54+
55+
Role:
56+
- Turn observability data into interactive diagnosis in user channels (Feishu/Telegram/CLI).
57+
58+
## End-to-End Flow
59+
60+
1. User sends `/doctor` in a chat session.
61+
2. Agent cancels active tasks for this session only.
62+
3. Agent runs provider precheck.
63+
4. If precheck fails: return setup guidance and stop.
64+
5. If precheck passes: enter doctor mode and start diagnostic turn.
65+
6. Codex decides which `doctor_check` calls to run and synthesizes conclusions.
66+
7. User can continue with follow-up questions, or `/doctor cancel`/`/doctor resume`.
67+
68+
## Safety Boundaries
69+
70+
- Session-local interruption only; other sessions/cron are unaffected by default.
71+
- Diagnostics are read-only (`health/status/logs/events`).
72+
- No automatic code mutation/restart in M0.
73+
74+
## Why This Shape
75+
76+
- Deterministic observability primitives stay in SnapAgent.
77+
- Dynamic diagnosis stays in Codex reasoning layer.
78+
- Keeps code volume low while preserving operational control and auditability.

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "snapagent-ai"
3-
version = "0.1.4.post2"
3+
version = "0.1.4.post3"
44
description = "A lightweight personal AI assistant framework"
55
requires-python = ">=3.11"
66
license = {text = "MIT"}

snapagent/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
"""SnapAgent - A lightweight AI agent framework."""
22

3-
__version__ = "0.1.4.post2"
3+
__version__ = "0.1.4.post3"
44
__logo__ = "🐈"
55
__app_name__ = "SnapAgent"

0 commit comments

Comments
 (0)