Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@ A lightweight personal AI assistant framework built on Python.

## Changelog

### v0.4 — Observability Surfaces & Doctor Mode

- **Event Backbone (Track 0)**: standardized diagnostic events with `session_key` / `run_id` / `turn_id`.
- **Health Surface (Track 1)**: machine-readable health/readiness snapshots (`health`, `status --deep --json`).
- **Logging Surface (Track 2)**: structured diagnostic logs with session/run filters and follow mode.
- **Doctor Surface**: chat-driven diagnosis commands (`/doctor`, `/doctor status`, `/doctor cancel`, `/doctor resume`) with provider precheck guidance and evidence-first diagnostics through `doctor_check`.
- Architecture doc: [`docs/architecture/doctor-observability-architecture.md`](docs/architecture/doctor-observability-architecture.md).

### v0.3 — ReAct Core, Prompt Security & Async Interrupt Handling

- **ReAct tracing core**: orchestrator now records thought/action/observation style steps with stronger iteration-cap handling.
Expand Down Expand Up @@ -379,6 +387,15 @@ User Message → [Channel] → [MessageBus] → [AgentLoop] → [ConversationOrc
| **SubagentManager** | `agent/subagent.py` | Background sub-task execution |
| **ProviderRegistry** | `providers/registry.py` | Provider metadata (17 specs) |

### Observability And Doctor

See [`docs/architecture/doctor-observability-architecture.md`](docs/architecture/doctor-observability-architecture.md) for the full design of:

- Event Backbone (Track 0)
- Health Surface (Track 1)
- Logging Surface (Track 2)
- Doctor Surface (Codex-driven diagnosis)

## Project Structure

```
Expand Down
78 changes: 78 additions & 0 deletions docs/architecture/doctor-observability-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# SnapAgent Observability + Doctor Architecture

## Goal

Build a lightweight but operable observability architecture so production issues can be diagnosed directly from chat channels (`/doctor`) without coupling diagnosis logic to core runtime internals.

## Design Principles

1. Thin connector, strong evidence.
2. Session-scoped control, no global pause.
3. Read-only diagnostics first, no risky auto-fix by default.
4. Separate deterministic data collection from model reasoning.

## Four Surfaces

### 1) Event Backbone (Track 0)

- Unified event model: `DiagnosticEvent`.
- Correlation fields: `session_key`, `run_id`, `turn_id`.
- Message bus emits structured inbound/outbound/runtime events.

Role:
- Foundation for cross-surface correlation and postmortem timeline reconstruction.

### 2) Health Surface (Track 1)

- CLI: `snapagent health --json`, `snapagent status --deep --json`.
- Aggregates provider/config/workspace/channel/runtime queue evidence.

Role:
- Fast readiness/liveness check and root-cause narrowing.

### 3) Logging Surface (Track 2)

- Structured JSONL sink (`diagnostic.jsonl`) with rotation/follow.
- CLI: `snapagent logs --json --session ... --run ... --follow`.

Role:
- Session/run-scoped evidence retrieval for operational debugging.

### 4) Doctor Surface (Codex-Driven)

- Chat commands:
- `/doctor`
- `/doctor status`
- `/doctor cancel`
- `/doctor resume`
- `/doctor` first pauses current session tasks (reuse stop/cancel path).
- Provider precheck before diagnostics:
- if provider not ready, return setup guidance and block doctor mode.
- guidance includes OAuth/API-key paths and validation command.
- Diagnostic execution is model-driven via read-only tool:
- `doctor_check(check=health|status|logs|events, session_key?, run_id?, lines?)`

Role:
- Turn observability data into interactive diagnosis in user channels (Feishu/Telegram/CLI).

## End-to-End Flow

1. User sends `/doctor` in a chat session.
2. Agent cancels active tasks for this session only.
3. Agent runs provider precheck.
4. If precheck fails: return setup guidance and stop.
5. If precheck passes: enter doctor mode and start diagnostic turn.
6. Codex decides which `doctor_check` calls to run and synthesizes conclusions.
7. User can continue with follow-up questions, or `/doctor cancel`/`/doctor resume`.

## Safety Boundaries

- Session-local interruption only; other sessions/cron are unaffected by default.
- Diagnostics are read-only (`health/status/logs/events`).
- No automatic code mutation/restart in M0.

## Why This Shape

- Deterministic observability primitives stay in SnapAgent.
- Dynamic diagnosis stays in Codex reasoning layer.
- Keeps code volume low while preserving operational control and auditability.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "snapagent-ai"
version = "0.1.4.post2"
version = "0.1.4.post3"
description = "A lightweight personal AI assistant framework"
requires-python = ">=3.11"
license = {text = "MIT"}
Expand Down
2 changes: 1 addition & 1 deletion snapagent/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""SnapAgent - A lightweight AI agent framework."""

__version__ = "0.1.4.post2"
__version__ = "0.1.4.post3"
__logo__ = "🐈"
__app_name__ = "SnapAgent"