Skip to content

Commit 57ed475

Browse files
authored
docs: sync with goclaw source changes e7afa832..a47d7f9f (EN+VI+ZH) (#42)
- knowledge-graph: comprehensive rewrite with REST API reference, data model, 3-tier search, D3 force layout, dedup management, shared KG, auto-extract - upgrading: schema version 32→33, add migration 033 - database-schema: add 5 new cron_jobs columns, migration 033 - environment-variables: add GOCLAW_ALLOWED_ORIGINS - scheduling-cron: add stateless field - openai: add developer role mapping for GPT-4o+ - telegram/whatsapp: document [From:] annotation with display name - context-files: IDENTITY.md Name auto-sync on agent rename
1 parent d965a15 commit 57ed475

27 files changed

Lines changed: 765 additions & 50 deletions

advanced/knowledge-graph.md

Lines changed: 224 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,36 @@ Each entity and relation has a **confidence score** (0.0–1.0). Only items at o
2828
- Descriptions are one sentence maximum
2929
- Temperature 0.0 for deterministic results
3030

31+
### Extract API
32+
33+
Trigger extraction manually via the REST API:
34+
35+
```bash
36+
POST /v1/agents/{agentID}/kg/extract
37+
Content-Type: application/json
38+
Authorization: Bearer <token>
39+
40+
{
41+
"text": "Conversation text to extract from...",
42+
"user_id": "user-123",
43+
"provider": "anthropic",
44+
"model": "claude-sonnet-4-20250514",
45+
"min_confidence": 0.75
46+
}
47+
```
48+
49+
Response:
50+
```json
51+
{
52+
"entities": 5,
53+
"relations": 3,
54+
"dedup_merged": 1,
55+
"dedup_flagged": 0
56+
}
57+
```
58+
59+
After extraction, inline dedup runs automatically on newly upserted entities — near-certain duplicates are merged immediately, possible duplicates are flagged for review.
60+
3161
### Relation types
3262

3363
The extractor uses a fixed set of relation types:
@@ -73,15 +103,48 @@ After extraction, GoClaw automatically checks new entities for duplicates using
73103

74104
**Flagged candidates** are stored in `kg_dedup_candidates` with status `pending`. You can list, dismiss, or manually merge them via the API.
75105

76-
### Bulk duplicate scan
106+
### Dedup Management Workflow
107+
108+
**1. Scan for duplicates** — Run a full scan across all entities:
109+
110+
```bash
111+
POST /v1/agents/{agentID}/kg/dedup/scan
112+
Content-Type: application/json
113+
114+
{"threshold": 0.90, "limit": 100}
115+
```
116+
117+
Useful after bulk imports or initial onboarding. Results are added to the review queue.
118+
119+
**2. Review candidates:**
77120

78-
You can trigger a full scan across all entities:
121+
```bash
122+
GET /v1/agents/{agentID}/kg/dedup?user_id=xxx
123+
```
124+
125+
Returns `DedupCandidate[]` with fields: `entity_a`, `entity_b`, `similarity`, `status`.
126+
127+
**3. Merge:**
79128

80129
```bash
81-
POST /v1/agents/{agentID}/kg/scan-duplicates
130+
POST /v1/agents/{agentID}/kg/merge
131+
Content-Type: application/json
132+
133+
{"target_id": "john-doe-uuid", "source_id": "j-doe-uuid"}
82134
```
83135

84-
This runs a self-join similarity scan and adds candidates to the review queue. Useful after bulk imports or initial onboarding.
136+
Re-points all relations from `source_id` to `target_id`, then deletes the source entity.
137+
138+
**4. Dismiss:**
139+
140+
```bash
141+
POST /v1/agents/{agentID}/kg/dedup/dismiss
142+
Content-Type: application/json
143+
144+
{"candidate_id": "candidate-uuid"}
145+
```
146+
147+
Marks the pair as not-duplicate — it won't appear in future review queues.
85148

86149
---
87150

@@ -96,6 +159,16 @@ This runs a self-join similarity scan and adds candidates to the review queue. U
96159
| `entity_id` | string | Start point for relationship traversal |
97160
| `max_depth` | int | Traversal depth (default 2, max 3) |
98161

162+
### 3-Tier Search Fallback
163+
164+
The tool uses a 3-tier fallback strategy to ensure results are always returned:
165+
166+
1. **Traversal** (when `entity_id` provided) — BFS outgoing traversal up to `max_depth`, returns up to 20 results with path info and relation types
167+
2. **Direct connections** (fallback if traversal returns nothing) — Bidirectional 1-hop relations, capped at 10
168+
3. **Text search** (fallback if no connections) — Full-text search on entity names/descriptions, returns up to 10 results with their relations (5 per entity)
169+
170+
When all three tiers return nothing, the tool returns the top 10 existing entities as hints so the model knows what's available in the graph.
171+
99172
### Search modes
100173

101174
**Text search** — Find entities by name or keyword:
@@ -119,6 +192,75 @@ Results include entity names, types, descriptions, depth, traversal path, and th
119192

120193
---
121194

195+
## REST API Reference
196+
197+
All endpoints require authentication (`Authorization: Bearer <token>`). Add `?user_id=<id>` to scope results to a specific user.
198+
199+
| Method | Path | Description |
200+
|--------|------|-------------|
201+
| `GET` | `/v1/agents/{agentID}/kg/entities` | List or search entities |
202+
| `GET` | `/v1/agents/{agentID}/kg/entities/{entityID}` | Get entity with its relations |
203+
| `POST` | `/v1/agents/{agentID}/kg/entities` | Upsert entity |
204+
| `DELETE` | `/v1/agents/{agentID}/kg/entities/{entityID}` | Delete entity (cascades relations) |
205+
| `POST` | `/v1/agents/{agentID}/kg/traverse` | Traverse the graph from an entity |
206+
| `POST` | `/v1/agents/{agentID}/kg/extract` | LLM-powered extraction from text |
207+
| `GET` | `/v1/agents/{agentID}/kg/stats` | Graph statistics |
208+
| `GET` | `/v1/agents/{agentID}/kg/graph` | Full graph for visualization |
209+
| `POST` | `/v1/agents/{agentID}/kg/dedup/scan` | Scan for duplicate candidates |
210+
| `GET` | `/v1/agents/{agentID}/kg/dedup` | List dedup candidates |
211+
| `POST` | `/v1/agents/{agentID}/kg/merge` | Merge two entities |
212+
| `POST` | `/v1/agents/{agentID}/kg/dedup/dismiss` | Dismiss a dedup candidate |
213+
214+
---
215+
216+
## Data Model
217+
218+
### Entity
219+
220+
```json
221+
{
222+
"id": "uuid",
223+
"agent_id": "agent-uuid",
224+
"user_id": "optional-user-id",
225+
"external_id": "john-doe",
226+
"name": "John Doe",
227+
"entity_type": "person",
228+
"description": "Backend engineer on the platform team",
229+
"properties": {"team": "platform"},
230+
"source_id": "optional-source-ref",
231+
"confidence": 0.95,
232+
"created_at": 1711900000,
233+
"updated_at": 1711900000
234+
}
235+
```
236+
237+
| Field | Description |
238+
|-------|-------------|
239+
| `external_id` | Human-readable slug (e.g., `john-doe`). Used for upsert dedup. |
240+
| `properties` | Arbitrary key-value metadata from extraction |
241+
| `source_id` | Optional reference to the source conversation or document |
242+
| `confidence` | Extraction confidence (0.0–1.0); surviving entity in merges keeps the higher value |
243+
244+
### Relation
245+
246+
```json
247+
{
248+
"id": "uuid",
249+
"agent_id": "agent-uuid",
250+
"user_id": "optional-user-id",
251+
"source_entity_id": "john-doe-uuid",
252+
"relation_type": "works_on",
253+
"target_entity_id": "project-alpha-uuid",
254+
"confidence": 0.9,
255+
"properties": {},
256+
"created_at": 1711900000
257+
}
258+
```
259+
260+
Relations are directional: `source --relation_type--> target`. Deleting an entity cascades and removes all its relations.
261+
262+
---
263+
122264
## Entity Types
123265

124266
| Type | Examples |
@@ -133,6 +275,83 @@ Results include entity names, types, descriptions, depth, traversal path, and th
133275

134276
---
135277

278+
## Graph Statistics & Visualization
279+
280+
### Statistics
281+
282+
```bash
283+
GET /v1/agents/{agentID}/kg/stats?user_id=xxx
284+
```
285+
286+
```json
287+
{
288+
"entity_count": 42,
289+
"relation_count": 87,
290+
"entity_types": {
291+
"person": 15,
292+
"project": 8,
293+
"concept": 12,
294+
"task": 7
295+
}
296+
}
297+
```
298+
299+
### Full Graph for Visualization
300+
301+
```bash
302+
GET /v1/agents/{agentID}/kg/graph?user_id=xxx&limit=200
303+
```
304+
305+
Returns all entities and relations suitable for rendering in a graph UI. Default limit is 200 entities; relations are capped at 3× the entity limit.
306+
307+
The web dashboard renders the graph using **ReactFlow** with **D3 Force Simulation** (`d3-force`) for automatic node positioning:
308+
309+
- **Force layout**`forceSimulation` computes node positions using link distance, charge repulsion (`forceManyBody`), centering (`forceCenter`), and collision avoidance (`forceCollide`). Forces scale by node count (tighter for small graphs, spread for large).
310+
- **Node sizing by type** — Each entity type has a different mass (organization=8, project=6, person=4, etc.), so hub entities naturally sit at the center.
311+
- **Degree centrality** — When entities exceed the display limit (50), the graph keeps the most-connected hub nodes. Nodes with ≥4 connections get a glow highlight.
312+
- **Interactive selection** — Clicking a node highlights its connected edges with labels, dims unrelated edges, and opens the entity detail dialog.
313+
- **Theme support** — Dual-theme color palette (dark/light) with per-entity-type colors. Theme changes update colors without re-running the layout.
314+
- **Performance** — Node components are `memo`-ized, layout runs in `setTimeout(0)` to avoid blocking, and edge updates use `useTransition` for responsive interaction.
315+
316+
---
317+
318+
## Shared Knowledge Graph
319+
320+
By default, the knowledge graph is scoped per agent **and** per user — each user builds their own graph. When `share_knowledge_graph` is enabled in the agent's workspace sharing config, the graph becomes agent-level (shared across all users):
321+
322+
```yaml
323+
workspace_sharing:
324+
share_knowledge_graph: true
325+
```
326+
327+
In shared mode, `user_id` is ignored for all KG operations — entities and relations from all users are stored and queried together. This is useful for team agents where everyone should see the same entity graph.
328+
329+
> **Note:** `share_knowledge_graph` is independent of `share_memory`. You can share memory without sharing the graph, or vice versa.
330+
331+
---
332+
333+
## Automatic Extraction on Memory Write
334+
335+
When an agent writes to its memory files (e.g., `MEMORY.md` or files under `memory/`), GoClaw automatically triggers KG extraction on the written content. This happens via the `MemoryInterceptor`, which calls the configured LLM to extract entities and relations from the new memory text.
336+
337+
This means agents continuously build their knowledge graph as they learn — no manual `/kg/extract` calls needed for normal conversations. The extract API is available for bulk imports or external integrations.
338+
339+
---
340+
341+
## Confidence Pruning
342+
343+
Remove low-confidence entities and relations in bulk using `PruneByConfidence`:
344+
345+
```bash
346+
# Internal service call — prunes items below threshold
347+
# Returns count of pruned entities and relations
348+
PruneByConfidence(agentID, userID, minConfidence)
349+
```
350+
351+
This is useful after bulk imports where many low-confidence items accumulate. Items with `confidence < minConfidence` are deleted; their relations cascade automatically.
352+
353+
---
354+
136355
## Example
137356

138357
After several conversations about a project, an agent's knowledge graph might contain:
@@ -159,4 +378,4 @@ An agent can then answer questions like *"Who is working on Project Alpha?"* by
159378
- [Memory System](/memory-system) — Vector-based long-term memory
160379
- [Sessions & History](/sessions-and-history) — Conversation storage
161380

162-
<!-- goclaw-source: e7afa832 | updated: 2026-03-30 -->
381+
<!-- goclaw-source: a47d7f9f | updated: 2026-03-31 -->

advanced/scheduling-cron.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ goclaw cron delete <jobId>
103103
| `schedule.expr` | string | 5-field cron expression (for `cron`) |
104104
| `schedule.tz` | string | IANA timezone for cron expressions; omit to use the gateway default timezone |
105105
| `message` | string | Text the agent receives as its input |
106+
| `stateless` | bool | Run without session history — saves tokens for simple scheduled tasks. Default `false` |
106107
| `deliver` | bool | `true` = deliver result to a channel; `false` = agent processes silently. Auto-defaults to `true` when the job is created from a real channel (Telegram, etc.) |
107108
| `channel` | string | Target channel: `telegram`, `discord`, etc. Auto-filled from context when `deliver` is `true` |
108109
| `to` | string | Chat ID or recipient identifier. Auto-filled from context when `deliver` is `true` |
@@ -317,4 +318,4 @@ When a session's conversation history exceeds **60% of the context window**, the
317318
- [Skills](/skills) — inject domain knowledge so scheduled agents are more effective
318319
- [Sandbox](/sandbox) — isolate code execution during scheduled agent runs
319320

320-
<!-- goclaw-source: 941a965 | updated: 2026-03-19 -->
321+
<!-- goclaw-source: a47d7f9f | updated: 2026-03-31 -->

agents/context-files.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,8 @@ _(Domain-specific knowledge goes here: coding standards, image generation techni
125125
**Open agent:** Per-user (generated on first chat)
126126
**Predefined agent:** Agent-level (optionally generated via LLM summoning)
127127

128+
> **Auto-sync:** When you rename an agent, the `Name:` field in IDENTITY.md is automatically updated to match. Other fields remain unchanged.
129+
128130
### TOOLS.md
129131

130132
**Purpose:** Local tool notes. Camera names, SSH hosts, TTS voice preferences, device nicknames.
@@ -372,4 +374,4 @@ FAQ bot creation with summoning:
372374
- [Summoning & Bootstrap](/summoning-bootstrap) — how SOUL.md and IDENTITY.md are LLM-generated
373375
- [Creating Agents](/creating-agents) — step-by-step agent creation
374376

375-
<!-- goclaw-source: 57754a5 | updated: 2026-03-23 -->
377+
<!-- goclaw-source: a47d7f9f | updated: 2026-03-31 -->

channels/telegram.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,22 @@ flowchart TD
143143
BUFFER --> NEXT["Next mention:<br/>history included"]
144144
```
145145

146+
### Group Message Annotation
147+
148+
In group chats, each message is prefixed with a `[From:]` annotation so the agent knows who is speaking:
149+
150+
```
151+
[From: @username (Display Name)]
152+
Message content here
153+
```
154+
155+
The label format depends on available user data:
156+
- Username + display name: `@username (Display Name)`
157+
- Username only: `@username`
158+
- Display name only: `Display Name`
159+
160+
This annotation is also added to DM messages for consistent sender identification.
161+
146162
### Group Concurrency
147163

148164
Group sessions support up to **3 concurrent agent runs**. When this limit is reached, additional messages are queued. This applies to all group and forum topic contexts.
@@ -275,4 +291,4 @@ Each Telegram instance maintains an isolated HTTP transport — no shared connec
275291
- [Browser Pairing](/channel-browser-pairing) — Pairing flow
276292
- [Sessions & History](/sessions-and-history) — Conversation history
277293

278-
<!-- goclaw-source: 0dab087f | updated: 2026-03-26 -->
294+
<!-- goclaw-source: a47d7f9f | updated: 2026-03-31 -->

channels/whatsapp.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,8 @@ Bridge detects group chats via `@g.us` suffix in chat ID:
7979

8080
Policies apply accordingly (DM policy for DMs, group policy for groups).
8181

82+
In group chats, messages include a `[From:]` annotation with the sender's display name, allowing the agent to distinguish between participants.
83+
8284
### Message Format
8385

8486
Messages are JSON objects:
@@ -142,4 +144,4 @@ isGroup := strings.HasSuffix(chatID, "@g.us")
142144
- [Larksuite](/channel-feishu) — Larksuite integration
143145
- [Browser Pairing](/channel-browser-pairing) — Pairing flow
144146

145-
<!-- goclaw-source: 57754a5 | updated: 2026-03-18 -->
147+
<!-- goclaw-source: a47d7f9f | updated: 2026-03-31 -->

deployment/upgrading.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ A GoClaw upgrade has two parts:
99
1. **SQL migrations** — schema changes applied by `golang-migrate` (idempotent, versioned)
1010
2. **Data hooks** — optional Go-based data transformations that run after schema migrations (e.g. backfilling a new column)
1111

12-
The `./goclaw upgrade` command handles both in the correct order. It is safe to run multiple times — it is fully idempotent. The current required schema version is **32**.
12+
The `./goclaw upgrade` command handles both in the correct order. It is safe to run multiple times — it is fully idempotent. The current required schema version is **33**.
1313

1414
```mermaid
1515
graph LR
@@ -225,6 +225,7 @@ These five migrations are auto-applied on startup when upgrading to v2.x. No man
225225
| 030 | Adds GIN indexes on `spans.metadata` (partial, `span_type = 'llm_call'`) and `sessions.metadata` JSONB columns for query performance |
226226
| 031 | Adds `tsv tsvector` generated column + GIN index to `kg_entities` for full-text search; creates `kg_dedup_candidates` table for entity deduplication review |
227227
| 032 | Creates `secure_cli_user_credentials` for per-user CLI credential injection; adds `contact_type` column to `channel_contacts` |
228+
| 033 | Cron payload columns | Promotes `stateless`, `deliver`, `deliver_channel`, `deliver_to`, `wake_heartbeat` from `payload` JSONB to dedicated columns on `cron_jobs` |
228229

229230
### Breaking Changes in v2.x
230231

@@ -277,4 +278,4 @@ Before each upgrade, check the release notes for:
277278
- [Database Setup](/deploy-database) — PostgreSQL and pgvector setup
278279
- [Observability](/deploy-observability) — monitor your gateway post-upgrade
279280

280-
<!-- goclaw-source: e7afa832 | updated: 2026-03-30 -->
281+
<!-- goclaw-source: a47d7f9f | updated: 2026-03-31 -->

providers/openai.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,10 +95,16 @@ OpenAI function calling works out of the box. GoClaw converts internal tool defi
9595
| `HTTP 400` on o-series | Unsupported parameter | Avoid setting `temperature` with o-series models |
9696
| Vision not working | Model doesn't support images | Use gpt-4o or gpt-4o-mini |
9797

98+
### Developer Role (GPT-4o+)
99+
100+
For native OpenAI endpoints (`api.openai.com`), GoClaw automatically maps the `system` role to `developer` when sending requests. The `developer` role has higher instruction priority than `system` for GPT-4o and newer models.
101+
102+
This mapping only applies to native OpenAI infrastructure. Other OpenAI-compatible backends (Azure OpenAI, proxies, Qwen, DeepSeek, etc.) continue to use the standard `system` role.
103+
98104
## What's Next
99105

100106
- [OpenRouter](/provider-openrouter) — access 100+ models through one API key
101107
- [Anthropic](/provider-anthropic) — native Claude integration
102108
- [Overview](/providers-overview) — provider architecture and retry logic
103109

104-
<!-- goclaw-source: 57754a5 | updated: 2026-03-18 -->
110+
<!-- goclaw-source: a47d7f9f | updated: 2026-03-31 -->

0 commit comments

Comments
 (0)