You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: advanced/knowledge-graph.md
+224-5Lines changed: 224 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,6 +28,36 @@ Each entity and relation has a **confidence score** (0.0–1.0). Only items at o
28
28
- Descriptions are one sentence maximum
29
29
- Temperature 0.0 for deterministic results
30
30
31
+
### Extract API
32
+
33
+
Trigger extraction manually via the REST API:
34
+
35
+
```bash
36
+
POST /v1/agents/{agentID}/kg/extract
37
+
Content-Type: application/json
38
+
Authorization: Bearer <token>
39
+
40
+
{
41
+
"text": "Conversation text to extract from...",
42
+
"user_id": "user-123",
43
+
"provider": "anthropic",
44
+
"model": "claude-sonnet-4-20250514",
45
+
"min_confidence": 0.75
46
+
}
47
+
```
48
+
49
+
Response:
50
+
```json
51
+
{
52
+
"entities": 5,
53
+
"relations": 3,
54
+
"dedup_merged": 1,
55
+
"dedup_flagged": 0
56
+
}
57
+
```
58
+
59
+
After extraction, inline dedup runs automatically on newly upserted entities — near-certain duplicates are merged immediately, possible duplicates are flagged for review.
60
+
31
61
### Relation types
32
62
33
63
The extractor uses a fixed set of relation types:
@@ -73,15 +103,48 @@ After extraction, GoClaw automatically checks new entities for duplicates using
73
103
74
104
**Flagged candidates** are stored in `kg_dedup_candidates` with status `pending`. You can list, dismiss, or manually merge them via the API.
75
105
76
-
### Bulk duplicate scan
106
+
### Dedup Management Workflow
107
+
108
+
**1. Scan for duplicates** — Run a full scan across all entities:
109
+
110
+
```bash
111
+
POST /v1/agents/{agentID}/kg/dedup/scan
112
+
Content-Type: application/json
113
+
114
+
{"threshold": 0.90, "limit": 100}
115
+
```
116
+
117
+
Useful after bulk imports or initial onboarding. Results are added to the review queue.
118
+
119
+
**2. Review candidates:**
77
120
78
-
You can trigger a full scan across all entities:
121
+
```bash
122
+
GET /v1/agents/{agentID}/kg/dedup?user_id=xxx
123
+
```
124
+
125
+
Returns `DedupCandidate[]` with fields: `entity_a`, `entity_b`, `similarity`, `status`.
This runs a self-join similarity scan and adds candidates to the review queue. Useful after bulk imports or initial onboarding.
136
+
Re-points all relations from `source_id` to `target_id`, then deletes the source entity.
137
+
138
+
**4. Dismiss:**
139
+
140
+
```bash
141
+
POST /v1/agents/{agentID}/kg/dedup/dismiss
142
+
Content-Type: application/json
143
+
144
+
{"candidate_id": "candidate-uuid"}
145
+
```
146
+
147
+
Marks the pair as not-duplicate — it won't appear in future review queues.
85
148
86
149
---
87
150
@@ -96,6 +159,16 @@ This runs a self-join similarity scan and adds candidates to the review queue. U
96
159
|`entity_id`| string | Start point for relationship traversal |
97
160
|`max_depth`| int | Traversal depth (default 2, max 3) |
98
161
162
+
### 3-Tier Search Fallback
163
+
164
+
The tool uses a 3-tier fallback strategy to ensure results are always returned:
165
+
166
+
1.**Traversal** (when `entity_id` provided) — BFS outgoing traversal up to `max_depth`, returns up to 20 results with path info and relation types
167
+
2.**Direct connections** (fallback if traversal returns nothing) — Bidirectional 1-hop relations, capped at 10
168
+
3.**Text search** (fallback if no connections) — Full-text search on entity names/descriptions, returns up to 10 results with their relations (5 per entity)
169
+
170
+
When all three tiers return nothing, the tool returns the top 10 existing entities as hints so the model knows what's available in the graph.
171
+
99
172
### Search modes
100
173
101
174
**Text search** — Find entities by name or keyword:
@@ -119,6 +192,75 @@ Results include entity names, types, descriptions, depth, traversal path, and th
119
192
120
193
---
121
194
195
+
## REST API Reference
196
+
197
+
All endpoints require authentication (`Authorization: Bearer <token>`). Add `?user_id=<id>` to scope results to a specific user.
198
+
199
+
| Method | Path | Description |
200
+
|--------|------|-------------|
201
+
|`GET`|`/v1/agents/{agentID}/kg/entities`| List or search entities |
202
+
|`GET`|`/v1/agents/{agentID}/kg/entities/{entityID}`| Get entity with its relations |
|`GET`|`/v1/agents/{agentID}/kg/graph`| Full graph for visualization |
209
+
|`POST`|`/v1/agents/{agentID}/kg/dedup/scan`| Scan for duplicate candidates |
210
+
|`GET`|`/v1/agents/{agentID}/kg/dedup`| List dedup candidates |
211
+
|`POST`|`/v1/agents/{agentID}/kg/merge`| Merge two entities |
212
+
|`POST`|`/v1/agents/{agentID}/kg/dedup/dismiss`| Dismiss a dedup candidate |
213
+
214
+
---
215
+
216
+
## Data Model
217
+
218
+
### Entity
219
+
220
+
```json
221
+
{
222
+
"id": "uuid",
223
+
"agent_id": "agent-uuid",
224
+
"user_id": "optional-user-id",
225
+
"external_id": "john-doe",
226
+
"name": "John Doe",
227
+
"entity_type": "person",
228
+
"description": "Backend engineer on the platform team",
229
+
"properties": {"team": "platform"},
230
+
"source_id": "optional-source-ref",
231
+
"confidence": 0.95,
232
+
"created_at": 1711900000,
233
+
"updated_at": 1711900000
234
+
}
235
+
```
236
+
237
+
| Field | Description |
238
+
|-------|-------------|
239
+
|`external_id`| Human-readable slug (e.g., `john-doe`). Used for upsert dedup. |
240
+
|`properties`| Arbitrary key-value metadata from extraction |
241
+
|`source_id`| Optional reference to the source conversation or document |
242
+
|`confidence`| Extraction confidence (0.0–1.0); surviving entity in merges keeps the higher value |
243
+
244
+
### Relation
245
+
246
+
```json
247
+
{
248
+
"id": "uuid",
249
+
"agent_id": "agent-uuid",
250
+
"user_id": "optional-user-id",
251
+
"source_entity_id": "john-doe-uuid",
252
+
"relation_type": "works_on",
253
+
"target_entity_id": "project-alpha-uuid",
254
+
"confidence": 0.9,
255
+
"properties": {},
256
+
"created_at": 1711900000
257
+
}
258
+
```
259
+
260
+
Relations are directional: `source --relation_type--> target`. Deleting an entity cascades and removes all its relations.
261
+
262
+
---
263
+
122
264
## Entity Types
123
265
124
266
| Type | Examples |
@@ -133,6 +275,83 @@ Results include entity names, types, descriptions, depth, traversal path, and th
133
275
134
276
---
135
277
278
+
## Graph Statistics & Visualization
279
+
280
+
### Statistics
281
+
282
+
```bash
283
+
GET /v1/agents/{agentID}/kg/stats?user_id=xxx
284
+
```
285
+
286
+
```json
287
+
{
288
+
"entity_count": 42,
289
+
"relation_count": 87,
290
+
"entity_types": {
291
+
"person": 15,
292
+
"project": 8,
293
+
"concept": 12,
294
+
"task": 7
295
+
}
296
+
}
297
+
```
298
+
299
+
### Full Graph for Visualization
300
+
301
+
```bash
302
+
GET /v1/agents/{agentID}/kg/graph?user_id=xxx&limit=200
303
+
```
304
+
305
+
Returns all entities and relations suitable for rendering in a graph UI. Default limit is 200 entities; relations are capped at 3× the entity limit.
306
+
307
+
The web dashboard renders the graph using **ReactFlow** with **D3 Force Simulation** (`d3-force`) for automatic node positioning:
308
+
309
+
-**Force layout** — `forceSimulation` computes node positions using link distance, charge repulsion (`forceManyBody`), centering (`forceCenter`), and collision avoidance (`forceCollide`). Forces scale by node count (tighter for small graphs, spread for large).
310
+
-**Node sizing by type** — Each entity type has a different mass (organization=8, project=6, person=4, etc.), so hub entities naturally sit at the center.
311
+
-**Degree centrality** — When entities exceed the display limit (50), the graph keeps the most-connected hub nodes. Nodes with ≥4 connections get a glow highlight.
312
+
-**Interactive selection** — Clicking a node highlights its connected edges with labels, dims unrelated edges, and opens the entity detail dialog.
313
+
-**Theme support** — Dual-theme color palette (dark/light) with per-entity-type colors. Theme changes update colors without re-running the layout.
314
+
-**Performance** — Node components are `memo`-ized, layout runs in `setTimeout(0)` to avoid blocking, and edge updates use `useTransition` for responsive interaction.
315
+
316
+
---
317
+
318
+
## Shared Knowledge Graph
319
+
320
+
By default, the knowledge graph is scoped per agent **and** per user — each user builds their own graph. When `share_knowledge_graph` is enabled in the agent's workspace sharing config, the graph becomes agent-level (shared across all users):
321
+
322
+
```yaml
323
+
workspace_sharing:
324
+
share_knowledge_graph: true
325
+
```
326
+
327
+
In shared mode, `user_id` is ignored for all KG operations — entities and relations from all users are stored and queried together. This is useful for team agents where everyone should see the same entity graph.
328
+
329
+
> **Note:** `share_knowledge_graph` is independent of `share_memory`. You can share memory without sharing the graph, or vice versa.
330
+
331
+
---
332
+
333
+
## Automatic Extraction on Memory Write
334
+
335
+
When an agent writes to its memory files (e.g., `MEMORY.md` or files under `memory/`), GoClaw automatically triggers KG extraction on the written content. This happens via the `MemoryInterceptor`, which calls the configured LLM to extract entities and relations from the new memory text.
336
+
337
+
This means agents continuously build their knowledge graph as they learn — no manual `/kg/extract` calls needed for normal conversations. The extract API is available for bulk imports or external integrations.
338
+
339
+
---
340
+
341
+
## Confidence Pruning
342
+
343
+
Remove low-confidence entities and relations in bulk using `PruneByConfidence`:
344
+
345
+
```bash
346
+
# Internal service call — prunes items below threshold
347
+
# Returns count of pruned entities and relations
348
+
PruneByConfidence(agentID, userID, minConfidence)
349
+
```
350
+
351
+
This is useful after bulk imports where many low-confidence items accumulate. Items with `confidence < minConfidence` are deleted; their relations cascade automatically.
352
+
353
+
---
354
+
136
355
## Example
137
356
138
357
After several conversations about a project, an agent's knowledge graph might contain:
@@ -159,4 +378,4 @@ An agent can then answer questions like *"Who is working on Project Alpha?"* by
|`schedule.tz`| string | IANA timezone for cron expressions; omit to use the gateway default timezone |
105
105
|`message`| string | Text the agent receives as its input |
106
+
|`stateless`| bool | Run without session history — saves tokens for simple scheduled tasks. Default `false`|
106
107
|`deliver`| bool |`true` = deliver result to a channel; `false` = agent processes silently. Auto-defaults to `true` when the job is created from a real channel (Telegram, etc.) |
107
108
|`channel`| string | Target channel: `telegram`, `discord`, etc. Auto-filled from context when `deliver` is `true`|
108
109
|`to`| string | Chat ID or recipient identifier. Auto-filled from context when `deliver` is `true`|
@@ -317,4 +318,4 @@ When a session's conversation history exceeds **60% of the context window**, the
317
318
-[Skills](/skills) — inject domain knowledge so scheduled agents are more effective
318
319
-[Sandbox](/sandbox) — isolate code execution during scheduled agent runs
This annotation is also added to DM messages for consistent sender identification.
161
+
146
162
### Group Concurrency
147
163
148
164
Group sessions support up to **3 concurrent agent runs**. When this limit is reached, additional messages are queued. This applies to all group and forum topic contexts.
@@ -275,4 +291,4 @@ Each Telegram instance maintains an isolated HTTP transport — no shared connec
Copy file name to clipboardExpand all lines: deployment/upgrading.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ A GoClaw upgrade has two parts:
9
9
1.**SQL migrations** — schema changes applied by `golang-migrate` (idempotent, versioned)
10
10
2.**Data hooks** — optional Go-based data transformations that run after schema migrations (e.g. backfilling a new column)
11
11
12
-
The `./goclaw upgrade` command handles both in the correct order. It is safe to run multiple times — it is fully idempotent. The current required schema version is **32**.
12
+
The `./goclaw upgrade` command handles both in the correct order. It is safe to run multiple times — it is fully idempotent. The current required schema version is **33**.
13
13
14
14
```mermaid
15
15
graph LR
@@ -225,6 +225,7 @@ These five migrations are auto-applied on startup when upgrading to v2.x. No man
225
225
| 030 | Adds GIN indexes on `spans.metadata` (partial, `span_type = 'llm_call'`) and `sessions.metadata` JSONB columns for query performance |
226
226
| 031 | Adds `tsv tsvector` generated column + GIN index to `kg_entities` for full-text search; creates `kg_dedup_candidates` table for entity deduplication review |
227
227
| 032 | Creates `secure_cli_user_credentials` for per-user CLI credential injection; adds `contact_type` column to `channel_contacts`|
228
+
| 033 | Cron payload columns | Promotes `stateless`, `deliver`, `deliver_channel`, `deliver_to`, `wake_heartbeat` from `payload` JSONB to dedicated columns on `cron_jobs`|
228
229
229
230
### Breaking Changes in v2.x
230
231
@@ -277,4 +278,4 @@ Before each upgrade, check the release notes for:
277
278
-[Database Setup](/deploy-database) — PostgreSQL and pgvector setup
278
279
-[Observability](/deploy-observability) — monitor your gateway post-upgrade
Copy file name to clipboardExpand all lines: providers/openai.md
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,10 +95,16 @@ OpenAI function calling works out of the box. GoClaw converts internal tool defi
95
95
|`HTTP 400` on o-series | Unsupported parameter | Avoid setting `temperature` with o-series models |
96
96
| Vision not working | Model doesn't support images | Use gpt-4o or gpt-4o-mini |
97
97
98
+
### Developer Role (GPT-4o+)
99
+
100
+
For native OpenAI endpoints (`api.openai.com`), GoClaw automatically maps the `system` role to `developer` when sending requests. The `developer` role has higher instruction priority than `system` for GPT-4o and newer models.
101
+
102
+
This mapping only applies to native OpenAI infrastructure. Other OpenAI-compatible backends (Azure OpenAI, proxies, Qwen, DeepSeek, etc.) continue to use the standard `system` role.
103
+
98
104
## What's Next
99
105
100
106
-[OpenRouter](/provider-openrouter) — access 100+ models through one API key
101
107
-[Anthropic](/provider-anthropic) — native Claude integration
102
108
-[Overview](/providers-overview) — provider architecture and retry logic
0 commit comments