feat: fix memory conflict detection and persist tool inputs/outputs by cyyeh · Pull Request #91 · cyyeh/duckdb-data-agent

cyyeh · 2026-03-03T13:12:24Z

Summary

Add recall-before-save instructions to agent system prompt to prevent contradictory memories (e.g. "User likes X" and "User does not like X" coexisting)
Fix backend tool result parsing so non-SQL tool outputs (memory tools, etc.) are sent to the frontend instead of being silently dropped
Persist toolInput and output fields in SQLite metadata so tool call details survive conversation switches
Add vega-themes Power BI theme for Vega-Lite charts (light and dark mode)
Expand sidecar allowedTools with Bash, Read, Write, Edit, Glob, Grep

Test plan

Save a memory, then save a contradictory one — verify the agent recalls first, forgets the old, then saves the new
Check thinking block shows tool inputs and outputs for recall_memories, save_memory, forget_memory
Switch to another conversation and back — verify tool inputs/outputs are still visible
Verify Vega-Lite charts render with Power BI theme in both light and dark mode

🤖 Generated with Claude Code

- Add recall-before-save instruction to prevent contradictory memories - Fix tool result parsing to preserve non-SQL output (memory tools, etc.) - Persist toolInput and output fields in SQLite for conversation reload - Add vega-themes Power BI theme for Vega-Lite charts - Expand sidecar allowedTools with Bash, Read, Write, Edit, Glob, Grep Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 48fcc0da71d7

claude · 2026-03-03T13:12:41Z

Claude finished @cyyeh's task in 2m 48s —— View job

Code Review for PR #91

Gather context and understand the request
Review backend/app/agent.py changes
Review frontend/src/utils/exportConversation.ts changes
Review frontend/src/components/VegaLiteChartWidget.tsx changes
Review sidecar/src/server.ts changes
Review frontend/package.json and package-lock.json changes
Compile and post comprehensive review

Summary

This PR delivers three independent improvements: memory conflict prevention, tool I/O persistence, and a Vega-Lite theme upgrade. The changes are well-scoped and address real UX gaps. A few issues are worth addressing in follow-up work.

✅ What's Good

Memory conflict detection (backend/app/agent.py:164-170)
The recall-before-save instruction sequence is the right pattern. The natural language instructions are explicit and actionable, and the "never leave contradictory memories" constraint is clearly stated.

Non-SQL tool result fix (backend/app/agent.py:714-722)
The new else branch correctly identifies non-SQL success results (memory tools, etc.) and extracts a meaningful output string rather than silently dropping them. The layered fallback parsed.get("memories") or parsed.get("message") or text is pragmatic.

VegaLiteChartWidget.tsx cleanup
The cancelled flag pattern prevents the viewRef from being set on an already-unmounted component. The finalize() call on cleanup is correct and prevents Vega's internal event listeners from leaking. The refactor from an ad-hoc theme object to powerbi from vega-themes is cleaner and more maintainable.

toolInput persistence (agent.py:561-563)

if not sql and tool_input:
    tool_seg["toolInput"] = tool_input

Correctly captures inputs for non-SQL tool calls at segment creation time, making them survive conversation switches.

⚠️ Issues to Address

1. Redundant toolInput assignment for ask_user_question (minor)
agent.py:581-584 sets seg["toolInput"] = tool_input again for ask_user_question, but this was already set at line 562 when the segment was created. This is a no-op now and should be removed to avoid confusion.

Fix this →

2. Brittle field-name hard-coding for non-SQL tool outputs
agent.py:716:

result_data["output"] = parsed.get("memories") or parsed.get("message") or text

This works for current MCP tool responses, but any new non-SQL tool that returns neither memories nor message will silently fall back to the raw JSON string in text. A more defensive approach would be to fall back to text directly (the last or text already does this, but relying on magic field names in the middle is fragile). Consider making all non-SQL tools return a consistent message field in their MCP response contract, or document this assumption.

3. as never type cast in VegaLiteChartWidget
VegaLiteChartWidget.tsx:59:

embed(containerRef.current, fullSpec as never, {

as never is a full type safety escape hatch that hides potential mismatches between your spec shape and vega-embed's expected TopLevelSpec. Prefer as TopLevelSpec from vega-lite/build/src/spec to retain some type safety:

import type { TopLevelSpec } from 'vega-lite';
// ...
embed(containerRef.current, fullSpec as unknown as TopLevelSpec, { ... })

Fix this →

4. DRY violation — dark mode overrides duplicated
The dark mode colour overrides are defined verbatim in two places:

VegaLiteChartWidget.tsx:8-26 (TypeScript object)
exportConversation.ts:216-222 (inline JS string in a template literal)

Since exportConversation.ts embeds these as a JavaScript string fragment for the exported HTML, full module sharing isn't possible. But the hex values and property names should at least be consistent and ideally referenced from a shared constants file to avoid drift.

5. Missing version pin for vega-themes CDN
exportConversation.ts:176:

<script src="https://cdn.jsdelivr.net/npm/vega-themes"><\/script>

All other CDN scripts in the same function include explicit versions (plotly-2.35.2.min.js, vega@5, vega-lite@5, vega-embed@6). An unpinned vega-themes could silently break exported conversations if a major version is published. Pin it to match the installed version from package.json (currently ^3.0.0 → pin to vega-themes@3).

Fix this →

🧪 Testing Gaps

No new tests were added for the changed functionality:

build_system_prompt memory instructions — the new recall-before-save instructions could be verified with a simple string assertion in a test for build_system_prompt.
Non-SQL tool result parsing path — the else branch at agent.py:714-716 (the memory tool output extraction) is not covered by any existing test.
toolInput/output persistence in segments — the segment population logic (agent.py:555-563, 770-781) is not unit-tested; it would benefit from a test that mocks the SSE stream and verifies the resulting persisted_segments structure.

📝 Nit

agent.py:238: There's an extra blank line between _build_message_with_history and stream_chat (three blank lines where two would be PEP 8 standard).

Overall this is solid, focused work. The memory conflict fix in particular closes a real correctness gap. The items above are mostly refinements rather than blockers — the PR was already merged, so these can be tracked as follow-up improvements.

cyyeh merged commit 1c2df0e into main Mar 3, 2026
2 checks passed

cyyeh deleted the feature/memory-and-tool-persistence branch March 3, 2026 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fix memory conflict detection and persist tool inputs/outputs#91

feat: fix memory conflict detection and persist tool inputs/outputs#91
cyyeh merged 1 commit into
mainfrom
feature/memory-and-tool-persistence

cyyeh commented Mar 3, 2026

Uh oh!

Uh oh!

claude Bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cyyeh commented Mar 3, 2026

Summary

Test plan

Uh oh!

Uh oh!

claude Bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for PR #91

Summary

✅ What's Good

⚠️ Issues to Address

🧪 Testing Gaps

📝 Nit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Mar 3, 2026 •

edited

Loading