Add content-fingerprint dedup on the write path by imonroe · Pull Request #61 · imonroe/memserv

imonroe · 2026-06-07T22:45:12Z

Summary

Adds cheap content-fingerprint dedup on the write path so that re-submitting byte-identical content skips mem0's LLM fact-extraction (a Claude call per add). Implements the OB1 content-fingerprint-dedup recipe (backlog issue #48) — the flagship cost-saver.

Flow: before add, the raw input is normalized (lowercase + collapse whitespace) and SHA-256'd; the fingerprint is stored in a content_fp Qdrant payload field and looked up on the next add. An exact repeat returns {"results": [], "deduplicated": true, "memory_id": "…"} without calling the LLM. This makes import re-runs and webhook/n8n retries cheap and idempotent.

Validated against a live Qdrant

Before building, I confirmed the two assumptions this depends on against a real deployment:

mem0's _create_memory merges custom metadata into the payload top-level, so content_fp is directly queryable (confirmed from a real point's payload).
Qdrant executes a filter on the unindexed content_fp field fine (a filter on the non-indexed hash field returned HTTP 200) — so no payload index is required.
_create_filter({"content_fp": …}) builds a valid Qdrant FieldCondition/MatchValue (checked locally).

Safety

Fail-open: any error in the dedup lookup returns None, so the add proceeds normally. The check can only ever save work, never block a write.
Opt-out: REST AddMemoryRequest gains dedup: bool = True; pass false to force re-extraction.
Dedup is scoped by user_id only (not agent_id) — consistent with the one-shared-pool model; the same fact from two agents dedupes to one.
Distinct from mem0's semantic dedup, which still applies to similar-but-not-identical content that reaches the LLM.

Files

app/memory.py — content_fingerprint(), _existing_fingerprint_id() (fail-open), add_memory(content, dedup=True, **kwargs).
app/rest.py — dedup flag, route add through the wrapper.
app/mcp_server.py — add_memory tool routes through the wrapper (always dedups; the tool docstring tells the model repeats are safe).

Tests

tests/test_memory.py — fingerprint normalization (whitespace/case, message lists), _existing_fingerprint_id found/empty/fail-open-on-error, and add_memory store-new/skip-duplicate/dedup=False/metadata-merge.
tests/test_rest.py + tests/test_mcp.py — integration: fingerprint stored, exact repeat deduplicated (no .add), dedup=false bypasses the check.
conftest.py defaults the dedup lookup to empty so existing add tests are unaffected.
Full suite: 149 passed, ruff clean.

Docs

User Guide: "How memory works" (exact-duplicate add is free), the add endpoint (dedup flag + deduplicated response), and the import-toolkit idempotency note.
Developer Guide: memory.py description.

Post-merge verification (optional, on your live deploy)

After this deploys, the same content posted twice should return deduplicated: true the second time with no new Qdrant point:

curl -s -X POST https://mem0.rage5.com/api/v1/memories \
  -H "Authorization: Bearer $MEM0_API_KEY" -H 'Content-Type: application/json' \
  -d '{"content":"dedup smoke test 12345"}'   # run twice; 2nd → {"deduplicated": true, ...}

Closes #48.

https://claude.ai/code/session_017835DVrvURaYnbQiPQwzue

Generated by Claude Code

Skip mem0's LLM fact-extraction when byte-identical content is re-submitted. Before add, the raw input is normalized (lowercase + collapse whitespace) and SHA-256'd; the fingerprint is stored in the `content_fp` Qdrant payload field and looked up (via the vector store's filter) on the next add. An exact repeat returns {"results": [], "deduplicated": true, "memory_id": ...} without calling the LLM. Adapts the OB1 content-fingerprint-dedup recipe. Verified against a live Qdrant: metadata lands top-level in the payload and filtering on the (unindexed) content_fp field works, so no payload index is required. The dedup check is fail-open — any lookup error just proceeds with a normal add, so it can only ever save work, never block a write. - app/memory.py: content_fingerprint(), _existing_fingerprint_id() (fail-open), and add_memory(content, dedup=True, **kwargs) wrapper. - app/rest.py: AddMemoryRequest.dedup flag (default true); route through wrapper. - app/mcp_server.py: add_memory tool routes through the wrapper (always dedups). - tests: fingerprint normalization, lookup found/empty/fail-open, wrapper store/skip/dedup-false/metadata-merge, plus REST + MCP integration cases. conftest defaults the dedup lookup to empty so existing add tests are unaffected. - docs: USER_GUIDE (How memory works + add endpoint + import idempotency), DEVELOPER_GUIDE memory.py description. Closes #48. https://claude.ai/code/session_017835DVrvURaYnbQiPQwzue

Copilot

Pull request overview

Adds a lightweight, deterministic content-fingerprint lookup on the write path to short-circuit exact re-submissions before mem0’s LLM fact-extraction runs, making repeated imports/webhook retries cheaper and more idempotent.

Changes:

Introduces content_fingerprint() + _existing_fingerprint_id() and an add_memory(..., dedup=True) wrapper in app/memory.py.
Routes REST and MCP “add memory” calls through the new wrapper and adds a REST opt-out flag (dedup: bool = True).
Expands tests and documentation to cover fingerprint storage, dedup short-circuiting, and the dedup=false bypass.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
app/memory.py	Adds fingerprint computation, best-effort lookup, and add wrapper that can short-circuit duplicates.
app/rest.py	Adds `dedup` request flag and routes adds through `memory.add_memory()`.
app/mcp_server.py	Routes MCP `add_memory` tool through the wrapper and documents dedup behavior.
tests/test_memory.py	Adds unit tests for fingerprinting, lookup behavior, and wrapper behavior.
tests/test_rest.py	Adds REST-level tests for fingerprint storage and dedup responses.
tests/test_mcp.py	Adds MCP-level test asserting dedup avoids calling `.add()`.
tests/conftest.py	Sets default mock vector_store list result to “empty” to avoid impacting existing tests.
docs/USER_GUIDE.md	Documents dedup behavior and the new `dedup` flag across user guide sections.
docs/DEVELOPER_GUIDE.md	Updates module description to mention dedup wrapper in `memory.py`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…cal" wording - app/memory.py: the message-list fingerprint path previously json-dumped + lowercased, which did NOT collapse internal whitespace (newlines/tabs become escaped \n in JSON), contradicting the docstring. Normalize each message's role and text individually (lowercase + collapse whitespace) so equivalent transcripts dedupe. Extract a shared _normalize_text() helper. - Wording: dedup matches a normalized fingerprint (case-insensitive, whitespace-collapsed), not raw bytes. Replace "byte-identical" everywhere (app/rest.py dedup-field comment, USER_GUIDE x3) with accurate phrasing. - tests: assert message-transcript normalization (case/whitespace/newline equivalence, and that differing role/text fingerprint differently). https://claude.ai/code/session_017835DVrvURaYnbQiPQwzue

imonroe requested a review from Copilot June 7, 2026 22:47

Copilot started reviewing on behalf of imonroe June 7, 2026 22:47 View session

Copilot AI reviewed Jun 7, 2026

View reviewed changes

Comment thread app/memory.py

Comment thread docs/USER_GUIDE.md Outdated

Comment thread docs/USER_GUIDE.md Outdated

Comment thread docs/USER_GUIDE.md Outdated

Comment thread app/rest.py Outdated

Comment thread tests/test_memory.py Outdated

imonroe merged commit 824651d into main Jun 7, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add content-fingerprint dedup on the write path#61

Add content-fingerprint dedup on the write path#61
imonroe merged 2 commits into
mainfrom
claude/ob1-content-dedup

imonroe commented Jun 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

imonroe commented Jun 7, 2026

Summary

Validated against a live Qdrant

Safety

Files

Tests

Docs

Post-merge verification (optional, on your live deploy)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants