Skip to content

[feature] summarize_recall — LLM-synthesized recall for context-tight agents #43

@ramonlimaramos

Description

@ramonlimaramos

Problem

recall returns top-K raw memories. for agents on tight context budgets (haiku, local models, multi-tool chains where memory is one of many tools), 10 raw memories can burn 2-3k tokens. the agent often only needs the synthesis: "what does synapto know about kafka in this project?" — not the verbatim memory texts.

Proposed Solution

new MCP tool summarize_recall that runs the standard hybrid retrieval, then synthesizes the top-K via LLM into a single answer with citations.

signature

summarize_recall(
    query: str,
    tenant: str | None = None,
    max_tokens: int = 500,        # synthesis budget
    k: int = 10,                  # candidates retrieved
    include_citations: bool = True,
) -> SummaryResult

returns:

class SummaryResult:
    summary: str                 # synthesized answer
    citations: list[Citation]    # [memory_id, snippet, score, depth_layer]
    sources_used: int            # how many of the k contributed
    cache_hit: bool

caching

  • key = sha256(query | tenant | top_k_memory_ids | model)
  • TTL = ephemeral_max_age_hours / 4 (re-synth if memories likely changed)
  • stored in redis (already in deps)
  • bypass with force_refresh=True

LLM provider

  • reuse synapto.embeddings.registry pattern: pluggable provider
  • default to claude-haiku-4-5 (fast + cheap)
  • fall back to disabled-with-error if no API key (this tool requires LLM)

prompt template

system prompt enforces: "only synthesize from provided memories, cite memory_id inline, refuse if memories don't support an answer". reduces hallucination risk.

Trade-offs

  • cost: each call = LLM invocation. mitigation: aggressive cache, opt-in tool (agent picks recall vs summarize_recall).
  • fidelity loss: summarization can lose nuance. mitigation: always return citations so caller can drill into raw if needed.
  • latency: adds ~500-1500ms vs raw recall. acceptable for context-saving usecase.
  • dependency: pulls anthropic SDK as optional extra (synapto[summarize]).

Out of scope

  • streaming summaries (return-and-be-done is fine for MCP)
  • cross-tenant summarization (same tenant only)

Success criteria

  • agents on haiku see meaningful token savings on memory-heavy chains
  • citations are accurate (every claim in summary maps to a memory_id)
  • cache hit rate > 60% on stable topics in a session

References

  • byterover ships 6 compression strategies for context optimization — confirms agents want this

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions