[feature] summarize_recall — LLM-synthesized recall for context-tight agents

## Problem

`recall` returns top-K raw memories. for agents on tight context budgets (haiku, local models, multi-tool chains where memory is one of many tools), 10 raw memories can burn 2-3k tokens. the agent often only needs the **synthesis**: "what does synapto know about kafka in this project?" — not the verbatim memory texts.

## Proposed Solution

new MCP tool `summarize_recall` that runs the standard hybrid retrieval, then synthesizes the top-K via LLM into a single answer with citations.

### signature

```python
summarize_recall(
    query: str,
    tenant: str | None = None,
    max_tokens: int = 500,        # synthesis budget
    k: int = 10,                  # candidates retrieved
    include_citations: bool = True,
) -> SummaryResult
```

returns:

```python
class SummaryResult:
    summary: str                 # synthesized answer
    citations: list[Citation]    # [memory_id, snippet, score, depth_layer]
    sources_used: int            # how many of the k contributed
    cache_hit: bool
```

### caching

- key = `sha256(query | tenant | top_k_memory_ids | model)`
- TTL = `ephemeral_max_age_hours / 4` (re-synth if memories likely changed)
- stored in redis (already in deps)
- bypass with `force_refresh=True`

### LLM provider

- reuse `synapto.embeddings.registry` pattern: pluggable provider
- default to `claude-haiku-4-5` (fast + cheap)
- fall back to disabled-with-error if no API key (this tool requires LLM)

### prompt template

system prompt enforces: "only synthesize from provided memories, cite memory_id inline, refuse if memories don't support an answer". reduces hallucination risk.

## Trade-offs

- **cost**: each call = LLM invocation. mitigation: aggressive cache, opt-in tool (agent picks `recall` vs `summarize_recall`).
- **fidelity loss**: summarization can lose nuance. mitigation: always return citations so caller can drill into raw if needed.
- **latency**: adds ~500-1500ms vs raw recall. acceptable for context-saving usecase.
- **dependency**: pulls anthropic SDK as optional extra (`synapto[summarize]`).

## Out of scope

- streaming summaries (return-and-be-done is fine for MCP)
- cross-tenant summarization (same tenant only)

## Success criteria

- agents on haiku see meaningful token savings on memory-heavy chains
- citations are accurate (every claim in summary maps to a memory_id)
- cache hit rate > 60% on stable topics in a session

## References

- byterover ships 6 compression strategies for context optimization — confirms agents want this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] summarize_recall — LLM-synthesized recall for context-tight agents #43

Problem

Proposed Solution

signature

caching

LLM provider

prompt template

Trade-offs

Out of scope

Success criteria

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[feature] summarize_recall — LLM-synthesized recall for context-tight agents #43

Description

Problem

Proposed Solution

signature

caching

LLM provider

prompt template

Trade-offs

Out of scope

Success criteria

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions