Skip to content

feat: add durable identifiers to messages#2836

Open
opieter-aws wants to merge 2 commits into
strands-agents:mainfrom
opieter-aws:opieter-aws/issue-2805-plan
Open

feat: add durable identifiers to messages#2836
opieter-aws wants to merge 2 commits into
strands-agents:mainfrom
opieter-aws:opieter-aws/issue-2805-plan

Conversation

@opieter-aws

@opieter-aws opieter-aws commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Description

Messages do not carry a durable identity. The only per-message tracking available today is ephemeral: the memory ExtractionCoordinator's in-session high-water-mark sequence number, and SessionMessage.message_id — an ordinal index that is not stable across conversation-manager truncation or session restore. Neither survives as a durable key, so a memory store has no way to build a (session_id, message_id) tuple to deduplicate extracted messages across sessions.

This change gives every message a durable, stable id in both SDKs. The id is assigned once, when a message is added to the conversation, and is preserved everywhere that message is later observed — MessageAddedEvent subscribers, session persistence, and snapshots. It is never sent to model providers (the existing role/content whitelist already strips everything else). Memory stores can now combine this id with a session id to identify a message uniquely across restarts.

Assignment happens at append time rather than at construction. This keeps it idempotent — a message that already has an id (restored from a session, supplied by a caller, or re-appended) keeps it — which is what makes the id stable through a save/restore cycle. It also means messages that were persisted before this change are left without an id rather than being silently backfilled with a fresh one on each load.

Both SDKs generate a canonical (hyphenated) UUID v4: Python via str(uuid.uuid4()), TypeScript via crypto.randomUUID(). The shapes match deliberately, so a message id means the same thing regardless of which SDK produced it.

Wiring the id into the memory extraction pipeline (so stores receive it for deduplication) is intentionally out of scope here; this PR only establishes the durable id on the Message type so that work can build on it.

This is a coordinated change across both SDKs to keep the Message shape consistent. Python and TypeScript are kept behaviorally identical: assign at the append chokepoint, preserve on redaction, exclude from provider payloads, and no backfill of legacy messages.

Public API Changes

Message gains an optional, durable id.

class Message(TypedDict):
    """A message in a conversation with the agent.

    Attributes:
        content: The message content.
        role: The role of the message sender.
        id: Durable, stable identifier for the message, assigned when the message is added to the
            conversation. Survives session save/restore and snapshots, and is stripped before model
            calls. Combined with a session id, it gives memory stores a key to deduplicate messages
            across sessions.
        metadata: Optional metadata, stripped before model calls.
    """

    content: list[ContentBlock]
    role: Role
    id: NotRequired[str] <-- New!!
    metadata: NotRequired[MessageMetadata]

Python — new id field on the Message TypedDict, plus a null-safe accessor:

from strands.types.content import get_message_id

# After a turn, every recorded message carries a stable id
agent("Hello")
message = agent.messages[-1]
message["id"]            # e.g. "686b8abc-1db4-4145-aa60-9615c04f50ef"
get_message_id(message)  # same value, or None if never assigned

# The id survives a session save/restore round-trip
agent_2 = Agent(session_manager=FileSessionManager(session_id="s1", storage_dir=...), agent_id="a1")
agent_2.messages[-1]["id"]  # identical to the persisted id

TypeScript — new optional id on MessageData and the Message class; it round-trips through toJSON/fromJSON/clone:

await agent.invoke('Hello')
const message = agent.messages.at(-1)!
message.id  // e.g. "0f8e1c2d-...-..." (crypto.randomUUID), stable across serialization

Both fields are optional and backward compatible: existing code that constructs messages without an id is unaffected, and messages persisted before this change deserialize with no id. The id is assigned by the agent when a message is appended, so a caller-supplied id (e.g. on input messages) is always preserved.

Related Issues

Resolves: #2805

Documentation PR

No documentation changes. The id is assigned automatically and is primarily a building block for upcoming memory-store deduplication; there is no new user-facing workflow to document yet.

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce new warnings.

  • I ran hatch run prepare

Beyond the unit suites (both SDKs green), I exercised the change end to end: ran a real agent turn and confirmed every recorded message has a unique id, then persisted through a FileSessionManager and restored into a fresh agent, confirming the restored ids match the persisted ones. Regression tests assert the id never reaches the model-provider payload — in Python at the stream_messages whitelist, and in TypeScript at the Anthropic adapter's request formatting (where TS does its stripping). Note: hatch run prepare's static-analysis step couldn't bootstrap locally due to an unrelated native build failure in the optional [cedar] extra (rustc version); ruff/mypy on the changed files and the full test suites pass, and CI runs the gate cleanly.

Checklist

  • I have read the CONTRIBUTING document
  • I have reviewed and understand every line of code in this PR, including any generated by AI tools, and I can explain why it works
  • My change is focused and reasonably small; I have split unrelated work into separate PRs
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@github-actions github-actions Bot added size/l enhancement New feature or request area-sessions Related to session or session managment strands-running labels Jun 16, 2026
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@opieter-aws opieter-aws force-pushed the opieter-aws/issue-2805-plan branch from 83fae70 to fe16a56 Compare June 16, 2026 17:55
@opieter-aws opieter-aws marked this pull request as ready for review June 16, 2026 20:32

# Add the response message to the conversation
_ensure_message_id(message)
agent.messages.append(message)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The PR description says ids are assigned "at the append chokepoint," but in practice _ensure_message_id is called at five separate Python call sites (here ×3 in event_loop.py, plus agent._append_messages, the bidi agent, and two inline _generate_message_id() calls in interventions/registry.py). Any future code path that appends a message without remembering to call the helper will silently produce an id-less message, quietly breaking the durability invariant this PR exists to guarantee.

Suggestion: Consider funneling all of these through a single private append helper (e.g. have the three event_loop.py sites and the interventions sites go through agent._append_messages, which already assigns the id and fires MessageAddedEvent). If a single chokepoint genuinely isn't reachable for some of these paths, a short comment explaining why each site assigns the id independently would help the next person preserve the invariant. The TS side has the same shape (two ensureMessageId sites in agent.ts + an inline generateMessageId() in interventions/registry.ts).

missing, ``None``, or empty-string id is treated as absent and replaced, since an empty id
cannot serve as a durable key.
"""
if not message.get("id"):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: _ensure_message_id mutates the caller-supplied Message in place, injecting an id into an object the caller still owns and may reuse. For input messages the user constructed, this is an observable side effect on their object that isn't obvious from the call site. (Same applies to the TS ensureMessageId.)

Suggestion: This is likely intentional — it's what makes a caller-supplied id stable — but a one-line note on the function that it intentionally mutates the passed message would prevent a future reader from "fixing" it into a copy and breaking stability. Worth confirming there's no path where the same Message dict is shared across two agents/turns where an injected id could leak unexpectedly.

return True
elif isinstance(action, Guide):
event.agent.messages.append({"role": "user", "content": [{"text": action.feedback}]})
event.agent.messages.append(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue (minor): These two intervention sites call _generate_message_id() inline and build the id into the dict literal, rather than appending the message and letting _ensure_message_id assign it. This is a third id-assignment pattern (distinct from _ensure_message_id and from going through _append_messages), which adds to the scatter noted in event_loop.py.

Suggestion: If these direct messages.append(...) calls can't route through _append_messages (the comment in the TS twin says they intentionally bypass the hook pipeline / conversation manager), that's fine — but a brief inline note here explaining the bypass and that the id is assigned manually as a result would make the divergence intentional rather than looking like an inconsistency.

"""Assign a durable id to the message in place if it does not already have a usable one.

A message that already carries a non-empty id (e.g. restored from a session or supplied by a
caller) keeps it, so the same message has a stable identifier everywhere it is observed. A

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Python exposes a public, null-safe accessor get_message_id(message) (mirroring the existing get_message_metadata), but the TypeScript side exposes only the raw optional message.id property with no accessor. For a change whose stated goal is keeping the Message shape "behaviorally identical" across both SDKs, the public surface is asymmetric — a consumer reading durable ids writes get_message_id(m) in Python but m.id in TS.

Suggestion: Either add a matching getMessageId helper in TS, or note in the PR why the asymmetry is acceptable (e.g. TS optional-property + ?. access already covers the null-safe case idiomatically). Not blocking, but worth a deliberate decision since this is the public read path memory stores will use.

@github-actions

Copy link
Copy Markdown
Contributor

Assessment: Comment (no blocking issues)

Solid, well-scoped change. The durable id is correctly stripped before model calls (verified against the role/content whitelist in streaming.py), survives session/snapshot round-trips, is preserved through redaction in both SDKs, and is not backfilled onto legacy messages. The cross-SDK id format is consistent (canonical UUID v4 in both), and empty/None ids are treated as absent in both SDKs. Test coverage is strong and follows the full-object-equality pattern rather than per-field assertions.

Review themes
  • Invariant durability (Important): Id assignment is spread across ~5 call sites rather than the single chokepoint the description implies. The main risk isn't today's code — it's that a future append path silently produces id-less messages with nothing to catch it. Consolidating through one append helper, or documenting why each site assigns independently, would protect the invariant. (Inline on event_loop.py.)
  • API process (Important): This adds a field to the public Message primitive plus a public get_message_id accessor. Per team/API_BAR_RAISING.md this looks like a "moderate" public-API change, but the PR carries no api/needs-review label. The description already includes the required API docs, so this is mainly about flagging it for an API reviewer.
  • Cross-SDK symmetry (Suggestion): Python ships a null-safe get_message_id; TS exposes only the raw optional message.id. Worth a deliberate call given the "behaviorally identical" goal.
  • Side effects & duplication (Suggestion): _ensure_message_id mutates the caller's message in place (intentional, but undocumented), and the interventions sites use a third id-assignment pattern. Brief comments would make both intentional.

Nicely done overall — the strip-before-provider regression test and the redaction-preservation tests are exactly the right things to lock down for this feature.

@JackYPCOnline

Copy link
Copy Markdown
Contributor

My concern is scope and layer. As a DTO that memory stores read for dedup, this is fine. But the PR already persists it inside SessionMessage and snapshots, so it's effectively entering the session layer.

If the natural next step is for it to become the session manager's record key. I'd want us to explicitly not do that, and ideally keep durable identity on the persistence wrapper rather than on the provider-facing Message.

It also overlaps with the planned transcript work, which models messages as a DAG. A flat message.id that's preserved across redaction/truncation can't express edits or branches, so shipping it as "the" durable identity now will collide with transcript node identity later.

Finally, on correctness as a unique key: it's preserved across deep copies (as_tool, graph, swarm), assigned only at append chokepoints (the summarizer already needed a manual fix to keep the invariant), and never validated for uniqueness or format with caller-supplied ids trusted as-is. Before we lean on (session_id, message_id) anywhere, I'd like the uniqueness scope defined (per session_id), copy/sub-agent semantics decided, and assignment enforced at the persistence boundary rather than by convention.

In short: can we scope this PR to "opaque durable id for memory dedup," document it as non-authoritative and not a session/transcript identity, and defer the canonical-identity decision to the transcript design?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-sessions Related to session or session managment enhancement New feature or request size/l

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add durable identifiers to messages

2 participants