This guide shows the canonical way to wire Engrava into an agent's turn loop: give a chat/agent long-term memory that persists across sessions and surfaces relevant context on every turn. It's the end-to-end pattern behind Engrava's one-line pitch — "the memory database for AI agents."
A complete, runnable version of everything here ships as
examples/agent_loop.py
— no LLM or embedding API required (it uses a canned responder and a
deterministic embedder). This page walks through the shape of that loop.
New to the model (thought, edge, reflection, cycle)? Read Core Concepts first — this guide assumes those terms.
Per user turn:
user message
│
▼
1. store it as a percept ──────────────► create_thought(OBSERVATION)
2. retrieve relevant memory ───────────► search_hybrid(query, current_cycle)
3. build prompt from retrieved essences ─► call your LLM
4. store the reply as an utterance ─────► create_thought(OUTPUT_DRAFT)
5. record the action taken ─────────────► create_action(ActionRecord)
6. advance the cycle counter ───────────► cycle += 1 (you own this clock)
│
└─ every N turns ────────────────────► dreaming.run_consolidation(current_cycle)
Create one store for the lifetime of the agent. Configure an embedding provider
so retrieval is semantic (the example uses a deterministic stand-in; in
production pass a real provider such as SentenceTransformerProvider or
OpenAICompatibleProvider, configurable via
engrava.yaml):
import aiosqlite
from engrava import SqliteEngravaCore, CallbackProvider
provider = CallbackProvider(
callback=my_embed_fn, # swap in a real provider in production
dimension=64,
model_name="demo",
)
conn = await aiosqlite.connect("agent-memory.db") # a file persists across runs
conn.row_factory = aiosqlite.Row
store = SqliteEngravaCore(conn, embedding_provider=provider, auto_embed=True)
await store.ensure_schema()auto_embed=True means thoughts are embedded on write. At search time you may
pass an explicit query_vector; if you omit it, the store embeds the query
text for you when an embedding provider is configured. Passing it yourself
is handy when you've already computed the vector or want a different query
representation.
Each user message becomes an OBSERVATION thought, tagged with percept(...)
metadata so its origin is recorded. Extend that metadata with a session_id
(which conversation) and turn_index (position within it) so every memory is
anchored to its conversation — these are the keys you'd later filter on (or
post-filter on) to scope retrieval to one session or user:
import uuid
from engrava import ThoughtRecord, ThoughtType, Priority, LifecycleStatus, percept
async def store_percept(store, text, cycle, user_id, session_id, turn_index):
record = ThoughtRecord(
thought_id=str(uuid.uuid4()),
thought_type=ThoughtType.OBSERVATION,
essence=text[:200], # the prompt-facing one-liner
content=text, # the full message
priority=Priority.P2,
lifecycle_status=LifecycleStatus.ACTIVE,
created_cycle=cycle, # the agent clock (see step 6)
updated_cycle=cycle,
source=user_id,
metadata={
**percept(source_id=user_id, label="user"),
"session_id": session_id,
"turn_index": turn_index,
},
)
return await store.create_thought(record)Before calling the LLM, pull the most relevant prior memories with
search_hybrid. Pass current_cycle so the recency signal works, and turn the
returned (thought_id, score) tuples back into text via get_thought:
async def retrieve_context(store, query, cycle):
result = await store.search_hybrid(
query,
query_vector=my_embed_fn(query), # optional: omit to let the provider embed `query`
top_k=3,
current_cycle=cycle,
)
essences = []
for thought_id, _score in result.results:
record = await store.get_thought(thought_id)
if record is not None:
essences.append(record.essence) # essence = prompt-ready text
return essencesresult.results is a list of (thought_id, score) — Engrava returns IDs, not
records, so you fetch the ones you want. result.backends_used tells you which
signals contributed (e.g. {"fts5", "vector", "recency"}).
This is the only step that touches your model. Engrava is LLM-free; you own the call:
prompt = "Context:\n" + "\n".join(f"- {c}" for c in context)
prompt += f"\n\nUser: {user_message}\nAssistant:"
reply = await my_llm(prompt) # your provider herePersist what the agent said as an OUTPUT_DRAFT thought with utterance(...)
metadata, so the agent's own outputs are part of memory too:
from engrava import utterance
async def store_utterance(store, reply, cycle, session_id, turn_index):
record = ThoughtRecord(
thought_id=str(uuid.uuid4()),
thought_type=ThoughtType.OUTPUT_DRAFT,
essence=reply[:200],
content=reply,
priority=Priority.P3,
lifecycle_status=LifecycleStatus.ACTIVE,
created_cycle=cycle,
updated_cycle=cycle,
source="agent",
metadata={ # same session + turn as the percept it answered
**utterance(),
"session_id": session_id,
"turn_index": turn_index,
},
)
return await store.create_thought(record)If your agent does things (sends a message, calls a tool), record each as an
ActionRecord linked to the source thought. This is how the audit/action
surface tracks what the agent did and whether it succeeded:
from engrava import ActionRecord, ActionType, ActionStatus, VerificationStatus
await store.create_action(
ActionRecord(
action_id=str(uuid.uuid4()),
source_thought_id=percept_thought.thought_id,
action_type=ActionType.MESSAGE, # or TOOL_CALL / CLI_OUTPUT / STATE_UPDATE
intent="answered user",
status=ActionStatus.CONFIRMED,
verification_status=VerificationStatus.CONFIRMED,
)
)Read them back with await store.get_actions(thought_id).
A cycle is the agent's logical clock, and you own it — Engrava never
advances or persists it. Increment it once per turn and use it for
created_cycle/updated_cycle and the current_cycle you pass to search and
consolidation:
cycle = 0
while running:
... # steps 1–5 use `cycle`
cycle += 1If you leave it at a constant, recency can't distinguish old memories from new and dreaming's age gate never opens (see Cycle (the agent clock)). On restart, recover it so it stays monotonic — see Persistence across restarts.
Dreaming turns accumulated observations into higher-order REFLECTIONs. In a long-running agent, run it on a cadence — e.g. every N turns — rather than every turn:
from engrava import DreamingExtension, DreamingConfig
dreaming = DreamingExtension(config=DreamingConfig(enabled=True))
# inside the loop, after advancing the cycle:
if cycle % 20 == 0:
result = await dreaming.run_consolidation(store, current_cycle=cycle)The cadence is yours to choose: every-N-turns (as above), a background asyncio task on a timer, or an out-of-process job. Engrava is single-writer, so run consolidation on the same writer that handles turns (or coordinate so they don't write concurrently). A brand-new store has little to consolidate — REFLECTIONs emerge as memories accumulate and repeat. See Dreaming for the knobs.
-
Embeddings persist. They are stored in the database; you do not re-embed on a normal restart. (You only need
engrava restore --re-embedwhen you deliberately change the embedding model.) -
The cycle counter does not persist — Engrava doesn't store it. Recover it on startup so it keeps increasing.
list_thoughtsreturns rows ordered byupdated_cycledescending, so the most recent thought carries the highest cycle you've used; resume one past it:recent = await store.list_thoughts(limit=1) # ordered by updated_cycle desc cycle = (recent[0].updated_cycle + 1) if recent else 0
-
Model lock. If you configured an embedding provider, the store remembers which model produced its vectors; calling
store_embeddinglater with a different model raisesEmbeddingModelMismatchError. Keep the same provider across restarts (or migrate deliberately).
The complete, runnable loop — including the deterministic embedder and the
mock LLM so it runs with zero external dependencies — is in
examples/agent_loop.py:
python examples/agent_loop.py- Core Concepts — thought / edge / reflection / cycle.
- Hybrid Search — how the retrieval ranking works.
- Dreaming — consolidation in depth.
- Configuration — wiring an embedding provider via
engrava.yaml.