✨(back) add management command to de-index inactive collections and reindex in conv#441
✨(back) add management command to de-index inactive collections and reindex in conv#441maxenceh wants to merge 4 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds scheduled de-indexing of inactive conversation collections and transparent re-index-on-resume, refactors document-context listing/rendering, introduces index-state model fields and migration, integrates reindex into the agent, provides a deindex management command + Helm CronJob, frontend resume UX, tests, settings, and docs. ChangesReindex & Deindex
Sequence Diagram sequenceDiagram
participant Client
participant reindex_conversation
participant DefaultStorage
participant DocumentStore
participant Database
Client->>reindex_conversation: start reindex(conversation, in_context_ids)
reindex_conversation->>Database: claim index_state -> INDEXING
reindex_conversation->>Client: emit ToolCallPart(conversation_resume)
reindex_conversation->>DocumentStore: acreate_collection() or reuse
loop per selected READY text attachment
reindex_conversation->>DefaultStorage: read attachment bytes (to_thread)
reindex_conversation->>DocumentStore: store_document(bytes) (thread)
DocumentStore-->>reindex_conversation: returns rag_document_id
reindex_conversation->>Database: mark attachment is_indexed & save rag_document_id
end
reindex_conversation->>Database: set index_state, set collection_id if applicable
reindex_conversation->>Client: emit ToolResultPart(state="done"/"partial"/"error")
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
b90789f to
7c8c601
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/chat/clients/pydantic_ai.py`:
- Around line 1156-1164: Two concurrent resumes can both see
conversation.collection_id empty and both call _reindex_conversation, producing
duplicate collections; make the claim-and-reindex atomic by adding a
single-claim step (e.g., a distributed/DB-backed lock or an atomic
compare-and-set) that verifies conversation.collection_id is still empty and
writes a placeholder/claim value before creating/filling the backend collection.
Update the flow so is_conversation_deindexed triggers _reindex_conversation only
when the claim succeeds (or move the claim logic into _reindex_conversation at
its start), and ensure the final save of conversation.collection_id uses the
same atomic operation (compare-and-swap or transactional update) so only the
claimer proceeds to populate and persist the collection id.
- Around line 640-716: The _reindex_conversation routine must report and persist
the true outcome: do not set self.conversation.collection_id when collection
creation failed or when no documents were successfully stored; on
collection-creation exception yield a ToolResultPart with state "error" (already
done) and return without modifying conversation; when processing attachments,
track successful_store_count and failed_documents, only set and save
self.conversation.collection_id if successful_store_count > 0 (use
document_store.collection_id), and yield ToolResultPart with state "done" when
failed_documents is empty and success_count > 0, state "partial" including
failed_documents and success_count when some succeeded, and state "none" or
"no_documents" (or "error") when zero succeeded so callers like _run_agent can
distinguish the zero-success path from a valid reindex; update code references
in _reindex_conversation, document_store, failed_documents,
self.conversation.collection_id, and the yielded ToolResultPart to implement
this logic.
In `@src/backend/chat/management/commands/deindex_inactive_collections.py`:
- Around line 34-40: The current flow calls backend_cls(...).delete_collection()
and then updates the DB with
ChatConversation.objects.filter(pk=conversation.pk).update(collection_id=None),
which can leave a stale collection_id if the DB update fails; change the flow to
persist a de-indexed state before deleting the external resource (or atomically
clear the DB id first and restore on failure): add or use a boolean flag (e.g.,
conversation.collection_deindexed or similar) and set it (via queryset.update to
avoid auto_now) before calling delete_collection(), then call
backend.delete_collection() inside try/except and on exception clear the
deindexed flag or log and surface the error; alternatively, perform
queryset.update(collection_id=None) first and if backend.delete_collection()
raises, re-set the collection_id in an except block so the system remains
recoverable; reference delete_collection(), backend_cls, and
ChatConversation.objects.filter(...).update(...) when making the change.
In `@src/frontend/apps/conversations/src/features/chat/components/Chat.tsx`:
- Around line 357-390: The effect only shows the modal when
result.failed_documents exists; update the logic in the useEffect (the block
that finds lastAssistant, resumePart and computes result) to also detect a full
re-index failure by checking result.state === 'error' or presence of
result.error and then call setChatErrorModal (same pattern used for
failed_documents) to surface the backend error; keep the existing de-duplication
using lastResumeErrorMessageIdRef.current === lastAssistant.id and include the
backend error text in the modal message (e.g., show result.error or a translated
generic message) so users see feedback when conversation_resume returned an
error.
- Around line 121-123: The modal currently wraps chatErrorModal.message (from
useState in Chat.tsx) inside a <Text> component which breaks rich React nodes
(like <ul>); update the Chat component's error modal render to render
chatErrorModal.message directly (or conditionally wrap only when the message is
a plain string) instead of always placing it inside <Text>, keeping the existing
state shape { title, message: React.ReactNode } and the setChatErrorModal usage
intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: eb1fae3e-895c-4f7c-a3b1-102b706c46c7
📒 Files selected for processing (16)
Makefilesrc/backend/chat/clients/pydantic_ai.pysrc/backend/chat/management/__init__.pysrc/backend/chat/management/commands/__init__.pysrc/backend/chat/management/commands/deindex_inactive_collections.pysrc/backend/chat/tests/clients/pydantic_ai/test_reindex_conversation.pysrc/backend/chat/tests/management/__init__.pysrc/backend/chat/tests/management/commands/__init__.pysrc/backend/chat/tests/management/commands/test_deindex_inactive_collections.pysrc/backend/conversations/settings.pysrc/frontend/apps/conversations/src/features/chat/components/Chat.tsxsrc/frontend/apps/conversations/src/features/chat/components/MessageItem.tsxsrc/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsxsrc/frontend/apps/conversations/src/i18n/translations.jsonsrc/helm/conversations/templates/backend_cronjob_deindex.yamlsrc/helm/conversations/values.yaml
31b8b16 to
7771e07
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
src/frontend/apps/conversations/src/features/chat/components/Chat.tsx (1)
121-124:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRender modal content without forcing a
<Text>wrapper.
chatErrorModal.messageis typed asReact.ReactNode, but it is still always wrapped in<Text>. Rich content (like the failed-documents<ul>) should be rendered directly.💡 Suggested fix
- <Text>{chatErrorModal?.message}</Text> + {typeof chatErrorModal?.message === 'string' ? ( + <Text>{chatErrorModal.message}</Text> + ) : ( + chatErrorModal?.message + )}#!/bin/bash # Verify that ReactNode modal content is still wrapped by <Text> in Chat.tsx. rg -n "message:\s*React\.ReactNode|<Text>\{chatErrorModal\?\.message\}</Text>" src/frontend/apps/conversations/src/features/chat/components/Chat.tsxAlso applies to: 901-901
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend/apps/conversations/src/features/chat/components/Chat.tsx` around lines 121 - 124, The modal state chatErrorModal is typed to allow React.ReactNode for message but the render uses a strict <Text>{chatErrorModal?.message}</Text> wrapper; change the Chat component's modal rendering (in Chat.tsx) to render chatErrorModal.message directly (e.g., {chatErrorModal?.message}) instead of wrapping it in <Text>, so rich nodes like <ul> are rendered correctly, and ensure any styling or layout previously provided by the <Text> wrapper is moved to the surrounding container if needed.
🧹 Nitpick comments (1)
src/backend/chat/management/commands/deindex_inactive_collections.py (1)
57-59: ⚡ Quick winSurface partial failures as command failure for cron observability.
When errors occur, the command still reports success and exits 0. Consider raising
CommandErrorwhencount_error > 0so Kubernetes CronJob status and alerts reflect partial failures.Suggested fix
-from django.core.management.base import BaseCommand +from django.core.management.base import BaseCommand, CommandError @@ - self.stdout.write( - self.style.SUCCESS(f"De-indexed {count_success} collection(s). {count_error} error(s).") - ) + message = f"De-indexed {count_success} collection(s). {count_error} error(s)." + if count_error: + self.stdout.write(self.style.WARNING(message)) + raise CommandError(message) + self.stdout.write(self.style.SUCCESS(message))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/chat/management/commands/deindex_inactive_collections.py` around lines 57 - 59, The final success message currently always returns exit 0; modify the command to raise a django.core.management.base.CommandError when count_error > 0 so cron systems see partial failures: import CommandError, then after computing count_success and count_error, if count_error > 0 raise CommandError(f"De-indexed {count_success} collection(s). {count_error} error(s).") else keep the existing self.stdout.write(self.style.SUCCESS(...)) call; reference the existing symbols count_success, count_error, self.stdout.write and self.style.SUCCESS when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/backend/chat/clients/conversation_reindexer.py`:
- Around line 50-51: The early returns that occur after claiming the
"reindexing" sentinel (e.g., when ready_attachments is falsy) do not
release/reset the claim and can leave conversations stuck; modify the code so
the claim is always cleared before any no-op return by invoking the DB/claim
release/reset API (e.g., call the existing reset_claim/release_claim method or
unset the "reindexing" sentinel using the conversation id and
collection_id="reindexing") and wrap the work in a try/finally (or ensure both
return paths call the release) so both the ready_attachments early return and
the other return path also clear the claim.
In `@src/backend/chat/management/commands/deindex_inactive_collections.py`:
- Around line 34-51: The current clear/restore uses pk-only updates and can
overwrite a concurrent change; instead perform a conditional claim and
conditional restore: when iterating conversations, replace the initial
ChatConversation.objects.filter(pk=conversation.pk).update(collection_id=None)
with a conditional claim that includes the expected collection_id and the
inactivity predicate (e.g., ChatConversation.objects.filter(pk=conversation.pk,
collection_id=old_collection_id,
<inactivity_predicate>).update(collection_id=None)) and only proceed to
instantiate backend_cls(...) and call backend.delete_collection if the claim
returned 1; on failure do not touch the row. When restoring, only restore if the
row is still NULL by using ChatConversation.objects.filter(pk=conversation.pk,
collection_id=None).update(collection_id=old_collection_id) so you won't
overwrite a concurrent update.
In `@src/backend/chat/tests/clients/pydantic_ai/test_reindex_conversation.py`:
- Around line 75-89: The test asserts on an in-memory ChatConversation instance
that may be stale after the reindexing; refresh the conversation from the DB
before asserting collection_id is None. Wrap the synchronous refresh or fetch in
sync_to_async (e.g. call sync_to_async(conversation.refresh_from_db)() or
re-query via sync_to_async(ChatConversation.objects.get)(pk=conversation.pk))
after iterating reindex_conversation and before the final assertions; apply the
same change to the similar test at lines ~217-233.
In `@src/helm/conversations/templates/backend_cronjob_deindex.yaml`:
- Line 31: Locate the Helm template directives in backend_cronjob_deindex.yaml
that are written as "{{- end}}" (appears twice) and add a space before the
closing braces so they read "{{- end }}"; update each occurrence (e.g., the
directive tokens near the end blocks) to use the spaced form to satisfy
Helm/Sonar spacing rules.
---
Duplicate comments:
In `@src/frontend/apps/conversations/src/features/chat/components/Chat.tsx`:
- Around line 121-124: The modal state chatErrorModal is typed to allow
React.ReactNode for message but the render uses a strict
<Text>{chatErrorModal?.message}</Text> wrapper; change the Chat component's
modal rendering (in Chat.tsx) to render chatErrorModal.message directly (e.g.,
{chatErrorModal?.message}) instead of wrapping it in <Text>, so rich nodes like
<ul> are rendered correctly, and ensure any styling or layout previously
provided by the <Text> wrapper is moved to the surrounding container if needed.
---
Nitpick comments:
In `@src/backend/chat/management/commands/deindex_inactive_collections.py`:
- Around line 57-59: The final success message currently always returns exit 0;
modify the command to raise a django.core.management.base.CommandError when
count_error > 0 so cron systems see partial failures: import CommandError, then
after computing count_success and count_error, if count_error > 0 raise
CommandError(f"De-indexed {count_success} collection(s). {count_error}
error(s).") else keep the existing self.stdout.write(self.style.SUCCESS(...))
call; reference the existing symbols count_success, count_error,
self.stdout.write and self.style.SUCCESS when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4902684d-d9b1-4e21-a882-b8430d49d4e7
⛔ Files ignored due to path filters (1)
src/frontend/apps/conversations/src/features/chat/assets/chat-bubbles-illustration.svgis excluded by!**/*.svg
📒 Files selected for processing (22)
CHANGELOG.mdMakefiledocs/attachments.mdsrc/backend/chat/clients/conversation_reindexer.pysrc/backend/chat/clients/pydantic_ai.pysrc/backend/chat/document_context_builder.pysrc/backend/chat/management/__init__.pysrc/backend/chat/management/commands/__init__.pysrc/backend/chat/management/commands/deindex_inactive_collections.pysrc/backend/chat/tests/clients/pydantic_ai/test_reindex_conversation.pysrc/backend/chat/tests/management/__init__.pysrc/backend/chat/tests/management/commands/__init__.pysrc/backend/chat/tests/management/commands/test_deindex_inactive_collections.pysrc/backend/chat/tests/test_document_context_builder.pysrc/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.pysrc/backend/conversations/settings.pysrc/frontend/apps/conversations/src/features/chat/components/Chat.tsxsrc/frontend/apps/conversations/src/features/chat/components/MessageItem.tsxsrc/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsxsrc/frontend/apps/conversations/src/features/chat/components/__tests__/ToolInvocationItem.test.tsxsrc/helm/conversations/templates/backend_cronjob_deindex.yamlsrc/helm/conversations/values.yaml
providenz
left a comment
There was a problem hiding this comment.
Seems pretty solid to me. Recent model updates and mass deindexing need to be handled.
|
|
||
| ready_attachments = [ | ||
| attachment | ||
| async for attachment in models.ChatConversationAttachment.objects.filter( |
There was a problem hiding this comment.
rag_document_id should be updated (it has been added on ChatConversationAttachment since this code was written )
| old_collection_id = conversation.collection_id | ||
| try: | ||
| # Clear DB first so a crash during delete_collection doesn't leave a stale pointer. | ||
| ChatConversation.objects.filter(pk=conversation.pk).update(collection_id=None) |
There was a problem hiding this comment.
rag_document_id (ChatConversationAttachment) should be nulled
| claimed = await models.ChatConversation.objects.filter( | ||
| pk=conversation.pk, | ||
| collection_id__isnull=True, | ||
| ).aupdate(collection_id="reindexing") |
There was a problem hiding this comment.
We should avoid this kind of magic strings.
Any usage of the conversation carrying this magic string could feed it to the rag backend
We should maybe add new filed to carry the index state (and on the attachments too).
Maybe worth considering a kind of recovery for the sentinel ? what happens if the process die beetween the claim and the db save ?
| } | ||
| except (ValueError, IndexError): | ||
| in_context_ids = set() | ||
| async for event in reindex_conversation(self.conversation, in_context_ids): |
There was a problem hiding this comment.
what happens if we uplaod a doc to a deindexed conversation ? reindex runs before self._handle_input_documents() so it's worth checking (if not already)
There was a problem hiding this comment.
Good catch, that was a bug (creating another collection with the new doc). Fixed by updating collection_id after reindexing
There was a problem hiding this comment.
I added a test to cover this case : test_reindex_existing_and_new_attachment_creates_single_collection
Add management command and cron job Collections are deindexed after N days If user reopen the conversations, we reindex them
Loader while we reindex document Modal if some documents fail
Add a claim to prevent concurrent request to create orphan collections
7c9055b to
20281ca
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/chat/clients/pydantic_ai.py (1)
643-653:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't mark the conversation
INDEXEDbefore the first document is stored.If any later
parse_and_store_document(...)call fails,_handle_input_documents()aborts the turn but this block has already persistedindex_state=INDEXED. On the next request_run_agent()only retriesUNINDEXED/ERROR, so the conversation can get stuck in a false healthy state with an empty or partial collection.Suggested direction
if not document_store.collection_id: # Create a new collection for the conversation collection_id = document_store.create_collection( name=f"conversation-{self.conversation.pk}", ) self.conversation.collection_id = str(collection_id) - self.conversation.index_state = CollectionIndexState.INDEXED + self.conversation.index_state = CollectionIndexState.INDEXING await self.conversation.asave( update_fields=["collection_id", "index_state", "updated_at"] ) - for document in documents: - key = None - if isinstance(document, DocumentUrl): - ... - else: - ... + try: + for document in documents: + key = None + if isinstance(document, DocumentUrl): + ... + else: + ... + except Exception: + self.conversation.index_state = CollectionIndexState.ERROR + await self.conversation.asave(update_fields=["index_state", "updated_at"]) + raise + else: + self.conversation.index_state = CollectionIndexState.INDEXED + await self.conversation.asave(update_fields=["index_state", "updated_at"])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/chat/clients/pydantic_ai.py` around lines 643 - 653, The code currently sets self.conversation.index_state = CollectionIndexState.INDEXED immediately after creating an empty document collection (in the block using document_store_backend(self.conversation.collection_id) and document_store.create_collection), which can leave the conversation marked INDEXED even if subsequent parse_and_store_document calls fail; change this to not mark INDEXED on creation—set the index_state to UNINDEXED (or leave unset) when creating the empty collection and persist that, and instead update self.conversation.index_state = CollectionIndexState.INDEXED only after the first successful parse_and_store_document (or wherever documents are actually persisted) so the state accurately reflects a populated/fully indexed collection.
♻️ Duplicate comments (1)
src/frontend/apps/conversations/src/features/chat/components/Chat.tsx (1)
380-395:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRender rich resume-error content outside the typography wrapper.
This branch now stores a fragment with a
<ul>, but the modal still does<Text>{chatErrorModal?.message}</Text>on Line 901. That wraps block content inside the text component, so the failed-documents list won't render correctly.💡 Suggested fix
- <Modal - isOpen={!!chatErrorModal} - onClose={() => { - setChatErrorModal(null); - }} - title={chatErrorModal?.title} - hideCloseButton={true} - closeOnClickOutside={true} - closeOnEsc={true} - size={ModalSize.MEDIUM} - > - <Text>{chatErrorModal?.message}</Text> - </Modal> + <Modal + isOpen={!!chatErrorModal} + onClose={() => { + setChatErrorModal(null); + }} + title={chatErrorModal?.title} + hideCloseButton={true} + closeOnClickOutside={true} + closeOnEsc={true} + size={ModalSize.MEDIUM} + > + {typeof chatErrorModal?.message === 'string' ? ( + <Text>{chatErrorModal.message}</Text> + ) : ( + chatErrorModal?.message + )} + </Modal>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend/apps/conversations/src/features/chat/components/Chat.tsx` around lines 380 - 395, The chat error modal currently stores complex JSX (a <ul> list) via setChatErrorModal but the modal later wraps chatErrorModal?.message inside a <Text> component (see chatErrorModal?.message usage), which breaks rendering of block-level content; update the modal state shape so message is a ReactNode (not always string) and change the modal rendering to render chatErrorModal?.message directly (remove the surrounding <Text> wrapper) or conditionally wrap only when the message is a plain string; update usages around setChatErrorModal and the modal component that renders chatErrorModal?.message to handle both string and JSX nodes correctly.
🧹 Nitpick comments (2)
CHANGELOG.md (1)
21-21: ⚡ Quick winConsider mentioning re-indexing for completeness.
The entry focuses on de-indexing but the feature also includes automatic re-indexing when a user resumes an inactive conversation. Since re-indexing is a significant user-facing behavior (including new frontend UX), consider rewording for clarity.
Suggested rewording
-- ✨(back) de-index collections after inactivity +- ✨(back) de-index and re-index collections based on activityor
-- ✨(back) de-index collections after inactivity +- ✨(back) lifecycle management for RAG collections (de-index inactive, re-index on resume)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CHANGELOG.md` at line 21, Update the CHANGELOG entry that currently reads "✨(back) de-index collections after inactivity" to explicitly mention that collections are automatically re-indexed when a user resumes an inactive conversation and note the accompanying frontend UX change; replace or expand that line to something like "de-index collections after inactivity and automatically re-index when conversation resumes (includes new frontend UX for re-indexing)" so release notes clearly reflect both backend and user-facing behavior.src/backend/chat/clients/conversation_reindexer.py (1)
111-115: ⚡ Quick winTreat the reindexer’s
collection_idupdate as correct for current backends; optional to useacreate_collection()’s return for future-proofing.
BaseRagBackend.acreate_collection()delegates tocreate_collection(), and both shipped RAG backends assignself.collection_idbefore returning (AlbertRagBackend.acreate_collection()/create_collection(),FindRagBackend.create_collection()), so the currentdocument_store.collection_idusage insrc/backend/chat/clients/conversation_reindexer.py(lines 111-115, 164-165) matches reality.- Still, capturing the returned id in the reindexer would make this resilient to a future backend that doesn’t mutate
collection_id.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/chat/clients/conversation_reindexer.py` around lines 111 - 115, The reindexer currently relies on document_store.collection_id being set by backend implementations (via BaseRagBackend.acreate_collection delegating to create_collection), which works today but is fragile; modify the block in conversation_reindexer.py where document_store.acreate_collection(name=f"conversation-{conversation.pk}") is awaited to capture its return value (e.g., new_id = await document_store.acreate_collection(...)) and then use that returned id to set the collection_id used later (assign to document_store.collection_id or to the local existing_collection_id variable) so the code doesn't rely solely on backends mutating state (refer to document_store.acreate_collection, BaseRagBackend.acreate_collection, AlbertRagBackend, FindRagBackend, and the conversation reindex flow).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/backend/chat/clients/pydantic_ai.py`:
- Around line 643-653: The code currently sets self.conversation.index_state =
CollectionIndexState.INDEXED immediately after creating an empty document
collection (in the block using
document_store_backend(self.conversation.collection_id) and
document_store.create_collection), which can leave the conversation marked
INDEXED even if subsequent parse_and_store_document calls fail; change this to
not mark INDEXED on creation—set the index_state to UNINDEXED (or leave unset)
when creating the empty collection and persist that, and instead update
self.conversation.index_state = CollectionIndexState.INDEXED only after the
first successful parse_and_store_document (or wherever documents are actually
persisted) so the state accurately reflects a populated/fully indexed
collection.
---
Duplicate comments:
In `@src/frontend/apps/conversations/src/features/chat/components/Chat.tsx`:
- Around line 380-395: The chat error modal currently stores complex JSX (a <ul>
list) via setChatErrorModal but the modal later wraps chatErrorModal?.message
inside a <Text> component (see chatErrorModal?.message usage), which breaks
rendering of block-level content; update the modal state shape so message is a
ReactNode (not always string) and change the modal rendering to render
chatErrorModal?.message directly (remove the surrounding <Text> wrapper) or
conditionally wrap only when the message is a plain string; update usages around
setChatErrorModal and the modal component that renders chatErrorModal?.message
to handle both string and JSX nodes correctly.
---
Nitpick comments:
In `@CHANGELOG.md`:
- Line 21: Update the CHANGELOG entry that currently reads "✨(back) de-index
collections after inactivity" to explicitly mention that collections are
automatically re-indexed when a user resumes an inactive conversation and note
the accompanying frontend UX change; replace or expand that line to something
like "de-index collections after inactivity and automatically re-index when
conversation resumes (includes new frontend UX for re-indexing)" so release
notes clearly reflect both backend and user-facing behavior.
In `@src/backend/chat/clients/conversation_reindexer.py`:
- Around line 111-115: The reindexer currently relies on
document_store.collection_id being set by backend implementations (via
BaseRagBackend.acreate_collection delegating to create_collection), which works
today but is fragile; modify the block in conversation_reindexer.py where
document_store.acreate_collection(name=f"conversation-{conversation.pk}") is
awaited to capture its return value (e.g., new_id = await
document_store.acreate_collection(...)) and then use that returned id to set the
collection_id used later (assign to document_store.collection_id or to the local
existing_collection_id variable) so the code doesn't rely solely on backends
mutating state (refer to document_store.acreate_collection,
BaseRagBackend.acreate_collection, AlbertRagBackend, FindRagBackend, and the
conversation reindex flow).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e1fd1c21-f9f5-446e-ab5e-f36466d540e6
⛔ Files ignored due to path filters (1)
src/frontend/apps/conversations/src/features/chat/assets/chat-bubbles-illustration.svgis excluded by!**/*.svg
📒 Files selected for processing (25)
CHANGELOG.mdMakefiledocs/attachments.mdsrc/backend/chat/clients/conversation_reindexer.pysrc/backend/chat/clients/pydantic_ai.pysrc/backend/chat/document_context_builder.pysrc/backend/chat/enums.pysrc/backend/chat/management/__init__.pysrc/backend/chat/management/commands/__init__.pysrc/backend/chat/management/commands/deindex_inactive_collections.pysrc/backend/chat/migrations/0008_chatconversation_index_state_attachment_is_indexed.pysrc/backend/chat/models.pysrc/backend/chat/tests/clients/pydantic_ai/test_reindex_conversation.pysrc/backend/chat/tests/management/__init__.pysrc/backend/chat/tests/management/commands/__init__.pysrc/backend/chat/tests/management/commands/test_deindex_inactive_collections.pysrc/backend/chat/tests/test_document_context_builder.pysrc/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.pysrc/backend/conversations/settings.pysrc/frontend/apps/conversations/src/features/chat/components/Chat.tsxsrc/frontend/apps/conversations/src/features/chat/components/MessageItem.tsxsrc/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsxsrc/frontend/apps/conversations/src/features/chat/components/__tests__/ToolInvocationItem.test.tsxsrc/helm/conversations/templates/backend_cronjob_deindex.yamlsrc/helm/conversations/values.yaml
🚧 Files skipped from review as they are similar to previous changes (12)
- src/frontend/apps/conversations/src/features/chat/components/MessageItem.tsx
- src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx
- src/backend/chat/enums.py
- Makefile
- src/backend/chat/models.py
- src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.py
- src/helm/conversations/values.yaml
- src/backend/chat/tests/management/commands/test_deindex_inactive_collections.py
- src/backend/conversations/settings.py
- docs/attachments.md
- src/backend/chat/document_context_builder.py
- src/backend/chat/management/commands/deindex_inactive_collections.py
20281ca to
3985d8a
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/chat/clients/pydantic_ai.py (1)
643-653:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't persist
INDEXEDbefore any document store succeeds.This marks the conversation as indexed right after
create_collection(), but the laterparse_and_store_document()calls can still fail. In that case_handle_input_documents()aborts the turn and leaves the conversation inINDEXED, so the next resume skips theDEINDEXED/ERRORrecovery path even though nothing may be searchable yet. SetINDEXEDonly after at least one document is stored, or flip the conversation toERRORin the exception path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/chat/clients/pydantic_ai.py` around lines 643 - 653, The code sets self.conversation.index_state = CollectionIndexState.INDEXED immediately after document_store.create_collection(), which can leave the conversation incorrectly marked INDEXED if subsequent parse_and_store_document() calls fail; change the flow so you only set INDEXED after at least one successful parse_and_store_document() (or alternatively set CollectionIndexState.ERROR in the exception path). Specifically, keep using document_store_backend(...) and create_collection(...) to obtain and persist self.conversation.collection_id, but delay setting index_state to INDEXED and calling asave(...) until after a successful parse/store step (the code that calls parse_and_store_document()/ _handle_input_documents()); also catch exceptions around the parse/store loop and set self.conversation.index_state = CollectionIndexState.ERROR and await asave(update_fields=["index_state","updated_at"]) if storing fails so recovery paths work correctly.
♻️ Duplicate comments (1)
src/backend/chat/clients/pydantic_ai.py (1)
1244-1257:⚠️ Potential issue | 🟠 Major | ⚡ Quick winExclude current-turn uploads from the resume reindex pass.
This reindex runs before
_handle_input_documents(). Sincereindex_conversation()reindexes any READY text attachment withnot a.is_indexed, a newly uploadedDocumentUrlcan be indexed here and then parsed again in_handle_input_documents(), duplicating chunks in the same collection. Filter the current request's attachments out of the resume pass, or move the resume reindex after input-document handling.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/chat/clients/pydantic_ai.py` around lines 1244 - 1257, The resume reindex step (reindex_conversation called before _handle_input_documents in the block around is_conversation_deindexed) can index newly uploaded DocumentUrl attachments from the current request and cause duplicate chunks; fix by excluding current-turn uploads from that resume pass: collect IDs (or another unique marker) of input_documents/document uploads from the current request and filter them out when building in_context_ids or when iterating reindex_conversation so any DocumentUrl whose id is in the current request is skipped, or alternatively move the reindex_conversation call to after _handle_input_documents; update the logic in the is_conversation_deindexed branch (functions/methods: _build_documents_listing, reindex_conversation, _handle_input_documents, conversation.arefresh_from_db) to implement one of these options.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/backend/chat/management/commands/deindex_inactive_collections.py`:
- Around line 27-35: The deindex path currently sets
ChatConversation.index_state to CollectionIndexState.DEINDEXED and clears
attachment fields (ChatConversationAttachment.is_indexed and rag_document_id)
before calling delete_collection, but on delete_collection failure only
collection_id is restored; update the rollback to also restore index_state and
attachment metadata (or defer clearing attachments until after delete_collection
succeeds) so that on backend delete errors you either (a) don't clear
attachments until delete succeeds, or (b) when catching the delete exception
restore ChatConversation.index_state and restore
ChatConversationAttachment.is_indexed and rag_document_id together (apply the
same fix for the analogous block at the later occurrence around lines 58-61).
Use the existing identifiers _claim / ChatConversation,
ChatConversationAttachment, delete_collection, and CollectionIndexState to
locate the code.
- Around line 83-90: The queryset for ChatConversation currently filters only
for collection_id__isnull=False but still includes empty-string values; update
the queryset construction (the
ChatConversation.objects.filter(...).only(...).iterator(...) block) to exclude
empty strings as well (e.g., add .exclude(collection_id="") or change the filter
to include collection_id__gt="") so rows with collection_id == "" are skipped
and delete_collection("") is never called.
In
`@src/backend/chat/migrations/0008_chatconversation_index_state_attachment_is_indexed.py`:
- Around line 23-33: The migration's CharField choices for the conversation
index state are missing the "deindexed" option; update the choices list in the
models.CharField (the migration that defines choices
["unindexed","indexing","indexed","error"]) to include "deindexed" (matching
CollectionIndexState.DEINDEXED) so historical DB state aligns with the runtime
enum and the deindex command; ensure the choice string and human-readable label
follow the existing pattern (e.g., ("deindexed", "DEINDEXED")).
---
Outside diff comments:
In `@src/backend/chat/clients/pydantic_ai.py`:
- Around line 643-653: The code sets self.conversation.index_state =
CollectionIndexState.INDEXED immediately after
document_store.create_collection(), which can leave the conversation incorrectly
marked INDEXED if subsequent parse_and_store_document() calls fail; change the
flow so you only set INDEXED after at least one successful
parse_and_store_document() (or alternatively set CollectionIndexState.ERROR in
the exception path). Specifically, keep using document_store_backend(...) and
create_collection(...) to obtain and persist self.conversation.collection_id,
but delay setting index_state to INDEXED and calling asave(...) until after a
successful parse/store step (the code that calls parse_and_store_document()/
_handle_input_documents()); also catch exceptions around the parse/store loop
and set self.conversation.index_state = CollectionIndexState.ERROR and await
asave(update_fields=["index_state","updated_at"]) if storing fails so recovery
paths work correctly.
---
Duplicate comments:
In `@src/backend/chat/clients/pydantic_ai.py`:
- Around line 1244-1257: The resume reindex step (reindex_conversation called
before _handle_input_documents in the block around is_conversation_deindexed)
can index newly uploaded DocumentUrl attachments from the current request and
cause duplicate chunks; fix by excluding current-turn uploads from that resume
pass: collect IDs (or another unique marker) of input_documents/document uploads
from the current request and filter them out when building in_context_ids or
when iterating reindex_conversation so any DocumentUrl whose id is in the
current request is skipped, or alternatively move the reindex_conversation call
to after _handle_input_documents; update the logic in the
is_conversation_deindexed branch (functions/methods: _build_documents_listing,
reindex_conversation, _handle_input_documents, conversation.arefresh_from_db) to
implement one of these options.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 76e0308c-f446-43af-8e0d-f0efbf5f22ed
📒 Files selected for processing (12)
docs/attachments.mdsrc/backend/chat/clients/conversation_reindexer.pysrc/backend/chat/clients/pydantic_ai.pysrc/backend/chat/document_context_builder.pysrc/backend/chat/enums.pysrc/backend/chat/management/commands/deindex_inactive_collections.pysrc/backend/chat/migrations/0008_chatconversation_index_state_attachment_is_indexed.pysrc/backend/chat/models.pysrc/backend/chat/tests/clients/pydantic_ai/test_reindex_conversation.pysrc/backend/chat/tests/management/commands/test_deindex_inactive_collections.pysrc/backend/chat/tests/test_document_context_builder.pysrc/backend/conversations/settings.py
🚧 Files skipped from review as they are similar to previous changes (7)
- src/backend/chat/tests/management/commands/test_deindex_inactive_collections.py
- src/backend/chat/enums.py
- src/backend/chat/document_context_builder.py
- src/backend/chat/clients/conversation_reindexer.py
- src/backend/chat/tests/clients/pydantic_ai/test_reindex_conversation.py
- src/backend/chat/models.py
- docs/attachments.md
| claimed = ChatConversation.objects.filter( | ||
| pk=conv.pk, | ||
| collection_id=conv.collection_id, | ||
| updated_at__lt=threshold, | ||
| ).update(collection_id=None, index_state=CollectionIndexState.DEINDEXED) | ||
| if claimed: | ||
| ChatConversationAttachment.objects.filter(conversation_id=conv.pk).update( | ||
| is_indexed=False, rag_document_id=None | ||
| ) |
There was a problem hiding this comment.
Rollback the full deindex state on backend delete errors.
_claim() already flips the conversation to DEINDEXED and clears every attachment's is_indexed / rag_document_id, but the error path restores only collection_id. If delete_collection() fails transiently and the backend collection still exists, the next resume will see not a.is_indexed everywhere and reindex those documents into the existing collection, duplicating content. Either defer clearing attachment metadata until the delete succeeds, or restore index_state and attachment metadata together on failure.
Also applies to: 58-61
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/backend/chat/management/commands/deindex_inactive_collections.py` around
lines 27 - 35, The deindex path currently sets ChatConversation.index_state to
CollectionIndexState.DEINDEXED and clears attachment fields
(ChatConversationAttachment.is_indexed and rag_document_id) before calling
delete_collection, but on delete_collection failure only collection_id is
restored; update the rollback to also restore index_state and attachment
metadata (or defer clearing attachments until after delete_collection succeeds)
so that on backend delete errors you either (a) don't clear attachments until
delete succeeds, or (b) when catching the delete exception restore
ChatConversation.index_state and restore ChatConversationAttachment.is_indexed
and rag_document_id together (apply the same fix for the analogous block at the
later occurrence around lines 58-61). Use the existing identifiers _claim /
ChatConversation, ChatConversationAttachment, delete_collection, and
CollectionIndexState to locate the code.
| queryset = ( | ||
| ChatConversation.objects.filter( | ||
| collection_id__isnull=False, | ||
| updated_at__lt=threshold, | ||
| ) | ||
| .only("id", "collection_id") | ||
| .iterator(chunk_size=100) | ||
| ) |
There was a problem hiding this comment.
Skip blank collection_id values too.
collection_id is nullable and blankable, and the migration backfill already treats "" as "no collection". This queryset will still claim those rows and call delete_collection(""), which can create avoidable errors and mark never-indexed conversations as deindexed. Add an empty-string exclusion here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/backend/chat/management/commands/deindex_inactive_collections.py` around
lines 83 - 90, The queryset for ChatConversation currently filters only for
collection_id__isnull=False but still includes empty-string values; update the
queryset construction (the
ChatConversation.objects.filter(...).only(...).iterator(...) block) to exclude
empty strings as well (e.g., add .exclude(collection_id="") or change the filter
to include collection_id__gt="") so rows with collection_id == "" are skipped
and delete_collection("") is never called.
3985d8a to
6da02ff
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/backend/chat/tests/clients/pydantic_ai/test_reindex_conversation.py (1)
92-94:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRefresh conversation before asserting persisted no-op state.
These assertions read the in-memory instance only. Add
await conversation.arefresh_from_db()before checkingcollection_idto ensure DB-side regressions are caught.Also applies to: 243-245
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/chat/tests/clients/pydantic_ai/test_reindex_conversation.py` around lines 92 - 94, The test is asserting on the in-memory Conversation instance instead of its persisted state; before asserting conversation.collection_id is None (and similarly at the other occurrence around lines 243-245), call await conversation.arefresh_from_db() to refresh the model from the DB so the assertion verifies the persisted no-op state—update the test to await conversation.arefresh_from_db() immediately before asserting collection_id and any other persisted fields.src/backend/chat/migrations/0008_chatconversation_index_state_attachment_is_indexed.py (1)
24-29:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd missing
deindexedchoice to migration state.The migration choices omit
("deindexed", "DEINDEXED")while runtime logic uses that state. This creates model-state drift and unnecessary futureAlterFieldchurn.Suggested fix
choices=[ ("unindexed", "UNINDEXED"), + ("deindexed", "DEINDEXED"), ("indexing", "INDEXING"), ("indexed", "INDEXED"), ("error", "ERROR"), ],🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/chat/migrations/0008_chatconversation_index_state_attachment_is_indexed.py` around lines 24 - 29, The migration's choices list for the state field in 0008_chatconversation_index_state_attachment_is_indexed.py is missing the ("deindexed", "DEINDEXED") option used at runtime; update the choices array (the tuple list assigned to choices) to include ("deindexed", "DEINDEXED") alongside ("unindexed","UNINDEXED"), ("indexing","INDEXING"), ("indexed","INDEXED"), ("error","ERROR") so the migration matches runtime model states and avoids future AlterField churn.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/backend/chat/clients/pydantic_ai.py`:
- Around line 539-540: The early return that enables RAG when
conversation_has_own_documents is true is unsafe; update the guard in the
function containing that check so it only returns True if
conversation_has_own_documents AND the conversation has a searchable collection
(e.g., conversation.collection_id is non-empty) or a verified indexed-attachment
flag, rather than relying solely on attachment presence; apply the same fix to
the duplicate check around lines 1274-1276 (both places that reference
conversation_has_own_documents) so RAG tools are only enabled when a valid
collection_id or confirmed indexed state exists.
- Around line 649-653: Don't set conversation.index_state =
CollectionIndexState.INDEXED and persist it (self.conversation.asave(...))
before attempting to index; instead, perform parse_and_store_document (the
indexing call that may fail) first and only on its success assign
self.conversation.collection_id = str(collection_id), set
self.conversation.index_state = CollectionIndexState.INDEXED, and then await
self.conversation.asave(update_fields=["collection_id","index_state","updated_at"]);
also handle exceptions from parse_and_store_document to set a failure state
(e.g., CollectionIndexState.FAILED) or leave it unchanged and persist that state
so a failed indexing run doesn't leave the conversation marked INDEXED.
---
Duplicate comments:
In
`@src/backend/chat/migrations/0008_chatconversation_index_state_attachment_is_indexed.py`:
- Around line 24-29: The migration's choices list for the state field in
0008_chatconversation_index_state_attachment_is_indexed.py is missing the
("deindexed", "DEINDEXED") option used at runtime; update the choices array (the
tuple list assigned to choices) to include ("deindexed", "DEINDEXED") alongside
("unindexed","UNINDEXED"), ("indexing","INDEXING"), ("indexed","INDEXED"),
("error","ERROR") so the migration matches runtime model states and avoids
future AlterField churn.
In `@src/backend/chat/tests/clients/pydantic_ai/test_reindex_conversation.py`:
- Around line 92-94: The test is asserting on the in-memory Conversation
instance instead of its persisted state; before asserting
conversation.collection_id is None (and similarly at the other occurrence around
lines 243-245), call await conversation.arefresh_from_db() to refresh the model
from the DB so the assertion verifies the persisted no-op state—update the test
to await conversation.arefresh_from_db() immediately before asserting
collection_id and any other persisted fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b30dc389-3f87-464d-b822-97c5c8412451
📒 Files selected for processing (13)
docs/attachments.mdsrc/backend/chat/clients/conversation_reindexer.pysrc/backend/chat/clients/pydantic_ai.pysrc/backend/chat/document_context_builder.pysrc/backend/chat/enums.pysrc/backend/chat/management/commands/deindex_inactive_collections.pysrc/backend/chat/migrations/0008_chatconversation_index_state_attachment_is_indexed.pysrc/backend/chat/models.pysrc/backend/chat/tests/clients/pydantic_ai/test_reindex_conversation.pysrc/backend/chat/tests/management/commands/test_deindex_inactive_collections.pysrc/backend/chat/tests/test_document_context_builder.pysrc/backend/conversations/settings.pysrc/helm/conversations/templates/backend_cronjob_deindex.yaml
🚧 Files skipped from review as they are similar to previous changes (7)
- src/backend/conversations/settings.py
- src/backend/chat/enums.py
- src/helm/conversations/templates/backend_cronjob_deindex.yaml
- src/backend/chat/clients/conversation_reindexer.py
- src/backend/chat/tests/management/commands/test_deindex_inactive_collections.py
- docs/attachments.md
- src/backend/chat/management/commands/deindex_inactive_collections.py
2a75d4c to
d637900
Compare
improve deindexing behaviour by adding index_state scale up deindexing script
d637900 to
8b93c21
Compare
|


Purpose
Automatic de-indexing and re-indexing of inactive conversation document collections.
Proposal
demo-reindexing.mov
Summary by CodeRabbit
New Features
UI
Documentation
Tests
Chores