Skip to content

Enhancement of Weaviate migration script#691

Open
sorphwer wants to merge 3 commits intolanggenius:mainfrom
sorphwer:main
Open

Enhancement of Weaviate migration script#691
sorphwer wants to merge 3 commits intolanggenius:mainfrom
sorphwer:main

Conversation

@sorphwer
Copy link

@sorphwer sorphwer commented Mar 1, 2026

This PR enhanced the script based on our internal experiment.

Key improvement:
Fixes the following failure scenarios in the old script:

  1. Schema type mismatch: Old script copies properties as-is, preserving uuid type for document_id/doc_id. Dify expects text type, so the migrated collection appears successful but Dify fails at runtime.

  2. UUID object insertion failure: When source collection has uuid-typed fields, the Weaviate client returns Python UUID objects. Writing these into text-typed fields causes batch insert errors, leading to data loss or migration abort.

  3. moduleConfig rejection: Stale moduleConfig on chunk_index from older Weaviate versions can cause collection creation to fail on newer Weaviate, aborting migration entirely.

  4. Partial migration blindspot: Collections already migrated for vectorConfig but still carrying wrong property types were skipped with "NEW SCHEMA (skip)", leaving silent incompatibilities.

Handle uuid→text conversion for document_id/doc_id and remove spurious
moduleConfig from chunk_index during schema migration. This fixes
property type incompatibilities that could cause issues even when
vectorConfig is already correct.

Fixes the following failure scenarios in the old script:

1. Schema type mismatch: Old script copies properties as-is, preserving
   uuid type for document_id/doc_id. Dify expects text type, so the
   migrated collection appears successful but Dify fails at runtime.

2. UUID object insertion failure: When source collection has uuid-typed
   fields, the Weaviate client returns Python UUID objects. Writing these
   into text-typed fields causes batch insert errors, leading to data
   loss or migration abort.

3. moduleConfig rejection: Stale moduleConfig on chunk_index from older
   Weaviate versions can cause collection creation to fail on newer
   Weaviate, aborting migration entirely.

4. Partial migration blindspot: Collections already migrated for
   vectorConfig but still carrying wrong property types were skipped
   with "NEW SCHEMA (skip)", leaving silent incompatibilities.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sorphwer sorphwer requested a review from RiskeyL as a code owner March 1, 2026 03:54
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Mar 1, 2026
@sorphwer sorphwer requested a review from ZhouhaoJiang March 1, 2026 03:57
sorphwer and others added 2 commits March 1, 2026 11:57
Document how to configure Weaviate connection for both in-container
and local (port-forward) scenarios, and clarify derived values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reorder replace_old_collection to prevent data loss on failure:
- Fetch schema BEFORE deleting anything
- Wrap data copy in try/except to preserve migrated collection on error
- Add count verification after copy, keep migrated as backup on mismatch
- Only delete the migrated collection after full verification passes
- Print recovery instructions (collection name) on every failure path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RiskeyL
Copy link
Collaborator

RiskeyL commented Mar 1, 2026

Hi @DhruvGorasiya, could you please take a look at this PR and review the updates to the migration script?

@sorphwer
Copy link
Author

sorphwer commented Mar 2, 2026

The updated codes are from our internal experiment https://gist.github.com/sorphwer/a5ae5f2eab649d0913a5b7e811e95321

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants