Skip to content

Add migrator tool#358

Open
caco3 wants to merge 36 commits intozevv:v1.5.0-rc2from
caco3:add-migrator-tool
Open

Add migrator tool#358
caco3 wants to merge 36 commits intozevv:v1.5.0-rc2from
caco3:add-migrator-tool

Conversation

@caco3
Copy link
Copy Markdown

@caco3 caco3 commented Apr 3, 2026

As suggested, I created a migration script.
I also added some shell scripts to test it and documented the DB formats.
See the README's for details.

Notes

  • Use squash when merging!
  • When building the binaries, I used the source from https://github.com/caco3/duc/tree/fix-crashes-on-indexing as it would crash otherwise during the indexing.
  • The test scripts use the /usr/share/doc for the scan by default.
  • Using tkrzw as source to migrate to any other format is pretty slow (several minutes per destination). Since tkrzw will be the new default, it is unlikely a migration from it to other formats is needed, because of this, the tests exclude it by default.

Script runs

./build-all-backends.sh 

==> Building duc-tkrzw ...
    -> /home/me/duc/testing/duc-tkrzw OK

==> Building duc-tokyocabinet ...
    -> /home/me/duc/testing/duc-tokyocabinet OK

==> Building duc-sqlite3 ...
    -> /home/me/duc/testing/duc-sqlite3 OK

==> Building duc-lmdb ...
    -> /home/me/duc/testing/duc-lmdb OK

==> Building duc-leveldb ...
    -> /home/me/duc/testing/duc-leveldb OK

==> Building duc-kyotocabinet ...
    -> /home/me/duc/testing/duc-kyotocabinet OK

=== Build summary ===
  OK    tkrzw  (duc version: 1.5.0-rc2)
  OK    tokyocabinet  (duc version: 1.5.0-rc2)
  OK    sqlite3  (duc version: 1.5.0-rc2)
  OK    lmdb  (duc version: 1.5.0-rc2)
  OK    leveldb  (duc version: 1.5.0-rc2)
  OK    kyotocabinet  (duc version: 1.5.0-rc2)
./test-compare-backends.sh 
Indexing path: /usr/share/doc
DB dir:        /home/me/duc/testing/dbs

[tkrzw] indexing ... opening tkzrw DB with compression: RECORD_COMP_ZSTD
done. dumping json ... done. (7137907 bytes)
[tokyocabinet] indexing ... done. dumping json ... done. (7137907 bytes)
[sqlite3] indexing ... done. dumping json ... done. (7137907 bytes)
[lmdb] indexing ... done. dumping json ... done. (7137907 bytes)
[leveldb] indexing ... done. dumping json ... done. (7137907 bytes)
[kyotocabinet] indexing ... done. dumping json ... done. (7137907 bytes)

=== Pairwise JSON comparison ===

  tkrzw == tokyocabinet  [identical]
  tkrzw == sqlite3  [identical]
  tkrzw == lmdb  [identical]
  tkrzw == leveldb  [identical]
  tkrzw == kyotocabinet  [identical]
  tokyocabinet == sqlite3  [identical]
  tokyocabinet == lmdb  [identical]
  tokyocabinet == leveldb  [identical]
  tokyocabinet == kyotocabinet  [identical]
  sqlite3 == lmdb  [identical]
  sqlite3 == leveldb  [identical]
  sqlite3 == kyotocabinet  [identical]
  lmdb == leveldb  [identical]
  lmdb == kyotocabinet  [identical]
  leveldb == kyotocabinet  [identical]

Result: all backends produce identical JSON output.
./test-migrator.sh  --include-tkrzw-as-source
=== Phase 1: Migrate ===

  tokyocabinet   -> kyotocabinet   ... ok
  tokyocabinet   -> sqlite3        ... ok
  tokyocabinet   -> lmdb           ... ok
  tokyocabinet   -> leveldb        ... ok
  tokyocabinet   -> tkrzw          ... ok
  kyotocabinet   -> tokyocabinet   ... ok
  kyotocabinet   -> sqlite3        ... ok
  kyotocabinet   -> lmdb           ... ok
  kyotocabinet   -> leveldb        ... ok
  kyotocabinet   -> tkrzw          ... ok
  sqlite3        -> tokyocabinet   ... ok
  sqlite3        -> kyotocabinet   ... ok
  sqlite3        -> lmdb           ... ok
  sqlite3        -> leveldb        ... ok
  sqlite3        -> tkrzw          ... ok
  lmdb           -> tokyocabinet   ... ok
  lmdb           -> kyotocabinet   ... ok
  lmdb           -> sqlite3        ... ok
  lmdb           -> leveldb        ... ok
  lmdb           -> tkrzw          ... ok
  leveldb        -> tokyocabinet   ... ok
  leveldb        -> kyotocabinet   ... ok
  leveldb        -> sqlite3        ... ok
  leveldb        -> lmdb           ... ok
  leveldb        -> tkrzw          ... ok
  [tkrzw] WARNING: tkrzw source iteration is very slow — this may take several minutes per destination
  tkrzw          -> tokyocabinet   ... ok
  tkrzw          -> kyotocabinet   ... ok
  tkrzw          -> sqlite3        ... ok
  tkrzw          -> lmdb           ... ok
  tkrzw          -> leveldb        ... ok

=== Phase 2: Export JSON ===

  tokyocabinet-to-kyotocabinet   ... ok  (7137907 bytes)
  tokyocabinet-to-sqlite3        ... ok  (7137907 bytes)
  tokyocabinet-to-lmdb           ... ok  (7137907 bytes)
  tokyocabinet-to-leveldb        ... ok  (7137907 bytes)
  tokyocabinet-to-tkrzw          ... ok  (7137907 bytes)
  kyotocabinet-to-tokyocabinet   ... ok  (7137907 bytes)
  kyotocabinet-to-sqlite3        ... ok  (7137907 bytes)
  kyotocabinet-to-lmdb           ... ok  (7137907 bytes)
  kyotocabinet-to-leveldb        ... ok  (7137907 bytes)
  kyotocabinet-to-tkrzw          ... ok  (7137907 bytes)
  sqlite3-to-tokyocabinet        ... ok  (7137907 bytes)
  sqlite3-to-kyotocabinet        ... ok  (7137907 bytes)
  sqlite3-to-lmdb                ... ok  (7137907 bytes)
  sqlite3-to-leveldb             ... ok  (7137907 bytes)
  sqlite3-to-tkrzw               ... ok  (7137907 bytes)
  lmdb-to-tokyocabinet           ... ok  (7137907 bytes)
  lmdb-to-kyotocabinet           ... ok  (7137907 bytes)
  lmdb-to-sqlite3                ... ok  (7137907 bytes)
  lmdb-to-leveldb                ... ok  (7137907 bytes)
  lmdb-to-tkrzw                  ... ok  (7137907 bytes)
  leveldb-to-tokyocabinet        ... ok  (7137907 bytes)
  leveldb-to-kyotocabinet        ... ok  (7137907 bytes)
  leveldb-to-sqlite3             ... ok  (7137907 bytes)
  leveldb-to-lmdb                ... ok  (7137907 bytes)
  leveldb-to-tkrzw               ... ok  (7137907 bytes)
  tkrzw-to-tokyocabinet          ... ok  (7137907 bytes)
  tkrzw-to-kyotocabinet          ... ok  (7137907 bytes)
  tkrzw-to-sqlite3               ... ok  (7137907 bytes)
  tkrzw-to-lmdb                  ... ok  (7137907 bytes)
  tkrzw-to-leveldb               ... ok  (7137907 bytes)

=== Phase 3: Compare JSON ===

  tokyocabinet-to-kyotocabinet   ... match
  tokyocabinet-to-sqlite3        ... match
  tokyocabinet-to-lmdb           ... match
  tokyocabinet-to-leveldb        ... match
  tokyocabinet-to-tkrzw          ... match
  kyotocabinet-to-tokyocabinet   ... match
  kyotocabinet-to-sqlite3        ... match
  kyotocabinet-to-lmdb           ... match
  kyotocabinet-to-leveldb        ... match
  kyotocabinet-to-tkrzw          ... match
  sqlite3-to-tokyocabinet        ... match
  sqlite3-to-kyotocabinet        ... match
  sqlite3-to-lmdb                ... match
  sqlite3-to-leveldb             ... match
  sqlite3-to-tkrzw               ... match
  lmdb-to-tokyocabinet           ... match
  lmdb-to-kyotocabinet           ... match
  lmdb-to-sqlite3                ... match
  lmdb-to-leveldb                ... match
  lmdb-to-tkrzw                  ... match
  leveldb-to-tokyocabinet        ... match
  leveldb-to-kyotocabinet        ... match
  leveldb-to-sqlite3             ... match
  leveldb-to-lmdb                ... match
  leveldb-to-tkrzw               ... match
  tkrzw-to-tokyocabinet          ... match
  tkrzw-to-kyotocabinet          ... match
  tkrzw-to-sqlite3               ... match
  tkrzw-to-lmdb                  ... match
  tkrzw-to-leveldb               ... match

=== Summary ===
  Migrations possible  : 30
  Migrations skipped   : 0
  Migrations attempted : 30
  Migration failed     : 0
  JSON export failed   : 0
  JSON match           : 30
  JSON differ          : 0

CaCO3 added 30 commits April 3, 2026 21:28
Example script output:
bash build-all-backends.sh 2>&1

==> Building duc-tkrzw ...
    -> /home/me/duc/testing/duc-tkrzw OK

==> Building duc-tokyocabinet ...
    -> /home/me/duc/testing/duc-tokyocabinet OK

==> Building duc-sqlite3 ...
    -> /home/me/duc/testing/duc-sqlite3 OK

==> Building duc-lmdb ...
    -> /home/me/duc/testing/duc-lmdb OK

==> Building duc-leveldb ...
    -> /home/me/duc/testing/duc-leveldb OK

==> Building duc-kyotocabinet ...
    -> /home/me/duc/testing/duc-kyotocabinet OK

=== Build summary ===
  OK    tkrzw  (duc version: 1.5.0-rc2)
  OK    tokyocabinet  (duc version: 1.5.0-rc2)
  OK    sqlite3  (duc version: 1.5.0-rc2)
  OK    lmdb  (duc version: 1.5.0-rc2)
  OK    leveldb  (duc version: 1.5.0-rc2)
  OK    kyotocabinet  (duc version: 1.5.0-rc2)
Example:
./test-compare-backends.sh
Indexing path: /usr/share/doc
DB dir:        /home/gruinelli/temp/duc/testing/dbs

[tkrzw] indexing ... opening tkzrw DB with compression: RECORD_COMP_ZSTD
done. dumping json ... done. (7140333 bytes)
[tokyocabinet] indexing ... done. dumping json ... done. (7140333 bytes)
[sqlite3] indexing ... done. dumping json ... done. (7140333 bytes)
[lmdb] indexing ... done. dumping json ... done. (7140333 bytes)
[leveldb] indexing ... done. dumping json ... done. (7140333 bytes)
[kyotocabinet] indexing ... done. dumping json ... done. (7140333 bytes)

=== Pairwise JSON comparison ===

  tkrzw == tokyocabinet  [identical]
  tkrzw == sqlite3  [identical]
  tkrzw == lmdb  [identical]
  tkrzw == leveldb  [identical]
  tkrzw == kyotocabinet  [identical]
  tokyocabinet == sqlite3  [identical]
  tokyocabinet == lmdb  [identical]
  tokyocabinet == leveldb  [identical]
  tokyocabinet == kyotocabinet  [identical]
  sqlite3 == lmdb  [identical]
  sqlite3 == leveldb  [identical]
  sqlite3 == kyotocabinet  [identical]
  lmdb == leveldb  [identical]
  lmdb == kyotocabinet  [identical]
  leveldb == kyotocabinet  [identical]

Result: all backends produce identical JSON output.
@caco3 caco3 marked this pull request as ready for review April 3, 2026 22:46
@caco3 caco3 mentioned this pull request Apr 3, 2026
@caco3
Copy link
Copy Markdown
Author

caco3 commented Apr 3, 2026

@l8gravely Please review. Don't get confused about all the commits. If they bother you I can create a new PR with a single commit.

@caco3 caco3 changed the base branch from master to v1.5.0-rc2 April 4, 2026 07:02
@caco3 caco3 marked this pull request as draft April 4, 2026 07:02
@caco3 caco3 marked this pull request as ready for review April 4, 2026 07:06
@caco3
Copy link
Copy Markdown
Author

caco3 commented Apr 4, 2026

I had the MR pointing to master instead of v1.5.0-rc2. This is fixed now (and again no changes affecting DUC itelf)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant