One-way synchronisation for data management command line tool dtool.
The dtool-sync python package provides a command line interface
for synchronization between different dataset base URIs.
It introduces two new subcommands, dtool compare for comparing two
base URIs and dtool sync for actually transferring datasets from
one base URI to the other.
Compare datasets at two different base URIs:
$ dtool compare all lhs rhs Datasets equal on source and target: lion file://path/to/lhs/lion she file://path/to/lhs/she cat file://path/to/lhs/cat Datasets changed from source to target: changed file://path/to/lhs/changed Datasets missing on target: people file://path/to/lhs/people
Datasets identified as equal by comparing their metadata appear first, followed by datasets that are present at both URIs, but have changed. A common case for differing datasets is an interrupted transfer. In such a case, the source dataset is has been frozen before, but its partial copy at the destination is still marked as a proto dataset. Eventually, datasets present at the left hand side URI, but missing at the right hand side URI are shown. Note that datasets present at rhs but missing at lhs are not shown. To identify those, invert the comparison’s direction.
To actually sync from lhs to rhs, use
$ dtool sync all lhs rhs Datasets equal on source and target: lion file://path/to/lhs/lion she file://path/to/lhs/she cat file://path/to/lhs/cat Datasets changed from source to target: changed file://path/to/lhs/changed Datasets missing on target: people file://path/to/lhs/people Resume copying of changed datasets, presuming their transfer had been interrupted in an earlier attempt. Dataset copied to: file://path/to/rhs/changed Copy missing datasets. Dataset copied to: file://path/to/rhs/people
Datasets already partially present at rhs are transferred first,
then missing datasets. Again, this only syncs one way from lhs to
rhs.
Use -verbose or -v to show more metadata in the output:
$ dtool compare all -v lhs rhs Datasets equal on source and target: lion file://path/to/lhs/lion jotelha 2021-09-05 065d9fe0-9e41-4add-8a55-577dbcfe2149 she file://path/to/lhs/she jotelha 2021-09-05 9ee101a4-7d1a-45c0-8955-da779398a5ed cat file://path/to/lhs/cat jotelha 2021-09-05 c2249963-6459-4901-8263-85610a7a2ac9 Datasets changed from source to target: changed file://path/to/lhs/changed jotelha 2021-09-05 af16c00d-f60d-41ce-83c6-2a7d9c5e1b0d Datasets missing on target: people file://path/to/lhs/people jotelha 2021-09-05 534792bd-d102-4efc-bc11-6af743959704
To put emphasis on the UUID instead of the name in the output of
compare, use --uuid or -u`…
$ dtool compare all -u lhs rhs Datasets equal on source and target: 065d9fe0-9e41-4add-8a55-577dbcfe2149 file://path/to/lhs/lion 9ee101a4-7d1a-45c0-8955-da779398a5ed file://path/to/lhs/she c2249963-6459-4901-8263-85610a7a2ac9 file://path/to/lhs/cat Datasets changed from source to target: af16c00d-f60d-41ce-83c6-2a7d9c5e1b0d file://path/to/lhs/changed Datasets missing on target: 534792bd-d102-4efc-bc11-6af743959704 file://path/to/lhs/people
… and combine with -v as you please:
$ dtool compare all -uv lhs rhs Datasets equal on source and target: 065d9fe0-9e41-4add-8a55-577dbcfe2149 file://path/to/lhs/lion jotelha 2021-09-05 lion 9ee101a4-7d1a-45c0-8955-da779398a5ed file://path/to/lhs/she jotelha 2021-09-05 she c2249963-6459-4901-8263-85610a7a2ac9 file://path/to/lhs/cat jotelha 2021-09-05 cat Datasets changed from source to target: af16c00d-f60d-41ce-83c6-2a7d9c5e1b0d file://path/to/lhs/changed jotelha 2021-09-05 changed Datasets missing on target: 534792bd-d102-4efc-bc11-6af743959704 file://path/to/lhs/people jotelha 2021-09-05 people
Instead of all, just list changed, equal or missing
datasets and use --quiet or -q` to only identify the datasets by
name…
$ dtool compare changed -q lhs rhs file://path/to/lhs/changed
… JSON-formatted …
$ dtool compare missing -jq lhs rhs
[
"534792bd-d102-4efc-bc11-6af743959704"
]
… or by UUID:
$ dtool compare equal -qu lhs rhs 065d9fe0-9e41-4add-8a55-577dbcfe2149 9ee101a4-7d1a-45c0-8955-da779398a5ed c2249963-6459-4901-8263-85610a7a2ac9
To print the comparison results in JSON, use --json or -j.
With the all command, the output is categorized into a dict with
keys equal, changed, and missing.
$ dtool compare all -j lhs rhs
{
"equal": [
{
"name": "lion",
"uuid": "065d9fe0-9e41-4add-8a55-577dbcfe2149",
"creator_username": "jotelha",
"frozen_at": "2021-09-05"
},
{
"name": "she",
"uuid": "9ee101a4-7d1a-45c0-8955-da779398a5ed",
"creator_username": "jotelha",
"frozen_at": "2021-09-05"
},
{
"name": "cat",
"uuid": "c2249963-6459-4901-8263-85610a7a2ac9",
"creator_username": "jotelha",
"frozen_at": "2021-09-05"
}
],
"changed": [
{
"name": "changed",
"uuid": "af16c00d-f60d-41ce-83c6-2a7d9c5e1b0d",
"creator_username": "jotelha",
"frozen_at": "2021-09-05"
}
],
"missing": [
{
"name": "people",
"uuid": "534792bd-d102-4efc-bc11-6af743959704",
"creator_username": "jotelha",
"frozen_at": "2021-09-05"
}
]
}
Again, --quiet or -q lists only the names (or UUIDs in
connection with -u).
$ dtool compare all -jq lhs rhs
{
"equal": [
"065d9fe0-9e41-4add-8a55-577dbcfe2149",
"9ee101a4-7d1a-45c0-8955-da779398a5ed",
"c2249963-6459-4901-8263-85610a7a2ac9"
],
"changed": [
"af16c00d-f60d-41ce-83c6-2a7d9c5e1b0d"
],
"missing": [
"534792bd-d102-4efc-bc11-6af743959704"
]
}
As above, use --verbose or -v to show more metadata in the
JSON-formatted output. In this case, equal and changed are
shown as lists of tuples of datasets.
$ dtool compare all -jv lhs rhs
{
"equal": [
[
{
"name": "lion",
"uuid": "065d9fe0-9e41-4add-8a55-577dbcfe2149",
"creator_username": "jotelha",
"uri": "file://path/to/lhs/lion",
"frozen_at": "2021-09-05"
},
{
"name": "lion",
"uuid": "065d9fe0-9e41-4add-8a55-577dbcfe2149",
"creator_username": "jotelha",
"uri": "file://path/to/rhs/lion",
"frozen_at": "2021-09-05"
}
],
[
{
"name": "she",
"uuid": "9ee101a4-7d1a-45c0-8955-da779398a5ed",
"creator_username": "jotelha",
"uri": "file://path/to/lhs/she",
"frozen_at": "2021-09-05"
},
{
"name": "she",
"uuid": "9ee101a4-7d1a-45c0-8955-da779398a5ed",
"creator_username": "jotelha",
"uri": "file://path/to/rhs/she",
"frozen_at": "2021-09-05"
}
],
[
{
"name": "cat",
"uuid": "c2249963-6459-4901-8263-85610a7a2ac9",
"creator_username": "jotelha",
"uri": "file://path/to/lhs/cat",
"frozen_at": "2021-09-05"
},
{
"name": "cat",
"uuid": "c2249963-6459-4901-8263-85610a7a2ac9",
"creator_username": "jotelha",
"uri": "file://path/to/rhs/cat",
"frozen_at": "2021-09-05"
}
]
],
"changed": [
[
{
"name": "changed",
"uuid": "af16c00d-f60d-41ce-83c6-2a7d9c5e1b0d",
"creator_username": "jotelha",
"uri": "file://path/to/lhs/changed",
"frozen_at": "2021-09-05"
},
{
"name": "*changed",
"uuid": "af16c00d-f60d-41ce-83c6-2a7d9c5e1b0d",
"creator_username": "jotelha",
"uri": "file://path/to/rhs/changed"
}
]
],
"missing": [
{
"name": "people",
"uuid": "534792bd-d102-4efc-bc11-6af743959704",
"creator_username": "jotelha",
"uri": "file://path/to/lhs/people",
"frozen_at": "2021-09-05"
}
]
}
Direct use of the equal, changed, and missing subcommand
makes such upper-level categorization obsolete. The output is a list
of datasets:
$ dtool compare changed -j lhs rhs
[
{
"name": "changed",
"uuid": "af16c00d-f60d-41ce-83c6-2a7d9c5e1b0d",
"creator_username": "jotelha",
"frozen_at": "2021-09-05"
}
]
$ dtool compare changed -jv lhs rhs
[
[
{
"name": "changed",
"uuid": "af16c00d-f60d-41ce-83c6-2a7d9c5e1b0d",
"creator_username": "jotelha",
"uri": "file://path/to/lhs/changed",
"frozen_at": "2021-09-05"
},
{
"name": "*changed",
"uuid": "af16c00d-f60d-41ce-83c6-2a7d9c5e1b0d",
"creator_username": "jotelha",
"uri": "file://path/to/rhs/changed"
}
]
]
The --raw or -r flag displays metadata (in particular
timestamps) as stored without any conversion reformatting for pretty
output:
$ dtool compare all -rv lhs rhs Datasets equal on source and target: lion file://path/to/lhs/lion jotelha 1630851896.375779 065d9fe0-9e41-4add-8a55-577dbcfe2149 she file://path/to/lhs/she jotelha 1630851892.800604 9ee101a4-7d1a-45c0-8955-da779398a5ed cat file://path/to/lhs/cat jotelha 1630851894.593098 c2249963-6459-4901-8263-85610a7a2ac9 Datasets changed from source to target: changed file://path/to/lhs/changed jotelha 1630862808.395145 af16c00d-f60d-41ce-83c6-2a7d9c5e1b0d Datasets missing on target: people file://path/to/lhs/people jotelha 1630851899.345241 534792bd-d102-4efc-bc11-6af743959704
To install the dtool-sync package,
cd dtool-sync
python setup.py install