Use UUID's for CFDataset's unique_id property by jameshiebert · Pull Request #78 · pacificclimate/nchelpers

jameshiebert · 2020-08-06T17:25:09Z

Constructing the unique_id from attributes of the dataset leads to problems. One can have multiple datasets that end up with the same unique_id (for example datasets that have a different spatial domain, but everything else is the same). When generated according to the standard methods, UUIDs are for practical purposes unique.

Constructing the unique_id from attributes of the dataset leads to problems. One can have multiple datasets that end up with the same unique_id (for example datasets that have a different spatial domain, but everything else is the same). When generated according to the standard methods, UUIDs are for practical purposes unique.

rod-glover

I think there is a problem here. See comment.

rod-glover · 2020-08-06T18:23:26Z

-
-        return unique_id.replace('+', '-')  # In original code, but why?
+        """A unique id for this file"""
+        return str(uuid.uuid4())


This generates a different random UUID every time it is executed. Which means that every time a program that uses nchelpers is run, a different UUID will be generated for any given dataset (file).

But I think the way we use unique_id relies on it being stable across time.

Possible problematic use case: Indexing. When a file is indexed in any given run, its unique_id will be generated (only once during that indexing run, but as a random value). Then when the same file is re-indexed in a later indexing run (it has moved location, say), its unique_id is generated, but it is by definition different than the first one. If the indexer (or anything else) tries to look up this file according to its unique_id, it will get incorrect results. Essentially, if anything (e.g., a modelmeta database) has a memory of a past unique_id for a file, it will be wrong the next time.

Hmm, however, this argument also can be used against using metadata to generate the unique_id, since reindexing could in principle include reindexing a data file with slightly different metadata. But I think that case is implicitly excluded; the idea was that the metadata uniquely identified the file. We have an identity crisis here (pun, but serious too). What is it that uniquely characterizes a file?

Yeah, I was concerned about that too, so thank you for pointing this out.

Admittedly, I filed this PR more as provocation than as a finished product.

Maybe it would be better to change this from a (assumed to be static) property to a method (e.g. create_unique_id()), so users of it have to be intentional about when they want a new one (for example, during indexing) and when they want the existent one.

That sounds more robust.

Question is how to generate the unique id and, if it is not a deterministic computation based on the file contents, how to persist it. One solution is to make method create_unique_id() write its value to the metadata (attribute unique_id?) in the file, and for property unique_id to read it from there. If we want to emphasize "don't screw with this", then maybe name the attribute __unique_id__ or something similarly repellent.

Yeah, that's a great idea.

jameshiebert requested review from corviday and rod-glover August 6, 2020 17:25

jameshiebert force-pushed the unique-newyork branch from 6fcbad1 to a2d400f Compare August 6, 2020 17:32

rod-glover requested changes Aug 6, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use UUID's for CFDataset's unique_id property#78

Use UUID's for CFDataset's unique_id property#78
jameshiebert wants to merge 1 commit into
masterfrom
unique-newyork

jameshiebert commented Aug 6, 2020

Uh oh!

rod-glover left a comment

Uh oh!

rod-glover Aug 6, 2020 •

edited

Loading

Uh oh!

rod-glover Aug 6, 2020 •

edited

Loading

Uh oh!

jameshiebert Aug 6, 2020

Uh oh!

rod-glover Aug 6, 2020 •

edited

Loading

Uh oh!

jameshiebert Aug 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jameshiebert commented Aug 6, 2020

Uh oh!

rod-glover left a comment

Choose a reason for hiding this comment

Uh oh!

rod-glover Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rod-glover Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jameshiebert Aug 6, 2020

Choose a reason for hiding this comment

Uh oh!

rod-glover Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jameshiebert Aug 6, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rod-glover Aug 6, 2020 •

edited

Loading

rod-glover Aug 6, 2020 •

edited

Loading

rod-glover Aug 6, 2020 •

edited

Loading