It feels like it is time for this to happen, perhaps for SLiM 6.0. "How do I get my tag values in Python?" is an FAQ, the amount of on-disk space needed is not large (since the number of individuals in the individuals table is typically not large), the runtime overhead would be small (since the time/memory only needs to be spent at write time, and the memory usage is recovered as soon as the write is done), etc. So this issue is about adding this metadata. @petrelharp
However, doing this is a bit tricky. The reason is that all of the tag properties keep track of whether they have been given a value by the user or not. If not, and the user tries to access the tag value, they get a runtime error, "used before set", rather than getting random garbage or zero. This functionality has proved quite useful; it often catches bugs in the way that tag values are being used in a SLiM script. So we want to persist this "has been set" information on the Python side, and ideally we should even provide the same "used before set" error on the Python side, connecting the SLiM and Python sides with the same error-checking.
In SLiM's code, here's the declaration of the tag values in individual.h:
unsigned tagL0_set_ : 1; // T if tagL0 has been set by the user
unsigned tagL0_value_ : 1; // a user-defined tag value of logical type
unsigned tagL1_set_ : 1; // T if tagL1 has been set by the user
unsigned tagL1_value_ : 1; // a user-defined tag value of logical type
unsigned tagL2_set_ : 1; // T if tagL2 has been set by the user
unsigned tagL2_value_ : 1; // a user-defined tag value of logical type
unsigned tagL3_set_ : 1; // T if tagL3 has been set by the user
unsigned tagL3_value_ : 1; // a user-defined tag value of logical type
unsigned tagL4_set_ : 1; // T if tagL4 has been set by the user
unsigned tagL4_value_ : 1; // a user-defined tag value of logical type
slim_usertag_t tag_value_; // a user-defined tag value of integer type
double tagF_value_; // a user-defined tag value of float type
For tag and tagF there are special values used to indicate the "unset" state:
#define SLIM_TAG_UNSET_VALUE (INT64_MIN) // for tags of type slim_usertag_t, the flag value for "unset"
#define SLIM_TAGF_UNSET_VALUE (-DBL_MAX) // for tags of type double (i.e. tagF), the flag value for "unset"
In both cases, these are the minimum representable values for their respective data types (uint64_t and double). SLiM checks for these values on a "get":
slim_usertag_t tag_value = tag_value_;
if (tag_value == SLIM_TAG_UNSET_VALUE)
EIDOS_TERMINATION << "ERROR (Trait::GetProperty): property tag accessed on trait before being set." << EidosTerminate();
(It really ought to check for them on a "set" and not allow them, also, but at present it doesn't; in practice it hasn't ever come up.)
For tagL0, tagL1, tagL2, tagL3, and tagL4, the scheme is a bit different. Taking tagL0 as an example, the tagL0_set_ data member of the Individual class tracks whether that tag value has been set or not, and the tagL0_value_ data member tracks the actual value of that tag value. So the initial state has the set flag false, a "get" checks for set false and errors, and a "set" makes the set flag true. So all of that works as expected, but I'm spelling it out a bit to be quite explicit about the model being followed by SLiM, which ought to be followed on the Python side as well.
So I'm not sure exactly how we want to handle persisting this. For tag and tagF we can probably just persist the value as it is; since individual metadata uses the struct codec the special "unset" marker values should get persisted exactly verbatim. For the tagLX values do we persist it as two separate boolean values? Or as a tri-state (unset, false, true) value?
And for the end user on the Python side, since we want to provide the same "error on get if not set" functionality, we probably don't want the user to be able to see the raw metadata values for any of this; we want to wrap those raw values in some API that handles the "get when unset" functionality, and hide the raw metadata. Maybe. Or maybe we just document this, and let users shoot themselves in the foot if they don't follow the rules.
Pyslim will need to worry about all this. When adding metadata annotations to individuals, their tag metadata should be initialized to "unset", presumably; and there should be some supported way of changing that annotation to a particular value, etc.
So it's a bit of a can of worms, and I think if it's going to happen I need a bit of guidance from you @petrelharp on how you'd like it to be designed. :-> Thanks!
It feels like it is time for this to happen, perhaps for SLiM 6.0. "How do I get my
tagvalues in Python?" is an FAQ, the amount of on-disk space needed is not large (since the number of individuals in the individuals table is typically not large), the runtime overhead would be small (since the time/memory only needs to be spent at write time, and the memory usage is recovered as soon as the write is done), etc. So this issue is about adding this metadata. @petrelharpHowever, doing this is a bit tricky. The reason is that all of the tag properties keep track of whether they have been given a value by the user or not. If not, and the user tries to access the tag value, they get a runtime error, "used before set", rather than getting random garbage or zero. This functionality has proved quite useful; it often catches bugs in the way that tag values are being used in a SLiM script. So we want to persist this "has been set" information on the Python side, and ideally we should even provide the same "used before set" error on the Python side, connecting the SLiM and Python sides with the same error-checking.
In SLiM's code, here's the declaration of the tag values in individual.h:
For
tagandtagFthere are special values used to indicate the "unset" state:In both cases, these are the minimum representable values for their respective data types (
uint64_tanddouble). SLiM checks for these values on a "get":(It really ought to check for them on a "set" and not allow them, also, but at present it doesn't; in practice it hasn't ever come up.)
For
tagL0,tagL1,tagL2,tagL3, andtagL4, the scheme is a bit different. TakingtagL0as an example, thetagL0_set_data member of theIndividualclass tracks whether that tag value has been set or not, and thetagL0_value_data member tracks the actual value of that tag value. So the initial state has thesetflag false, a "get" checks forsetfalse and errors, and a "set" makes thesetflag true. So all of that works as expected, but I'm spelling it out a bit to be quite explicit about the model being followed by SLiM, which ought to be followed on the Python side as well.So I'm not sure exactly how we want to handle persisting this. For
tagandtagFwe can probably just persist the value as it is; since individual metadata uses thestructcodec the special "unset" marker values should get persisted exactly verbatim. For thetagLXvalues do we persist it as two separate boolean values? Or as a tri-state (unset, false, true) value?And for the end user on the Python side, since we want to provide the same "error on get if not set" functionality, we probably don't want the user to be able to see the raw metadata values for any of this; we want to wrap those raw values in some API that handles the "get when unset" functionality, and hide the raw metadata. Maybe. Or maybe we just document this, and let users shoot themselves in the foot if they don't follow the rules.
Pyslim will need to worry about all this. When adding metadata annotations to individuals, their tag metadata should be initialized to "unset", presumably; and there should be some supported way of changing that annotation to a particular value, etc.
So it's a bit of a can of worms, and I think if it's going to happen I need a bit of guidance from you @petrelharp on how you'd like it to be designed. :-> Thanks!