Inconsistency of format for mention of terms in annotations #771
Replies: 10 comments
-
|
The practice of capitalizing the names of classes should be questioned altogether. They aren't proper names, which are the usual case where capitalization is used, and their subsequent use in annotations where they are capitalized isn't proper, as I understand the rules of english grammar. The recommendation for OBO and in subsequent papers on ontology form is that lower case is used for term labels, except, of course, for words that are proper names or abbreviations. I happen to think that there's not going to be a workable automatic rule that will figure out that a phrase in an annotation is referring to a class unless, possibly, if an LLM is used or trained for this task. Instead I think we will need some explicit marker, like braces or pipes to mark them off, and we will want systems that display definitions to turn them into hyperlinks. |
Beta Was this translation helpful? Give feedback.
-
|
A few thoughts:
|
Beta Was this translation helpful? Give feedback.
-
|
I agree with @swartik that the current (albeit inconsistently deployed) convention of capitalizing class labels when they're used in annotations is helpful, that markers might reduce readability, and that CCO could use a copy editor. (And I don't disagree with his other points.) Regarding the first point, though, another way of indicating class labels in annotations might work just as well. I also agree with @alanruttenberg that capitalizing class labels in annotations is in tension with the rules of English grammar. However, notice that organizations are free to use a regimented version of English in which some standard grammatical rules don't apply. (Indeed, this happens often and often involves capitalization; a rental agreement might, for instance, capitalize the first letter of 'lessee' throughout.) This seems to be what we're doing when writing annotations, so I don't think this point of Alan's constitutes a problem with continuing to capitalize class labels. (Though it would be helpful to have the rules of the regimented version of English we're using in annotations explicitly stated somewhere.) That said, I do think there's an issue with capitalizing class labels related to Alan's observation that they aren't proper names. In particular, because of the rules of standard English grammar regarding capitalization, capitalizing class names in annotations makes it seem as though they're functioning there as proper names of classes. But they're not; e.g., in the definition for Brake--
--'Material Artifact' is functioning not as a proper name of the class for which it's a label but as a sortal that applies to instances of that class, contrary to what's suggested by the rules of standard English grammar. These considerations seem to me to suggest that another way of indicating class labels in annotations might be preferable. (I think an all-caps route, though ugly, might work, since I don't think that there's anything in the grammatical rules indicating that a word or phrase in all-caps is functioning as a proper name.) @mark-jensen wrote:
I'd just like to note that such an unambiguous formatting method for object properties would serve another purpose as well. In particular, just as I find the current capitalization convention helpful as a CCO user, so too would I find a way of indicating object property labels helpful. |
Beta Was this translation helpful? Give feedback.
-
|
Design choices generally have constraints...
Class name capitalization in ontologies -
|
Beta Was this translation helpful? Give feedback.
-
|
@BrendaBraitling BFO uses lower case, as is OBO policy and documented in this survey @swartik using separators to delineate class and property is the worse possible solution, except for all the others. Relatively unobtrusive would be to use underscores or backticks. e.g. Geospatial Location: A _geospatial region_ at which an _entity_ or event is located. or A `geospatial region` at which an `entity` or `event` is located. I would intend that these only be visible in naive applications. We would document that new applications that show definition should either display these as hyperlinks or without the separators. As an aside, about that particular definition: entity or event is equivalent to just entity, if event is meant to be a synonym for process. So mentioning event is redundant. |
Beta Was this translation helpful? Give feedback.
-
|
@BrendaBraitling, regarding your third item: we are discussing annotations, which by definition have no semantic meaning. Their purpose is to aid human comprehension. I don't know of any ontology tools that restrict capitalization in natural language annotations. If they exist I'd question their correctness. rdfs:label annotations sometimes take on the semi-official meaning of the preferred name for a class in a document. I've also seen ontologies use annotations instead of data properties, but I consider that bad practice. A counterargument is that LLMs might be skewed if they're fed unusual capitalization. I can't speak to how annotations might get preprocessed. I don't know of any LLM-input tools that are both widely used and designed to read ontologies. Anybody have examples? |
Beta Was this translation helpful? Give feedback.
-
|
@alanruttenberg I didn't use the word "separators". I'm not sure what you thought I intended. Personally I'd vote for guillemets if they weren't so hard to type. |
Beta Was this translation helpful? Give feedback.
-
|
@swartik: I meant what you meant by "surrounding class names with markers" |
Beta Was this translation helpful? Give feedback.
-
|
@alanruttenberg What I meant was akin to your suggestion to use underscores or backticks. |
Beta Was this translation helpful? Give feedback.
-
I personally like the capitalization for functional the reason that it emphasizes that a word is special and the developer should be aware that it is (in theory) defined elsewhere. However, back ticks also fulfill this functional requirement, and they seem the least obtrusive to me (underscores seem too obstructive). But I would like to see a tool that can perform this capability before coming down on any particular marker--I don't want to build around a tool that either doesn't exist in a stable/mature state or that is not available for CCO users. Re LLMs and NLP, back ticks could be stripped more easily from annotations than arbitrary capitalization. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
@BrendaBraitling notes in #573 and @gregfowlerphd in #590 that the convention of capitalizing the labels of classes when they are mentioned in annotations, such as definitions (most commonly) or comments and elucidations, is inconsistently deployed across CCO. It has been raised before in numerous contexts.
This issue shall serve as starting point to resolve this problem before the next release.
@neilotte @johnbeve @APCox @alanruttenberg
The ideal is an unambiguous way of formatting the reference to labels for classes or properties in annotations so that a script can be run across annotations during a build process so that any changes to an element label propagates across all annotations.
Important Note: an acceptable solution is to not use special formatting at all in the annotations if a solution cannot be reasonably implemented at this point.
Beta Was this translation helpful? Give feedback.
All reactions