Skip to content

incorrect breathing marks for some lemmas beginning with vowels, given both as a combining mark and as part of a composed character #37

@bcrowell

Description

@bcrowell

This is an error that is easy to miss by eye but that causes problems with processing the xml. The first example I came across was at Iliad 5.719, where the proper noun Ἀθήνη is lemmatized as the following unicode string: 787 7936 952 ... Here 787 is a combining comma above, and 7936 is an alpha with a smooth breathing mark. So you have the breathing mark in there twice: once as a combining character and once built into the composed ἀ. If you view the string on a screen, the result will depend a little on what software is rendering it. For example, in the terminal program I use, the combining comma is almost on top of the breathing mark, so it looks like a slightly fatter breathing mark.

This seems to occur repeatedly, but not 100% of the time, for the following lemmas representing proper names (which are in the xml as lowercase): ἀπόλλων, ἀλέξανδρος, ἀφροδίτη, ἀτρείδης, ἀθήνη, ἀχαιός, ἀνδρομάχη, ἰδομενεύς, ὠκεανός, ὀδυσσεύς, ἀσκληπιάδης, ἀντίλοχος, ἀχιλλεύς, ἀγχίσης, ἀλκίνοος, ἀρήτη, ὠγυγία.

Also: ἐνψύω at iliad 8.382, and some other non-proper nouns: ἐννοσίγαιος, ἀμφίμαχος.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions