You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking through this text (Aristotle's Physics) and found a few typos in the metadata. I intend to submit a pull request soon-ish, but documenting typos here.
Oooof, as I was checking possible errors against the source text, I was led down a rabbit-hole of what I think are two systematic errors. The 'False/Extra Line Number at Chapter Start' should be easy enough to regex fix. The '<lb> usage/numbering' question, if it is an error and not just my poor assumption on vocabulary meanings, would be a bit more difficult to fix.
This metadata is useful for a project I am doing, but I am curious - how critical are the line beginning marks for the Scaife viewer and other interfaces that use this data? Are there other digitized-to-text sources of this Greek text (with line numbers) that I could use in the meantime?
I don't have the time to correct all this right now, but I wanted to document this and I also wanted to get feedback on the two bigger systematic errors/questions I had before sinking time into them.
Ok, I missed this link until later http://digital.slub-dresden.de/id416133894 at the head of the tei file, but that page is painfully slow, and I don't want to update all the links below now. A quick spot-check of a few pages show the same Greek text pages here and on Internet Archive (but no intro etc as in Internet Archive edition)
Oh, hmmm, and this seems like it is a systematic error. I was looking for this error above to get the line permalink, and instead I first found this error below and others like it. I'd imagine most chapter number marks have this problem?
TODO: look at other subtype="chapter" lines for spurious lb tags This did affect many chapters, fixed with regex, see commit message in linked PR.
I've included more context because this looks like an error on all right-hand side pages (that I checked) of transcribing this source text (in the file, the even numbered pbs)
Important
TODO: see if this end-of-line vs beginning-of-line error occurs on all right-hand pages of this text
A quick scan of this text versus source images shows:
This error occurs with the <note type="marginal"> marks as well on right-hand side pages, depending on how those marks are used/interpreted. I was taking these to be an implied 'beginning of line number 1 of page 207 column a'. But as the tei file is now, it is technically correct - the text of the marginal note is correct, and the marginal note is correctly placed according to the source text. It is only 'wrongly' placed if we assume it to mark the beginning of the line. If all the 'incorrect' lb tags on the even pb pages were instead just <note type="marginal"> types, then they too would be in the correct place.
So the bigger question is: Is the intent of this file to accurately encode the location of marginal notes according to the source text? Or to accurately mark and number the beginning of lines of text? Both could be done, but not sure how these files are used/interpreted downstream. e.g. on left-hand side pages this tei file could, for marked line numbers, say: <note type="marginal">15</note><lb n="15"/>... and right-hand side pages would say: <lb n="20"/> καὶ τὸ πᾶν ἐν ἑαυτῷ ἔχειν, διὰ τὸ ἔχειν τινὰ ὁμοιότητα <note type="marginal">20</note>
assuming that: if a word is split over a line ending, then place the lb mark at the beginning of the word (and un-split it in this tei file). If the lb mark should go after the end of the split word, then it should be: ἀριθμόν<lb n="33"/>). <lg> ? see: https://archive.org/details/aristotlesphysic0000wdro/page/180/mode/1up
This is technically a marginal note, but is really a line number where the source text has a deletion marked. Not sure if anything really needs to be done for this, it doesn't seem to mess up the Scaife Viewer of this portion
Hello!
I was looking through this text (Aristotle's Physics) and found a few typos in the metadata. I intend to submit a pull request soon-ish, but documenting typos here.
Oooof, as I was checking possible errors against the source text, I was led down a rabbit-hole of what I think are two systematic errors. The 'False/Extra Line Number at Chapter Start' should be easy enough to regex fix. The '
<lb>usage/numbering' question, if it is an error and not just my poor assumption on vocabulary meanings, would be a bit more difficult to fix.This metadata is useful for a project I am doing, but I am curious - how critical are the line beginning marks for the Scaife viewer and other interfaces that use this data? Are there other digitized-to-text sources of this Greek text (with line numbers) that I could use in the meantime?
I don't have the time to correct all this right now, but I wanted to document this and I also wanted to get feedback on the two bigger systematic errors/questions I had before sinking time into them.
Source Text
Going off of:
First1KGreek/data/tlg0086/tlg031/__cts__.xml
Line 5 in 6812dab
I am using this Internet Archive edition: https://archive.org/details/aristotlesphysic0000wdro/ (and also https://archive.org/details/aristotelisopera01arisrich as a rough check on text & line numbers)
Ok, I missed this link until later http://digital.slub-dresden.de/id416133894 at the head of the tei file, but that page is painfully slow, and I don't want to update all the links below now. A quick spot-check of a few pages show the same Greek text pages here and on Internet Archive (but no intro etc as in Internet Archive edition)
✅ False/Extra Line Number at Chapter Start
First1KGreek/data/tlg0086/tlg031/tlg0086.tlg031.1st1K-grc1.xml
Lines 1595 to 1597 in 6812dab
on line 1596:
<lb n="6"/>should be removed, it is the chapter number (which is captured in metadata one line above) See: https://archive.org/details/aristotlesphysic0000wdro/page/178/mode/1upOh, hmmm, and this seems like it is a systematic error. I was looking for this error above to get the line permalink, and instead I first found this error below and others like it. I'd imagine most chapter number marks have this problem?
TODO: look at otherThis did affect many chapters, fixed with regex, see commit message in linked PR.subtype="chapter"lines for spurious lb tagsFirst1KGreek/data/tlg0086/tlg031/tlg0086.tlg031.1st1K-grc1.xml
Lines 418 to 422 in 6812dab
on line 419:
<lb n="6"/>should be removed, it is the chapter number (which is captured in metadata one line above)✅ Line number typo
First1KGreek/data/tlg0086/tlg031/tlg0086.tlg031.1st1K-grc1.xml
Line 1669 in 6812dab
should be line begin
20, not 10. This<lb n="10"/>appears between a line 15 and then 25, so I assumed it should be 10. But I went to source to make sure it wasn't a typo there or something. And indeed, it should be20- see https://archive.org/details/aristotlesphysic0000wdro/page/181/mode/1up and https://archive.org/details/aristotelisopera01arisrich/page/207/mode/1up But those sources led me to another question...❓
<lb>usage/numbering?Is
<lb>'line beginning' as here: https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-lb.html ? Because I think thelbtags in this section of text (same as above, just more lines of context here) are misplaced:First1KGreek/data/tlg0086/tlg031/tlg0086.tlg031.1st1K-grc1.xml
Lines 1661 to 1670 in 6812dab
the lb tag should corrected and moved to instead be like so (code-lines 1668-1669):
I've included more context because this looks like an error on all right-hand side pages (that I checked) of transcribing this source text (in the file, the even numbered
pbs)Important
TODO: see if this end-of-line vs beginning-of-line error occurs on all right-hand pages of this text
A quick scan of this text versus source images shows:
pb59: correct, haslbmarks at the beginning of the lines. see: https://archive.org/details/aristotlesphysic0000wdro/page/180/mode/1uppb60: incorrect, haslbmarks at end of line of text (said another way, all lb numbers are off by one, e.g. lb 15 would be accurate if it was instead lb 16). see: https://archive.org/details/aristotlesphysic0000wdro/page/181/mode/1uppb61: correct, haslbmarks at the beginning of the lines. see: https://archive.org/details/aristotlesphysic0000wdro/page/182/mode/1uppb62: incorrect, haslbmarks at end of line of text. see: https://archive.org/details/aristotlesphysic0000wdro/page/183/mode/1upThis error occurs with the
<note type="marginal">marks as well on right-hand side pages, depending on how those marks are used/interpreted. I was taking these to be an implied 'beginning of line number 1 of page 207 column a'. But as the tei file is now, it is technically correct - the text of the marginal note is correct, and the marginal note is correctly placed according to the source text. It is only 'wrongly' placed if we assume it to mark the beginning of the line. If all the 'incorrect'lbtags on the evenpbpages were instead just<note type="marginal">types, then they too would be in the correct place.So the bigger question is: Is the intent of this file to accurately encode the location of marginal notes according to the source text? Or to accurately mark and number the beginning of lines of text? Both could be done, but not sure how these files are used/interpreted downstream. e.g. on left-hand side pages this tei file could, for marked line numbers, say:
<note type="marginal">15</note><lb n="15"/>...and right-hand side pages would say:<lb n="20"/> καὶ τὸ πᾶν ἐν ἑαυτῷ ἔχειν, διὰ τὸ ἔχειν τινὰ ὁμοιότητα <note type="marginal">20</note>Also ref: https://archive.org/details/aristotelisopera01arisrich/page/206/mode/1up, https://archive.org/details/aristotelisopera01arisrich/page/207/mode/1up
✅ Line Number in incorrect place
First1KGreek/data/tlg0086/tlg031/tlg0086.tlg031.1st1K-grc1.xml
Lines 1648 to 1652 in 6812dab
should be:
assuming that: if a word is split over a line ending, then place the
lbmark at the beginning of the word (and un-split it in this tei file). If thelbmark should go after the end of the split word, then it should be:ἀριθμόν<lb n="33"/>). <lg>? see: https://archive.org/details/aristotlesphysic0000wdro/page/180/mode/1up❓ Confusing Marginal Note
First1KGreek/data/tlg0086/tlg031/tlg0086.tlg031.1st1K-grc1.xml
Line 1615 in 6812dab
This is technically a marginal note, but is really a line number where the source text has a deletion marked. Not sure if anything really needs to be done for this, it doesn't seem to mess up the Scaife Viewer of this portion
See: https://archive.org/details/aristotlesphysic0000wdro/page/179/mode/1up and
actual version used: https://digital.slub-dresden.de/werkansicht?tx_dlf%5Bid%5D=109594&tx_dlf%5Bpage%5D=67