-
Notifications
You must be signed in to change notification settings - Fork 0
Language
According to the TEI
guidelines,
the writing direction of a script should be encoded by the @xml:lang
attribute. Moreover, the languages, that are used in a TEI encoded
document, should be listed in teiHeader/profileDesc/langUsage.
Here is an example header:
<profileDesc xml:id="profileDesc">
<langUsage xml:id="langUsage">
<language ident="ar">Arabisch</language>
<language ident="ar-Latn">Arabisch in Umschrift nach Brockelmann/Wehr</language>
<language ident="de">Deutsch</language>
<language ident="en">Englisch</language>
</langUsage>
</profileDesc>
The framework offers a function for setting the @xml:lang attribute
by selecting a language from the list of languages in the header.
-
Change languageauthor mode action- is available in the Toolbar:
(Note:
The icon was desigend by Onur Mustak Cobanli an is distributed on
http://languageicon.org/ by under a CC
licence with Relax-Attribution term.) - is available through content completion (Return)
- is available in the
TEI P5menu
- is available in the Toolbar:
- content completion is active in text mode
In order to get nice rendering in author mode, you should provide CSS for the used languages through the project specific CSS file. Here is an example:
@namespace xml "http://www.w3.org/XML/1998/namespace";
[xml|lang="ar"] {
direction: rtl !important;
}
[xml|lang="de"] {
direction: ltr !important;
}
[xml|lang="en"] {
direction: ltr !important;
}
[xml|lang="ar-Latn"] {
direction: ltr !important;
}
Do you wonder how to get language codes right? Are ar-Latn,
he-Arab, got-Goth, DE-de-zyyy syntactically correct? (Yes, they
all are!)
The specification is in RFC 5646.
The important part is that part of the grammar in bnf:
langtag = language
["-" script]
["-" region]
*("-" variant)
*("-" extension)
["-" privateuse]
language = 2*3ALPHA ; shortest ISO 639 code
["-" extlang] ; sometimes followed by
; extended language subtags
/ 4ALPHA ; or reserved for future use
/ 5*8ALPHA ; or registered language subtag
extlang = 3ALPHA ; selected ISO 639 codes
*2("-" 3ALPHA) ; permanently reserved
script = 4ALPHA ; ISO 15924 code
region = 2ALPHA ; ISO 3166-1 code
/ 3DIGIT ; UN M.49 code
...
So to be sure, that he-Arab -- hebrew language in arabic script --
is syntactically correct, you will need at least two other documents:
I recommend not to use Unicode codes for bidirectional embeddings or overrides in the TEI source file. Reason: They are not visible directly and though it can easily happen that things get broken. These codes impose a embedded context free grammar (CFG) own their own (like parenthesis expressions). To me, it seems to be a better approach to place pairs of them into CSS or generated HTML, where it is much easier to assert, that there are always pairs of them in use. That's how we go in our edition of arabic poems.