Milestones

feature extraction from PSG-tree data
Load the data of syntax structure from the corpora into a graph database using the importer implementations. Thereafter traverse the sup-graphs for each sentence in various ways to collect tree features like nesting depth, POS n-grams, direct and indirect dominance relationsships (with their occurrence counts) a.s.o., attaching results to the nodes for each sentence. The feature values for each sentence are hereafter aggegated for the sets for sentences grouped by the enclosing corpus to obtain feature vectors by language.
No due date
•0/2 issues closed
0% complete2 open 0 closed
Specify mappings for the annotations tags for the indiviual corpora to OLiA classes
For each input corpus, specifiy a mapping from the string tags for POS and phrase types in that specific corpus to a previously selected subset of OLiA classes.
No due date
•0/1 issues closed
0% complete1 open 0 closed
Corpus importers
Create importer components that parse the different serialisation formats for the corpora, including the annotations for part-of-speech for terminals and the types of non-terminals in the PSG trees.
No due date
•0/1 issues closed
0% complete1 open 0 closed

Provide feedback