Skip to content

Milestones

List view

  • Load the data of syntax structure from the corpora into a graph database using the importer implementations. Thereafter traverse the sup-graphs for each sentence in various ways to collect tree features like nesting depth, POS n-grams, direct and indirect dominance relationsships (with their occurrence counts) a.s.o., attaching results to the nodes for each sentence. The feature values for each sentence are hereafter aggegated for the sets for sentences grouped by the enclosing corpus to obtain feature vectors by language.

    No due date
    0/2 issues closed
  • For each input corpus, specifiy a mapping from the string tags for POS and phrase types in that specific corpus to a previously selected subset of OLiA classes.

    No due date
    0/1 issues closed
  • Create importer components that parse the different serialisation formats for the corpora, including the annotations for part-of-speech for terminals and the types of non-terminals in the PSG trees.

    No due date
    0/1 issues closed