Skip to content

GarrettNicolai/GeneratorsAndAnalyzers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

Runnable Morphological Analysis and Generation Tools

Warning

This software is at an alpha stage.

Prerequisites

DirecTL+

Installation

900 DTL+ models are too large to be hosted on GitHub, so we do not include them here; please contact gnicola2 AT jhu DOT edu for pre-trained models. Uncompress DTL models into models/DTL directory.

tar -xvzf DTLModel.tgz

Set environment variables to point to required binaries.

export DTL=<location of DTL binary>
export CTRANSLATE=<location of ctranslate binary>

Usage

python src/analyze.py -i input.wordlist -a output.analyses -l language -n nBest -d dictionary -g

The input list contains a list of words to either analyze, or a list of lemmas from which to generate.

-n produces the n-best hypotheses for each input form; default is 5.

-d lists a dictionary that can be used in both analysis and generation mode; not the usage examples for differences in usage.

-g will run the system in generation mode, as opposed to analysis mode.

For example:

To analyze a list of Welsh words, and to limit their lemmas to a dictionary of citation forms contained in WelshLemmas.txt:

python analyze.py -i Welsh.toAnalyze -a WelshLemmaPredictions.out -l cym -d WelshLemmas

To generate inflected forms from a list of lemmas, activate the -g flag. The dictionary option can still be used, but instead of lemmas, the dictionary should now contain a list of attested forms, without frequency statistics. Note that the dictionary for the generation task can be used as an input to the analysis task, and vice versa.

python analyze.py -i WelshLemmas -a WelshInflectionPredictions.out -l cym -d Welsh.toAnalyze -g

It is not necessary to provide the location of a DTL model; this information is contained in the configuration file (models.in) in the src directory.

Supported Languages

We currently support more than 900 languages; to see if your language is
supported, please view the file supportedLanguages, which contains
the ISO-639 codes for each language currently supported.

About

Morphological analyzers and generators for more than 900 languages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages