Skip to content

MorSeg study scope #11

@arubehn

Description

@arubehn

@LinguList as discussed, here's a list of algorithms I would like to compare in terms of how well they perform on small wordlists.

Baselines

Segmentation Algorithms

Morfessor and Linguistica are already available as Python packages which seem to be actively maintained, and there is an open Python implementation for MorphAGram as well. The other algorithms seem to be fairly easy to implement.

I am especially interested in MorphAGram and the "Square Entropy" methods, since they are the only ones I could find that actually test their methods on small wordlists with ~1,000 items. The other methods listed above are frequently mentioned in the literature and seem to be fairly established, and they have the obvious advantage of already coming as Python packages. There are some other methods that could be interesting later on, but I would focus on these ones first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions