Skip to content

numeralbank/cosinus

Repository files navigation

CLDF Dataset presenting Rubehn et al.'s "Compositional Structures in Numeral Systems Across Languages" from 2025

How to cite

If you use these data please cite

  • the original source

    Rubehn, A., C. Rzymski, L. Ciucci, K. Bocklage, A. Kučerová, D. Snee, A. Stephen, K. P. van Dam, and J.-M. List (forthcoming): Annotating and Inferring Compositional Structures in Numeral Systems Across Languages. In: Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP (SIGTYP 2025). 1-13. https://doi.org/10.48550/arXiv.2503.01625

  • the derived dataset using the DOI of the particular released version you were using

Description

CLDF dataset providing annotated numeral systems.

This dataset is licensed under a CC-BY-4.0 license

Conceptlists in Concepticon:

Statistics

Glottolog: 100% Concepticon: 100% Source: 100% BIPA: 96% CLTS SoundClass: 96%

  • Varieties: 42 (linked to 41 different Glottocodes)
  • Concepts: 99 (linked to 99 different Concepticon concept sets)
  • Lexemes: 3,464
  • Sources: 62
  • Synonymy: 1.11
  • Invalid lexemes: 0
  • Tokens: 40,030
  • Segments: 211 (9 BIPA errors, 8 CLTS sound class errors, 199 CLTS modified)
  • Inventory size (avg): 23.93

Contributors

Name GitHub user Description Role
Arne Rubehn @arubehn data annotation, CLDF conversion Author
Christoph Rzymski @chrzyki CLDF conversion Author
Luca Ciucci data annotation Author
Katja Bocklage data annotation Author
Alžběta Kučerová data annotation Author
David Snee data annotation Author
Kellen Parker van Dam data annotation Author
Johann-Mattis List @lingulist CLDF conversion, data annotation Author

CLDF Datasets

The following CLDF datasets are available in cldf: