CLDF Dataset presenting Rubehn et al.'s "Compositional Structures in Numeral Systems Across Languages" from 2025
If you use these data please cite
- the original source
Rubehn, A., C. Rzymski, L. Ciucci, K. Bocklage, A. Kučerová, D. Snee, A. Stephen, K. P. van Dam, and J.-M. List (forthcoming): Annotating and Inferring Compositional Structures in Numeral Systems Across Languages. In: Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP (SIGTYP 2025). 1-13. https://doi.org/10.48550/arXiv.2503.01625
- the derived dataset using the DOI of the particular released version you were using
CLDF dataset providing annotated numeral systems.
This dataset is licensed under a CC-BY-4.0 license
Conceptlists in Concepticon:
- Varieties: 42 (linked to 41 different Glottocodes)
- Concepts: 99 (linked to 99 different Concepticon concept sets)
- Lexemes: 3,464
- Sources: 62
- Synonymy: 1.11
- Invalid lexemes: 0
- Tokens: 40,030
- Segments: 211 (9 BIPA errors, 8 CLTS sound class errors, 199 CLTS modified)
- Inventory size (avg): 23.93
| Name | GitHub user | Description | Role |
|---|---|---|---|
| Arne Rubehn | @arubehn | data annotation, CLDF conversion | Author |
| Christoph Rzymski | @chrzyki | CLDF conversion | Author |
| Luca Ciucci | data annotation | Author | |
| Katja Bocklage | data annotation | Author | |
| Alžběta Kučerová | data annotation | Author | |
| David Snee | data annotation | Author | |
| Kellen Parker van Dam | data annotation | Author | |
| Johann-Mattis List | @lingulist | CLDF conversion, data annotation | Author |
The following CLDF datasets are available in cldf:
- CLDF Wordlist at cldf/cldf-metadata.json