Skip to content

Conversation

@alzkuc
Copy link
Collaborator

@alzkuc alzkuc commented Jan 5, 2026

Adding the updated telugu.tsv file with numbers till 99.

  • open a pull request to contribute your data

@alzkuc alzkuc requested a review from arubehn January 5, 2026 15:45
Copy link
Collaborator

@LinguList LinguList left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alzkuc, I just eye-balled your annotations, but I'd argue that there are some things better to be corrected. I could do that, but I'd ask you first to consider the following and see if my arguments are clear: You have heavy inline changes in the teens, like fifty, sixty, etc, where you always drop the sequence d i (in eight and nine), which is dropped in the words for eighty and ninety. Do you really think that it is this linear? It rather seems to me d i is some random suffix that one coudl identify in eight and nine, reflecting some kind of contamination, that makes numerals that are consecutive have a similar sound (elf zwölf, some say even drölf in German). Thus, in short, I think the analysis by now makes too much use of the inline alignments to drop entire sequences but could actually be more apt if making the core numbers until ten not monomorphemic, as they are now.

@LinguList
Copy link
Collaborator

What also helps is considering the contrast with ordinals: https://www.omniglot.com/language/numbers/telugu.htm

@LinguList
Copy link
Collaborator

You have enimidi "eight" vs. enimi-dava (eighth), with -dava indicating ordinals. As most of these cases could only be resolved with high information on the history, which we seem to lack, I'd rather not mark them as being internal cognates. Similar to English -ty having a clear connection to ten, but where we argue against classifying -ty as ten, as we say it is not transparent enough.

@alzkuc alzkuc removed the request for review from arubehn January 6, 2026 10:11
@alzkuc
Copy link
Collaborator Author

alzkuc commented Jan 6, 2026

Thanks @alzkuc, I just eye-balled your annotations, but I'd argue that there are some things better to be corrected. I could do that, but I'd ask you first to consider the following and see if my arguments are clear: You have heavy inline changes in the teens, like fifty, sixty, etc, where you always drop the sequence d i (in eight and nine), which is dropped in the words for eighty and ninety. Do you really think that it is this linear? It rather seems to me d i is some random suffix that one coudl identify in eight and nine, reflecting some kind of contamination, that makes numerals that are consecutive have a similar sound (elf zwölf, some say even drölf in German). Thus, in short, I think the analysis by now makes too much use of the inline alignments to drop entire sequences but could actually be more apt if making the core numbers until ten not monomorphemic, as they are now.

Hi @LinguList, thanks for the comment. I agree that the 8 and 9 would indeed be natural candidates for a polymorphemic structure and strictly linear interpretation of the alignments might be incorrect. In other Dravidian languages, 9, for example, can clearly be analyzed as polymorphemic: oṉpatu (Tamil); ombattu (Kannada) onpatŭ (Malayalam): ONE (less than) TEN (e.g. Tamil ONE=onru, TEN=pattu).

Telugu is, however, synchronically much less transparent. There is no trace of okati/oka/on/ (1) in the form of 9 - tommidi. We could, however, argue that -(i)di is a suffix and relates to TEN too (padi), resulting in something like NINE as a FRACTIONOFTEN? And the same for 8.

What about other core numbers? Mudu (three) vs. Muppai (thirty); Nalugu (four) vs. Nalabai (forty); aidu (five) jabbai (fifty). Would you suggest a polymorphemic analysis here as well with suffixes -du and -gu? I am not sure that the distribution is systematic enough here to justify segmenting these core numerals.

@LinguList
Copy link
Collaborator

I checked the data again, and I would prefer to be much more careful in the analysis with respect to indicated internal cognates.

@LinguList
Copy link
Collaborator

Right now, your analysis vastly suggests that e.g. 80 can be derived from 8. I'd suggest to be much more careful here.

@LinguList
Copy link
Collaborator

Maybe, we make a case of inter-annotator agreement test here, asking somebody else to analyse the data?

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jan 6, 2026

I checked the data again, and I would prefer to be much more careful in the analysis with respect to indicated internal cognates. Right now, your analysis vastly suggests that e.g. 80 can be derived from 8. I'd suggest to be much more careful here.

I think caution is a good idea, and I trust your instinct on this, @LinguList. I’m all in for the inter-annotator agreement test. Who should we ask?

@LinguList
Copy link
Collaborator

I can also try and do it, but this would rather mean annotation in collaboration. Yet I find it also okay at this point of the process. I could then share the results on Thursday in our grad seminar?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants