- A language classifier to classify if the given text is in English or Dutch using decision tree and adaboost over the text from Wikipedia.
- Features used to predict the language are deteremined by the
pronouns,parts of speech,frequency of i's,j's and k's(higher in Dutch as compared to English language),frequency of consecutive repeating letters in a wordandaverage length of a word in a given sentence.
Samridhi16/wikipedia-language-classifier
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|