Languages found, never lost.
An open corpus for the endangered and underrepresented languages of East Africa — built by the speakers, for the speakers and the tools that will serve them.
Loba is a community-built, openly licensed dataset of words, phrases, proverbs, and sentences in the languages of East Africa. It starts with Dholuo and is designed to grow.
The corpus exists so that developers, researchers, and educators can build spell checkers, translation tools, dictionaries, and AI assistants that actually work for Luo speakers — and eventually speakers of Kikuyu, Kamba, Kalenjin, and more.
| Language | Entries | Status |
|---|---|---|
| Dholuo | building | 🟢 Active |
| Kikuyu | — | 🔜 Planned |
| Kamba | — | 🔜 Planned |
- Code: MIT
- Data: CC BY 4.0
Read CONTRIBUTING.md to get started. No linguistics degree required — if you speak, you qualify.
Questions, ideas, and discussions live in GitHub Discussions.