This repo contains various datasets in Georgian for NLP or other purposes. These are entire text of "The Knight with the Panther skin" vefxistyaosani.txt, Georgian aphorisms aforizmebi.txt, first and last names of Georgian poets and writers poetswriters.txt, baby names in Georgian names.csv (ยฉ kids.ge), and full Georgian Alphabet anbani.csv with corresponding descriptions of the letters as it appears in Unicode.
Some of these datasets were fed to Neural Networks (char-rnn by Andrej Karpathy) to generate fake data, such as fake-aforizmebi.txt, fake-names.txt trained on Georgian (origin) subset, fake-poetswriters.txt.
| Name | Description | Source | Lines | URL |
|---|---|---|---|---|
| vefxistyaosani.csv | Labeled text of "The Knight with the Panther skin" | 6678 | GET | |
| quotes.csv | Quotes from 184 famous people in Georgian | ka.wikiquote.org | 3683 | GET |
| aforizmebi.txt | Georgian aphorisms | various sources | 132 | GET |
| poetswriters.txt | First and Last names of Georgian Poets and Writers | ka.wikipedia.org | 544 | GET |
| names.csv | Baby names in Georgian with various origins | kids.ge ยฉ | 2094 | GET |
| anbani.csv | Full Georgian alphabet with descriptions and char codes | unicode.org | 175 | GET |
| vefxistyaosani.txt | Raw text of "The Knight with the Panther skin" | 8524 | GET |
| Name | Description | Source | Lines | URL |
|---|---|---|---|---|
| fake-aforizmebi.txt | Georgian aphorisms generated using char-rnn | anbani.db | 17047 | GET |
| fake-poetswriters.txt | Fake poetic names trained on Georgian poets and writers | anbani.db | 2514 | GET |
| fake-names.csv | Fake names trained on Georgian subset of baby names | anbani.db | 60961 | GET |
| fake-vefxistyaosani.txt | Char-RNN mimicking Shota Rustaveli (not well) | anbani.db | 26032 | GET |
Here are some of the resources you might like.
Fake Georgian text and names generation is supported by anbani.js - a multifunctional Javascript library for working with Georgian Alphabet. Read more about the package here [anbani / anbani.js]
npm install anbanivar anbani = require('anbani')
anbani.core.convert("แแแแแแ", "แแฎแแแ แฃแแ", "แแกแแแแแแ แฃแแ")
// 'แ แฌแกแ แฌแจ'
anbani.lorem.names(3)
// ['แแแแแ แแแจแแแแแแ', 'แกแแแ แงแแ แแแแแ', 'แแแแแ แแแฌแแจแแแแ']
anbani.lorem.sentences(10)
// 'แแแแฎแแแแแก แกแแขแแ แแ แแแจแแฎแแ แแ. แฌแแแแ แแแแแแ แแแ, แฐแฅแแแแแแแกแแแ แแแแ แฃแคแแ, แฃแแ แแแแแแแแแ แแแกแแแแชแ แแงแแแแ.'For other awesome Georgian datasets, visit [bumbeishvili / awesome-georgian-datasets]
Datasets are available freely for non-commercial purposes only. For commercial purposes, contact the corresponding source.