Skip to content

Copy cs to sk for prototyping#132

Draft
Adrijaned wants to merge 2 commits into
common-voice:mainfrom
Adrijaned:patch-1
Draft

Copy cs to sk for prototyping#132
Adrijaned wants to merge 2 commits into
common-voice:mainfrom
Adrijaned:patch-1

Conversation

@Adrijaned
Copy link
Copy Markdown

Mainly to get the extraction running and to get an idea how much more work will need to be done.

Comment thread src/rules/sk.toml Outdated
@@ -0,0 +1,17 @@
allowed_symbols_regex="[A-Za-zěščřžýáíéóďťňúůĚŠČŘŽÝÁÍÉÓĎŤŇäöüÚ‚–\\. \"„“]"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed_symbols_regex="[A-Za-zěščŕřžýáíéóôďťňúůĺľÁÄĚŠČŔŘŽÝÁÍÉÓÔĎŤŇĹĽäöüÚ‚–\. "„“]"

Comment thread src/rules/sk.toml Outdated
needs_uppercase_start = true
even_symbols = ["\""]
broken_whitespace = [" ", " ,", " .", " ?", " !", " ;"]
abbreviation_patterns = ["[A-ZĚŠČŘŽÝÁÍÉĎŤŇÓÚ]+\\.*[a-z]*[A-ZĚŠČŘŽÝÁÍÉĎŤŇÓÚ]+", "atd\\.", "\\baj\\.", "tj\\.", "\\brec\\.", "[nN]apř\\.",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abbreviation_patterns = ["[A-ZĹĽĚŠČŔŘŽÝÁÍÉĎŤŇÓÔÚ]+\.[a-z][A-ZĹĽĚŠČŔŘŽÝÁÍÉĎŤŇÓÔÚ]+", "a i\.", "a pod\.", "atď\.", "\baj\.", "tj\

.", "\brec\.", "[nN]apr\.",
""."", "\s[^aikosuvzáó]\s", "zkr\.", "[Tt]zv\.", "[dD]r\.", "\b[aAeE]d\.", "\b[sS]?[tT]r\.", "[aA]rch\.", "Inc\.", "Ltd\.", "[pP]opr\.",
"\b[fF]r\.", "\b[A-Z]+DR\b", "[pP]ozn\.", "[sS]rov\.", "\b[eE][a-z]\.", "[zZ]ejm\.", "[JS]r\.", "\b[lL][lL]",
"Mgr\.", "[mM]j\.", "\b[sS]tol\.", "\b[pP]ol\.", "Ing\.", "[cCkK]pt\.", "\b[lL]t\.", "Mr?s?\.", "\s[^\\s]{1,2}\.", "\bviz\.", "\b[sS]at\."]

@Adrijaned
Copy link
Copy Markdown
Author

Blocklist generated from words of frequency 60 and lower

@Hrano
Copy link
Copy Markdown

Hrano commented Dec 14, 2020

Downloaded and sent for review to five native speakers.
Corrects error sentences in the second column next to it, in xls format.
Will it be OK like this?

@MichaelKohler MichaelKohler marked this pull request as draft March 14, 2021 13:49
@MichaelKohler
Copy link
Copy Markdown
Member

Sorry, I missed that comment.

Corrects error sentences in the second column next to it, in xls format.
Will it be OK like this?

No. We can't accept corrected sentences, because we need to run a new, fresh export once the rules are added. This is needed to make sure that we fulfil all legal requirements. As sentences are picked at random, any changes to them would be lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants