Skip to content

Commit ca71a78

Browse files
committed
trying with .rst...
1 parent b445726 commit ca71a78

3 files changed

Lines changed: 90 additions & 4 deletions

File tree

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
Statistics about word frequency in different languages based on a corpus of
1+
2+
Statistics about word frequencies in different languages based on a corpus of
23
movie subtitles as extracted by the Frequency Words (https://github.com/hermitdave/FrequencyWords) project.
34

45
Currently supported languages:

README.rst

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
Statistics about word frequencies in different languages based on a
2+
corpus of movie subtitles as extracted by the `Frequency Words`_
3+
project.
4+
5+
Currently supported languages:
6+
7+
::
8+
9+
"da", "de", "el", "en", "es", "fr", "it", "nl", "no", "pl", "pt", "ro", "zh-CN"
10+
11+
Usage Examples
12+
~~~~~~~~~~~~~~
13+
14+
Getting the info about a given word
15+
'''''''''''''''''''''''''''''''''''
16+
17+
::
18+
19+
>> from wordstats import Word
20+
>> print (Word.stats('bleu', 'fr'))
21+
bleu: (lang: fr, rank: 1521, freq: 9.42, imp: 9.42, diff: 0.03, klevel: 2)
22+
23+
Comparing the difficulty of two German words
24+
''''''''''''''''''''''''''''''''''''''''''''
25+
26+
::
27+
28+
>> from wordstats import Word
29+
>> Word.stats('blauzungekrankenheit','de').difficulty > Word.stats('blau','de').difficulty
30+
True
31+
32+
Top 10 most used words in Dutch
33+
'''''''''''''''''''''''''''''''
34+
35+
::
36+
37+
>> from wordstats import LanguageInfo
38+
>> Dutch = LanguageInfo.load('nl')
39+
>> print(Dutch.all_words()[:10])
40+
['ik', 'je', 'het', 'de', 'dat', 'is', 'een', 'niet', 'en', 'van']
41+
42+
Words common across all the languages
43+
'''''''''''''''''''''''''''''''''''''
44+
45+
Given that the corpus is based on subtitles, some common names have
46+
sliped in. The ``common_words()`` function returns a list.
47+
48+
::
49+
50+
>> from wordstats.common_words import common_words
51+
>> for each in common_words():
52+
>> if len(each) > 9:
53+
>> print(each)
54+
washington
55+
christopher
56+
enterprise
57+
58+
Words that are the same in Polish and Romanian
59+
''''''''''''''''''''''''''''''''''''''''''''''
60+
61+
::
62+
63+
>> from wordstats import LanguageInfo
64+
>> Polish = LanguageInfo.load("pl")
65+
>> Romanian = LanguageInfo.load("ro")
66+
>> for each in Polish.all_words():
67+
>> if each in Romanian.all_words():
68+
>> if len(each) > 5 and each not in common_words():
69+
>> print(each)
70+
telefon
71+
moment
72+
prezent
73+
interes
74+
...
75+
76+
Installation
77+
~~~~~~~~~~~~
78+
79+
::
80+
81+
pip install wordstats
82+
83+
.
84+
85+
.. _Frequency Words: https://github.com/hermitdave/FrequencyWords

setup.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,21 +22,21 @@ def package_files(directory):
2222

2323
extra_files = package_files('wordstats/language_data/')
2424

25-
with open('README.md') as f:
25+
with open('README.rst') as f:
2626
long_description = f.read()
2727

2828
setuptools.setup(
2929
name="wordstats",
3030
packages=setuptools.find_packages(),
31-
version="1.0.3",
31+
version="1.0.4",
3232
license="MIT",
3333
description="Multilingual word frequency statistics for Python based on subtitles corpora",
3434
long_description=long_description,
3535
long_description_content_type='text/markdown',
3636
author="Mircea Lungu",
3737
author_email="me@mir.lu",
3838
url="https://github.com/zeeguu-ecosystem/Python-Wordstats",
39-
download_url="https://github.com/zeeguu-ecosystem/Python-Wordstats/archive/v_1.0.3.tar.gz",
39+
download_url="https://github.com/zeeguu-ecosystem/Python-Wordstats/archive/v_1.0.4.tar.gz",
4040
include_package_data=True,
4141
zip_safe=False,
4242
keywords="natural language processing, multilingual",

0 commit comments

Comments
 (0)