From a5e8496d0df216e7a2f1b26e185fccaf9e39f8ee Mon Sep 17 00:00:00 2001 From: David Alexander Date: Fri, 20 Oct 2017 16:14:49 -0400 Subject: [PATCH 1/6] Adds default reviewers for pull requests --- CODEOWNERS | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 CODEOWNERS diff --git a/CODEOWNERS b/CODEOWNERS new file mode 100644 index 0000000..469c317 --- /dev/null +++ b/CODEOWNERS @@ -0,0 +1,9 @@ +# Each line is a file pattern followed by one or more owners. + +# These owners will be the default owners for everything in the repo. +# Unless a later match takes precedence, @global-owner1 and @global-owner2 +# will be requested for review when someone opens a pull request. +* @itsthejoker + +*.sh @thelonelyghost +test/* @thelonelyghost From 2eb1cc65647a757661f82964b72fabd79ad29529 Mon Sep 17 00:00:00 2001 From: David Alexander Date: Fri, 20 Oct 2017 16:38:23 -0400 Subject: [PATCH 2/6] Updates docs to be more consistent across projects --- CONTRIBUTING.md | 52 ++++++++++++++++++++++++++++++++----------------- README.md | 31 ++++++++++++++++------------- 2 files changed, 52 insertions(+), 31 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 7dcfc8f..2142bc7 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,4 +1,4 @@ -[![Stories in Ready](https://badge.waffle.io/TranscribersOfReddit/ToR_OCR.png?label=ready&title=Ready)](http://waffle.io/TranscribersOfReddit/ToR_OCR) +[![Waffle.io - Columns and their card count](https://badge.waffle.io/TranscribersOfReddit/TranscribersOfReddit.svg?columns=all)](http://waffle.io/TranscribersOfReddit/TranscribersOfReddit) # Contributing @@ -14,43 +14,59 @@ Do your best to check as many of these boxes as you can and everything will be f ## Issues Any bugs you find, features you want to request, or questions you have should go in the -repository's [issues section](https://github.com/TranscribersOfReddit/ToR_Archivist/issues). +repository's [issues section](https://github.com/TranscribersOfReddit/ToR_OCR/issues). Please, be kind and search through both open and closed issues to make sure your question or bug report hasn't already been posted and resolved. ## Development -After checking out the repo, run `bin/run setup` to install native dependencies. +Initial setup: -To install this package locally, setup a virtualenv environment and run `pip install -e .` -from the project root. To make sure you have everything setup correctly, run `bin/run test` -and it _should_ pass entirely. +```bash +# Clone the repository +$ git clone git@github.com:TranscribersOfReddit/ToR_OCR.git tor_ocr +$ cd ./tor_ocr -In case you get tired of prefixing `bin/` to the `run` script here, [Tim Pope's method](https://twitter.com/tpope/status/165631968996900865) -of safely adding a script to your PATH is recommended. +# Setup sandbox +$ virtualenv --no-site-packages --python=python3 venv +$ source ./venv/bin/activate + +# Install the project in "editable" mode +$ pip install --process-dependency-links -e .[dev] +``` + +In case there are any tests, they would be run by calling `python setup.py test`. ## Testing -This project has (some) automated test coverage, so be sure to check that tests are passing -_before_ you begin development. Our emphasis is on stability here, so if tests aren't passing, -that's a bug. +This project is expected to have automated test coverage, so be sure to check that tests +are passing _before_ you begin development. Our emphasis is on stability here, so if tests +aren't passing, that's a bug. ### Stability As noted before, make sure tests are passing before starting. If you have difficulty getting to that stable, initial state, reach out by opening an issue (see [Issues](#Issues) above). -This is considered a failing by the maintainers if instructions are less than absolutely -clear. Any feedback is helpful here! +This is considered a failure by the maintainers if instructions are less than absolutely +clear. Feedback is very helpful here! ### Writing tests -Tests are written using `pytest` because it allows for simple decorators to modify when to -run certain tests and the output is much prettier than `unittest`. We invoke the full test -suite by calling `bin/run test`. +Tests are written using `pytest` for a variety of reasons. Some of which are: -At the moment, the test suite should run quickly, but that won't always be the case. Running +- easy assertions that an exception will be thrown and the message it contains +- skipping some tests for stated reasons +- marking some tests as expected to fail +- colorized output compared to `unittest` + +We should be able to invoke the full test suite by calling either `python setup.py test` or +`pytest` from the terminal. + +The test suite should run quickly at the moment, but that won't always be the case. Running individual tests with `pytest path/to/test/file.py` is also acceptable while actively -developing. Please note: a pull request should always have a fully passing test suite. +developing. + +> **NOTE:** a pull request should always have a fully passing test suite. ## Pull Requests diff --git a/README.md b/README.md index 409a418..53896be 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,20 @@ -[![Stories in Ready](https://badge.waffle.io/TranscribersOfReddit/ToR_OCR.png?label=ready&title=Ready)](http://waffle.io/TranscribersOfReddit/ToR_OCR) +[![Waffle.io - Columns and their card count](https://badge.waffle.io/TranscribersOfReddit/TranscribersOfReddit.svg?columns=all)](http://waffle.io/TranscribersOfReddit/TranscribersOfReddit) [![BugSnag](https://img.shields.io/badge/errors--hosted--by-Bugsnag-blue.svg)](https://www.bugsnag.com/open-source/) # Apprentice Bot - Transcribers Of Reddit -This is the source code for Apprentice Bot (`/u/transcribot`). It forms one part -of the team that assists in the running or /r/TranscribersOfReddit (ToR), which -is privileged to have the incredibly important job of organizing crowd-sourced -transcriptions of images, video, and audio. +This is the source code for a helper bot, making attempts at transcribing content as +it is posted to the subreddit /r/TranscribersOfReddit, a community dedicated to +transcribing images, audio, and video. It acts under the username "/u/transcribot". -As a whole, the ToR bots are designed to be as light on local resources as they -can be, though there are some external requirements. +This bot is still in training and might not be able to recognize everything it +attempts. Some transcriptions might be complete trash, but the hope is that it will +be a start to a more legitimate, volunteer-written transcription. -- Redis (tracking completed posts and queue system) -- Tesseract (OCR solution) +## Resources + +Redis (tracking completed posts and queue system) +Tesseract (OCR solution) > **NOTE:** > @@ -23,14 +25,15 @@ can be, though there are some external requirements. ## Installation ``` -$ git clone https://github.com/TranscribersOfReddit/ToR_OCR.git tor-ocr -$ pip install --process-dependency-links tor-ocr/ +$ git clone https://github.com/TranscribersOfReddit/ToR_OCR.git tor_ocr +$ cd tor_ocr/ +$ pip install --process-dependency-links . ``` OR ``` -$ pip install --process-dependency-links 'git+https://github.com/TranscribersOfReddit/ToR_OCR.git@master#egg=tor_ocr' +$ pip install --process-dependency-links 'git+https://github.com/TranscribersOfReddit/ToR_OCR.git@master#egg=tor_ocr-0' ``` ## High-level functionality @@ -41,9 +44,11 @@ Monitoring daemon (via Redis queue): - Download image - OCR the image - If OCR successful: - - Post OCR-ed content to post on /r/TranscribersOfReddit in 9000 character chunks + - Post OCR-ed content to post on /r/TranscribersOfReddit in 9000 character chunks, replying to previous comment when [over 9000][over-9000] characters - Delete local copy of image +[over-9000]: https://tenor.com/view/dragonball-z-super-saiyan-charging-yelling-gif-4987448 + ### Running Apprentice Bot ``` From 51abd3bae29377fe34e12d3c690a6341d5cf3397 Mon Sep 17 00:00:00 2001 From: David Alexander Date: Fri, 20 Oct 2017 16:40:15 -0400 Subject: [PATCH 3/6] Keeps consistent style of setup.py across projects --- setup.py | 35 ++++++++++++++++++++--------------- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/setup.py b/setup.py index d1574cd..d339385 100644 --- a/setup.py +++ b/setup.py @@ -21,7 +21,7 @@ def initialize_options(self): def run_tests(self): import shlex - # import here, cause outside the eggs aren't loaded + # import here, because outside the eggs aren't loaded import pytest errno = pytest.main(shlex.split(self.pytest_args)) sys.exit(errno) @@ -35,10 +35,19 @@ def long_description(): return f.read() +test_deps = [ + 'pytest', + 'pytest-cov', +] +dev_helper_deps = [ + 'better-exceptions', +] + + setup( name='tor_ocr', version=__version__, - description='', + description='An AI attempting to transcribe contents of /r/TranscribersOfReddit', long_description=long_description(), url='https://github.com/TranscribersOfReddit/ToR_OCR', author='Joe Kaufeld', @@ -56,29 +65,25 @@ def long_description(): 'Programming Language :: Python :: 3.6', ], keywords='', - packages=find_packages(exclude=['test*', 'bin/*']), + packages=find_packages(exclude=['test', 'test.*', '*.test.*', '*.test']), + cmdclass={'test': PyTest}, test_suite='test', entry_points={ 'console_scripts': [ 'tor-apprentice = tor_ocr.main:main', ], }, - tests_require=[ - 'pytest', - ], - cmdclass={'test': PyTest}, + extras_require={ + 'dev': test_deps + dev_helper_deps, + }, + tests_require=test_deps, install_requires=[ - 'praw==5.0.1', - 'redis<3.0.0', - 'tor_core>=0.2.0,<0.3.0', - 'addict', + 'tor_core', 'tesserocr', - 'wget', - 'sh', - 'bugsnag', 'cython', # WORKAROUND: 'tesserocr' only sometimes installs this dependency + 'wget', ], dependency_links=[ - 'git+https://github.com/TranscribersOfReddit/tor_core.git@master#egg=tor_core-0.2.0', + 'git+https://github.com/TranscribersOfReddit/tor_core.git@master#egg=tor_core-0', ], ) From a66bcbd12de5112249a9b85cf483ee843ab2edde Mon Sep 17 00:00:00 2001 From: David Alexander Date: Fri, 20 Oct 2017 16:41:30 -0400 Subject: [PATCH 4/6] bin/run -> Makefile for dev tasks Docs make no mention of `make` for a reason. In this project it is intended to be a helper, not a development requirement. Docs should cover the implementation and `make` should make it easy. --- Makefile | 12 ++++++++++ bin/run | 62 ------------------------------------------------ test/__init__.py | 0 3 files changed, 12 insertions(+), 62 deletions(-) create mode 100644 Makefile delete mode 100755 bin/run create mode 100644 test/__init__.py diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..529f94c --- /dev/null +++ b/Makefile @@ -0,0 +1,12 @@ +.PHONY: clean all test + +all: test clean + @true + +clean: + @# echo "Removing \`*.pyc', \`*.pyo', and \`__pycache__/'" + @find . -regex '.+/[^/]+\.py[co]$$' -delete + @find . -regex '.+/__pycache__$$' -exec rm -rf {} \; -prune + +test: clean + @python setup.py test diff --git a/bin/run b/bin/run deleted file mode 100755 index 7f4fe1a..0000000 --- a/bin/run +++ /dev/null @@ -1,62 +0,0 @@ -#!/usr/bin/env bash - -if [ $# -gt 0 ]; then - action="$1"; shift -else - action="test" -fi - - -install_native_libs() { - local required_libs - required_libs=</dev/null; then - install_apt_libs - elif command -v 'brew' &>/dev/null; then - install_brew_libs - else - printf 'Unknown platform. Please setup required libraries manually:\n%s' "$required_libs" - fi -} - -sudo_required_to_proceed() { - printf 'Root privileges are required in order to install native libs.\n' -} - -install_apt_libs() { - local choice - while sudo_required_to_proceed && read -rp 'Continue (y/n)? ' choice; do - case "$choice" in - y|Y ) - sudo apt-get install libtesseract-dev libleptonica-dev build-essential - break - ;; - n|N ) - printf 'Cancelling...\n' - break - ;; - * ) - printf "I didn't quite understand that.\n" >&2 - ;; - esac - done -} - -install_brew_libs() { - brew install tesseract -} - -case "$action" in - setup) - install_native_libs - ;; - - *) - python setup.py "$action" - ;; -esac diff --git a/test/__init__.py b/test/__init__.py new file mode 100644 index 0000000..e69de29 From 8c6a669466897e22d866a5e53b611c1a3ac9b350 Mon Sep 17 00:00:00 2001 From: David Alexander Date: Fri, 20 Oct 2017 16:43:14 -0400 Subject: [PATCH 5/6] License should reference a legal entity --- LICENSE.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/LICENSE.txt b/LICENSE.txt index 48767bf..aa1b0fe 100644 --- a/LICENSE.txt +++ b/LICENSE.txt @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2017 TranscribersOfReddit +Copyright (c) 2017 Grafeas Group, Ltd. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal From 01f37f807d1233a66b7028599ea8f9c66e2b8fd2 Mon Sep 17 00:00:00 2001 From: David Alexander Date: Fri, 20 Oct 2017 16:51:33 -0400 Subject: [PATCH 6/6] Setup.py in consistent format and pruned dependencies --- setup.py | 1 + 1 file changed, 1 insertion(+) diff --git a/setup.py b/setup.py index d339385..3167a07 100644 --- a/setup.py +++ b/setup.py @@ -66,6 +66,7 @@ def long_description(): ], keywords='', packages=find_packages(exclude=['test', 'test.*', '*.test.*', '*.test']), + zip_safe=True, cmdclass={'test': PyTest}, test_suite='test', entry_points={