Skip to content
This repository was archived by the owner on Sep 21, 2023. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Each line is a file pattern followed by one or more owners.

# These owners will be the default owners for everything in the repo.
# Unless a later match takes precedence, @global-owner1 and @global-owner2
# will be requested for review when someone opens a pull request.
* @itsthejoker

*.sh @thelonelyghost
test/* @thelonelyghost
52 changes: 34 additions & 18 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[![Stories in Ready](https://badge.waffle.io/TranscribersOfReddit/ToR_OCR.png?label=ready&title=Ready)](http://waffle.io/TranscribersOfReddit/ToR_OCR)
[![Waffle.io - Columns and their card count](https://badge.waffle.io/TranscribersOfReddit/TranscribersOfReddit.svg?columns=all)](http://waffle.io/TranscribersOfReddit/TranscribersOfReddit)

# Contributing

Expand All @@ -14,43 +14,59 @@ Do your best to check as many of these boxes as you can and everything will be f
## Issues

Any bugs you find, features you want to request, or questions you have should go in the
repository's [issues section](https://github.com/TranscribersOfReddit/ToR_Archivist/issues).
repository's [issues section](https://github.com/TranscribersOfReddit/ToR_OCR/issues).
Please, be kind and search through both open and closed issues to make sure your question
or bug report hasn't already been posted and resolved.

## Development

After checking out the repo, run `bin/run setup` to install native dependencies.
Initial setup:

To install this package locally, setup a virtualenv environment and run `pip install -e .`
from the project root. To make sure you have everything setup correctly, run `bin/run test`
and it _should_ pass entirely.
```bash
# Clone the repository
$ git clone git@github.com:TranscribersOfReddit/ToR_OCR.git tor_ocr
$ cd ./tor_ocr

In case you get tired of prefixing `bin/` to the `run` script here, [Tim Pope's method](https://twitter.com/tpope/status/165631968996900865)
of safely adding a script to your PATH is recommended.
# Setup sandbox
$ virtualenv --no-site-packages --python=python3 venv
$ source ./venv/bin/activate

# Install the project in "editable" mode
$ pip install --process-dependency-links -e .[dev]
```

In case there are any tests, they would be run by calling `python setup.py test`.

## Testing

This project has (some) automated test coverage, so be sure to check that tests are passing
_before_ you begin development. Our emphasis is on stability here, so if tests aren't passing,
that's a bug.
This project is expected to have automated test coverage, so be sure to check that tests
are passing _before_ you begin development. Our emphasis is on stability here, so if tests
aren't passing, that's a bug.

### Stability

As noted before, make sure tests are passing before starting. If you have difficulty getting
to that stable, initial state, reach out by opening an issue (see [Issues](#Issues) above).
This is considered a failing by the maintainers if instructions are less than absolutely
clear. Any feedback is helpful here!
This is considered a failure by the maintainers if instructions are less than absolutely
clear. Feedback is very helpful here!

### Writing tests

Tests are written using `pytest` because it allows for simple decorators to modify when to
run certain tests and the output is much prettier than `unittest`. We invoke the full test
suite by calling `bin/run test`.
Tests are written using `pytest` for a variety of reasons. Some of which are:

At the moment, the test suite should run quickly, but that won't always be the case. Running
- easy assertions that an exception will be thrown and the message it contains
- skipping some tests for stated reasons
- marking some tests as expected to fail
- colorized output compared to `unittest`

We should be able to invoke the full test suite by calling either `python setup.py test` or
`pytest` from the terminal.

The test suite should run quickly at the moment, but that won't always be the case. Running
individual tests with `pytest path/to/test/file.py` is also acceptable while actively
developing. Please note: a pull request should always have a fully passing test suite.
developing.

> **NOTE:** a pull request should always have a fully passing test suite.

## Pull Requests

Expand Down
2 changes: 1 addition & 1 deletion LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2017 TranscribersOfReddit
Copyright (c) 2017 Grafeas Group, Ltd.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
12 changes: 12 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.PHONY: clean all test

all: test clean
@true

clean:
@# echo "Removing \`*.pyc', \`*.pyo', and \`__pycache__/'"
@find . -regex '.+/[^/]+\.py[co]$$' -delete
@find . -regex '.+/__pycache__$$' -exec rm -rf {} \; -prune

test: clean
@python setup.py test
31 changes: 18 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
[![Stories in Ready](https://badge.waffle.io/TranscribersOfReddit/ToR_OCR.png?label=ready&title=Ready)](http://waffle.io/TranscribersOfReddit/ToR_OCR)
[![Waffle.io - Columns and their card count](https://badge.waffle.io/TranscribersOfReddit/TranscribersOfReddit.svg?columns=all)](http://waffle.io/TranscribersOfReddit/TranscribersOfReddit)
[![BugSnag](https://img.shields.io/badge/errors--hosted--by-Bugsnag-blue.svg)](https://www.bugsnag.com/open-source/)

# Apprentice Bot - Transcribers Of Reddit

This is the source code for Apprentice Bot (`/u/transcribot`). It forms one part
of the team that assists in the running or /r/TranscribersOfReddit (ToR), which
is privileged to have the incredibly important job of organizing crowd-sourced
transcriptions of images, video, and audio.
This is the source code for a helper bot, making attempts at transcribing content as
it is posted to the subreddit /r/TranscribersOfReddit, a community dedicated to
transcribing images, audio, and video. It acts under the username "/u/transcribot".

As a whole, the ToR bots are designed to be as light on local resources as they
can be, though there are some external requirements.
This bot is still in training and might not be able to recognize everything it
attempts. Some transcriptions might be complete trash, but the hope is that it will
be a start to a more legitimate, volunteer-written transcription.

- Redis (tracking completed posts and queue system)
- Tesseract (OCR solution)
## Resources

Redis (tracking completed posts and queue system)
Tesseract (OCR solution)

> **NOTE:**
>
Expand All @@ -23,14 +25,15 @@ can be, though there are some external requirements.
## Installation

```
$ git clone https://github.com/TranscribersOfReddit/ToR_OCR.git tor-ocr
$ pip install --process-dependency-links tor-ocr/
$ git clone https://github.com/TranscribersOfReddit/ToR_OCR.git tor_ocr
$ cd tor_ocr/
$ pip install --process-dependency-links .
```

OR

```
$ pip install --process-dependency-links 'git+https://github.com/TranscribersOfReddit/ToR_OCR.git@master#egg=tor_ocr'
$ pip install --process-dependency-links 'git+https://github.com/TranscribersOfReddit/ToR_OCR.git@master#egg=tor_ocr-0'
```

## High-level functionality
Expand All @@ -41,9 +44,11 @@ Monitoring daemon (via Redis queue):
- Download image
- OCR the image
- If OCR successful:
- Post OCR-ed content to post on /r/TranscribersOfReddit in 9000 character chunks
- Post OCR-ed content to post on /r/TranscribersOfReddit in 9000 character chunks, replying to previous comment when [over 9000][over-9000] characters
- Delete local copy of image

[over-9000]: https://tenor.com/view/dragonball-z-super-saiyan-charging-yelling-gif-4987448

### Running Apprentice Bot

```
Expand Down
62 changes: 0 additions & 62 deletions bin/run

This file was deleted.

36 changes: 21 additions & 15 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def initialize_options(self):

def run_tests(self):
import shlex
# import here, cause outside the eggs aren't loaded
# import here, because outside the eggs aren't loaded
import pytest
errno = pytest.main(shlex.split(self.pytest_args))
sys.exit(errno)
Expand All @@ -35,10 +35,19 @@ def long_description():
return f.read()


test_deps = [
'pytest',
'pytest-cov',
]
dev_helper_deps = [
'better-exceptions',
]


setup(
name='tor_ocr',
version=__version__,
description='',
description='An AI attempting to transcribe contents of /r/TranscribersOfReddit',
long_description=long_description(),
url='https://github.com/TranscribersOfReddit/ToR_OCR',
author='Joe Kaufeld',
Expand All @@ -56,29 +65,26 @@ def long_description():
'Programming Language :: Python :: 3.6',
],
keywords='',
packages=find_packages(exclude=['test*', 'bin/*']),
packages=find_packages(exclude=['test', 'test.*', '*.test.*', '*.test']),
zip_safe=True,
cmdclass={'test': PyTest},
test_suite='test',
entry_points={
'console_scripts': [
'tor-apprentice = tor_ocr.main:main',
],
},
tests_require=[
'pytest',
],
cmdclass={'test': PyTest},
extras_require={
'dev': test_deps + dev_helper_deps,
},
tests_require=test_deps,
install_requires=[
'praw==5.0.1',
'redis<3.0.0',
'tor_core>=0.2.0,<0.3.0',
'addict',
'tor_core',
'tesserocr',
'wget',
'sh',
'bugsnag',
'cython', # WORKAROUND: 'tesserocr' only sometimes installs this dependency
'wget',
],
dependency_links=[
'git+https://github.com/TranscribersOfReddit/tor_core.git@master#egg=tor_core-0.2.0',
'git+https://github.com/TranscribersOfReddit/tor_core.git@master#egg=tor_core-0',
],
)
Empty file added test/__init__.py
Empty file.