last updated: October 23, 2020
This repository contains all the files for sending Hong Kong news articles on Wisenews to CSRP colleagues. The class WisenewsScraper also contains a routine to save scraped articles to a local MongoDB collection.
This is a standalone version of Wisenews scraper. Also check repository openup-triage-server for an integrated solution for Django.
- a working python virtual environment - follow Python or Conda documentation to set up one if you haven't already done so.
Selenium>= 3.141.0openpyxl>= 2.6.3pymongo>= 3.8.0jupyter-core>=4.5 and associated packages if running Jupyter Notebook file
Google Chrome- https://www.google.com/chrome/ - for selenium controllerChromeDriver- https://chromedriver.chromium.org/ - select a version to matchGoogle ChromeMongoDB>= 4.0
- Open
python wisenews.pywith a text editor - recommended ones arevim,sublime, ornotepad++ - modify the global variable
WISENEWS_NEWS_SECTIONSto select a subset of news articles to download from Wisenews - modify tuples in the enum
Keywordsto tailor keywords for searching news articles - change the chromedriver path accordingly
- If this is your first time running this scraper, create
.envfile as follows:
cat env_template > .env # this will create a new environment file called .env using env_template as baseOpen and edit your .env credentials accordingly:
# Wisenews Login
HKU_LOGIN='HKU_PID_HERE' # your HKU login. Must be a real one.
HKU_PASSWORD='HKU_PASSWORD_HERE' # your HKU password. Must be a real one.
SENDER='SENDER_NAME_HERE' # i.e. Byron
FROM_EMAIL='SENDER EMAIL HERE' # i.e. byron@csrp.hku.hk
TO_EMAIL='RECEPIENT EMAIL HERE' # i.e. staff@csrp.hku.hk- Source into the python virtual environment.
- Either: a) In
jupyter notebookrun the notebook fileWisenews.ipynbor b) enterpython wisenews.pyin bash.
Full details on usage: see the main function of wisenews.py and the Jupyter notebook Wisenews.ipynb