This covers how the project is developed. It should include conventions, development workflow, and other choices that were made in the development.
This project uses the jupytext extension to put Jupyter Notebooks under version control. Whenever the notebook is saved, a copy is also saved as .py format that is more appropriate for version control systems.
The .ipynb notebook file extension is added to the .gitignore file so that it is not tracked at all by version control. As a result, it is essential that any Jupyter Notebooks that should be a part of the main repo use jupytext.
Because credentials are so important to all of the API calls, they need to be present in the directory. Here, they are stored in credentials.py, which is not tracked by version control.
Since the data for this network was sourced from different sources, Github, Hydroshare and Bibtex files, there was an issue of the same person having multiple names across different data channels. This was done by reconciling the names using the Python FuzzyWuzzy package. This package uses Levenshtein Distance to figure out how similar two strings are.
To create any network visualization in Tableau, one needs to supply the X and Y coordinates for each node plotted on the network. This is generated using the Fruchterman Reingold layout, using the NetworkX python package.