ASU: Justified Referral in AI Glaucoma Screening challenge.
- Login to the lambda machine (via ssh key is preferred). To set this up:
- Alias the lambda machine in your
~/.ssh/configfile:Host lambda HostName 152.10.212.186 User <your-username> Port 22 - Issue a key exchange with the lambda machine:
ssh-copy-id <your-username>@lambda
- You should now be able to login to the lambda machine with the following command:
ssh <your-username>@lambda
- Alias the lambda machine in your
- Create a new Python virtual environment using
venv. Ensure you use the command below to do so, as this will leverage the lambda-stack by default:python3 -m venv just-raigs --system-site-packages
- If you are not in
bashshell, switch to it:bash
- Activate the virtual environment:
source just-raigs/bin/activate - Note that the lambda-stack will have installed most of
the deep learning packages you require for you:
pip list
- I have created an environment for us to use, which is stored in the
requirements.txtfile. You can install all the required packages with the following command:pip install -r requirements.txt
- Any other packages you wish to install can be installed with
pip:pip install some-package
- The lambda machine is powerful, but it is not a GPU cluster. There are two
NVIDIA GeForce RTX 3090 GPUsand126 GBof RAM that we must all share. - Please monitor resource consumption and utilization with the following commands:
- Current GPU usage:
watch -d -n 0.5 nvidia-smi - Memory usage:
htop
- Current GPU usage:
- If one of the GPUs is not in use, you can use it. If both are in use, please wait until one is free. Communicate with
your team members on our
JustRAIGSGoogle Chat Space to coordinate experimentation and resource utilization. - Ensure you check WaB to see if someone is already running an experiment that would include the parameters you were going to run.
- Data is stored in the
/usr/local/data/JustRAIGSdirectory. - The raw compressed files are stored in
/usr/local/data/JustRAIGS/compressed. - The extracted uncompressed files (unmodified) are stored in:
/usr/local/data/JustRAIGS/raw.- Files in this directory are partitioned by the original dataset splits provided by the challenge organizers. For
example, the training data is stored in
/usr/local/data/JustRAIGS/raw/train/0corresponds to the compressed file:JustRAIGS_Train_0.zipprovided directly from the challenge Zenodo website.
- Files in this directory are partitioned by the original dataset splits provided by the challenge organizers. For
example, the training data is stored in
- I have provided a utility method :meth:
src.utils.datasets.load_datasetswhich will load the training datasets from the disk, perform preprocessing, rescaling, normalization, and convert the result into TensorFlow Datasets for use downstream.
- Weights and Biases (WaB) is a tool that allows us to track and visualize our
experiments. It is a great tool for collaboration, and is flexible enough to be customized how you see fit.
- This means you have the freedom to log arbitrary metrics, artifacts, models, and more.
- You can also integrate WaB with other deep learning frameworks and libraries. For instance, you could use Keras Tuner with WaB instead of the native WaB hyperparameter tuning framework.
- I have created a WaB project for us to use, which is located at: WaB: JustRAIGS.
- I have also provided comments in Restructured Text (RST) format in the codebase to help you understand how to integrate WaB with your code, and which documentation to reference when you get stuck.
- There is some high level terminology you should know which will allow you to utilize WaB effectively:
Organization: This is a collection ofProjects. It is a way to organize separate distinctProjectswithin a particular organization/research group.Project: This is the highest level of organization in WaB. It is a collection ofExperiments.
- An
Experimentis a collection ofRuns,MetricsandArtifacts.- A
Run: Is a single unit of computation logged by WaB. Consider aRunan atomic element of the whole project. A newRunshould be initialized if you change a hyperparameter, use a different model, etc. - Within a
Run,Metricscan be tracked across the training and validation datasets and visualized in the WaB dashboard. You could track theaccuracy,loss,precision,recalland more asMetrics. - Within a
Run,Artifactscan also be logged. For example, you could log the model weights, the model architecture, the training dataset, the training logs, a matplotlib image, etc.- For more on logging artifacts, see the WaB Documentation on Artifacts.
- A
- WaB uses
Reportsto organizeRuns, embed visualizations, describe findings, and share content with collaborators. - WaB leverages
Sweepsto automate hyperparameter searching and perform experiment tracking.Sweepsallow you to leverage popular hyperparameter search methods such as Bayesian search, grid search, random search, hyperband search, and more.Sweepsrequire aSweep Configurationto specify the overall hyperparameter search space and the method of search.- Behind the scenes, the WaB Sweep Controller
manages the execution of the
Sweep. You interact with the WaB Sweep Controller via thewandb.agentAPI.
- Behind the scenes, the WaB Sweep Controller
manages the execution of the
- WaB
Sweepsgenerate uniqueTrialsbased on theSweep Configurationprovided. EachTrialis a unique subset of the overall hyperparameter search space specified by theSweep Configuration. In the way I leverage WaB, eachTrialwill have its own uniqueRun, and therefore its ownMetricsandArtifacts. - In Keras a
Modelis a class that performs fitting (weight optimization) to a particular datasets. The model is in charge of minimizing a particular loss function, and is capable of making predictions on a particular dataset (i.e. performing inference). This class also is in charge of: resource management for the training process, the logging of metrics, the logging of losses, and the logging of artifacts to WaB. - A
Hypermodelis a class that is in charge of instantiating and managingModelsfor the hyperparameter tuning process. TheHyperModelclass is instantiated just once per-Sweepand is responsible for creating a newModelfor eachTrialin theSweep.
- In the
docsdirectory, you will find the source files needed to build Sphinx documentation. Sphinx is a static website generator that parses Python docstrings into HTML documentation. This is the tool leveraged by the official Python documentation so it is worth your while to be somewhat familiar with it. Sphinx operates on docstrings written in Restructured Text (RST) form. Restructured Text is a superset of Markdown. Since RST can be ugly to look at, I write docstrings in the Google Documentation Style, and use thesphinx.ext.napoleonextension to parse it into RST, which Sphinx then utilizes to generate pretty looking HTML documentation. You will most likely not need to know how this works. Just know that if you write good docstrings in the Google Style (use this example for reference) , Sphinx will be able to generate readable documentation for you (almost) automagically.- If you use PyCharm (which you should) for Python development, then PyCharm can build the Sphinx documentation for you. Additionally, you can configure PyCharm to lint your docstrings in Google Style. Ask me if you want to know how to do this.
- In the
srcdirectory, you will find the following relevant subdirectories:sweepers: This is the main entry point for the program. The :mod:src.sweepers.sweepermodule is the file you should modify to either change the hyperparameters that are experimented with, or to change the method of the hyperparameter search itself (i.e. random search, grid search, hyperband, etc.). Note that if you do change the hyperparameters, you will also need to change theHypermodelitself to be able to handle the new hyperparameters.hypermodels: Contains an example :class:hypermodels.hypermodels.WaBHyperModelclass which is instantiated by the WaB agent just once for a particularSweep. This class is responsible for creating a newModelfor eachTrial(i.e. unique subset of hyperparameters). Specifically, the :meth:hypermodels.hypermodels.WaBHyperModel.construct_model_run_trialmethod is invoked once perTrialand is in charge of creating a newModelfor theTrialand fitting theModelfor theTrial. You will need to modify this method if you modify the hyperparameters in the sweep configuration.models: Contains an example :class:models.models.WaBModelclass which is instantiated by theHypermodelonce for everyTrial. This class is separate from the hypermodel as thehypermodelcould theoretically instantiate separateModelsubclasses for eachTrial. Additionally, the :class:models.models.WaBModelclass provides an example of how to perform custom model serialization and deserialization in TensorFlow 2.0. If you wish to use a non-sequential model, or a model that requires custom serialization/deserialization, this class will serve as a useful reference.metrics: This file is used to house custom metrics that are not available by default in the Keras API. Note that the use of custom metrics will result in a custom model, which means you will have to modify the serialization and deserialization methods in the :class:src.models.models.WaBModelclass.utils: This directory houses utility functions which are leveraged by the various classes above. For instance, the :meth:utils.datasets.load_datasetsmethod will load the training dataset from the disk, perform preprocessing, rescaling and normalization, and convert the result into TensorFlow Datasets for use downstream.layers: This directory houses the :mod:src.layers.custommodule which provides an example of how to create custom layers in TensorFlow 2.0. This is not used in the current codebase, but is provided as a reference in case you wish to use custom layers in your model. Note that using a custom layer will result in a custom module, which will require you to modify the serialization an deserialization methods in the :class:src.models.WaBModelclass.
- Less-relevant subdirectories:
tuners: This directory houses the :mod:src.tuners.wab_kt_tunermodule which provides an example of how to integrate WaB with Kerastuner. This is not used in the current codebase, but is provided as a reference in case you wish to leverage KerasTuner for hyperparameter tuning directly, instead of WaB. This class uses KerasTuner as a driver, but still integrates with WaB for experiment tracking, versioning, and artifact retention.