Find more information here.
-
Dockerfile ... File with instructions to build the image
-
Docker Image ... Image that can be shared across platforms to run code
-
Docker Container ... The "environment" which is used to actually run the code
An example project can be found here
For new projects, I recommend following structure:
├── workspace
│ ├── code
│ │ ├── all your code
│ ├── data
│ │ ├── all your data
│ ├── output
│ │ ├── logging + checkpoints
│ ├── .env
├── Dockerfile
├── requirements.txt
├── README.md
└── .gitignore
- Like this, you can easily mount/add your code and data directory seperately or all together - whatever is needed.
- The requirements.txt should have all python packages with version control. It can hence be accessed in the Dockerfile to install dependencies.
- Using git + wandb in combintation with docker will help you to keep track of the docker you used for the experiment, as you can access the git commit from your wandb experiments, and then link these to the corresponding dockerfile.
- You should keep some additional file to write down the results of your experiments. There you should also link the corresponding wandb projects, groups or runs. If you like you can add version control to your docker images and link them in this file as well. (So if you ever want to redo an experiment, you just go to the wandb run, copy the command to run it, get your docker image, and do so.)
More details here.
Generate a file Dockerfile (without extension) in your project directory. This way, when you use git + wandb you can find the corresponding commit of one experiment and the correlated Dockerfile.
Simple example for docker of pytorch extension:
FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
RUN apt-get update && apt-get upgrade -y
ADD requirements.txt /reproducability/
ADD Dockerfile /reproducability/
RUN pip install -r /reproducability/requirements.txt
WORKDIR /workspace/
ENV CUBLAS_WORKSPACE_CONFIG=:16:8
FROM
Sets the Base Image for subsequent instructions.
A valid Dockerfile must start with a FROM instruction.
RUN
Executes any commands on top of the new image and commits the results. The resulting image will be used in the next steps of the Dockerfile.
ENV
The ENV instruction sets the environment variable <key> to the value <value> (such as ENV <key>=<value>)
ADD
The ADD instruction copies new files, directories or remote file URLs from <src> and adds them to the filesystem of the image at the path <dest> (such as ADD <src>,... <dest>). This is helpful when probably for a final release the code/data or similar should be included in the docker image itself. Adding the Dockerfile as well as the requirements.txt to your docker image, will help to understand on how you build the image. Hence, I would recommend adding it - just in case.
COPY
The COPY instruction copies new files or directories from <src> and adds them to the filesystem of the container at the path <dest> (such as COPY <src>,... <dest>)
!Note that you need to define the CUDA ARCH flags if you wish to compile something for cuda within the docker. Here is a list for CUDA ARCH: https://en.wikipedia.org/wiki/CUDA
FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
RUN apt-get update && apt-get upgrade -y
ADD requirements.txt /reproducability/
ADD Dockerfile /reproducability/
RUN pip install -r /reproducability/requirements.txt
WORKDIR /workspace/
ENV CUBLAS_WORKSPACE_CONFIG=:16:8
In this case, we will include requirements.txt and Dockerfile in the docker image, to later access them if we need to. Note that we do not add (ADD) code or data since we want to be able to make local changes.
builds a docker image from a file
docker build -f <file-name> -t <image-name> .-f file name of the docker. If not defined, it will look for Dockerfile
-t human readable name of the docker image, which is being created (this is then used to run the container).
. tells Docker to look for the docker file in this directory.
Run a docker container in interactive mode. This means you can start code within the container that starts. The container will be removed once you exit it (Crtl+D). However, you are not able to browse through the directories within the docker at the same time as you run some code.
You should mount the code and data directory using -v. With this, changes that will be made within the container will be presistend locally (so also when exiting the container). This is handy if you'd like to generate some output or logging, which you'd like to access from outside the container.
docker run --gpus all -it --rm --ipc=host -v /local_dir/:/container_dir/ --name <container-name> <image-name>Example:
docker run --gpus all -it --rm --ipc=host -v /mnt/e/Documents/UbuntuCode/1_DockerTest/HowToDockern/workspace/:/repository/workspace/ --name docker_gpu_container docker_gpu--gpus Useage of available CUDA GPUs (all, 0, 1, ...)
-it run in interactive mode
--rm removes the container when finished
--ipc use hosts inter-process communication namespace for shared memory. When using torch multiprocessing for multi-threaded data loaders, default shared memory segment size might not be enough.
-v /local_dir/:/container_dir/ local_dir is the directory or file from your host system (absolute path) that you want to access from inside your container. Mount your project directory here with code and data to work on it from inside the container at the /containerdir/ path.
--name <container-name> assign name of the container (for future reference). For example, to start container with docker start <container-name>
<image-name> the name of the image to use as a basis to create the container.
Run a docker daemon in interactive mode. This means you can start code within the deamon that starts. At the same time, you can browse through the dockers file system in a seperate terminal.
docker run --gpus all -dit --ipc=host -v /local_dir/:/container_dir/ --name <container-name> <image-name>Then, start a docker deamon to work interactively:
docker exec -it <container-name> /bin/bash--gpus Useage of available CUDA GPUs (all, 0, 1, ...)
-dit detach (a deamon) + run in interactive mode
--ipc use hosts inter-process communication namespace for shared memory. When using torch multiprocessing for multi-threaded data loaders, default shared memory segment size might not be enough.
-v /local_dir/:/container_dir/ local_dir is the directory or file from your host system (absolute path) that you want to access from inside your container. Mount your project directory here with code and data to work on it from inside the container at the /containerdir/ path.
--name <container-name> assign name of the container (for future reference). For example, to start container with docker start <container-name>
<image-name> the name of the image to use as a basis to create the container.
https://docs.docker.com/get-started/04_sharing_app/ First, you need to generate an account at Dockerhub.
docker login -u <user-name>
docker tag <image-name> <user-name>/<image-name>:<version>
docker push <user-name>/<image-name>:<version>You should be able to use the docker image on dockerhub for the submit script of slurm on our cluster. It should look somewhat like:
submit ./run_script.sh --name test-docker --custom hannahkniesel/myfirstimage --gpus 3090FINAL NOTE: To be consistent between the cluster and locally, you should always mount the full directory to the image. As this will happen automatically on the cluster, we can access folder structures in a similar fashion locally and on the cluster.
Show all (locally available) docker images
docker image lsShow all docker containers
docker container ls -aGenerates a new docker container from the image.
docker run ... Start a docker container (starts the exact instance of the docker container with the state that it was left - probably you installed some additional dependencies inside this container.)
docker start <container-name>Stop a docker container
docker stop <container-name>Execute a docker container in interactive mode (can be used to run code from here)
docker exec -it <container-name> /bin/bashRemove docker container
docker rm <name>Generate a docker image from a docker container. More info here.
docker commit <container-name> <image-name>Make sure, that you have nvidia-drivers, nvidia-cuda-runtime, nvidia-docker, nvidia-cuda-toolkit installed. (For example, do sudo apt-get install nvidia-container-runtime)
Make sure /etc/docker/daemon.json looks like follows:
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}When building the docker do:
DOCKER_BUILDKIT=0 docker build ...NOTE: When this is not working, try to build dependencies inside a running container (which is started by
docker run --gpus all ...) and generate a new image from this container usingdocker commit ...
Runs a docker image that has been build from a file that already includes the code. However, since the docker image would become rather big, we do not include the data, but mount the volume via the -v flag.
docker run --gpus all --ipc=host -v data:/workspace/data --name <container-name> <image-name>An example Dockerfile for release could look somewhat like this:
FROM pytorch/pytorch:latest
RUN pip install numpy
ENV CUBLAS_WORKSPACE_CONFIG=:16:8
ADD code/ /workspace/code/
ENTRYPOINT [ "python", "/workspace/code/main.py"]
The code is added to the docker itself. Additionally, the python script will be directly executed on the run of the docker. However, such a release docker is usually not nessecary in research, as one would like to reproduce multiple experiments or adapt the code in order to fit new needs. Furthermore, it still requires the mounting of some output directory for logging. Otherwise all files will only be created within the docker container.