Deploying machine learning models has always been a struggle. Most of the software industry has adopted the use of container engines like Docker for deploying code to production, but since accessing hardware resources like GPUs from Docker was difficult and required hacky, driver specific workarounds, the machine learning community has shied away from this option. With the recent release of NVIDIA’s
nvidia-docker tool, however, accessing GPUs from within Docker is a breeze, and we’re already reaping the benefits here at indico. In this tutorial we’ll walk you through setting up
nvidia-docker so you too can deploy machine learning models with ease.
Before we get into the details however, let’s talk briefly about why using Docker for your next data science project may be a good choice. There is certainly a learning curve for the tools in the Docker ecosystem, but the benefits are worth the effort.
- No inconsistencies between team environment configurations:
Software configuration is always a pain. Docker’s configure once, run anywhere model means your teammates will have to worry less about environment setup and can focus more on writing code and building machine learning models.
- Reliable deployments:
Fewer bugs crop up in production when you can be assured that your development environment is identical to your production environment.
- Git-like tool for environment configuration:
If something does go wrong in production, reverting to a previous Docker image ensures you can quickly get back to a functional state.
Why is a special solution needed for using GPUs within Docker?
Docker is designed to be hardware and platform agnostic. GPUs are specialized hardware that is not necessarily available on every host. Because of this, the Docker binary does not include GPU support out of the box, and requires a fair amount of configuration to get things working properly. When we first started using Docker in production and needed to enable access to GPU devices from within the container, we had to roll our own solution. It was educational to have to understand the mechanisms by which hardware like GPUs are exposed to an operating system (primarily the
/dev block), but we ended up with a solution that was not portable and required that the host’s NVIDIA driver was identical to a second copy of the driver installed within the container. Whenever we updated our NVIDIA drivers to support newer CUDA versions, we had to make a breaking change to our Docker image in order to ensure drivers matched exactly.
Thankfully, the nice folks at NVIDIA have rectified this problem by releasing
nvidia-docker, a tool for configuring docker to allow GPU access from within containers.
takes the following steps to get CUDA working within your container:
- It attaches the GPU device blocks to your container as Docker volumes (/dev/nvidia0, /dev/nvidiactl, etc.)
- It mounts the device drivers on your host within the Docker container
This means that as long as you have a functional NVIDIA driver on your host and a CUDA version recent enough to support your driver is installed within your container, you should be able to execute CUDA code from your running Docker container. Importantly, the Docker container can also be run in another environment with different driver versions, making it easy to build once and then run anywhere.
How do I install
- Linux kernel > 3.10
- NVIDIA GPU with Architecture > Fermi (2.1)
- NVIDIA drivers >= 340.29 with binary nvidia-modprobe
- Docker >= 1.9
If you already meet these requirements, installation of
nvidia-docker is as easy as installing a
.deb file (on Ubuntu 14.04):
bash # Install nvidia-docker and nvidia-docker-plugin wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc.3/nvidia-docker_1.0.0.rc.3-1_amd64.deb sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
If you already have a working
nvidia-docker on your host machine, you can try out
nvidia-docker immediately by running the
nvidia/cuda Docker image provided by NVIDIA:
# Test nvidia-smi nvidia-docker run --rm nvidia/cuda nvidia-smi
Depending on your driver version, you may need to specify a different version of CUDA to run when testing your installation:
# Test nvidia-smi nvidia-docker run --rm nvidia/cuda:7.5 nvidia-smi
If all is well, you should see something like:
$ nvidia-docker run --rm nvidia/cuda:7.5 nvidia-smi 7.5: Pulling from nvidia/cuda bf5d46315322: Already exists 9f13e0ac480c: Already exists e8988b5b3097: Already exists 40af181810e7: Already exists e6f7c7e5c03e: Already exists 261ad237e477: Already exists 83d2db6fdab9: Pull complete e8e8d0e851cd: Pull complete c0000b849c19: Pull complete 180b04fcdc2d: Pull complete 1e5b85df3d02: Pull complete Digest: sha256:c601c6902928d62c79f2cbf90bf07477b666e28b51b094b3a10924ec7dacde8b Status: Downloaded newer image for nvidia/cuda:7.5 Fri Nov 4 16:34:00 2016 +------------------------------------------------------+ | NVIDIA-SMI 352.93 Driver Version: 352.93 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 760 Off | 0000:01:00.0 N/A | N/A | | 17% 31C P8 N/A / N/A | 172MiB / 4095MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+
Now let’s use
nvidia-docker for something more substantial. We’ll be setting up and running the “neural doodle” project from Alex Champanard (@alexjc). The project takes rough sketches and turns them into artistic masterpieces using techniques from the Semantic Style Transfer paper.
Alex has already done the hard work of providing us with a Docker image of his project, and has gone to the trouble of installing the necessary CUDA drivers in the Docker image as well. Normally we’d need to have a functioning installation of CUDA, Theano, and the lasagne library in order to run his code, but since he’s provided us with a Docker image we should be up and running in just a few minutes.
git clone https://github.com/alexjc/neural-doodle.git && cd neural-doodle alias doodle="nvidia-docker run -v ($pwd)/samples:/nd/samples -v ($pwd)/frames:/nd/frames -it alexjc/neural-doodle:gpu" # paint a photo of a coastline in the style of Monet doodle --style samples/Monet.jpg --output samples/Coastline.png --device=gpu --iterations=40
This example takes this original Monet painting:
and this sketch of a similar coastline:
and creates a new work of art in style similar to the original Monet:
Pretty cool, huh?
Let’s walk through the
neural-doodle dockerfile and the
doodle alias to remove some of the magic behind what we’ve just done.
The dockerfile used to build the
alexjc/neural-doodle:gpu image is below:
FROM nvidia/cuda:7.5-cudnn4-devel # Install dependencies RUN apt-get -qq update && apt-get -qq install --assume-yes "module-init-tools" "build-essential" "cmake" "git" "wget" "libopenjpeg2" "libopenblas-dev" "liblapack-dev" "libjpeg-dev" "libtiff5-dev" "zlib1g-dev" "libfreetype6-dev" "liblcms2-dev" "libwebp-dev" "gfortran" "pkg-config" "python3" "python3-dev" "python3-pip" "python3-numpy" "python3-scipy" "python3-matplotlib" "python3-six" "python3-networkx" "python3-tk" && rm -rf /var/lib/apt/lists/* && python3 -m pip -q install "cython" # Install requirements before copying project files WORKDIR /nd COPY requirements.txt . RUN python3 -m pip -q install -r "requirements.txt" # Copy only required project files COPY doodle.py . # Get a pre-trained neural network (VGG19) RUN wget -q "https://github.com/alexjc/neural-doodle/releases/download/v0.0/vgg19_conv.pkl.bz2" # Set an entrypoint to the main doodle.py script ENTRYPOINT ["python3", "doodle.py", "--device=gpu"]
Hey, this isn’t so bad. The dockerfile Alex used is based off of an official NVIDIA Docker image (
nvidia/cuda:7.5-cudnn4-devel) that already includes the required CUDA libraries, so it only has to describe how to install a few system dependencies for working with image formats, install a few machine learning Python packages with
pip (Theano, lasagne, etc.), and download some pre-trained model weights. It’s little more than a glorified bash setup script.
doodle alias isn’t bad either. It simply specifies the Docker image we’ll be running (
alexjc/neural-doodle:gpu) and lets Docker know that the
./frames directories should be accessible from the Docker container at
/nd/frames. This is done using Docker’s “volumes” feature, which the curious can read more about on the official Docker site.
At indico, we now use a setup to the
neural-doodle configuration to host the indico API on Amazon GPUs. Instead of using our own bash scripts, we allow the
nvidia-docker tool to handle the process of ensuring device drivers within the Docker container match device drivers on the host. This means when our customers wish to run our APIs on their local machines, deployment is as easy as providing them with access to our production Docker image and letting the
nvidia-docker tool handle the rest.
Operating System Support
At the moment,
nvidia-docker is only portable in the sense that it’s not reliant on a particular GPU model, NVIDIA driver version, or linux distribution. Running
nvidia-docker on OSX or Windows will likely not be supported anytime soon.
Where can I find more information on
NVIDIA has done an excellent job of keeping the wiki of their Github page up-to-date. Chances are if you have questions that aren’t answered in this blog post, you can probably find answers in the
nvidia-docker Github wiki.
If you’re using a version of CUDA other than the one used in this demo (CUDA 7.5), you might also want to take a peek at the full list of base images that NVIDIA provides for you to work with.
I hope you’ve enjoyed this whirlwind tour on using
nvidia-docker to build and run machine learning projects, and perhaps created a bit of original algorithmic art while you’re at it. If you run into trouble trying out this tutorial, or want to learn more about how we’re using Docker in production at indico, feel free to reach out over our site chat and say hello. Happy hacking!