20240534-u24_nvidia_docker_podman.jpeg

<aside> đź’ˇ Set up instructions for NVIDIA GPU container toolkits on a Linux host running Ubuntu 24.04, which can be used with docker and podman.

</aside>

Revision: 20250326-0 (init: 20240224)

The NVIDIA GPU Container Runtime plugin enables container platforms to securely access and manage NVIDIA GPUs in a containerized application environment. Docker is an open-source platform that automates applications' deployment, scaling, and management within lightweight, portable containers. Podman is an open-source, daemonless container engine designed for developing, managing, and running OCI Containers. It functions as a drop-in replacement for Docker.

Instructions for a Linux host running Ubuntu 24.04 to install the Nvidia runtime for docker and podman. We note that NVIDIA’s Container Toolkit officially only supports Ubuntu LTS release.

Preamble

The following are only required if you do not already have some of the tools installed.

Confirming the Nvidia driver is available

The rest of this guide expects an already functional nvidia-driver.

To install it :

To confirm it is functional, after a reboot, run nvidia-smi from a terminal; if a valid prompt shows up, you will have information on the Driver Version and the supported CUDA Version for future running GPU-enabled containers.

Using a more recent driver

When writing this section (late June 2024), Ubuntu 24.04 uses driver 535 as its recommended driver.

As a user of the CTPO: CUDA + TensorFlow + PyTorch + OpenCV Docker container whose latest version uses CUDA 12.3.2, nvidia-smi tells me 535 supports up to CUDA 12.2, which requires me to use a more recent version of the driver. You can find the list of Nvidia drivers from https://www.nvidia.com/en-us/drivers/unix/

For the rest of this section, we will install driver 550 on a Ubuntu 24.04 Desktop.

First, look at the options listed on Ubuntu’s page at https://ubuntu.com/server/docs/nvidia-drivers-installation

If sudo ubuntu-drivers list offers driver 550 in the provided list, you can perform a sudo ubuntu-drivers install nvidia:550 and ignore the rest of this section.

Otherwise, we will follow the method listed in the “Manual driver installation (using APT)” section.

apt list --installed | grep nvidia | grep modules

Among the options presented, the linux-modules-nvidia-535-generic-hwe-24.04 matches the expected linux-modules-nvidia-${DRIVER_BRANCH}${SERVER}-${LINUX_FLAVOUR} format.

apt search nvidia | grep 550 | grep modules | grep generic-hwe
sudo apt install linux-modules-nvidia-550-generic-hwe-24.04
sudo apt-cache policy linux-modules-nvidia-550-$(uname -r)
sudo apt install nvidia-driver-550
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+

, confirming driver 550 is loaded, supporting up to CUDA 12.4 in Docker containers.

Using an even more recent driver

In some cases, the latest official driver provider is not recent enough to support more recent CUDA version. In such cases, it is possible to add a Personal Package Archive (PPA) from the “Graphics Drivers” team to the list of package sources. To add it:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

At this point, we can start the “Additional Drivers” application and select the driver to install (here we will select Using NVIDIA driver metapackage From nvidia-driver-560 (proprietary)) and start the installation process. After the installation is completed, a reboot is required. At next login, we can confirm the driver and its capabilities using nvidia-smi:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+

“Continue using a manually installed driver” error

Having finally got my hand on an RTX 5090, I added it to my Ubuntu 24.04 system without updating the driver first, which got me into a low-resolution situation.

To solve this, with the graphics-drivers/ppa installed:

# make sure the system is up to date first
sudo apt update
sudo apt upgrade
# reboot to enable the latest kernel if one was installed
sudo reboot

# Get the list of recommended drivers
ubuntu-drivers devices

# Install the recommended driver as listed above
sudo apt install nvidia-driver-570

# Reboot to enable the new driver
sudo reboot

# If after reboot, nvidia-smi still does not work
nvidia-smi
# check the dmesg for: NVIDIA GPU [...]installed in this system requires use of the NVIDIA open kernel modules
# install the open driver instead in this case
sudo apt install nvidia-driver-570-open
# another reboot is required
sudo reboot

# clean older package not needed anymore
sudo apt autoremove 

After that final reboot and confirming the screen resolution is resolved, our nvidia-smi returns the expected result:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.04             Driver Version: 570.124.04     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5090        Off |   00000000:01:00.0  On |                  N/A |
|  0%   31C    P8             34W /  600W |     344MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Docker setup (from docker.io)

We will follow the instructions to set it up using the apt registry option. Details can be found at https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository.

On a Ubuntu 24.04 system from a terminal, clean up potential conflicting packages:

for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done

Add support for GPG keys, load Docker’s key, and set up the repository:

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL <https://download.docker.com/linux/ubuntu/gpg> -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \\
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] <https://download.docker.com/linux/ubuntu> \\
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \\
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

Install the required packages:

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Confirm docker is functional by checking if we get the Hello from Docker! message running its hello-world:

sudo docker run --rm hello-world

Optionally, make docker available without sudo, which has some security implications, as detailed in https://docs.docker.com/engine/install/linux-postinstall/.

You will need to log out entirely before the changes take effect. Once this is done, you should be able to run docker run hello-world without the need for a sudo:

sudo usermod -aG docker $USER

Install podman

On Ubuntu 24.04, apt search podman returns versions above 4.1.0, the minimum required to use the Container Device Interface (CDI) for nvidia-container-toolkit.

It is, therefore, possible to install podman by simply:

sudo apt install podman

Now we can test podman:

podman run hello-world

podman runs similarly to docker; for example:

podman run --rm -it docker.io/ubuntu:24.04 /bin/bash

, will download ubuntu:24.04, give you a bash shell prompt in an interactive session, and will delete the created container when you exit the shell.

It is recommended that you always use a fully qualified image name, including the registry server (full DNS name), namespace, image name, and tag, such as docker.io/ubuntu:24.04.

To add docker.io to the list of “unqualified search registries,” edit/etc/containers/registries.conf and modify the following line as follows: unqualified-search-registries=["docker.io"]—more details on that topic at https://podman.io/docs/installation#registriesconf.

Contrary to docker, podman does not create iptables configurations or use br_netfilter, which allows for the use of bridged VMs. In such cases, only install podman and also install podman-compose to get access to podman-compose. Usepodman-compose if you want to use a tool like Dockge; but we also recommend seeing this PR.

NVIDIA Container Toolkit

For further details on what this supports, NVIDIA has an excellent primer document at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/overview.html, and detailed instructions are available at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html. On this page, you will find details on the following:

The NVIDIA Container Toolkit also supports generating a Container Device Interface (CDI) for podman.

Setup the package repository and the GPG keys:

curl -fsSL <https://nvidia.github.io/libnvidia-container/gpgkey> | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \\
  && curl -s -L <https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list> | \\
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \\
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Install the toolkit:

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

For Docker

Configure docker to recognize the toolkit:

sudo nvidia-ctk runtime configure --runtime=docker

Restart docker

sudo systemctl restart docker

Confirm docker (no sudo needed if you made the optional step in the last section) sees any GPU that you have running on your system by having it run nvidia-smi. Note that docker will need both --runtime=nvidia and --gpus all to use the proper runtime and have access to all the GPUs

docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

Please be aware that the max CUDA version returned by the nvidia-smi command on your host (without docker) will inform you of the max cuda:version image that you can use.

You can inspect your /etc/docker/daemon.json file to see that the nvidia-container-runtime is added:

[...]
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }

To make this runtime the default, add the following content to the top of the file "default-runtime": "nvidia", (after the first {) and sudo systemctl restart docker. You should not have to add --runtime=nvidia to the CLI anymore.


Untitled

Untitled