<aside>
đź’ˇ Set up instructions for NVIDIA GPU container toolkits on a Linux host running Ubuntu 24.04, which can be used with docker
and podman
.
</aside>
Revision: 20250326-0 (init
: 20240224)
The NVIDIA GPU Container Runtime plugin enables container platforms to securely access and manage NVIDIA GPUs in a containerized application environment. Docker is an open-source platform that automates applications' deployment, scaling, and management within lightweight, portable containers. Podman is an open-source, daemonless container engine designed for developing, managing, and running OCI Containers. It functions as a drop-in replacement for Docker.
Instructions for a Linux host running Ubuntu 24.04 to install the Nvidia runtime for docker
and podman
.
We note that NVIDIA’s Container Toolkit officially only supports Ubuntu LTS release.
The following are only required if you do not already have some of the tools installed.
The rest of this guide expects an already functional nvidia-driver
.
To install it :
Software & Updates
’s Additional Drivers
and reboot.ubuntu-drivers devices
and install the recommended
“server” driver: sudo apt install nvidia-driver-535-server
, then reboot.
aplay
error, you can sudo apt-get install alsa-utils
To confirm it is functional, after a reboot, run nvidia-smi
from a terminal; if a valid prompt shows up, you will have information on the Driver Version
and the supported CUDA Version
for future running GPU-enabled containers.
When writing this section (late June 2024), Ubuntu 24.04 uses driver 535 as its recommended driver.
As a user of the CTPO: CUDA + TensorFlow + PyTorch + OpenCV Docker container whose latest version uses CUDA 12.3.2, nvidia-smi
tells me 535 supports up to CUDA 12.2, which requires me to use a more recent version of the driver. You can find the list of Nvidia drivers from https://www.nvidia.com/en-us/drivers/unix/
For the rest of this section, we will install driver 550 on a Ubuntu 24.04 Desktop.
First, look at the options listed on Ubuntu’s page at https://ubuntu.com/server/docs/nvidia-drivers-installation
If sudo ubuntu-drivers list
offers driver 550 in the provided list, you can perform a sudo ubuntu-drivers install nvidia:550
and ignore the rest of this section.
Otherwise, we will follow the method listed in the “Manual driver installation (using APT)” section.
apt list --installed | grep nvidia | grep modules
Among the options presented, the linux-modules-nvidia-535-generic-hwe-24.04
matches the expected linux-modules-nvidia-${DRIVER_BRANCH}${SERVER}-${LINUX_FLAVOUR}
format.
apt search nvidia | grep 550 | grep modules | grep generic-hwe
server
, for example) is linux-modules-nvidia-550-generic-hwe-24.04
, which we will install:sudo apt install linux-modules-nvidia-550-generic-hwe-24.04
sudo apt-cache policy linux-modules-nvidia-550-$(uname -r)
Installed
and Candidate
versions should match. You can also test reinstallation using sudo apt install linux-modules-nvidia-550-$(uname -r)
sudo apt install nvidia-driver-550
nvidia-smi
which in this case now returns:+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
, confirming driver 550 is loaded, supporting up to CUDA 12.4 in Docker containers.
In some cases, the latest official driver provider is not recent enough to support more recent CUDA version. In such cases, it is possible to add a Personal Package Archive (PPA) from the “Graphics Drivers” team to the list of package sources. To add it:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
At this point, we can start the “Additional Drivers” application and select the driver to install (here we will select Using NVIDIA driver metapackage From nvidia-driver-560 (proprietary)
) and start the installation process. After the installation is completed, a reboot is required. At next login, we can confirm the driver and its capabilities using nvidia-smi
:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
Having finally got my hand on an RTX 5090, I added it to my Ubuntu 24.04 system without updating the driver first, which got me into a low-resolution situation.
To solve this, with the graphics-drivers/ppa
installed:
# make sure the system is up to date first
sudo apt update
sudo apt upgrade
# reboot to enable the latest kernel if one was installed
sudo reboot
# Get the list of recommended drivers
ubuntu-drivers devices
# Install the recommended driver as listed above
sudo apt install nvidia-driver-570
# Reboot to enable the new driver
sudo reboot
# If after reboot, nvidia-smi still does not work
nvidia-smi
# check the dmesg for: NVIDIA GPU [...]installed in this system requires use of the NVIDIA open kernel modules
# install the open driver instead in this case
sudo apt install nvidia-driver-570-open
# another reboot is required
sudo reboot
# clean older package not needed anymore
sudo apt autoremove
After that final reboot and confirming the screen resolution is resolved, our nvidia-smi
returns the expected result:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.04 Driver Version: 570.124.04 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 Off | 00000000:01:00.0 On | N/A |
| 0% 31C P8 34W / 600W | 344MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
We will follow the instructions to set it up using the apt
registry option.
Details can be found at https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository.
On a Ubuntu 24.04 system from a terminal, clean up potential conflicting packages:
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
Add support for GPG keys, load Docker’s key, and set up the repository:
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL <https://download.docker.com/linux/ubuntu/gpg> -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \\
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] <https://download.docker.com/linux/ubuntu> \\
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \\
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
Install the required packages:
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Confirm docker
is functional by checking if we get the Hello from Docker!
message running its hello-world
:
sudo docker run --rm hello-world
Optionally, make docker
available without sudo
, which has some security implications, as detailed in https://docs.docker.com/engine/install/linux-postinstall/.
You will need to log out entirely before the changes take effect. Once this is done, you should be able to run docker run hello-world
without the need for a sudo
:
sudo usermod -aG docker $USER
On Ubuntu 24.04, apt search podman
returns versions above 4.1.0, the minimum required to use the Container Device Interface (CDI) for nvidia-container-toolkit
.
It is, therefore, possible to install podman
by simply:
sudo apt install podman
Now we can test podman
:
podman run hello-world
podman
runs similarly to docker
; for example:
podman run --rm -it docker.io/ubuntu:24.04 /bin/bash
, will download ubuntu:24.04
, give you a bash
shell prompt in an interactive session, and will delete the created container when you exit the shell.
It is recommended that you always use a fully qualified image name, including the registry server (full DNS name), namespace, image name, and tag, such as docker.io/ubuntu:24.04
.
To add docker.io
to the list of “unqualified search registries,” edit/etc/containers/registries.conf
and modify the following line as follows: unqualified-search-registries=["docker.io"]
—more details on that topic at https://podman.io/docs/installation#registriesconf.
Contrary to docker
, podman
does not create iptables
configurations or use br_netfilter
, which allows for the use of bridged VMs. In such cases, only install podman
and also install podman-compose
to get access to podman-compose
. Usepodman-compose
if you want to use a tool like Dockge; but we also recommend seeing this PR.
For further details on what this supports, NVIDIA has an excellent primer document at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/overview.html, and detailed instructions are available at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html. On this page, you will find details on the following:
The NVIDIA Container Toolkit also supports generating a Container Device Interface (CDI) for podman
.
Setup the package repository and the GPG keys:
curl -fsSL <https://nvidia.github.io/libnvidia-container/gpgkey> | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \\
&& curl -s -L <https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list> | \\
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \\
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Install the toolkit:
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
Configure docker
to recognize the toolkit:
sudo nvidia-ctk runtime configure --runtime=docker
Restart docker
sudo systemctl restart docker
Confirm docker
(no sudo
needed if you made the optional step in the last section) sees any GPU that you have running on your system by having it run nvidia-smi
. Note that docker
will need both --runtime=nvidia
and --gpus all
to use the proper runtime and have access to all
the GPUs
docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Please be aware that the max CUDA version
returned by the nvidia-smi
command on your host (without docker
) will inform you of the max cuda:version
image that you can use.
You can inspect your /etc/docker/daemon.json
file to see that the nvidia-container-runtime
is added:
[...]
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
To make this runtime the default, add the following content to the top of the file "default-runtime": "nvidia",
(after the first {
) and sudo systemctl restart docker
. You should not have to add --runtime=nvidia
to the CLI anymore.