
<aside> đź’ˇ Linux hosts set up instructions for installing Ollama with the Open WebUI using Dockge, a self-hosted Docker Compose stacks management.
</aside>
Revision: 20240730-0 (init: 20240707)
This post details the installation of Ollama and the Open WebUI using Dockge for docker-compose stacks to run LLMs on your Linux NVIDIA GPU host.
Ollama is a free and open-source tool designed to simplify running large language models (LLMs) locally on a machine (preferably with a GPU). It allows users to download, run, and manage open-source LLMs on their local systems without complex setups or cloud services. Ollama supports various open-source models, including Llama 3, Mistral, Gemma.
Ollama runs models locally (the usable model size depends on the amount of memory available on the GPU), which ensures privacy and control over our queries. Because it exposes a REST API, many applications integrate it; Open WebUI is one of many others, as can be seen at https://github.com/ollama/ollama?tab=readme-ov-file#community-integrations
Open WebUI is an open-source web interface designed to work seamlessly with Ollama. It provides an intuitive graphical user interface for interacting with various AI models, with features like chat interfaces, model management, and prompt templates. This allows us to generate text, answer questions, and perform multiple language-related tasks. It helps experiment with or integrate language models into projects while maintaining control over privacy and data.
We will use Dockge and create a new ollama stack. For details on this setup, see:
We are using the following compose.yaml for this setup:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- 11434:11434
volumes:
- ./ollama:/root/.ollama
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
command: serve
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
labels:
- "com.centurylinklabs.watchtower.enable=true"
The serve command asks Ollama to answer API requests, making it available for other tools to use.
After starting the ollama stack, we should have access to http://127.0.0.1:11434/ answering with Ollama is running, which is what is expected: API access is enabled for other tools to use.
With an HTTPS reverse proxy available, let’s configure it to give map https://ollama.example.com/ to this HTTP resource, which will become an option for a later section of this writeup.
Here, the amount of memory of the GPU is going to be a limiting factor:
Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
From Dockge’s >_ bash window, ask ollama to pull some models. For example, ollama pull llama3:8b, which we can follow by ollama run llama3:8b and ask it a question.
Download some compatible models, as per https://ollama.com/library, while considering the memory limitation of such models for the GPU.
Each model will be downloaded locally into /opt/stacks/ollama/ollama/models directory. Looking into /opt/stacks/ollama/ollama/models/manifests/registry.ollama.ai/library/ we will see the list of models installed locally. We will see the obtained model’s options (7b, etc.) by investigating the model directory.
Since this is a container, we can also get a bash within the running container (obtain the list using docker container ls)and add more models by running docker exec -it <CONTAINERID> /bin/bash, then run ollama pull or ollama run commands. Similarly, the ollama command has some sub-commands; in particular, be aware of list and rm should you want to clean up some older downloaded models.
After downloading a model or a few, let’s setup Open WebUI to access them.
This method adds open-webui to the already existing ollama stack.
By default, Docker Compose creates a network for services in the same compose.yaml to communicate with one another. When this is done, services end up on the same private subnetwork, and it is possible to use the service names to communicate (i.e. a service named ollama can be accessed using the ollama name).
This method will set OLLAMA_HOST to ask Ollama to listen on all available network interfaces. Because Docker containers operate within an abstracted network environment different from the host's network interfaces, containers are connected to a virtual network interface created by Docker. This interface is typically part of a bridge network, which isolates the container's network from the host's network while allowing communication between containers on the same bridge network (in this case, both services are being started within the same compose.yaml file).
Using OLLAMA_HOST=0.0.0.0:11434 in this setup, we request Ollama to answer requests beyond localhost only, allowing us to have the open_webui service talk to ollama directly.
The final compose.yaml is as follows:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- 11434:11434
volumes:
- ./ollama:/root/.ollama
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
command: serve
environment:
- OLLAMA_HOST=0.0.0.0:11434
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
labels:
- "com.centurylinklabs.watchtower.enable=true"
open-webui:
image: ghcr.io/open-webui/open-webui:cuda
container_name: open-webui
volumes:
- ./open-webui:/app/backend/data
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
ports:
- 3030:8080
depends_on:
- ollama
restart: unless-stopped
environment:
- OLLAMA_BASE_URL=http://ollama:11434
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
labels:
- "com.centurylinklabs.watchtower.enable=true"
In addition to the open-webui service which depends_on and communicate with the ollama service, and listens on port 8080 (but we are exposing it on port 3030), we altered the ollama stack to add the environment section: the OLLAMA_HOST variable requests the service to listen to all interfaces, not just 127.0.0.1 which is only local to the ollama container.
<aside>
đź’ˇ Each service in the compose.yaml file gets its own private IP on the private bridge subnet created for the stack. In a terminal, run docker network ls to see the list of private subnets created by docker compose to isolate the services from the running host. Except for exposed ports, those communications stay internal to that subnet. The stack name is ollama (which is also the directory name in /opt/stacks), ollama_default is the name of the network to inspect using docker network inspect ollama_default. In our setup, the subnet is 172.23.0.0/16 and the ollama container runs on 172.23.0.2/16 while open-webui is on 172.23.0.3/16.
</aside>
This setup will be done on the same host where Ollama is running, and does not require the OLLAMA_HOST variable to be set (i.e. the compose.yaml from the “Ollama” section is sufficient) but requires the use ofhost.docker.internal.
This host is a special DNS name used in Docker environments to allow containers to communicate with the host machine to access services running on the host machine's localhost(i.e., other exposed services). It resolves to the host's internal IP address within the Docker network. The host-gateway option is a reserved string used in Docker configurations to determine the host's IP address dynamically.
To use it, use the host exposed port (here also 11434), not the container port if those differ, and use two extra entries in the compose.yaml file:
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
extra_hosts:
- host.docker.internal:host-gateway
Integrating those into the open-webui stack’s compose.yaml:
services:
open-webui:
image: ghcr.io/open-webui/open-webui:cuda
container_name: open-webui
volumes:
- ./open-webui:/app/backend/data
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
ports:
- 3030:8080
restart: unless-stopped
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
extra_hosts:
- host.docker.internal:host-gateway
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
labels:
- "com.centurylinklabs.watchtower.enable=true"
Note that we are using an alternate host port: 3030.
With this setup, there will be an open-webui directory in /opt/stacks, but the tool will only work if the ollama container has been started before the open-webui one.
With Ollama configured to answer on an HTTPS reverse proxy, such as https://ollama.example.com/ we can bypass the host.docker.internal in favor a direct access to the host. The compose.yaml looks similar to the one in the previous section, but notice the alternate OLLAMA_BASE_URL:
services:
open-webui:
image: ghcr.io/open-webui/open-webui:cuda
container_name: open-webui
volumes:
- ./open-webui:/app/backend/data
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
ports:
- 3030:8080
restart: unless-stopped
environment:
- OLLAMA_BASE_URL=https://ollama.example.com
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
labels:
- "com.centurylinklabs.watchtower.enable=true"
Specifying the OLLAMA_BASE_URL allows us to run the WebUI on another host than the one running the Ollama service, which, by extension, can also be defined as OLLAMA_BASE_URL=http://<IP>:<PORT> in the absence of a reverse proxy.
Per https://docs.openwebui.com/:
Admin Creation: The first account created on Open WebUI gains Administrator privileges, controlling user management and system settings. User Registrations: Subsequent sign-ups start with Pending status, requiring Administrator approval for access. Privacy and Data Security: All your data, including login details, is locally stored on your device. Open WebUI ensures strict confidentiality and no external requests for enhanced privacy and security.
After setting up an account (the email address is only used locally), we can investigate some of the tool's features and ask questions about the different models we have downloaded for Ollama.

When asking WebSearch in OpenWebui about itself:
"WebSearch" in Open WebUI is a powerful feature that lets you perform searches directly through the interface using various search engines.
Here's what it provides:
- Integrated Search: No need to leave Open WebUI to use a search engine. Just type your query into the search bar and get results from multiple sources.
- Multiple Engines: Open WebUI supports various search engines like Google, DuckDuckGo, Bing, and others. You can choose your preferred engine or switch between them easily.
- Direct Results: WebSearch pulls results directly from the chosen search engine, so you get the same information you'd find on their websites.
- Relevance and Ranking: Open WebUI aims to present results in a relevant and well-ranked order, making it easier to find what you're looking for.
There are many backend possible for it, their set up can be seen from https://docs.openwebui.com/tutorial/web_search
Most of those can be enabled from the WebUI, as an admin user:
We will not propose to discuss many of the available options are those are an end-user choice, and will note that DuckDuckGo is an excellent privacy conscious option and can be enabled easily from the UI.
SearXNG is a free and open-source metasearch engine (a search engine that searches other search engines); we will install it as it own stack for Dockge (similar to “Setup in a separate compose.yaml” above), using the host.docker.internal:host-gateway method (it is easy to change this to a reverse proxy url when available)
First, from the Dockge UI, “+ Compose” a new stack named searxng and just “Save” it; before using it, we need to populate the directory with a folder and three files that will be obtained from Open WebUI’s SearXNG WebSearch documentation at https://docs.openwebui.com/tutorial/web_search#searxng-docker
# /opt/stacks is not readable by the default user, we need to become root (temporarily)
sudo su
cd /opt/stacks/searxng
mkdir searxng
nano searxng/settings.yml
# fill in the content of the file from the documentation
# feel free to modify the secret_key value
nano searxng/limiter.toml
# fill in the content of the file from the documentation
nano searxng/uwsgi.ini
# fill in the content of the file from the documentation
After exiting the root shell, from the Dockge WebUI, “Edit” the searxng stack and use the following for its compose.yaml:
services:
searxng:
image: searxng/searxng:latest
container_name: searxng
ports:
- 8234:8080
volumes:
- ./searxng:/etc/searxng
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
restart: always
labels:
- "com.centurylinklabs.watchtower.enable=true"
After performing a “Save”, it is fine to “Start” the stack. Note that we have modified the exposed port to be 8234.
“Stop” then “Edit” the open-webui stack’s compose.yml:
extra_hosts:
- host.docker.internal:host-gateway
environment: section, add: - ENABLE_RAG_WEB_SEARCH=true
- RAG_WEB_SEARCH_ENGINE=searxng
- RAG_WEB_SEARCH_RESULT_COUNT=3
- RAG_WEB_SEARCH_CONCURRENT_REQUESTS=10
- SEARXNG_QUERY_URL=http://host.docker.internal:8234/search?q=<query>
“Save” and “Start” it again.
After going to the Open WebUI page, as an admin, go to the “WebSearch” “Setting” in the “Admin Panel” again and select “searxng” as the “engine” and use http://host.docker.internal:8234/search?q=<query> for the “Query URL”.
After a “Save”, test in a chat by enabling the “Web Search” option.