20240707-ollama_openwebui.jpeg

<aside> đź’ˇ Linux hosts set up instructions for installing Ollama with the Open WebUI using Dockge, a self-hosted Docker Compose stacks management.

</aside>

Revision: 20240730-0 (init: 20240707)

This post details the installation of Ollama and the Open WebUI using Dockge for docker-compose stacks to run LLMs on your Linux NVIDIA GPU host.

Ollama is a free and open-source tool designed to simplify running large language models (LLMs) locally on a machine (preferably with a GPU). It allows users to download, run, and manage open-source LLMs on their local systems without complex setups or cloud services. Ollama supports various open-source models, including Llama 3, Mistral, Gemma.

Ollama runs models locally (the usable model size depends on the amount of memory available on the GPU), which ensures privacy and control over our queries. Because it exposes a REST API, many applications integrate it; Open WebUI is one of many others, as can be seen at https://github.com/ollama/ollama?tab=readme-ov-file#community-integrations

Open WebUI is an open-source web interface designed to work seamlessly with Ollama. It provides an intuitive graphical user interface for interacting with various AI models, with features like chat interfaces, model management, and prompt templates. This allows us to generate text, answer questions, and perform multiple language-related tasks. It helps experiment with or integrate language models into projects while maintaining control over privacy and data.

Ollama

We will use Dockge and create a new ollama stack. For details on this setup, see:

Dockge (Rev: 20251129-0)

We are using the following compose.yaml for this setup:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - 11434:11434
    volumes:
      - ./ollama:/root/.ollama
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    command: serve
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities:
                - gpu
    labels:
      - "com.centurylinklabs.watchtower.enable=true"

The serve command asks Ollama to answer API requests, making it available for other tools to use.

After starting the ollama stack, we should have access to http://127.0.0.1:11434/ answering with Ollama is running, which is what is expected: API access is enabled for other tools to use.

With an HTTPS reverse proxy available, let’s configure it to give map https://ollama.example.com/ to this HTTP resource, which will become an option for a later section of this writeup.

Obtaining some models

Here, the amount of memory of the GPU is going to be a limiting factor:

Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

From Dockge’s >_ bash window, ask ollama to pull some models. For example, ollama pull llama3:8b, which we can follow by ollama run llama3:8b and ask it a question.

Download some compatible models, as per https://ollama.com/library, while considering the memory limitation of such models for the GPU.

Each model will be downloaded locally into /opt/stacks/ollama/ollama/models directory. Looking into /opt/stacks/ollama/ollama/models/manifests/registry.ollama.ai/library/ we will see the list of models installed locally. We will see the obtained model’s options (7b, etc.) by investigating the model directory.

Since this is a container, we can also get a bash within the running container (obtain the list using docker container ls)and add more models by running docker exec -it <CONTAINERID> /bin/bash, then run ollama pull or ollama run commands. Similarly, the ollama command has some sub-commands; in particular, be aware of list and rm should you want to clean up some older downloaded models.

After downloading a model or a few, let’s setup Open WebUI to access them.

Open WebUI

Setup using the Ollama’s compose.yaml

This method adds open-webui to the already existing ollama stack.

By default, Docker Compose creates a network for services in the same compose.yaml to communicate with one another. When this is done, services end up on the same private subnetwork, and it is possible to use the service names to communicate (i.e. a service named ollama can be accessed using the ollama name).

This method will set OLLAMA_HOST to ask Ollama to listen on all available network interfaces. Because Docker containers operate within an abstracted network environment different from the host's network interfaces, containers are connected to a virtual network interface created by Docker. This interface is typically part of a bridge network, which isolates the container's network from the host's network while allowing communication between containers on the same bridge network (in this case, both services are being started within the same compose.yaml file).

Using OLLAMA_HOST=0.0.0.0:11434 in this setup, we request Ollama to answer requests beyond localhost only, allowing us to have the open_webui service talk to ollama directly.

The final compose.yaml is as follows:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - 11434:11434
    volumes:
      - ./ollama:/root/.ollama
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    command: serve
    environment:
      - OLLAMA_HOST=0.0.0.0:11434
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities:
                - gpu
    labels:
      - "com.centurylinklabs.watchtower.enable=true"

  open-webui:
    image: ghcr.io/open-webui/open-webui:cuda
    container_name: open-webui
    volumes:
      - ./open-webui:/app/backend/data
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    ports:
      - 3030:8080
    depends_on:
      - ollama
    restart: unless-stopped
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities:
                - gpu
    labels:
      - "com.centurylinklabs.watchtower.enable=true"

In addition to the open-webui service which depends_on and communicate with the ollama service, and listens on port 8080 (but we are exposing it on port 3030), we altered the ollama stack to add the environment section: the OLLAMA_HOST variable requests the service to listen to all interfaces, not just 127.0.0.1 which is only local to the ollama container.

<aside> đź’ˇ Each service in the compose.yaml file gets its own private IP on the private bridge subnet created for the stack. In a terminal, run docker network ls to see the list of private subnets created by docker compose to isolate the services from the running host. Except for exposed ports, those communications stay internal to that subnet. The stack name is ollama (which is also the directory name in /opt/stacks), ollama_default is the name of the network to inspect using docker network inspect ollama_default. In our setup, the subnet is 172.23.0.0/16 and the ollama container runs on 172.23.0.2/16 while open-webui is on 172.23.0.3/16.

</aside>

Setup in a separate compose.yaml

This setup will be done on the same host where Ollama is running, and does not require the OLLAMA_HOST variable to be set (i.e. the compose.yaml from the “Ollama” section is sufficient) but requires the use ofhost.docker.internal.

This host is a special DNS name used in Docker environments to allow containers to communicate with the host machine to access services running on the host machine's localhost(i.e., other exposed services). It resolves to the host's internal IP address within the Docker network. The host-gateway option is a reserved string used in Docker configurations to determine the host's IP address dynamically.

To use it, use the host exposed port (here also 11434), not the container port if those differ, and use two extra entries in the compose.yaml file:

environment:
  - OLLAMA_BASE_URL=http://host.docker.internal:11434

extra_hosts:
  - host.docker.internal:host-gateway

Integrating those into the open-webui stack’s compose.yaml:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:cuda
    container_name: open-webui
    volumes:
      - ./open-webui:/app/backend/data
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    ports:
      - 3030:8080
    restart: unless-stopped
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    extra_hosts:
      - host.docker.internal:host-gateway
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities:
                - gpu
    labels:
      - "com.centurylinklabs.watchtower.enable=true"

Note that we are using an alternate host port: 3030.

With this setup, there will be an open-webui directory in /opt/stacks, but the tool will only work if the ollama container has been started before the open-webui one.

With Ollama behind an HTTPS reverse proxy

With Ollama configured to answer on an HTTPS reverse proxy, such as https://ollama.example.com/ we can bypass the host.docker.internal in favor a direct access to the host. The compose.yaml looks similar to the one in the previous section, but notice the alternate OLLAMA_BASE_URL:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:cuda
    container_name: open-webui
    volumes:
      - ./open-webui:/app/backend/data
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    ports:
      - 3030:8080
    restart: unless-stopped
    environment:
      - OLLAMA_BASE_URL=https://ollama.example.com
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities:
                - gpu
    labels:
      - "com.centurylinklabs.watchtower.enable=true"

Specifying the OLLAMA_BASE_URL allows us to run the WebUI on another host than the one running the Ollama service, which, by extension, can also be defined as OLLAMA_BASE_URL=http://<IP>:<PORT> in the absence of a reverse proxy.

Using Open WebUI

Per https://docs.openwebui.com/:

Admin Creation: The first account created on Open WebUI gains Administrator privileges, controlling user management and system settings. User Registrations: Subsequent sign-ups start with Pending status, requiring Administrator approval for access. Privacy and Data Security: All your data, including login details, is locally stored on your device. Open WebUI ensures strict confidentiality and no external requests for enhanced privacy and security.

After setting up an account (the email address is only used locally), we can investigate some of the tool's features and ask questions about the different models we have downloaded for Ollama.

Enabling WebSearch

download.png

When asking WebSearch in OpenWebui about itself:

"WebSearch" in Open WebUI is a powerful feature that lets you perform searches directly through the interface using various search engines.

Here's what it provides:

There are many backend possible for it, their set up can be seen from https://docs.openwebui.com/tutorial/web_search

Most of those can be enabled from the WebUI, as an admin user:

We will not propose to discuss many of the available options are those are an end-user choice, and will note that DuckDuckGo is an excellent privacy conscious option and can be enabled easily from the UI.

Adding a SearXNG stack

SearXNG is a free and open-source metasearch engine (a search engine that searches other search engines); we will install it as it own stack for Dockge (similar to “Setup in a separate compose.yaml” above), using the host.docker.internal:host-gateway method (it is easy to change this to a reverse proxy url when available)

First, from the Dockge UI, “+ Compose” a new stack named searxng and just “Save” it; before using it, we need to populate the directory with a folder and three files that will be obtained from Open WebUI’s SearXNG WebSearch documentation at https://docs.openwebui.com/tutorial/web_search#searxng-docker

# /opt/stacks is not readable by the default user, we need to become root (temporarily)
sudo su

cd /opt/stacks/searxng
mkdir searxng

nano searxng/settings.yml
# fill in the content of the file from the documentation
# feel free to modify the secret_key value

nano searxng/limiter.toml
# fill in the content of the file from the documentation

nano searxng/uwsgi.ini
# fill in the content of the file from the documentation

After exiting the root shell, from the Dockge WebUI, “Edit” the searxng stack and use the following for its compose.yaml:

services:
  searxng:
    image: searxng/searxng:latest
    container_name: searxng
    ports:
      - 8234:8080
    volumes:
      - ./searxng:/etc/searxng
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    restart: always
    labels:
      - "com.centurylinklabs.watchtower.enable=true"

After performing a “Save”, it is fine to “Start” the stack. Note that we have modified the exposed port to be 8234.

“Stop” then “Edit” the open-webui stack’s compose.yml:

    extra_hosts:
      - host.docker.internal:host-gateway
      - ENABLE_RAG_WEB_SEARCH=true
      - RAG_WEB_SEARCH_ENGINE=searxng
      - RAG_WEB_SEARCH_RESULT_COUNT=3
      - RAG_WEB_SEARCH_CONCURRENT_REQUESTS=10
      - SEARXNG_QUERY_URL=http://host.docker.internal:8234/search?q=<query>

“Save” and “Start” it again.

After going to the Open WebUI page, as an admin, go to the “WebSearch” “Setting” in the “Admin Panel” again and select “searxng” as the “engine” and use http://host.docker.internal:8234/search?q=<query> for the “Query URL”.

After a “Save”, test in a chat by enabling the “Web Search” option.

Further reading


Untitled

Untitled