Tag: Open-WebUI

A Practical Local LLM Environment for Developers Using Ollama, Open Web UI, and Cody

Llama developing software

Over my last few posts I have talked about an approach to running large language models locally 100% on docker. I chose this approach for a couple of reasons:

  • It is simple to get up and running. Given the docker-compose file you really don’t need much knowledge to get things up and running and play around.
  • Everything is pretty much encapsulated in that single docker-compose file and things are fairly isolated, so the chances of success are greatly improved.
  • It’s simple to clean up/remove everything, either because you want to start over fresh, or you are just done and want to clean up your system.

All that means it makes for a great playground and a way to dip your toes in the water so to speak. That said, as I began to try to use the system day to day I ran into some rough points that just really made it impractical and unusable for real daily work. The biggest issue I experienced was that it would take nearly 2 minutes to load the model initially. If that only happened once per day, it would have been non-ideal, but bearable. However, what I found in practice was that if I didn’t use the LLM for more than about 5 minutes, it would unload and I would again have to wait 2 minutes for it to load the next time I went to use it. What that means in real terms is that since I often use it once or twice and then not again for a few minutes or more that I really ended up waiting nearly 2 minutes very often when attempting to use the system.

To solve this issue I have adapted things a bit to run Ollama “on-the-metal”, installing it directly on windows and running Open Web UI as a docker container. I’m not sure where the bottleneck is running this in Docker, but this appears to completely resolve the issue and now that Ollama for Windows appears to be available/working properly it is not substantially harder to get it all running.

The New Set Up

I am running this on Windows, but there are installers for Ollama for Windows, Mac and Linux, so it should in theory work for any desktop platform you are using. You “should” simply need to use the appropriate installer/installation steps for your platform.

Ollama Set Up

To install Ollama on Windows simply download the installer from ollama.com/download and run the installer. Once installed you should have a llama icon in your system tray and if you navigate to http://localhost:11434/ you should be greeted with the message “Ollama is running”.

At this point you can use Ollama from the command line. Try the following commands to pull a model and run an interactive session in your terminal.

ollama pull llama3.1

ollama run llama3.1

Note: If your terminal was open before installing Ollama you may need to start a new session to ensure Ollama is on the path in your session before executing these commands.

Open Web UI Set Up

I am setting up/starting Open Web UI using a docker compose file that is really just a stripped down version of the docker compose file from my previous posts.

services:
  openWebUI:
    image: ghcr.io/open-webui/open-webui:main
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      OLLAMA_BASE_URL: http://host.docker.internal:11434
    volumes:
      - c:\docker\data\owu:/app/backend/data

Note: Notice the last line of the compose file has a volume mapping. This maps the local path C:\docker\data\owu into the container at /app/backend/data. This allows your configuration data to persist between runs of the Open Web UI container (for example after a system re-boot). You can make this directory anything you would like that exists on your system.

Simply copy this file into a directory on your system and from a terminal, in that directory, run the following command to start the container.

docker compose up -d

It will take a minute or so for the container to come up and initialize, but once it does the web UI should be available at the URL: http://localhost:3000.

The first time you visit you will need to create an account (this is a local account, nothing shared externally) which will become an administrator of your instance. Once logged in you should already be pointed to the correct Ollama URL and assuming you downloaded a model earlier and tested Ollama in the terminal you should already be pointed to that model and ready to go.

If you did not download a model earlier, or want to add additional models you can do that in the admin settings by clicking your username in the sidebar (on the bottom left) and selecting Admin Panel.

Then select “Settings” and then “Connections” in the main UI.

Then click the wrench Icon next to your Ollama API Connection

That should bring up the “Manage Ollama” dialog

From here simply type in the name of a new model into the “Pull a model from Ollama.com” field and click the download button. The model will begin downloading and once completed and validated (You should see a green alert message in the UI once validated) you can go back to your chat and select the new model from the drop down near the top left corner of the UI. You can find a list of models that are available for download @ https://ollama.com/search

At this point you should have a functioning LLM running locally with access via web UI. Next well bring AI support directly into Visual Studio Code.

Cody Visual Studio Code Plugin Set Up

To install and configure Cody in Visual Studio Code you will want to open the Extensions MarketPlace and search for “Cody” and find the extension “Cody: AI Coding Assistant with Autocomplete & Chat”.

Install Cody extension

Click “Install” and wait for the Getting Started With Cody screen to appear once completed. Once you are done reviewing this screen you can close it and go back to your code (or open some code to test things out).

You can use Cody to work with any supported type of code, in my case I’ll take a look at some old C# Code. To get started you will need to log in to Cody/SourceGraph by clicking the Coy icon on the left sidebar and choosing a method to authenticate. I’ll be using the free version of Cody and authenticating using GitHub.

At this point you should see a new side panel that will allow you to interact with Cody to perform an number of AI assisted actions including, documenting existing code, explaining code (great for new code bases), finding code smells, generating unit tests, and even generating new code. This is however currently using external services to provide the AI assistance. To get it pointed locally follow these steps

  • Click on the gear icon in the bottom left corner of VS Code
  • Select “Settings”.
  • In the search box near the top of the page type “Cody autocomplete”
  • Under Cody > Autocomplete > Advanced: Provider select the option experimental-ollama

That should set up your auto completion in the editor to use your local LLM. Additionally, and more importantly for me and how I tend to use Cody so far, in the Cody Chat dialog you should see something like this:

Cody Chat provider Selection

Under the prompt input field you will see a drop down with all the available LLMs. That list should include a section labelled Ollama (Local models). Select one of these models to run your prompts against your local LLM.

My experience so far with Cody has been mostly positive, but somewhat limited (I haven’t used it much yet). I’ve had really good luck using it to explain and document existing code. It generated some useful unit tests, and did a good job implementing a few entirely new classes to implement existing interfaces in my code.

Local Chat AI With Ollama – Update

Llama writing code with tools on the desk indicating he is fixing something.

Having recently wiped and reset my workstation, I found myself following my own guide for “Setting Up a Local Chat AI with Ollama and Open Web UI” and ran into a little hick up because the UI for Open Web UI appears to have been updated. Probably a good thing overall, but they moved some of the settings I used to configure things in my post and I has a devil of a time finding their new home, so I though I would give a quick update.

Specifically, the location in settings to add new models has changed. To add a new model in the latest version of Open Web UI open Admin Panel by clicking your username in the sidebar (on the bottom left) and selecting Admin Panel.

Then select “Settings” and then “Connections” in the main UI.

Then click the wrench Icon next to your Ollama API Connection

That should bring up the “Manage Ollama” dialog

From here you proceed the same as before and enter the name of a model you wish to install and click the download button to get the process running.

Setting Up a Local Chat AI with Ollama and Open Web UI

As a software developer and architect, I’m always excited to explore new technologies that can revolutionize the way we interact with computers. AI is taking the technology world by storm, and for good reason, it can be a very powerful tool. Sometimes however, using a public service like ChatGPT or Microsoft’s Co-Pilot doesn’t work for a number of reasons (usually privacy related).

In this article, I’ll guide you through setting up a chat AI using Ollama and Open Web UI that you can quickly and easily run on your local, Windows based machine . We’ll use Docker Compose to spin up the environment, and I’ll walk you through the initial launch of the web UI, configuring models in settings, and generally getting things up and running.

Prerequisites

Before we dive into the setup process, make sure you have:

  • Docker installed on your machine (you can download it from the official Docker website)
  • A basic understanding of Docker Compose and its syntax. (Not necessarily required, but helpful if you run into issues or want to tweak things)
  • A compatible graphics card (GPU) to run Ollama efficiently. While this is not strictly required, your experience will not be very good without one. My example is configured to use an Nvidia graphics card.

Step 1: Create a Docker Compose File

Create a new file named docker-compose.yml in a directory of your choice. Copy the following content into this file:

services:
  openWebUI:
    image: ghcr.io/open-webui/open-webui:main
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      OLLAMA_BASE_URL: http://host.docker.internal:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - c:\tmp\owu:/app/backend/data

  ollama:
    image: ollama/ollama:latest    
    environment:
      NVIDIA_VISIBLE_DEVICES: all
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: ["gpu"]
              driver: nvidia
              count: all    
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - c:\tmp\ollama:/root/.ollama

This Docker Compose file defines two services:

  • openWebUI: runs the Open Web UI container with the necessary environment variables and port mappings.
  • ollama: runs the Ollama container with NVIDIA GPU support, exposes a port for communication between the containers, and mounts a volume to store model data.
A couple things to note here: 
Lines 12 and 29 are mapping local directories into the containers as volumes. That allows the data from your sessions to persist, even if the docker images are restarted or your machine is rebooted. C:\tmp\ollama and C:\tmp\owu can be changed to any empty directory you choose, but remember that in the following steps if you choose to change them.

Lines 16-24 configure the container to take advantage of your Nvidia GPU. If you don't have one, or don't want to use it, you can remove these lines and everything should still work, albeit much slower. If you have another GPU, this is where you will want to look into making changes to use your GPU - In particular lines 17 & 23 will likely need to change.

Lines 4 & 25 configure the containers to automatically restart unless they were manually stopped. That means they should restart if you reboot your machine, they crash, or you update docker and it restarts.

Step 2: Create Data Directories

As mentioned above, the directories C:\tmp\ollama and C:\tmp\owu will be mapped into the running containers and used for data storage. You will want to create these directories ahead of launching the containers to avoid any potential issues.

Step 3: Launch the Environment

Open your terminal or command prompt and navigate to the directory where you saved the docker-compose.yml file. Run the following command:

docker-compose up -d

This will start both containers in detached mode (i.e., they’ll run in the background).

Step 4: Launch the Web UI

Once the environment is set up, navigate to http://localhost:3000 in your web browser. This should open the Open Web UI interface.

You should be presented with a screen to log in, but you won’t have an account yet. Just click the “Don’t have an account? Sign up” link under the “Sign in” button. Since this will be the first account, it will automatically become the administrator. Simply enter your name, email address and a password and create your account.

Once you account is created you will be logged in and should now she the Chat Interface, which should look pretty familiar if you have been using ChatGPT.

Step 5: Setting/Adding up Models

Note: This has been changed in the latest version of the Open Web UI interface. There is a new post : “Local Chat AI With Ollama – Update” which outlines how to achieve this in the new version of the user interface.

Before you can start chatting it up with your new application, you’ll need to install some models to use. To get started I would install the llama3.1 model. To do this click on your name in the lower left corner of the UI, select “Settings” and then in the dialog, select “Admin Settings” on the left.

Now select “Models” in the Admin Panel and enter llama3.1 in the “Pull a model from Ollama.com” field and click the download button on the right of the field. (You can see a list of available models on Ollama.com: https://ollama.com/library)

You should see a small progress bar appear. Wait for the progress bar to get to 100%, then it will verify the hash and eventually you should see a green pop-up notifying you that it has successfully been added. Now you can click “New Chat” on the far top-left and select llama3.1 from the “Select a model” dropdown.

Next Steps

At this point you should now have a functioning Chat AI interface!

Going forward, I’ll be playing with this configuration and attempting to add on more functionality and potentially convert the docker compose above into kubernetes manifests so that I can run my service on a local kind cluster.

Resources

Source Code

https://github.com/DotNet-Ninja/DotNetNinja.Docker.Ollama