Over my last few posts I have talked about an approach to running large language models locally 100% on docker. I chose this approach for a couple of reasons:
- It is simple to get up and running. Given the docker-compose file you really don’t need much knowledge to get things up and running and play around.
- Everything is pretty much encapsulated in that single docker-compose file and things are fairly isolated, so the chances of success are greatly improved.
- It’s simple to clean up/remove everything, either because you want to start over fresh, or you are just done and want to clean up your system.
All that means it makes for a great playground and a way to dip your toes in the water so to speak. That said, as I began to try to use the system day to day I ran into some rough points that just really made it impractical and unusable for real daily work. The biggest issue I experienced was that it would take nearly 2 minutes to load the model initially. If that only happened once per day, it would have been non-ideal, but bearable. However, what I found in practice was that if I didn’t use the LLM for more than about 5 minutes, it would unload and I would again have to wait 2 minutes for it to load the next time I went to use it. What that means in real terms is that since I often use it once or twice and then not again for a few minutes or more that I really ended up waiting nearly 2 minutes very often when attempting to use the system.
To solve this issue I have adapted things a bit to run Ollama “on-the-metal”, installing it directly on windows and running Open Web UI as a docker container. I’m not sure where the bottleneck is running this in Docker, but this appears to completely resolve the issue and now that Ollama for Windows appears to be available/working properly it is not substantially harder to get it all running.
The New Set Up
I am running this on Windows, but there are installers for Ollama for Windows, Mac and Linux, so it should in theory work for any desktop platform you are using. You “should” simply need to use the appropriate installer/installation steps for your platform.
Ollama Set Up
To install Ollama on Windows simply download the installer from ollama.com/download and run the installer. Once installed you should have a llama icon in your system tray and if you navigate to http://localhost:11434/ you should be greeted with the message “Ollama is running”.
At this point you can use Ollama from the command line. Try the following commands to pull a model and run an interactive session in your terminal.
ollama pull llama3.1 ollama run llama3.1
Note: If your terminal was open before installing Ollama you may need to start a new session to ensure Ollama is on the path in your session before executing these commands.
Open Web UI Set Up
I am setting up/starting Open Web UI using a docker compose file that is really just a stripped down version of the docker compose file from my previous posts.
services: openWebUI: image: ghcr.io/open-webui/open-webui:main restart: unless-stopped ports: - "3000:8080" environment: OLLAMA_BASE_URL: http://host.docker.internal:11434 volumes: - c:\docker\data\owu:/app/backend/data
Note: Notice the last line of the compose file has a volume mapping. This maps the local path C:\docker\data\owu into the container at /app/backend/data. This allows your configuration data to persist between runs of the Open Web UI container (for example after a system re-boot). You can make this directory anything you would like that exists on your system.
Simply copy this file into a directory on your system and from a terminal, in that directory, run the following command to start the container.
docker compose up -d
It will take a minute or so for the container to come up and initialize, but once it does the web UI should be available at the URL: http://localhost:3000.
The first time you visit you will need to create an account (this is a local account, nothing shared externally) which will become an administrator of your instance. Once logged in you should already be pointed to the correct Ollama URL and assuming you downloaded a model earlier and tested Ollama in the terminal you should already be pointed to that model and ready to go.
If you did not download a model earlier, or want to add additional models you can do that in the admin settings by clicking your username in the sidebar (on the bottom left) and selecting Admin Panel.
Then select “Settings” and then “Connections” in the main UI.
Then click the wrench Icon next to your Ollama API Connection
That should bring up the “Manage Ollama” dialog
From here simply type in the name of a new model into the “Pull a model from Ollama.com” field and click the download button. The model will begin downloading and once completed and validated (You should see a green alert message in the UI once validated) you can go back to your chat and select the new model from the drop down near the top left corner of the UI. You can find a list of models that are available for download @ https://ollama.com/search
At this point you should have a functioning LLM running locally with access via web UI. Next well bring AI support directly into Visual Studio Code.
Cody Visual Studio Code Plugin Set Up
To install and configure Cody in Visual Studio Code you will want to open the Extensions MarketPlace and search for “Cody” and find the extension “Cody: AI Coding Assistant with Autocomplete & Chat”.
Click “Install” and wait for the Getting Started With Cody screen to appear once completed. Once you are done reviewing this screen you can close it and go back to your code (or open some code to test things out).
You can use Cody to work with any supported type of code, in my case I’ll take a look at some old C# Code. To get started you will need to log in to Cody/SourceGraph by clicking the Coy icon on the left sidebar and choosing a method to authenticate. I’ll be using the free version of Cody and authenticating using GitHub.
At this point you should see a new side panel that will allow you to interact with Cody to perform an number of AI assisted actions including, documenting existing code, explaining code (great for new code bases), finding code smells, generating unit tests, and even generating new code. This is however currently using external services to provide the AI assistance. To get it pointed locally follow these steps
- Click on the gear icon in the bottom left corner of VS Code
- Select “Settings”.
- In the search box near the top of the page type “Cody autocomplete”
- Under Cody > Autocomplete > Advanced: Provider select the option experimental-ollama
That should set up your auto completion in the editor to use your local LLM. Additionally, and more importantly for me and how I tend to use Cody so far, in the Cody Chat dialog you should see something like this:
Under the prompt input field you will see a drop down with all the available LLMs. That list should include a section labelled Ollama (Local models). Select one of these models to run your prompts against your local LLM.
My experience so far with Cody has been mostly positive, but somewhat limited (I haven’t used it much yet). I’ve had really good luck using it to explain and document existing code. It generated some useful unit tests, and did a good job implementing a few entirely new classes to implement existing interfaces in my code.