Ollama is one of the simplest ways to turn a Linux machine into a local LLM server without building a full AI stack from scratch. If you want to run open models on your own hardware, test them from the terminal, and later connect them to other self-hosted tools, Ollama gives you a clean starting point.
In this guide, you will install Ollama on Ubuntu, confirm the background service is healthy, download a first model, run a quick prompt, and verify the local API before you branch out into browser interfaces or automations.
Why self-host Ollama?
Most people do not self-host Ollama because they desperately need another service running on a VPS at 2 a.m. They do it because it gives them a practical way to run AI models under their own control.
That usually means:
keeping prompts and responses on infrastructure you manage
experimenting without handing every request to a third-party SaaS tool
building a reusable local model endpoint for other apps
pairing a model runtime with tools such as Open WebUI or Hermes Agent later
learning what your hardware can actually handle before you overcomplicate things
If you want a browser chat interface after the base install, this guide pairs nicely with How to Install Open WebUI with Docker Compose.
If you want to use local models from an agent workflow later, keep Install Hermes Agent on Linux and Complete Your First Setup in your back pocket too.
Is Ubuntu a good fit for Ollama?
Yes, as long as your expectations match your hardware.
Ubuntu is a good fit because:
the official Ollama installer supports Linux cleanly
systemd makes it easy to run Ollama as a background service
most self-hosting readers already use Ubuntu on a VPS, mini PC, or homelab box
later integrations usually assume a Linux-friendly environment anyway
The real limit is not Ubuntu. It is RAM, CPU, disk space, and whether you have a useful GPU.
A small server can install Ollama just fine, but that does not mean it will run every model comfortably. Smaller models can work on modest hardware, while larger models quickly turn a cheerful experiment into a waiting simulator.
What you need before you start
Have these ready before you begin:
an Ubuntu server or desktop
a user account with
sudoaccessinternet access
enough free disk space for at least one model download
a little patience while the first model downloads
You do not need Docker for this guide.
If you still need the basics for working on a Linux server first, start with How to Install Docker on Ubuntu and Run Your First Container later for container-based apps, but Ollama itself will be installed directly on Ubuntu here.
What this tutorial will do
By the end of this guide, you will have:
installed the official Ollama Linux package
confirmed the
ollamacommand workschecked the systemd service status
pulled a starter model
sent a test prompt from the terminal
verified the local HTTP API is responding
Step 1: Connect to your Ubuntu server
If you are working on a remote VPS, connect over SSH from your local machine.
ssh your-user@your-server-ip
Replace your-user with your Ubuntu username and your-server-ip with the server's IP address.
If you are installing Ollama on a local Ubuntu desktop or mini PC, you can skip the SSH step and open a terminal directly on that machine.
Step 2: Check your CPU architecture
Ollama's Linux installer supports amd64 and arm64, so it helps to confirm what kind of machine you are on before you install anything.
uname -m
Common results include:
x86_64for standard 64-bit Intel or AMD systemsaarch64orarm64for ARM-based systems
If you are on a normal cloud VPS or home server, x86_64 is the usual answer.
Step 3: Run the official Ollama installer
The official install script is the simplest supported path on Ubuntu.
curl -fsSL https://ollama.com/install.sh | sh
On current Linux installs, the script downloads the correct package for your architecture, installs the ollama binary, creates the ollama service user if needed, and configures a systemd service when systemd is available.
At the end of a healthy install, the script reports that the Ollama API is available at 127.0.0.1:11434.
Note: If the installer complains about missing
zstd, install it and rerun the command.
sudo apt update
sudo apt install zstd
Step 4: Confirm the ollama command is available
Now verify that the CLI is installed.
ollama --version
You should get a version string back instead of command not found.
If the command is missing right after installation, open a fresh shell session and run it again.
Step 5: Check the Ollama service status
If your Ubuntu machine uses systemd, the installer should have created and enabled the service for you.
sudo systemctl status ollama
A healthy result should show the service as active or running.
If it is not running yet, start it manually.
sudo systemctl start ollama
You can also confirm that it is enabled to start at boot.
sudo systemctl is-enabled ollama
This is a good checkpoint because it confirms the background runtime is alive before you download a model.
Step 6: Look at the local API response
Ollama exposes a local API on port 11434. Before you pull a model, make sure the service is actually responding.
curl http://127.0.0.1:11434/api/tags
On a fresh install, the response should usually be an empty model list rather than an error.
That is good news. An empty list means the service is reachable and simply has no downloaded models yet.
Step 7: Pull and run your first model
The official Ollama examples use gemma4, so we will use that as the first test here.
ollama run gemma4
The first run does two jobs:
it downloads the model if you do not already have it
it opens an interactive chat session in your terminal
Depending on your connection speed and hardware, this may take a while on the first pass.
Once the model is ready, type a simple prompt such as:
Give me three one-sentence ideas for a homelab dashboard.
If you get a sensible response back, Ollama is working.
Tip: If
gemma4feels too large or too slow for your machine, choose a smaller model from the Ollama library later. The install is still fine even if your first model choice turns out to be too ambitious.
Step 8: List your downloaded models
After the first successful run, check which models are available locally.
ollama ls
This should show the model you just pulled.
It is an easy way to confirm that the download completed and that Ollama knows about it.
Step 9: Exit the interactive chat cleanly
When you are done testing the terminal chat session, exit it.
/type your prompt, then press Ctrl+D to leave the interactive session/
If you only wanted to test whether the model would answer at all, that is enough for now.
Step 10: Test the API again now that a model exists
Run the tags endpoint one more time.
curl http://127.0.0.1:11434/api/tags
This time, you should see JSON that includes the model you downloaded.
That confirms the service and the local model inventory are both working.
Optional: View the service logs
If something looked weird during install or model startup, check the service logs.
journalctl -e -u ollama
This is usually the fastest place to look for:
failed service starts
permission issues
repeated crashes
hardware-related startup problems
Optional: Make Ollama useful with a web interface
Ollama works fine on its own, but many people quickly decide they want a browser UI instead of living in the terminal forever.
A common next step is pairing it with Open WebUI:
Ollama handles the model runtime
Open WebUI provides the browser-based chat interface
If that is your plan, continue with How to Install Open WebUI with Docker Compose.
Common problems and quick fixes
ollama: command not found
The installer may have finished before your current shell picked up the new binary path.
Open a new terminal session and try again:
ollama --version
The service is not running
Try starting it manually first.
sudo systemctl start ollama
Then check status again.
sudo systemctl status ollama
If it still fails, inspect the logs.
journalctl -e -u ollama
The model download takes forever
That is often just a bandwidth or model-size problem, not a broken install.
Try again later, or switch to a smaller model once you confirm the service itself is healthy.
Responses are extremely slow
Ollama can run on CPU-only systems, but speed depends heavily on your hardware and the model you picked.
If the install succeeded but inference is painfully slow, the most common fixes are:
use a smaller model
use a machine with more RAM
use a GPU-friendly setup if that fits your environment
You want to reach Ollama from another app
For apps on the same server, the local API at 127.0.0.1:11434 is often enough.
For remote access, do not casually expose the raw port to the internet and call it a day. Put it behind a properly planned access method, reverse proxy, or application-specific integration instead.
How to update Ollama
The official Linux docs say you can update Ollama by running the installer again.
curl -fsSL https://ollama.com/install.sh | sh
That is the simplest routine update path for this setup.
How to remove Ollama
If you decide you do not want Ollama on this machine anymore, stop and disable the service first.
sudo systemctl stop ollama
sudo systemctl disable ollama
Remove the service file.
sudo rm /etc/systemd/system/ollama.service
Remove the binary.
sudo rm $(which ollama)
Remove the service user and its data directory.
sudo userdel ollama
sudo groupdel ollama
sudo rm -r /usr/share/ollama
If you are removing Ollama from a production machine, double-check what model data or integrations depend on it before you start deleting files like a victorious goblin.
You are done
You now have Ollama installed on Ubuntu, running as a background service, answering terminal prompts, and exposing a local API you can build on.
From here, the most useful next steps are usually:
connect Ollama to Open WebUI for a browser chat interface
test a different model that better matches your hardware
plug it into a local workflow or agent tool
keep the API local and use it as a building block for other self-hosted apps
