How to Install Ollama on Ubuntu and Run Your First Local LLM

Ollama is one of the simplest ways to turn a Linux machine into a local LLM server without building a full AI stack from scratch. If you want to run open models on your own hardware, test them from the terminal, and later connect them to other self-hosted tools, Ollama gives you a clean starting point.

In this guide, you will install Ollama on Ubuntu, confirm the background service is healthy, download a first model, run a quick prompt, and verify the local API before you branch out into browser interfaces or automations.

Why self-host Ollama?

Most people do not self-host Ollama because they desperately need another service running on a VPS at 2 a.m. They do it because it gives them a practical way to run AI models under their own control.

That usually means:

  • keeping prompts and responses on infrastructure you manage

  • experimenting without handing every request to a third-party SaaS tool

  • building a reusable local model endpoint for other apps

  • pairing a model runtime with tools such as Open WebUI or Hermes Agent later

  • learning what your hardware can actually handle before you overcomplicate things

If you want a browser chat interface after the base install, this guide pairs nicely with How to Install Open WebUI with Docker Compose.

If you want to use local models from an agent workflow later, keep Install Hermes Agent on Linux and Complete Your First Setup in your back pocket too.

Is Ubuntu a good fit for Ollama?

Yes, as long as your expectations match your hardware.

Ubuntu is a good fit because:

  • the official Ollama installer supports Linux cleanly

  • systemd makes it easy to run Ollama as a background service

  • most self-hosting readers already use Ubuntu on a VPS, mini PC, or homelab box

  • later integrations usually assume a Linux-friendly environment anyway

The real limit is not Ubuntu. It is RAM, CPU, disk space, and whether you have a useful GPU.

A small server can install Ollama just fine, but that does not mean it will run every model comfortably. Smaller models can work on modest hardware, while larger models quickly turn a cheerful experiment into a waiting simulator.

What you need before you start

Have these ready before you begin:

  • an Ubuntu server or desktop

  • a user account with sudo access

  • internet access

  • enough free disk space for at least one model download

  • a little patience while the first model downloads

You do not need Docker for this guide.

If you still need the basics for working on a Linux server first, start with How to Install Docker on Ubuntu and Run Your First Container later for container-based apps, but Ollama itself will be installed directly on Ubuntu here.

What this tutorial will do

By the end of this guide, you will have:

  • installed the official Ollama Linux package

  • confirmed the ollama command works

  • checked the systemd service status

  • pulled a starter model

  • sent a test prompt from the terminal

  • verified the local HTTP API is responding

Step 1: Connect to your Ubuntu server

If you are working on a remote VPS, connect over SSH from your local machine.

ssh your-user@your-server-ip

Replace your-user with your Ubuntu username and your-server-ip with the server's IP address.

If you are installing Ollama on a local Ubuntu desktop or mini PC, you can skip the SSH step and open a terminal directly on that machine.

Step 2: Check your CPU architecture

Ollama's Linux installer supports amd64 and arm64, so it helps to confirm what kind of machine you are on before you install anything.

uname -m

Common results include:

  • x86_64 for standard 64-bit Intel or AMD systems

  • aarch64 or arm64 for ARM-based systems

If you are on a normal cloud VPS or home server, x86_64 is the usual answer.

Step 3: Run the official Ollama installer

The official install script is the simplest supported path on Ubuntu.

curl -fsSL https://ollama.com/install.sh | sh

On current Linux installs, the script downloads the correct package for your architecture, installs the ollama binary, creates the ollama service user if needed, and configures a systemd service when systemd is available.

At the end of a healthy install, the script reports that the Ollama API is available at 127.0.0.1:11434.

Note: If the installer complains about missing zstd, install it and rerun the command.

sudo apt update
sudo apt install zstd

Step 4: Confirm the ollama command is available

Now verify that the CLI is installed.

ollama --version

You should get a version string back instead of command not found.

If the command is missing right after installation, open a fresh shell session and run it again.

Step 5: Check the Ollama service status

If your Ubuntu machine uses systemd, the installer should have created and enabled the service for you.

sudo systemctl status ollama

A healthy result should show the service as active or running.

If it is not running yet, start it manually.

sudo systemctl start ollama

You can also confirm that it is enabled to start at boot.

sudo systemctl is-enabled ollama

This is a good checkpoint because it confirms the background runtime is alive before you download a model.

Step 6: Look at the local API response

Ollama exposes a local API on port 11434. Before you pull a model, make sure the service is actually responding.

curl http://127.0.0.1:11434/api/tags

On a fresh install, the response should usually be an empty model list rather than an error.

That is good news. An empty list means the service is reachable and simply has no downloaded models yet.

Step 7: Pull and run your first model

The official Ollama examples use gemma4, so we will use that as the first test here.

ollama run gemma4

The first run does two jobs:

  • it downloads the model if you do not already have it

  • it opens an interactive chat session in your terminal

Depending on your connection speed and hardware, this may take a while on the first pass.

Once the model is ready, type a simple prompt such as:

Give me three one-sentence ideas for a homelab dashboard.

If you get a sensible response back, Ollama is working.

Tip: If gemma4 feels too large or too slow for your machine, choose a smaller model from the Ollama library later. The install is still fine even if your first model choice turns out to be too ambitious.

Step 8: List your downloaded models

After the first successful run, check which models are available locally.

ollama ls

This should show the model you just pulled.

It is an easy way to confirm that the download completed and that Ollama knows about it.

Step 9: Exit the interactive chat cleanly

When you are done testing the terminal chat session, exit it.

/type your prompt, then press Ctrl+D to leave the interactive session/

If you only wanted to test whether the model would answer at all, that is enough for now.

Step 10: Test the API again now that a model exists

Run the tags endpoint one more time.

curl http://127.0.0.1:11434/api/tags

This time, you should see JSON that includes the model you downloaded.

That confirms the service and the local model inventory are both working.

Optional: View the service logs

If something looked weird during install or model startup, check the service logs.

journalctl -e -u ollama

This is usually the fastest place to look for:

  • failed service starts

  • permission issues

  • repeated crashes

  • hardware-related startup problems

Optional: Make Ollama useful with a web interface

Ollama works fine on its own, but many people quickly decide they want a browser UI instead of living in the terminal forever.

A common next step is pairing it with Open WebUI:

  • Ollama handles the model runtime

  • Open WebUI provides the browser-based chat interface

If that is your plan, continue with How to Install Open WebUI with Docker Compose.

Common problems and quick fixes

ollama: command not found

The installer may have finished before your current shell picked up the new binary path.

Open a new terminal session and try again:

ollama --version

The service is not running

Try starting it manually first.

sudo systemctl start ollama

Then check status again.

sudo systemctl status ollama

If it still fails, inspect the logs.

journalctl -e -u ollama

The model download takes forever

That is often just a bandwidth or model-size problem, not a broken install.

Try again later, or switch to a smaller model once you confirm the service itself is healthy.

Responses are extremely slow

Ollama can run on CPU-only systems, but speed depends heavily on your hardware and the model you picked.

If the install succeeded but inference is painfully slow, the most common fixes are:

  • use a smaller model

  • use a machine with more RAM

  • use a GPU-friendly setup if that fits your environment

You want to reach Ollama from another app

For apps on the same server, the local API at 127.0.0.1:11434 is often enough.

For remote access, do not casually expose the raw port to the internet and call it a day. Put it behind a properly planned access method, reverse proxy, or application-specific integration instead.

How to update Ollama

The official Linux docs say you can update Ollama by running the installer again.

curl -fsSL https://ollama.com/install.sh | sh

That is the simplest routine update path for this setup.

How to remove Ollama

If you decide you do not want Ollama on this machine anymore, stop and disable the service first.

sudo systemctl stop ollama
sudo systemctl disable ollama

Remove the service file.

sudo rm /etc/systemd/system/ollama.service

Remove the binary.

sudo rm $(which ollama)

Remove the service user and its data directory.

sudo userdel ollama
sudo groupdel ollama
sudo rm -r /usr/share/ollama

If you are removing Ollama from a production machine, double-check what model data or integrations depend on it before you start deleting files like a victorious goblin.

You are done

You now have Ollama installed on Ubuntu, running as a background service, answering terminal prompts, and exposing a local API you can build on.

From here, the most useful next steps are usually:

  • connect Ollama to Open WebUI for a browser chat interface

  • test a different model that better matches your hardware

  • plug it into a local workflow or agent tool

  • keep the API local and use it as a building block for other self-hosted apps

Frequently Asked Questions

Does Ollama need a GPU?
No. Ollama can run on CPU-only systems, but model size and response speed depend heavily on your hardware. A GPU helps a lot, especially once you move beyond small starter models.
Can I use Ollama without a web interface?
Yes. Ollama works directly from the terminal and exposes a local HTTP API, so you can use it on its own or connect it later to tools like Open WebUI, Hermes Agent, or other apps.
Where does Ollama store downloaded models?
On Linux, the installer creates an Ollama service user and stores model data under that Ollama-owned setup rather than inside your home directory. That is why backups and disk-space checks matter once you start downloading larger models.

Related articles

🤖
AI & LLMs
How to Install Open WebUI with Docker Compose
Install Open WebUI with Docker Compose on a Linux server, connect it to an Ollama container, and get a private browser-based chat interface running with persistent storage and a clean upgrade path.
🐳
Docker & Containers
How to Install Docker on Ubuntu and Run Your First Container
Install Docker Engine on Ubuntu using Docker's official apt repository, verify that it works, and optionally set it up so you can run Docker commands without sudo.
☁️
File & Cloud Storage
How to Install Nextcloud with Docker Compose
Install Nextcloud with Docker Compose using MariaDB and Redis, keep the data persistent, and finish the first web setup without turning the stack into a science project.
🔒
Security & Firewalls
How to Harden a VPS with SSH Keys, UFW, and Fail2ban on Ubuntu
Lock down an Ubuntu VPS with SSH key authentication, a basic UFW firewall, and Fail2ban so your server is safer before you start exposing self-hosted apps to the internet.