Ollama GPU Setup for Framework Desktop (AMD Ryzen AI Max+ 395)

This guide explains how to set up Ollama with GPU acceleration on a Framework Desktop with the AMD Ryzen AI Max+ 395 APU (RDNA 3.5 / gfx1151) running Ubuntu Server 24.04.

Hardware

Framework Desktop with AMD Ryzen AI Max+ 395
Integrated GPU: AMD Radeon 8060S (gfx1151, RDNA 3.5)
128GB unified memory (shared between CPU and GPU)

1. Install Ubuntu Server 24.04

Install Ubuntu Server 24.04.x LTS.

2. Install HWE Kernel

The Hardware Enablement (HWE) kernel provides better support for newer AMD hardware:

sudo apt update
sudo apt install linux-generic-hwe-24.04
sudo reboot

After reboot, verify the kernel version:

uname -r
# Should show 6.17.x or newer

3. Install ROCm Userspace Libraries

The amdgpu kernel driver is already included in the HWE kernel. You only need the ROCm userspace libraries (version 7.2+ required for gfx1151 support).

Follow the ROCm Quick start installation guide to install ROCm.

Don't install the AMDGPU driver, just the user-space tools. The latest kernel has the driver you need, and if you install the driver here it will conflict.

Verify GPU Detection

rocminfo | grep -E "Name:|Marketing Name:"

You should see:

  Name:                    gfx1151
  Marketing Name:          AMD Radeon Graphics

4. Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

This creates:

/usr/local/bin/ollama - The ollama binary
/usr/local/lib/ollama/ - Bundled libraries (including Vulkan support)
ollama system user and group

5. Add ollama User to GPU Groups

sudo usermod -aG video ollama
sudo usermod -aG render ollama

6. Configure Systemd Service

Create or edit /etc/systemd/system/ollama.service:

sudo tee /etc/systemd/system/ollama.service << 'EOF'
[Unit]
Description=Ollama Service - GPU Enabled (Vulkan)
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_FLASH_ATTENTION=true"
Environment="OLLAMA_KV_CACHE_TYPE=q8_0"
Environment="OLLAMA_CONTEXT_LENGTH=64000"
Environment="OLLAMA_NUM_PARALLEL=4"
Environment="OLLAMA_MAX_LOADED_MODELS=2"
Environment="OLLAMA_VULKAN=1"

[Install]
WantedBy=default.target
EOF

Key Configuration Notes

OLLAMA_VULKAN=1: Uses Vulkan backend for GPU acceleration. This is required because ollama's bundled ROCm libraries (v6.3) don't support gfx1151 yet.
OLLAMA_FLASH_ATTENTION=true: Enables flash attention for better memory efficiency.
OLLAMA_KV_CACHE_TYPE=q8_0: Uses quantized KV cache to reduce memory usage.
OLLAMA_HOST=0.0.0.0: Allows remote connections (remove for localhost only).

7. Enable and Start the Service

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

8. Verify GPU Usage

Test inference:

ollama pull qwen2.5-coder:7b-instruct-q4_K_M
ollama run qwen2.5-coder:7b-instruct-q4_K_M "Hello"
sudo journalctl -u ollama --no-pager -n 30 | grep -i "layer"

You should see something like:

❯ ollama pull qwen2.5-coder:7b-instruct-q4_K_M
❯ ollama run qwen2.5-coder:7b-instruct-q4_K_M "Hello"
Hello! How can I assist you today?
❯ sudo journalctl -u ollama --no-pager -n 30 | grep -i "layer"
Feb 05 15:09:08 grimoire ollama[3069908]: load_tensors: offloading 28 repeating layers to GPU
Feb 05 15:09:08 grimoire ollama[3069908]: load_tensors: offloading output layer to GPU
Feb 05 15:09:08 grimoire ollama[3069908]: load_tensors: offloaded 29/29 layers to GPU
Feb 05 15:09:11 grimoire ollama[3069908]: llama_kv_cache: size = 3808.00 MiB ( 32768 cells,  28 layers,  4/4 seqs), K (q8_0): 1904.00 MiB, V (q8_0): 1904.00 MiB

micahflee/framework-desktop-ollama-gpu.md

Select an option

No results found