This guide explains how to set up Ollama with GPU acceleration on a Framework Desktop with the AMD Ryzen AI Max+ 395 APU (RDNA 3.5 / gfx1151) running Ubuntu Server 24.04.
- Framework Desktop with AMD Ryzen AI Max+ 395
- Integrated GPU: AMD Radeon 8060S (gfx1151, RDNA 3.5)
- 128GB unified memory (shared between CPU and GPU)
Install Ubuntu Server 24.04.x LTS.
The Hardware Enablement (HWE) kernel provides better support for newer AMD hardware:
sudo apt update
sudo apt install linux-generic-hwe-24.04
sudo rebootAfter reboot, verify the kernel version:
uname -r
# Should show 6.17.x or newerThe amdgpu kernel driver is already included in the HWE kernel. You only need the ROCm userspace libraries (version 7.2+ required for gfx1151 support).
Follow the ROCm Quick start installation guide to install ROCm.
Don't install the AMDGPU driver, just the user-space tools. The latest kernel has the driver you need, and if you install the driver here it will conflict.
rocminfo | grep -E "Name:|Marketing Name:"You should see:
Name: gfx1151
Marketing Name: AMD Radeon Graphics
curl -fsSL https://ollama.com/install.sh | shThis creates:
/usr/local/bin/ollama- The ollama binary/usr/local/lib/ollama/- Bundled libraries (including Vulkan support)ollamasystem user and group
sudo usermod -aG video ollama
sudo usermod -aG render ollamaCreate or edit /etc/systemd/system/ollama.service:
sudo tee /etc/systemd/system/ollama.service << 'EOF'
[Unit]
Description=Ollama Service - GPU Enabled (Vulkan)
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_FLASH_ATTENTION=true"
Environment="OLLAMA_KV_CACHE_TYPE=q8_0"
Environment="OLLAMA_CONTEXT_LENGTH=64000"
Environment="OLLAMA_NUM_PARALLEL=4"
Environment="OLLAMA_MAX_LOADED_MODELS=2"
Environment="OLLAMA_VULKAN=1"
[Install]
WantedBy=default.target
EOFOLLAMA_VULKAN=1: Uses Vulkan backend for GPU acceleration. This is required because ollama's bundled ROCm libraries (v6.3) don't support gfx1151 yet.OLLAMA_FLASH_ATTENTION=true: Enables flash attention for better memory efficiency.OLLAMA_KV_CACHE_TYPE=q8_0: Uses quantized KV cache to reduce memory usage.OLLAMA_HOST=0.0.0.0: Allows remote connections (remove for localhost only).
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollamaTest inference:
ollama pull qwen2.5-coder:7b-instruct-q4_K_M
ollama run qwen2.5-coder:7b-instruct-q4_K_M "Hello"
sudo journalctl -u ollama --no-pager -n 30 | grep -i "layer"You should see something like:
❯ ollama pull qwen2.5-coder:7b-instruct-q4_K_M
❯ ollama run qwen2.5-coder:7b-instruct-q4_K_M "Hello"
Hello! How can I assist you today?
❯ sudo journalctl -u ollama --no-pager -n 30 | grep -i "layer"
Feb 05 15:09:08 grimoire ollama[3069908]: load_tensors: offloading 28 repeating layers to GPU
Feb 05 15:09:08 grimoire ollama[3069908]: load_tensors: offloading output layer to GPU
Feb 05 15:09:08 grimoire ollama[3069908]: load_tensors: offloaded 29/29 layers to GPU
Feb 05 15:09:11 grimoire ollama[3069908]: llama_kv_cache: size = 3808.00 MiB ( 32768 cells, 28 layers, 4/4 seqs), K (q8_0): 1904.00 MiB, V (q8_0): 1904.00 MiB