I used Bazzite Linux because it seems to have the best 395+ CPU support right now. It also installs and uses podman by default. But the following instructions should work on any linux if:
- You have (very) recent AMD kernel drivers installed
- Podman or Docker installed (I think the instructions before should also work with docker if you just change the tool name)
- Go into the BIOS and bump up the amount of RAM given over to the GPU side by default (I used 64GB but you do you)
This will start serving on port 11434 and for example purposes I also have it fetch a model (llama3) and run it.
Based on this issue comment I got GPU access inside the podman containers:
# let containers use devices (like the AMD GPU)
sudo setsebool -P container_use_devices=true
Then download and run the container as a daemon:
# download and run the container, it will then answer on port 11434
podman run -it --name ollama -d --network=host -p 127.0.0.1:11434:11434 -v ollama:/root/.ollama --device /dev/kfd --device /dev/dri docker.io/ollama/ollama:rocm
# To pull a new model
podman exec -it ollama ollama pull llama3
# Or to pull a model and run it in an interactive shell
podman exec -it ollama ollama run llama3
# Test the web api
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt":" Why is the color of the sea blue ?"
}'
Here's a quick demo of using the basic shell access to the model.
~/development/ai$ podman exec -it ollama ollama run llama3
pulling manifest
pulling 6a0746a1ec1a... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏ 12 KB
pulling 8ab4849b038c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏ 254 B
pulling 577073ffcc6c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏ 110 B
pulling 3f8eb4da87fa... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
success
>>> make a joke about cats
Why did the cat join a band?
Because it wanted to be the purr-cussionist!
You probably want to use the (very nice) web interface. In that case, run the following container:
podman run -d --network=host -p 8080:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Then you can access the web interface at http://localhost:8080/
Hi!
Thanks for the guide!
I am using a z13 flow 2025, but with the Razen 390 chip instead of the 395+. I had no issues with GPT4all, even with larger models, now with ollama I get a
podman exec -it ollama ollama run llama3
Error: 500 Internal Server Error: llama runner process has terminated: exit status 2
I changed the Ram allocation at UMA config in Bios, but no luck independently from the value... Have you any clue on how can I circumvent this issue??
Thanks!
Sys details below
Operating System: Fedora Linux 43
KDE Plasma Version: 6.5.5
KDE Frameworks Version: 6.22.0
Qt Version: 6.10.1
Kernel Version: 6.18.7-200.fc43.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 24 × AMD RYZEN AI MAX 390 w/ Radeon 8050S
Memory: 32 GiB of RAM (27.0 GiB usable)
Graphics Processor: Radeon 8050S Graphics
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: ROG Flow Z13 GZ302EA_GZ302EA
System Version: 1.0