Note
Using RamaLama might be an easier option. It seems to be a convenience wrapper to do pretty much what I've described here. In fact, I've even used their container image.
Here's a conference talk introducing it: https://www.youtube.com/watch?v=53NZFC-ReWs
Install Podman with libkrun backend for GPU acceleration: