Verified: February 2026
Hardware: Radxa Dragon Q6A (Qualcomm QCS6490)
OS: Ubuntu 24.04 (Noble)
Status: ✅ Production Ready
Building a truly responsive voice assistant on embedded hardware requires balancing workloads. We use the NPU (Hexagon DSP) for the heavy lifting (LLM) and the CPU (Kryo) for real-time sensory tasks.
- The Brain (NPU): Llama 3.2 1B running on the Hexagon DSP via Qualcomm's
genieruntime.- Why NPU? It delivers ~15-20 tokens/sec at extremely low power, leaving the CPU free.
- The Ears (CPU): OpenAI Whisper (Tiny) running on CPU.
- Why CPU? While NPU Whisper is possible, the CPU version is more robust for "Decoder Loops" (turning speech into text) and simpler to integrate. The 'Tiny' model is fast enough (~200ms latency).
- The Voice (CPU): Piper TTS (Neural Text-to-Speech).
- Why Piper? It's highly optimized for ARM64 and generates human-like speech offline.
Install audio and NPU libraries.
sudo apt update
sudo apt install -y fastrpc fastrpc-dev libcdsprpc1 radxa-firmware-qcs6490 \
python3-pip python3-venv libportaudio2 ffmpeg git alsa-utils
Allow users to access the Hexagon DSP without sudo.
sudo tee /etc/udev/rules.d/99-fastrpc.rules << 'EOF'
KERNEL=="fastrpc-*", MODE="0666"
SUBSYSTEM=="dma_heap", KERNEL=="system", MODE="0666"
EOF
sudo udevadm control --reload-rules && sudo udevadm trigger
Create a clean environment for the AI stack.
python3 -m venv ~/jarvis-venv
source ~/jarvis-venv/bin/activate
# Install Core & UI Libraries
pip install --upgrade pip
pip install rich sounddevice soundfile numpy modelscope
# Install Transformers (Pinned for stability)
pip install "transformers==4.48.1" torch
We use modelscope to fetch the pre-compiled Llama model.
mkdir -p ~/llama-npu && cd ~/llama-npu
pip install modelscope
modelscope download --model radxa/Llama3.2-1B-4096-qairt-v68 --local_dir .
chmod +x genie-t2t-run
We fetch Piper and a high-quality voice model.
mkdir -p ~/piper_tts && cd ~/piper_tts
# Download Piper binary
wget -O piper.tar.gz [https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_aarch64.tar.gz](https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_aarch64.tar.gz)
tar -xvf piper.tar.gz
# Download Voice (Ryan - Medium Quality)
wget -O en_US-ryan-medium.onnx [https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/medium/en_US-ryan-medium.onnx](https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/medium/en_US-ryan-medium.onnx)
wget -O en_US-ryan-medium.onnx.json [https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/medium/en_US-ryan-medium.onnx.json](https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/medium/en_US-ryan-medium.onnx.json)
Place llama_engine.py and jarvis_ui.py (files below) in your home directory.
source ~/jarvis-venv/bin/activate
python3 ~/jarvis_ui.py
- The "Current Working Directory" Bug:
The Qualcomm
geniebinary looks fortokenizer.jsonin the folder where you run the command, not where the binary sits. We fixed this inllama_engine.pyby forcingcwd=self.model_dirin the subprocess call. - Audio Gibberish:
Piping raw audio from Piper to
aplaycaused static noise due to header mismatches. The fix was to generate a temporary WAV file (temp_speech.wav) which ensures perfect playback. - Log Pollution:
The NPU binary loves to print debug info (
Using libGenie.so...). We implemented a robust regex cleaner in Python to strip these out so the user only sees the AI's answer.