This guide enables Hardware Accelerated AI on the Radxa Dragon Q6A. We will run Llama 3.2 (LLM) on the NPU and Whisper (Speech) on the CPU to create a fully voice-interactive system.
Hardware: Radxa Dragon Q6A (QCS6490) OS: Ubuntu 24.04 Noble (T7 Image or newer) Status: β Verified Working (Jan 2026)
Run these commands once to install drivers and set permissions.
sudo apt update
sudo apt install -y fastrpc fastrpc-dev libcdsprpc1 radxa-firmware-qcs6490 \
python3-pip python3.12-venv libportaudio2 ffmpeg git
This ensures you don't get "Permission Denied" errors after rebooting.
sudo tee /etc/udev/rules.d/99-fastrpc.rules << 'EOF'
KERNEL=="fastrpc-*", MODE="0666"
SUBSYSTEM=="dma_heap", KERNEL=="system", MODE="0666"
EOF
# Apply immediately
sudo udevadm control --reload-rules
sudo udevadm trigger
We use a virtual environment to prevent "Dependency Hell" with system packages.
# Create and activate
python3 -m venv ~/qai-venv
source ~/qai-venv/bin/activate
# Install AI tools (Whisper, Audio libraries)
pip install --upgrade pip
pip install "qai-hub-models[whisper-small]" librosa sounddevice
We use the 4096-context model for better conversation memory.
(Note: requires ~2GB space)
# Ensure you are NOT in the venv for this part (using system tools for binary download)
deactivate 2>/dev/null
# Install downloader
pip3 install modelscope --break-system-packages
# Download
mkdir -p ~/llama-4k && cd ~/llama-4k
modelscope download --model radxa/Llama3.2-1B-4096-qairt-v68 --local_dir .
# Make the runner executable
chmod +x genie-t2t-run
Create a simple script to run the NPU model.
cd ~/llama-4k
cat << 'EOF' > chat
#!/bin/bash
cd ~/llama-4k
export LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH"
# Llama 3 Prompt Format
PROMPT="<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n$1<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
./genie-t2t-run -c htp-model-config-llama32-1b-gqa.json -p "$PROMPT"
EOF
chmod +x chat
Test it: ~/llama-4k/chat "What is the capital of France?"
We run Whisper Small on the CPU. It is lightweight enough to be fast without needing the complex NPU compilation.
cat << 'EOF' > ~/transcribe.sh
#!/bin/bash
# Wrapper to run Whisper in the virtual environment
source ~/qai-venv/bin/activate
python3 -m qai_hub_models.models.whisper_small.demo --audio-file "$1" 2>/dev/null | grep "Transcription:" | sed 's/Transcription: //'
EOF
chmod +x ~/transcribe.sh
Download a sample file to test the system.
wget [https://github.com/ggerganov/whisper.cpp/raw/master/samples/jfk.wav](https://github.com/ggerganov/whisper.cpp/raw/master/samples/jfk.wav) -O jfk.wav
~/transcribe.sh jfk.wav
Expected Output: "And so my fellow Americans..."
Combine both tools! This script records your voice, converts it to text, sends it to Llama, and prints the answer.
cat << 'EOF' > ~/voice-chat.sh
#!/bin/bash
echo "π΄ Recording... (Press Ctrl+C to stop, or wait 5 seconds)"
arecord -d 5 -f cd -r 16000 -c 1 -t wav my_voice.wav 2>/dev/null
echo "β
Processing..."
# 1. Speech to Text (Whisper)
USER_TEXT=$(~/transcribe.sh my_voice.wav)
echo "π£οΈ You said: $USER_TEXT"
if [ -z "$USER_TEXT" ]; then
echo "β No speech detected."
exit 1
fi
# 2. Text to Intelligence (Llama NPU)
echo "π€ AI Thinking..."
~/llama-4k/chat "$USER_TEXT"
EOF
chmod +x ~/voice-chat.sh
Plug in a USB microphone and run:
~/voice-chat.sh
| Component | Model | Processor | Performance |
|---|---|---|---|
| Brain | Llama 3.2 1B (4096) | NPU (Hexagon) | ~15 tokens/sec (Real-time) |
| Ears | Whisper Small | CPU (Kryo) | ~2 sec for 5 sec audio |
| Memory | System RAM | Shared | ~2.5 GB Total Used |
| Issue | Solution |
|---|---|
Permission denied (/dev/fastrpc) |
Run the Step 1 udev commands and reboot. |
genie-t2t-run: not found |
Ensure you are in ~/llama-4k and run chmod +x genie-t2t-run. |
ModuleNotFoundError (Whisper) |
Run source ~/qai-venv/bin/activate before using python. |
EOFError (Whisper) |
The audio file is corrupt/empty. Re-download or re-record. |