Radxa Dragon Q6A - NPU Quick Start Guide

Run Llama 3.2 1B on the 12 TOPS Hexagon NPU.

Hardware: Radxa Dragon Q6A (QCS6490)
OS: Ubuntu 24.04 Noble
Last tested: January 2026 (T7 image)

Step 1: Flash the OS (from Linux laptop)

Download the latest image from GitHub Releases.

# Insert SD card and identify device
lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT

# Unmount partitions (replace <username> with yours)
sudo umount /media/<username>/config 2>/dev/null
sudo umount /media/<username>/rootfs 2>/dev/null

# Flash image (replace /dev/sdb with your SD card device)
xzcat radxa-dragon-q6a_noble_gnome_t7.output_512.img.xz | sudo dd of=/dev/sdb bs=4M status=progress conv=fsync
sudo sync

⚠️ Warning: Double-check the device path. Using the wrong device will destroy data.

Step 2: First Boot Setup

Insert SD card into Dragon Q6A
Connect power, monitor, keyboard (or use SSH after boot)
Login: radxa / radxa

# Install NPU packages
sudo apt update
sudo apt install -y fastrpc fastrpc-dev libcdsprpc1 radxa-firmware-qcs6490

# Set device permissions
sudo chmod 666 /dev/fastrpc-*
sudo chmod 666 /dev/dma_heap/system

# Reboot to apply changes
sudo reboot

Step 3: Run Llama 3.2 1B on NPU

After reboot, SSH back in or use the console:

# Set permissions again (resets after reboot)
sudo chmod 666 /dev/fastrpc-*
sudo chmod 666 /dev/dma_heap/system

# Install modelscope
export PATH="$HOME/.local/bin:$PATH"
pip3 install modelscope --break-system-packages

# Download model (~1.7GB)
mkdir -p ~/llama-test && cd ~/llama-test
modelscope download --model radxa/Llama3.2-1B-1024-qairt-v68 --local_dir .

# Run inference
chmod +x genie-t2t-run
export LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH"
./genie-t2t-run -c htp-model-config-llama32-1b-gqa.json \
  -p '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHello<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'

Expected output:

[INFO]  "Using create From Binary List Async"
[INFO]  "Allocated total size = 33333760 across 1 buffers"
[PROMPT]: ...
[BEGIN]: Hello! How can I assist you today?[END]

Persistent Permissions (Optional)

To avoid running chmod after every reboot:

# Create udev rule
sudo tee /etc/udev/rules.d/99-fastrpc.rules << 'EOF'
KERNEL=="fastrpc-*", MODE="0666"
SUBSYSTEM=="dma_heap", KERNEL=="system", MODE="0666"
EOF

sudo udevadm control --reload-rules
sudo udevadm trigger

Quick Test Script

Save as ~/test-npu.sh:

#!/bin/bash
cd ~/llama-test
export LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH"
./genie-t2t-run -c htp-model-config-llama32-1b-gqa.json \
  -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n$1<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

Usage:

chmod +x ~/test-npu.sh
~/test-npu.sh "Write a haiku about Linux"

Troubleshooting

Issue	Solution
`cannot open shared object file`	Run `export LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH"`
`Error 14001 Device Creation Failure`	Install `libcdsprpc1` and set permissions
`Permission denied` on `/dev/fastrpc-*`	Run `sudo chmod 666 /dev/fastrpc-*`

Resources

Performance

Metric	Value
Model	Llama 3.2 1B (INT8 quantized)
Context length	1024 tokens
Inference speed	~10-12 tokens/sec
First response	~1.8s (includes model load)
Longer generation	~6s for ~50 tokens

Tested on Radxa Dragon Q6A 8GB with T7 image (January 2026)

Foadsf/RADXA_DRAGON_Q6A_NPU_QUICKSTART.md

Select an option

No results found