Skip to content

Instantly share code, notes, and snippets.

@Foadsf
Created January 29, 2026 11:43
Show Gist options
  • Select an option

  • Save Foadsf/380edb809a1203b2f96ec88ac369a548 to your computer and use it in GitHub Desktop.

Select an option

Save Foadsf/380edb809a1203b2f96ec88ac369a548 to your computer and use it in GitHub Desktop.

Radxa Dragon Q6A - NPU Quick Start Guide

Run Llama 3.2 1B on the 12 TOPS Hexagon NPU.

Hardware: Radxa Dragon Q6A (QCS6490)
OS: Ubuntu 24.04 Noble
Last tested: January 2026 (T7 image)


Step 1: Flash the OS (from Linux laptop)

Download the latest image from GitHub Releases.

# Insert SD card and identify device
lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT

# Unmount partitions (replace <username> with yours)
sudo umount /media/<username>/config 2>/dev/null
sudo umount /media/<username>/rootfs 2>/dev/null

# Flash image (replace /dev/sdb with your SD card device)
xzcat radxa-dragon-q6a_noble_gnome_t7.output_512.img.xz | sudo dd of=/dev/sdb bs=4M status=progress conv=fsync
sudo sync

⚠️ Warning: Double-check the device path. Using the wrong device will destroy data.


Step 2: First Boot Setup

  1. Insert SD card into Dragon Q6A
  2. Connect power, monitor, keyboard (or use SSH after boot)
  3. Login: radxa / radxa
# Install NPU packages
sudo apt update
sudo apt install -y fastrpc fastrpc-dev libcdsprpc1 radxa-firmware-qcs6490

# Set device permissions
sudo chmod 666 /dev/fastrpc-*
sudo chmod 666 /dev/dma_heap/system

# Reboot to apply changes
sudo reboot

Step 3: Run Llama 3.2 1B on NPU

After reboot, SSH back in or use the console:

# Set permissions again (resets after reboot)
sudo chmod 666 /dev/fastrpc-*
sudo chmod 666 /dev/dma_heap/system

# Install modelscope
export PATH="$HOME/.local/bin:$PATH"
pip3 install modelscope --break-system-packages

# Download model (~1.7GB)
mkdir -p ~/llama-test && cd ~/llama-test
modelscope download --model radxa/Llama3.2-1B-1024-qairt-v68 --local_dir .

# Run inference
chmod +x genie-t2t-run
export LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH"
./genie-t2t-run -c htp-model-config-llama32-1b-gqa.json \
  -p '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHello<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'

Expected output:

[INFO]  "Using create From Binary List Async"
[INFO]  "Allocated total size = 33333760 across 1 buffers"
[PROMPT]: ...
[BEGIN]: Hello! How can I assist you today?[END]

Persistent Permissions (Optional)

To avoid running chmod after every reboot:

# Create udev rule
sudo tee /etc/udev/rules.d/99-fastrpc.rules << 'EOF'
KERNEL=="fastrpc-*", MODE="0666"
SUBSYSTEM=="dma_heap", KERNEL=="system", MODE="0666"
EOF

sudo udevadm control --reload-rules
sudo udevadm trigger

Quick Test Script

Save as ~/test-npu.sh:

#!/bin/bash
cd ~/llama-test
export LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH"
./genie-t2t-run -c htp-model-config-llama32-1b-gqa.json \
  -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n$1<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

Usage:

chmod +x ~/test-npu.sh
~/test-npu.sh "Write a haiku about Linux"

Troubleshooting

Issue Solution
cannot open shared object file Run export LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH"
Error 14001 Device Creation Failure Install libcdsprpc1 and set permissions
Permission denied on /dev/fastrpc-* Run sudo chmod 666 /dev/fastrpc-*

Resources


Performance

Metric Value
Model Llama 3.2 1B (INT8 quantized)
Context length 1024 tokens
Inference speed ~10-12 tokens/sec
First response ~1.8s (includes model load)
Longer generation ~6s for ~50 tokens

Tested on Radxa Dragon Q6A 8GB with T7 image (January 2026)

@ZIFENG278
Copy link

great job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment