Skip to content

Instantly share code, notes, and snippets.

@Foadsf
Created February 2, 2026 22:09
Show Gist options
  • Select an option

  • Save Foadsf/88c1a83314db9cdb6dddc7199fb804af to your computer and use it in GitHub Desktop.

Select an option

Save Foadsf/88c1a83314db9cdb6dddc7199fb804af to your computer and use it in GitHub Desktop.
Radxa Dragon Q6A: Offline Audio Transcriber CLI (Whisper CPU) - A polished CLI tool for converting audio to text.

πŸŽ™οΈ Radxa Dragon Q6A: Audio Transcriber CLI

Hardware: Radxa Dragon Q6A (Qualcomm QCS6490)
Engine: OpenAI Whisper (CPU via HuggingFace Transformers)
Interface: Python CLI with Rich UI

Overview

This is a polished, "production-ready" Command Line Interface (CLI) for transcribing audio files directly on the Radxa Dragon. It automatically handles file inputs, generates text transcripts with matching filenames, and provides a beautiful visual status during processing.

While the NPU is great for LLMs, we run Whisper on the CPU here to ensure maximum compatibility with various audio formats and to support the complex decoding logic required for long-form transcription.

✨ Features

  • Auto-Naming: Automatically saves meeting.wav as meeting.txt in the same folder.
  • Visual Feedback: Uses Rich for professional spinners and status updates.
  • Flexible Models: Defaults to whisper-tiny (fast) but supports base, small, etc.
  • Format Agnostic: Accepts WAV, MP3, FLAC, M4A (via FFmpeg).
  • Smart Logging: Suppresses the noisy TensorFlow/PyTorch warnings for a clean experience.

πŸ› οΈ Prerequisites

1. System Dependencies

FFmpeg is required to decode audio files.

sudo apt update
sudo apt install -y ffmpeg

2. Python Environment

It is recommended to run this inside a virtual environment.

# Create and activate (if you haven't already)
python3 -m venv ~/jarvis-venv
source ~/jarvis-venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install torch transformers rich

πŸš€ Usage

Download the transcribe.py script and run it:

Basic Transcription

Transcribe a file using the default model (Tiny) and auto-detected language.

python3 transcribe.py ~/recordings/meeting.wav

*Result: Creates ~/recordings/meeting.txt*

High-Accuracy Mode

Use a larger model for better accuracy (slower).

python3 transcribe.py interview.mp3 --model openai/whisper-base

Force Language

If the audio has a strong accent or specific language, force it to improve accuracy.

python3 transcribe.py french_lesson.mp3 --lang fr

🧩 Benchmark (Radxa Dragon Q6A)

Audio Length Model Time (CPU) Realtime Factor
30s whisper-tiny ~4s ~7.5x faster
30s whisper-base ~12s ~2.5x faster
30s whisper-small ~45s ~0.6x realtime

Note: For purely offline, low-latency voice assistants, stick to tiny or base.

πŸ› Troubleshooting

  • FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

  • Fix: Run sudo apt install ffmpeg.

  • Killed

  • Fix: You ran out of RAM. Ensure you aren't running heavy NPU models (like Llama 7B) simultaneously if using whisper-small or larger.

#!/usr/bin/env python3
import argparse
import sys
import os
import warnings
import logging
import torch
from pathlib import Path
from transformers import pipeline
from rich.console import Console
from rich.panel import Panel
from rich.text import Text
# --- CONFIGURATION & SILENCING ---
# Suppress the noisy logs from TensorFlow/Transformers
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
warnings.filterwarnings("ignore")
logging.getLogger("transformers").setLevel(logging.ERROR)
# Initialize Rich Console
console = Console()
def setup_args():
"""Defines the CLI arguments."""
parser = argparse.ArgumentParser(
description="Transcribe audio files to text using OpenAI Whisper (CPU).",
epilog="Example: python3 transcribe.py my_meeting.wav"
)
parser.add_argument(
"input_file",
help="Path to the audio file (wav, mp3, flac, etc.)",
type=Path
)
parser.add_argument(
"--model",
default="openai/whisper-tiny",
help="Whisper model size to use (default: openai/whisper-tiny). Try 'openai/whisper-base' for better accuracy.",
type=str
)
parser.add_argument(
"--lang",
help="Force source language (e.g., 'en', 'fr'). If omitted, language is auto-detected.",
default=None,
type=str
)
return parser.parse_args()
def transcribe_audio(audio_path, model_name, language=None):
"""
Runs the transcription pipeline.
Returns: (text, detected_language)
"""
device = "cpu" # Force CPU as per your robust setup
# Initialize pipeline
# We use chunk_length_s=30 to handle long audio files correctly via sliding window
pipe = pipeline(
"automatic-speech-recognition",
model=model_name,
device=device,
chunk_length_s=30
)
# Configure generation arguments
gen_kwargs = {}
if language:
gen_kwargs["language"] = language
# Run Inference
# return_timestamps=True is often required for long-form chunking logic to work best
result = pipe(str(audio_path), batch_size=4, generate_kwargs=gen_kwargs, return_timestamps=True)
return result["text"].strip()
def main():
args = setup_args()
# --- 1. Validation ---
if not args.input_file.exists():
console.print(f"[bold red]Error:[/bold red] File '{args.input_file}' not found.")
sys.exit(1)
# Define output path (same directory, same name, .txt extension)
output_file = args.input_file.with_suffix(".txt")
# --- 2. Visual Header ---
console.print(Panel(
Text("Radxa Dragon β€’ Audio Transcriber", justify="center", style="bold cyan"),
border_style="cyan",
expand=False
))
# --- 3. Processing Loop ---
try:
with console.status(f"[bold green]Loading model '{args.model}'...[/bold green]", spinner="dots") as status:
# A. Load Model & Transcribe
status.update(f"[bold yellow]Transcribing '{args.input_file.name}'...[/bold yellow]\n[dim](This may take a moment for long files)[/dim]")
transcript = transcribe_audio(args.input_file, args.model, args.lang)
# B. Save to File
status.update("[bold cyan]Saving output...[/bold cyan]")
with open(output_file, "w", encoding="utf-8") as f:
f.write(transcript)
# --- 4. Success Output ---
console.print(f"\n[bold green]βœ“ Success![/bold green]")
console.print(f" [dim]Input:[/dim] {args.input_file.name}")
console.print(f" [dim]Output:[/dim] {output_file.name}")
console.print(f" [dim]Path:[/dim] {output_file.parent}")
# Print a snippet preview
preview = (transcript[:150] + '...') if len(transcript) > 150 else transcript
console.print(Panel(preview, title="Transcript Preview", border_style="green"))
except Exception as e:
console.print(f"\n[bold red]Fatal Error:[/bold red] {e}")
sys.exit(1)
if __name__ == "__main__":
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment