Hardware: Radxa Dragon Q6A (Qualcomm QCS6490)
Engine: OpenAI Whisper (CPU via HuggingFace Transformers)
Interface: Python CLI with Rich UI
This is a polished, "production-ready" Command Line Interface (CLI) for transcribing audio files directly on the Radxa Dragon. It automatically handles file inputs, generates text transcripts with matching filenames, and provides a beautiful visual status during processing.
While the NPU is great for LLMs, we run Whisper on the CPU here to ensure maximum compatibility with various audio formats and to support the complex decoding logic required for long-form transcription.
- Auto-Naming: Automatically saves
meeting.wavasmeeting.txtin the same folder. - Visual Feedback: Uses Rich for professional spinners and status updates.
- Flexible Models: Defaults to
whisper-tiny(fast) but supportsbase,small, etc. - Format Agnostic: Accepts WAV, MP3, FLAC, M4A (via FFmpeg).
- Smart Logging: Suppresses the noisy TensorFlow/PyTorch warnings for a clean experience.
FFmpeg is required to decode audio files.
sudo apt update
sudo apt install -y ffmpeg
It is recommended to run this inside a virtual environment.
# Create and activate (if you haven't already)
python3 -m venv ~/jarvis-venv
source ~/jarvis-venv/bin/activate
# Install dependencies
pip install --upgrade pip
pip install torch transformers rich
Download the transcribe.py script and run it:
Transcribe a file using the default model (Tiny) and auto-detected language.
python3 transcribe.py ~/recordings/meeting.wav
*Result: Creates ~/recordings/meeting.txt*
Use a larger model for better accuracy (slower).
python3 transcribe.py interview.mp3 --model openai/whisper-base
If the audio has a strong accent or specific language, force it to improve accuracy.
python3 transcribe.py french_lesson.mp3 --lang fr
| Audio Length | Model | Time (CPU) | Realtime Factor |
|---|---|---|---|
| 30s | whisper-tiny |
~4s | ~7.5x faster |
| 30s | whisper-base |
~12s | ~2.5x faster |
| 30s | whisper-small |
~45s | ~0.6x realtime |
Note: For purely offline, low-latency voice assistants, stick to tiny or base.
-
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg' -
Fix: Run
sudo apt install ffmpeg. -
Killed -
Fix: You ran out of RAM. Ensure you aren't running heavy NPU models (like Llama 7B) simultaneously if using
whisper-smallor larger.