Three models are installed locally at ~/.local/share/qwen3-tts-models/:
| Key | Model | Notes |
|---|---|---|
1 |
Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit |
Default. Best quality. Supports instruct. |
4 |
Qwen3-TTS-12Hz-0.6B-CustomVoice-8bit |
Faster, less RAM. No instruction control. |
6 |
Qwen3-TTS-12Hz-0.6B-Base-8bit |
Voice cloning from reference audio. |
Use aliases: pro-custom (1), lite-custom (4), lite-clone (6).
For best quality, use the speaker whose native language matches the text.
| Speaker | Native Language | Description |
|---|---|---|
| Vivian | Chinese | Bright, slightly edgy young female |
| Serena | Chinese | Warm, gentle young female |
| Ryan | English | Dynamic male with strong rhythmic drive |
| Aiden | English | Sunny American male with clear midrange |
| Ethan | English | — |
| Chelsie | English | — |
The full 1.7B model supports 9 speakers (including
Ono_Annafor Japanese andSoheefor Korean) but the installed mlx-community quantized version ships with 6.
Always pass --lang-code for best results. Auto-detection works but can misidentify.
auto, chinese, english, japanese, korean, german, french, russian, portuguese, spanish, italian
The 1.7B CustomVoice model supports natural language style control via instruct.
# Excited tone
python mcp_server.py --speak-text "Hello!" --voice Ryan --instruct "speak in an excited and energetic tone"
# Angry tone
python mcp_server.py --speak-text "I told you so." --voice Ryan --instruct "speak in an angry tone"
# Chinese emotional style
python mcp_server.py --speak-text "你好!" --voice Vivian --lang-code chinese --instruct "用特别愉快的语气说"Default instruction is "normal tone" when omitted.
Point .mcp.json directly at the venv Python — avoids PyInstaller startup overhead:
{
"mcpServers": {
"qwen3-tts-mcp": {
"type": "stdio",
"command": "/path/to/tts/.venv/bin/python",
"args": [
"/path/to/tts/mcp_server.py",
"--models-dir", "/Users/<you>/.local/share/qwen3-tts-models"
]
}
}
}The binary is slower on cold start (~15s) due to PyInstaller bootstrap overhead. Once warm it is fine. Build with:
.venv/bin/pyinstaller qwen3-tts-mcp.spec --noconfirm
rm -f ~/.local/bin/qwen3-tts-mcp
cp dist/qwen3-tts-mcp ~/.local/bin/qwen3-tts-mcpImportant: The binary bundles all Python dependencies. Patches to .venv packages (e.g. mlx_audio) are NOT reflected in the binary until you rebuild.
mlx_audio's AudioPlayer defaults to min_buffer_seconds = 1.5. Combined with a streaming_interval of 2.0 seconds, audio chunks arrive too slowly and gaps appear between chunks during playback.
from mlx_audio.tts.audio_player import AudioPlayer
AudioPlayer.min_buffer_seconds = 4.0 # wait for larger buffer before starting playbackAnd in generate_audio():
streaming_interval=5.0 # generate larger chunks before yieldingMonkey-patching AudioPlayer directly in mcp_server.py (rather than editing the library file) survives pip upgrades.
With min_buffer_seconds = 4.0, short phrases (< 4 seconds) never accumulate enough audio to trigger playback. wait_for_drain() blocks forever — the process hangs.
Fix: patch wait_for_drain to force-start playback if audio is buffered but not yet playing:
_original_wait_for_drain = AudioPlayer.wait_for_drain
def _patched_wait_for_drain(self):
if not self.playing and self.buffered_samples() > 0:
self.start_stream()
return _original_wait_for_drain(self)
AudioPlayer.wait_for_drain = _patched_wait_for_drainpython mcp_server.py \
--speak-text "Hello world" \
--voice Ryan \
--speak-model 1 \
--lang-code english \
--instruct "speak in an excited tone" \
--speak-keep-file \
--speak-output-dir ./outputs| Flag | Default | Description |
|---|---|---|
--speak-text |
— | Text to synthesize |
--voice |
Vivian |
Speaker name |
--speak-model |
1 |
Model key (1-6 or alias) |
--lang-code |
auto |
Language hint |
--instruct |
None |
Style instruction |
--speak-speed |
1.0 |
Speed (note: not yet implemented in mlx_audio) |
--speak-keep-file |
off | Save WAV to disk |
--speak-output-dir |
outputs/ |
Output directory |
--speak-no-play |
off | Disable audio playback |
speedparameter has no effect —mlx_audioaccepts it but notes "not directly supported yet".- Tokenizer warning —
transformers5.0.0rc3 warns about an incorrect regex pattern in the Qwen3 tokenizer.fix_mistral_regex=Trueis passed inqwen3.pybut doesn't fully propagate throughAutoTokenizer.from_pretrained. The warning is cosmetic and audio quality is acceptable. - Binary vs Python path — The warning always exists in both; it's just hidden in MCP tool output (stderr not forwarded).