跳至主要内容
小龙虾小龙虾AI
🤖

Qwen3-tts

Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.

下载2.3k
星标7
版本1.0.0
设计媒体
安全通过
💬Prompt

技能说明


name: qwen-tts description: Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.

Qwen TTS

Local text-to-speech using Hugging Face's Qwen3-TTS-12Hz-1.7B-CustomVoice model.

Quick Start

Generate speech from text:

scripts/tts.py "Ciao, come va?" -l Italian -o output.wav

With voice instruction (emotion/style):

scripts/tts.py "Sono felice!" -i "Parla con entusiasmo" -l Italian -o happy.wav

Different speaker:

scripts/tts.py "Hello world" -s Ryan -l English -o hello.wav

Installation

First-time setup (one-time):

cd skills/public/qwen-tts
bash scripts/setup.sh

This creates a local virtual environment and installs qwen-tts package (~500MB).

Note: First synthesis downloads ~1.7GB model from Hugging Face automatically.

Usage

scripts/tts.py [options] "Text to speak"

Options

  • -o, --output PATH - Output file path (default: qwen_output.wav)
  • -s, --speaker NAME - Speaker voice (default: Vivian)
  • -l, --language LANG - Language (default: Auto)
  • -i, --instruct TEXT - Voice instruction (emotion, style, tone)
  • --list-speakers - Show available speakers
  • --model NAME - Model name (default: CustomVoice 1.7B)

Examples

Basic Italian speech:

scripts/tts.py "Benvenuto nel futuro del text-to-speech" -l Italian -o welcome.wav

With emotion/instruction:

scripts/tts.py "Sono molto felice di vederti!" -i "Parla con entusiasmo e gioia" -l Italian -o happy.wav

Different speaker:

scripts/tts.py "Hello, nice to meet you" -s Ryan -l English -o ryan.wav

List available speakers:

scripts/tts.py --list-speakers

Available Speakers

The CustomVoice model includes 9 premium voices:

SpeakerLanguageDescription
VivianChineseBright, slightly edgy young female
SerenaChineseWarm, gentle young female
Uncle_FuChineseSeasoned male, low mellow timbre
DylanChinese (Beijing)Youthful Beijing male, clear
EricChinese (Sichuan)Lively Chengdu male, husky
RyanEnglishDynamic male, rhythmic
AidenEnglishSunny American male
Ono_AnnaJapanesePlayful female, light nimble
SoheeKoreanWarm female, rich emotion

Recommendation: Use each speaker's native language for best quality, though all speakers support all 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian).

Voice Instructions

Use -i, --instruct to control emotion, tone, and style:

Italian examples:

  • "Parla con entusiasmo"
  • "Tono serio e professionale"
  • "Voce calma e rilassante"
  • "Leggi come un narratore"

English examples:

  • "Speak with excitement"
  • "Very happy and energetic"
  • "Calm and soothing voice"
  • "Read like a narrator"

Integration with OpenClaw

The script outputs the audio file path to stdout (last line), making it compatible with OpenClaw's TTS workflow:

# OpenClaw captures the output path
cd skills/public/qwen-tts
OUTPUT=$(scripts/tts.py "Ciao" -s Vivian -l Italian -o /tmp/audio.wav 2>/dev/null)
# OUTPUT = /tmp/audio.wav

Performance

  • GPU (CUDA): ~1-3 seconds for short phrases
  • CPU: ~10-30 seconds for short phrases
  • Model size: ~1.7GB (auto-downloads on first run)
  • Venv size: ~500MB (installed dependencies)

Troubleshooting

Setup fails:

# Ensure Python 3.10-3.12 is available
python3.12 --version

# Re-run setup
cd skills/public/qwen-tts
rm -rf venv
bash scripts/setup.sh

Model download slow/fails:

# Use mirror (China mainland)
export HF_ENDPOINT=https://hf-mirror.com
scripts/tts.py "Test" -o test.wav

Out of memory (GPU): The model automatically falls back to CPU if GPU memory insufficient.

Audio quality issues:

  • Try different speaker: --list-speakers
  • Add instruction: -i "Speak clearly and slowly"
  • Check language matches text: -l Italian for Italian text

Model Details

如何使用「Qwen3-tts」?

  1. 打开小龙虾AI(Web 或 iOS App)
  2. 点击上方「立即使用」按钮,或在对话框中输入任务描述
  3. 小龙虾AI 会自动匹配并调用「Qwen3-tts」技能完成任务
  4. 结果即时呈现,支持继续对话优化

相关技能