Generate speech audio using a configured text-to-speech provider.
term-llm audio "hello from term-llm"
By default, audio clips are:
- Saved to
~/Music/term-llm/with timestamped filenames - Generated with Venice
tts-kokoro - Rendered as MP3 using voice
af_sky
You can also use Gemini or ElevenLabs TTS:
term-llm audio "Say cheerfully: hello from Gemini" --provider gemini --voice Kore
term-llm audio "Hello from ElevenLabs" --provider elevenlabs --voice Rachel --model eleven_flash_v2_5
Audio Flags
| Flag | Short | Description |
|---|---|---|
--provider |
-p |
Audio provider override: venice, gemini, elevenlabs |
--output |
-o |
Custom output path, or - for stdout |
--model |
TTS model override | |
--voice |
Model-specific voice; ElevenLabs accepts a voice ID or account voice name; Venice accepts cloned voice handles like vv_<id> |
|
--voice1 |
Gemini multi-speaker voice for --speaker1 |
|
--voice2 |
Gemini multi-speaker voice for --speaker2 |
|
--speaker1 |
Gemini multi-speaker label for the first speaker | |
--speaker2 |
Gemini multi-speaker label for the second speaker | |
--language |
Optional provider language hint (English, en, etc.; provider/model-specific) |
|
--prompt |
Style/emotion prompt for models that support it | |
--format |
Venice: mp3, opus, aac, flac, wav, pcm; Gemini: wav, pcm; ElevenLabs: mp3_44100_128, pcm_24000, wav_44100, etc. |
|
--speed |
Venice speech speed 0.25 to 4.0; ElevenLabs voice speed 0.7 to 1.2 |
|
--streaming |
Ask supported providers to stream; term-llm still collects before saving | |
--temperature |
Sampling temperature for supported models, 0 to 2; omitted by default |
|
--top-p |
Nucleus sampling for supported models, 0 to 1; omitted by default |
|
--stability |
ElevenLabs voice stability, 0 to 1; omitted by default |
|
--similarity-boost |
ElevenLabs similarity boost, 0 to 1; omitted by default |
|
--style |
ElevenLabs style exaggeration, 0 to 1; omitted by default |
|
--speaker-boost |
ElevenLabs speaker boost voice setting | |
--seed |
ElevenLabs deterministic seed | |
--previous-text / --next-text |
ElevenLabs continuity context | |
--previous-request-ids / --next-request-ids |
ElevenLabs comma-separated request IDs for continuity | |
--pronunciation-dictionaries |
ElevenLabs comma-separated pronunciation dictionary IDs, optionally id:version |
|
--use-pvc-as-ivc |
ElevenLabs lower-latency PVC workaround | |
--apply-text-normalization |
ElevenLabs text normalization: auto, on, off |
|
--apply-language-text-normalization |
ElevenLabs language text normalization | |
--optimize-streaming-latency |
ElevenLabs latency optimization level, 0 to 4 |
|
--enable-logging |
ElevenLabs request logging/history; set false for zero-retention-capable accounts | |
--json |
Emit machine-readable JSON to stdout | |
--debug |
-d |
Show debug information |
--model, --voice, --voice1, --voice2, --format, --provider, and --apply-text-normalization include shell completion candidates.
Examples
term-llm audio "hello from term-llm"
term-llm audio "quick smoke test" --output smoke.mp3
term-llm audio "faster please" --speed 1.25 --format wav
term-llm audio "sad robot noises" \
--model tts-qwen3-0-6b \
--voice Vivian \
--prompt "Sad and slow."
term-llm audio "Say cheerfully: have a wonderful day" \
--provider gemini \
--model gemini-3.1-flash-tts-preview \
--voice Kore \
--format wav
term-llm audio "TTS the following conversation between Joe and Jane: Joe: Hi Jane. Jane: Hi Joe." \
--provider gemini \
--speaker1 Joe --voice1 Kore \
--speaker2 Jane --voice2 Puck
term-llm audio "A one second ElevenLabs smoke test." \
--provider elevenlabs \
--model eleven_flash_v2_5 \
--voice Rachel \
--format mp3_44100_128 \
--stability 0.5 \
--similarity-boost 0.75
echo "pipe me" | term-llm audio --voice af_bella -o - > out.mp3
Venice TTS Models
term-llm includes the Venice text-to-speech model catalog:
| Model | Notes |
|---|---|
tts-kokoro |
Default, cheap general TTS |
tts-qwen3-0-6b |
Qwen 3 TTS, supports style prompt / sampling options |
tts-qwen3-1-7b |
Larger Qwen 3 TTS |
tts-xai-v1 |
xAI TTS v1 |
tts-inworld-1-5-max |
Inworld TTS-1.5 Max |
tts-chatterbox-hd |
Chatterbox HD; supports cloned voices |
tts-orpheus |
Orpheus TTS |
tts-elevenlabs-turbo-v2-5 |
ElevenLabs Turbo v2.5 |
tts-minimax-speech-02-hd |
MiniMax Speech-02 HD |
tts-gemini-3-1-flash |
Gemini 3.1 Flash TTS |
Voices are model-specific. If a model rejects a voice, Venice returns the API error directly.
Gemini TTS Models
| Model | Single speaker | Multi-speaker |
|---|---|---|
gemini-3.1-flash-tts-preview |
Yes | Yes |
gemini-2.5-flash-preview-tts |
Yes | Yes |
gemini-2.5-pro-preview-tts |
Yes | Yes |
Gemini TTS returns 24 kHz mono PCM. term-llm can save it directly as pcm, or wrap it as a wav file. Gemini TTS does not support streaming; language is auto-detected.
Gemini prebuilt voices:
Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafat.
ElevenLabs TTS Models
term-llm includes the documented ElevenLabs text-to-speech models:
| Model | Notes |
|---|---|
eleven_v3 |
Latest expressive speech model |
eleven_multilingual_v2 |
Default high-quality multilingual speech |
eleven_flash_v2_5 |
Low-latency multilingual speech |
eleven_flash_v2 |
Low-latency English speech |
eleven_turbo_v2_5 |
Deprecated predecessor of Flash v2.5 |
eleven_turbo_v2 |
Deprecated predecessor of Flash v2 |
eleven_monolingual_v1 |
Deprecated English-only model |
eleven_multilingual_v1 |
Deprecated multilingual model |
ElevenLabs voices are account-specific. --voice accepts either a raw voice_id or an exact voice name from the account; name lookup uses the ElevenLabs voices API before generating speech.
ElevenLabs output formats:
alaw_8000, mp3_22050_32, mp3_24000_48, mp3_44100_32, mp3_44100_64, mp3_44100_96, mp3_44100_128, mp3_44100_192, opus_48000_32, opus_48000_64, opus_48000_96, opus_48000_128, opus_48000_192, pcm_8000, pcm_16000, pcm_22050, pcm_24000, pcm_32000, pcm_44100, pcm_48000, ulaw_8000, wav_8000, wav_16000, wav_22050, wav_24000, wav_32000, wav_44100, wav_48000.
JSON Output
--json prints a single structured object to stdout after saving the file.
{
"provider": "venice",
"text": "hello from term-llm",
"model": "tts-kokoro",
"voice": "af_sky",
"format": "mp3",
"output": {
"path": "/home/me/Music/term-llm/20260502-120000-hello_from_term-llm.mp3",
"mime_type": "audio/mpeg",
"bytes": 12345
}
}
Credentials and Config
term-llm audio reads Venice credentials from VENICE_API_KEY, audio.venice.api_key, or the existing image.venice.api_key fallback.
Gemini credentials are read from GEMINI_API_KEY, audio.gemini.api_key, image.gemini.api_key, or the configured providers.gemini API key.
ElevenLabs credentials are read from ELEVENLABS_API_KEY, XI_API_KEY, or audio.elevenlabs.api_key.
audio:
provider: venice
output_dir: ~/Music/term-llm
venice:
api_key: $VENICE_API_KEY
model: tts-kokoro
voice: af_sky
format: mp3
gemini:
api_key: $GEMINI_API_KEY
model: gemini-3.1-flash-tts-preview
voice: Kore
format: wav
elevenlabs:
api_key: $ELEVENLABS_API_KEY
model: eleven_multilingual_v2
voice: JBFqnCBsd6RMkjVDRZzb
format: mp3_44100_128