Audio generation · term-llm docs

On this page

Generate speech audio using a configured text-to-speech provider.

term-llm audio "hello from term-llm"

By default, audio clips are:

Saved to ~/Music/term-llm/ with timestamped filenames
Generated with Venice tts-kokoro
Rendered as MP3 using voice af_sky

You can also use Gemini or ElevenLabs TTS:

term-llm audio "Say cheerfully: hello from Gemini" --provider gemini --voice Kore
term-llm audio "Hello from ElevenLabs" --provider elevenlabs --voice Rachel --model eleven_flash_v2_5

Audio Flags

Flag	Short	Description
`--provider`	`-p`	Audio provider override: `venice`, `gemini`, `elevenlabs`
`--output`	`-o`	Custom output path, or `-` for stdout
`--model`		TTS model override
`--voice`		Model-specific voice; ElevenLabs accepts a voice ID or account voice name; Venice accepts cloned voice handles like `vv_<id>`
`--voice1`		Gemini multi-speaker voice for `--speaker1`
`--voice2`		Gemini multi-speaker voice for `--speaker2`
`--speaker1`		Gemini multi-speaker label for the first speaker
`--speaker2`		Gemini multi-speaker label for the second speaker
`--language`		Optional provider language hint (`English`, `en`, etc.; provider/model-specific)
`--prompt`		Style/emotion prompt for models that support it
`--format`		Venice: `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`; Gemini: `wav`, `pcm`; ElevenLabs: `mp3_44100_128`, `pcm_24000`, `wav_44100`, etc.
`--speed`		Venice speech speed `0.25` to `4.0`; ElevenLabs voice speed `0.7` to `1.2`
`--streaming`		Ask supported providers to stream; term-llm still collects before saving
`--temperature`		Sampling temperature for supported models, `0` to `2`; omitted by default
`--top-p`		Nucleus sampling for supported models, `0` to `1`; omitted by default
`--stability`		ElevenLabs voice stability, `0` to `1`; omitted by default
`--similarity-boost`		ElevenLabs similarity boost, `0` to `1`; omitted by default
`--style`		ElevenLabs style exaggeration, `0` to `1`; omitted by default
`--speaker-boost`		ElevenLabs speaker boost voice setting
`--seed`		ElevenLabs deterministic seed
`--previous-text` / `--next-text`		ElevenLabs continuity context
`--previous-request-ids` / `--next-request-ids`		ElevenLabs comma-separated request IDs for continuity
`--pronunciation-dictionaries`		ElevenLabs comma-separated pronunciation dictionary IDs, optionally `id:version`
`--use-pvc-as-ivc`		ElevenLabs lower-latency PVC workaround
`--apply-text-normalization`		ElevenLabs text normalization: `auto`, `on`, `off`
`--apply-language-text-normalization`		ElevenLabs language text normalization
`--optimize-streaming-latency`		ElevenLabs latency optimization level, `0` to `4`
`--enable-logging`		ElevenLabs request logging/history; set false for zero-retention-capable accounts
`--json`		Emit machine-readable JSON to stdout
`--debug`	`-d`	Show debug information

--model, --voice, --voice1, --voice2, --format, --provider, and --apply-text-normalization include shell completion candidates.

Examples

term-llm audio "hello from term-llm"
term-llm audio "quick smoke test" --output smoke.mp3
term-llm audio "faster please" --speed 1.25 --format wav
term-llm audio "sad robot noises" \
  --model tts-qwen3-0-6b \
  --voice Vivian \
  --prompt "Sad and slow."

term-llm audio "Say cheerfully: have a wonderful day" \
  --provider gemini \
  --model gemini-3.1-flash-tts-preview \
  --voice Kore \
  --format wav

term-llm audio "TTS the following conversation between Joe and Jane: Joe: Hi Jane. Jane: Hi Joe." \
  --provider gemini \
  --speaker1 Joe --voice1 Kore \
  --speaker2 Jane --voice2 Puck

term-llm audio "A one second ElevenLabs smoke test." \
  --provider elevenlabs \
  --model eleven_flash_v2_5 \
  --voice Rachel \
  --format mp3_44100_128 \
  --stability 0.5 \
  --similarity-boost 0.75

echo "pipe me" | term-llm audio --voice af_bella -o - > out.mp3

Venice TTS Models

term-llm includes the Venice text-to-speech model catalog:

Model	Notes
`tts-kokoro`	Default, cheap general TTS
`tts-qwen3-0-6b`	Qwen 3 TTS, supports style prompt / sampling options
`tts-qwen3-1-7b`	Larger Qwen 3 TTS
`tts-xai-v1`	xAI TTS v1
`tts-inworld-1-5-max`	Inworld TTS-1.5 Max
`tts-chatterbox-hd`	Chatterbox HD; supports cloned voices
`tts-orpheus`	Orpheus TTS
`tts-elevenlabs-turbo-v2-5`	ElevenLabs Turbo v2.5
`tts-minimax-speech-02-hd`	MiniMax Speech-02 HD
`tts-gemini-3-1-flash`	Gemini 3.1 Flash TTS

Voices are model-specific. If a model rejects a voice, Venice returns the API error directly.

Gemini TTS Models

Model	Single speaker	Multi-speaker
`gemini-3.1-flash-tts-preview`	Yes	Yes
`gemini-2.5-flash-preview-tts`	Yes	Yes
`gemini-2.5-pro-preview-tts`	Yes	Yes

Gemini TTS returns 24 kHz mono PCM. term-llm can save it directly as pcm, or wrap it as a wav file. Gemini TTS does not support streaming; language is auto-detected.

Gemini prebuilt voices:

Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafat.

ElevenLabs TTS Models

term-llm includes the documented ElevenLabs text-to-speech models:

Model	Notes
`eleven_v3`	Latest expressive speech model
`eleven_multilingual_v2`	Default high-quality multilingual speech
`eleven_flash_v2_5`	Low-latency multilingual speech
`eleven_flash_v2`	Low-latency English speech
`eleven_turbo_v2_5`	Deprecated predecessor of Flash v2.5
`eleven_turbo_v2`	Deprecated predecessor of Flash v2
`eleven_monolingual_v1`	Deprecated English-only model
`eleven_multilingual_v1`	Deprecated multilingual model

ElevenLabs voices are account-specific. --voice accepts either a raw voice_id or an exact voice name from the account; name lookup uses the ElevenLabs voices API before generating speech.

ElevenLabs output formats:

alaw_8000, mp3_22050_32, mp3_24000_48, mp3_44100_32, mp3_44100_64, mp3_44100_96, mp3_44100_128, mp3_44100_192, opus_48000_32, opus_48000_64, opus_48000_96, opus_48000_128, opus_48000_192, pcm_8000, pcm_16000, pcm_22050, pcm_24000, pcm_32000, pcm_44100, pcm_48000, ulaw_8000, wav_8000, wav_16000, wav_22050, wav_24000, wav_32000, wav_44100, wav_48000.

JSON Output

--json prints a single structured object to stdout after saving the file.

{
  "provider": "venice",
  "text": "hello from term-llm",
  "model": "tts-kokoro",
  "voice": "af_sky",
  "format": "mp3",
  "output": {
    "path": "/home/me/Music/term-llm/20260502-120000-hello_from_term-llm.mp3",
    "mime_type": "audio/mpeg",
    "bytes": 12345
  }
}

Credentials and Config

term-llm audio reads Venice credentials from VENICE_API_KEY, audio.venice.api_key, or the existing image.venice.api_key fallback.

Gemini credentials are read from GEMINI_API_KEY, audio.gemini.api_key, image.gemini.api_key, or the configured providers.gemini API key.

ElevenLabs credentials are read from ELEVENLABS_API_KEY, XI_API_KEY, or audio.elevenlabs.api_key.

audio:
  provider: venice
  output_dir: ~/Music/term-llm
  venice:
    api_key: $VENICE_API_KEY
    model: tts-kokoro
    voice: af_sky
    format: mp3
  gemini:
    api_key: $GEMINI_API_KEY
    model: gemini-3.1-flash-tts-preview
    voice: Kore
    format: wav
  elevenlabs:
    api_key: $ELEVENLABS_API_KEY
    model: eleven_multilingual_v2
    voice: JBFqnCBsd6RMkjVDRZzb
    format: mp3_44100_128