Providers and models

On this page

Discover providers and models

term-llm providers
term-llm providers --configured
term-llm providers anthropic

term-llm models --provider anthropic
term-llm models --provider openrouter
term-llm models --provider ollama
term-llm models --json

Use providers when you want to know what is available and how it is configured. Use models when you want the concrete model names a provider currently exposes.

Provider categories

term-llm supports a mix of provider types:

hosted API providers such as Anthropic, AWS Bedrock, OpenAI, xAI, Gemini, and OpenRouter
subscription-backed OAuth providers such as ChatGPT, Copilot, and Gemini CLI
local or self-hosted OpenAI-compatible providers such as Ollama, LM Studio, vLLM, or custom endpoints

Credentials

Most providers use API keys via environment variables. Some use OAuth credentials from companion CLIs or locally stored auth files.

Provider	Credentials source	Notes
`anthropic`	`ANTHROPIC_API_KEY`	API key
`bedrock`	AWS credential chain or explicit `access_key_id` / `secret_access_key`	Anthropic Claude via AWS Bedrock
`openai`	`OPENAI_API_KEY`	Standard OpenAI API key
`chatgpt`	`~/.config/term-llm/chatgpt_creds.json`	ChatGPT Plus/Pro OAuth
`copilot`	`~/.config/term-llm/copilot_creds.json`	GitHub Copilot OAuth
`gemini`	`GEMINI_API_KEY`	Google AI Studio key
`gemini-cli`	`~/.gemini/oauth_creds.json`	gemini-cli OAuth
`xai`	`XAI_API_KEY`	xAI API key
`openrouter`	`OPENROUTER_API_KEY`	OpenRouter API key
`zen`	`ZEN_API_KEY` optional	empty is valid for free tier

Examples:

term-llm ask --provider anthropic "question"
term-llm ask --provider chatgpt "question"
term-llm ask --provider copilot "question"
term-llm ask --provider gemini-cli "question"

WebSocket defaults

The built-in openai and chatgpt text providers use the Responses WebSocket transport by default. This improves latency in agentic/tool-heavy runs by reusing one connection and continuing compatible turns with previous_response_id plus only new input. If setup fails before streaming starts, term-llm falls back to HTTP/SSE; if a WebSocket continuation rejects the previous response ID, it retries once with full input.

To force HTTP/SSE for either built-in provider:

providers:
  openai:
    use_websocket: false
  chatgpt:
    use_websocket: false

OpenAI-compatible providers remain HTTP/SSE by default. WebSocket defaults are not applied to type: openai_compatible entries.

OpenAI-compatible providers

For local or custom backends, use type: openai_compatible.

providers:
  ollama:
    type: openai_compatible
    base_url: http://localhost:11434/v1
    model: llama3.2:latest

  lmstudio:
    type: openai_compatible
    base_url: http://localhost:1234/v1
    model: deepseek-coder-v2

  cerebras:
    type: openai_compatible
    base_url: https://api.cerebras.ai/v1
    model: llama-4-scout-17b
    api_key: ${CEREBRAS_API_KEY}

Use base_url when the standard /chat/completions path should be appended automatically. Use url when you need to specify the full chat completions endpoint directly.

Configuration reference

Field	Type	Description
`type`	string	Must be `openai_compatible` for custom providers. Inferred automatically for known names like `ollama`, `cerebras`, `groq`.
`base_url`	string	Base URL (e.g., `http://localhost:11434/v1`). `/chat/completions` is appended automatically.
`url`	string	Full chat completions URL, used as-is. Use this when your endpoint path differs from the standard. Supports `srv://` for DNS SRV discovery and `$()` for command-based resolution.
`api_key`	string	API key. Supports `${ENV_VAR}`, `op://`, `file://`, and `$()` resolution. If omitted, term-llm tries `<PROVIDER_NAME>_API_KEY` from the environment.
`model`	string	Default model name sent to the server.
`models`	list	Optional list of model names for shell tab completion with `--provider name:<TAB>`.
`fast_model`	string	Lightweight model used for control-plane tasks (e.g., title generation).
`fast_provider`	string	Provider key to use for `fast_model` if it lives on a different provider.
`context_window`	int	Override context window size in tokens. Use this for self-hosted models not in the built-in token limit tables.
`max_output_tokens`	int	Override maximum output tokens. Same use case as `context_window`.
`no_stream_options`	bool	When `true`, don’t send `stream_options` in the request. Use this for servers that reject the field. Default `false`; most OpenAI-compatible servers (vLLM, Ollama, LM Studio) support it and need it to report token usage.
`use_websocket`	bool	Reserved for providers with native Responses WebSocket support. Defaults to `true` only for built-in `openai` and `chatgpt`; OpenAI-compatible providers default to HTTP/SSE.

Full example

providers:
  my-vllm:
    type: openai_compatible
    base_url: http://gpu-server:8000/v1
    model: Qwen/Qwen3-30B-A3B
    api_key: ${VLLM_API_KEY}
    context_window: 32768
    max_output_tokens: 8192
    models:
      - Qwen/Qwen3-30B-A3B
      - Qwen/Qwen3-8B

  legacy-server:
    type: openai_compatible
    url: http://old-server:5000/api/chat
    model: custom-finetune
    no_stream_options: true  # this server rejects stream_options

Reasoning and model suffixes

OpenAI reasoning effort

For OpenAI models, append -low, -medium, -high, or -xhigh to control reasoning effort.

term-llm ask --provider openai:gpt-5.2-xhigh "complex question"
term-llm exec --provider openai:gpt-5.2-low "quick task"

providers:
  openai:
    model: gpt-5.2-high

Effort	Meaning
`low`	faster, cheaper, less thorough
`medium`	balanced default
`high`	more thorough reasoning
`xhigh`	maximum reasoning on supported models

Anthropic extended thinking

For Anthropic models, append -thinking:

term-llm ask --provider anthropic:claude-sonnet-4-6-thinking "complex question"

providers:
  anthropic:
    model: claude-sonnet-4-6-thinking

AWS Bedrock

The bedrock provider routes Anthropic Claude models through AWS Bedrock. It supports the same model suffixes (-thinking, -1m) and has full feature parity with the direct anthropic provider.

Authentication uses the standard AWS credential chain (AWS_ACCESS_KEY_ID env var, ~/.aws/credentials, instance profiles), or explicit credentials in config:

providers:
  bedrock:
    region: us-west-2
    access_key_id: $(op-cache read "op://Private/AWS Bedrock/AWS_ACCESS_KEY_ID")
    secret_access_key: $(op-cache read "op://Private/AWS Bedrock/AWS_SECRET_ACCESS_KEY")
    model: claude-sonnet-4-6-thinking

Model resolution uses a 3-tier system. Friendly model names like claude-sonnet-4-6 are automatically translated to Bedrock cross-region IDs. Use model_map to override with application inference profile ARNs or specific Bedrock IDs:

providers:
  bedrock:
    region: us-west-2
    model: claude-sonnet-4-6-thinking
    model_map:
      claude-sonnet-4-6: arn:aws:bedrock:us-west-2:123456789:application-inference-profile/abc123
      claude-opus-4-6: us.anthropic.claude-opus-4-6-v1

Suffixes are stripped before lookup, so claude-sonnet-4-6-1m-thinking strips to claude-sonnet-4-6, resolves through model_map, then re-applies thinking and 1M context.

The geographic prefix (us., eu., ap.) is derived from the configured region automatically. For example, eu-west-1 produces eu.anthropic.* IDs, ap-southeast-1 produces ap.anthropic.*, etc. This ensures data residency matches your region without manual override.

Raw Bedrock model IDs (us.anthropic.claude-sonnet-4-6, anthropic.claude-sonnet-4-6) and full ARNs are passed through without translation.

Config field	Description
`region`	AWS region. Falls back to `AWS_REGION` env var, then `us-east-1`.
`profile`	AWS profile name from `~/.aws/credentials`.
`access_key_id`	Explicit AWS access key. Supports `$()`, `op://`, `${ENV}`.
`secret_access_key`	Explicit AWS secret key. Same resolution support.
`session_token`	Optional session token for temporary credentials.
`model_map`	Map of friendly names to Bedrock model IDs or ARNs.

Native search support

Some providers support native web search. Others rely on external search tooling.

Native support is most relevant for:

Anthropic
Bedrock
OpenAI
xAI
Gemini

You can override behavior with:

term-llm ask "latest news" -s --native-search
term-llm ask "latest news" -s --no-native-search

Or in config:

search:
  force_external: true

providers:
  gemini:
    use_native_search: false

See Search for the full routing model.

Recommendations by use case

fast free experimentation: zen
OpenAI ecosystem / Codex editing: openai
Claude models: anthropic
Claude models via AWS billing: bedrock
broad model access: openrouter
local inference: ollama or another OpenAI-compatible endpoint
subscription-backed consumer access: chatgpt, copilot, or gemini-cli