Text embeddings · term-llm docs

On this page

Use term-llm embed when you need vectors for retrieval, ranking, clustering, semantic similarity, or local pipeline glue.

term-llm embed "What is the meaning of life?"

Embeddings are numerical representations of text (arrays of floats) that capture semantic meaning. The embed command takes text input, calls an embedding API, and outputs vectors.

Embed Input Methods

# Positional arguments (each embedded separately)
term-llm embed "first text" "second text" "third text"

# From stdin
echo "Hello world" | term-llm embed

# From files
term-llm embed -f document.txt
term-llm embed -f doc1.txt -f doc2.txt

# Mixed
term-llm embed "query text" -f corpus.txt

Embed Flags

Flag	Short	Description
`--provider`	`-p`	Override provider (gemini, openai, jina, voyage, ollama) with optional model
`--file`	`-f`	Input file(s) to embed (repeatable)
`--format`		Output format: `json` (default), `array`, `plain`
`--output`	`-o`	Write output to file
`--dimensions`		Custom output dimensions (Matryoshka truncation)
`--task-type`		Task type hint (e.g., RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY)
`--similarity`		Compare texts by cosine similarity instead of outputting vectors

Embed Output Formats

# JSON with metadata (default)
term-llm embed "hello"
# → {"model": "gemini-embedding-001", "dimensions": 3072, "embeddings": [...]}

# Bare JSON array(s) — one per input, for piping
term-llm embed "hello" --format array
# → [0.0023, -0.0094, 0.0156, ...]

# One number per line (single input only)
term-llm embed "hello" --format plain
# → 0.0023
#   -0.0094
#   ...

# Save to file
term-llm embed "hello" -o embeddings.json

Similarity Mode

Compare texts by cosine similarity without manually handling vectors:

# Pairwise comparison
term-llm embed --similarity "king" "queen"
# → 0.834521

# Rank multiple texts against a query (first argument)
term-llm embed --similarity "What is AI?" "Machine learning is a subset of AI" "The weather is nice" "Neural networks process data"
# → 1. 0.891234  Machine learning is a subset of AI
#   2. 0.812456  Neural networks process data
#   3. 0.234567  The weather is nice

Embed Examples

# Provider/model selection
term-llm embed "hello" -p openai                          # use OpenAI
term-llm embed "hello" -p openai:text-embedding-3-large   # specific model
term-llm embed "hello" -p gemini                           # use Gemini
term-llm embed "hello" -p jina                             # use Jina (free tier)
term-llm embed "hello" -p voyage                           # use Voyage AI
term-llm embed "hello" -p ollama:nomic-embed-text          # local Ollama

# Custom dimensions (Matryoshka)
term-llm embed "hello" --dimensions 256

# Gemini task type hints
term-llm embed "search query" --task-type RETRIEVAL_QUERY -p gemini
term-llm embed -f doc.txt --task-type RETRIEVAL_DOCUMENT -p gemini

Embedding Providers

Provider	Default Model	Dimensions	Environment Variable	Free Tier
Gemini (default)	`gemini-embedding-001`	3072 (128–3072)	`GEMINI_API_KEY`	Yes
OpenAI	`text-embedding-3-small`	1536 (customizable)	`OPENAI_API_KEY`	No
Jina AI	`jina-embeddings-v3`	1024 (customizable)	`JINA_API_KEY`	Yes (10M tokens)
Voyage AI	`voyage-3.5`	1024 (256–2048)	`VOYAGE_API_KEY`	No
Ollama	`nomic-embed-text`	768	—	Local

Embedding providers use their own credentials, separate from text and image providers. The default provider is auto-detected from your LLM provider (Gemini users → Gemini, OpenAI users → OpenAI, Anthropic users → Voyage if configured, otherwise Gemini).

Jina AI is a great choice for getting started — sign up at jina.ai/embeddings for a free API key with 10M tokens, no credit card required.

Voyage AI is Anthropic’s recommended embedding partner (acquired by MongoDB, Feb 2025). The API remains fully available.

When to use it

Typical uses include:

embedding a query and a corpus for retrieval
comparing candidate text by cosine similarity
generating vectors for a local RAG index
piping embeddings into your own scripts or downstream tools