The problem
When term-llm runs on a home machine behind NAT and the browser connects through a public relay (VPS, reverse proxy, Cloudflare Tunnel), every API request pays the relay’s round-trip penalty. For a relay in a different region that can be 200-500 ms per request, which adds up during streaming conversations.
How it works
WebRTC direct routing adds a peer-to-peer data channel between the browser and the home machine. After a brief ICE negotiation through a signaling server, all subsequent /v1/ API traffic flows directly over UDP, bypassing the relay entirely.
Browser --(HTTPS signaling)--> signaling server (public)
| ^
| | poll
+----(WebRTC data channel, P2P)------> term-llm (home, behind NAT)
The normal HTTPS path is unchanged. WebRTC is purely additive: if ICE negotiation fails (corporate firewall, symmetric NAT, etc.), the browser silently falls back to HTTPS within 8 seconds. No user action required.
Prerequisites
You need an external signaling server accessible from both the browser and the home machine. The signaling server handles the one-time SDP exchange that bootstraps the WebRTC connection. It does not relay any API traffic.
term-llm does not include a signaling server. You need to run one yourself or use an existing service.
Enable it
term-llm serve web \
--webrtc \
--webrtc-signaling-url https://signal.example.com/webrtc \
--webrtc-token "$SIGNAL_TOKEN"
The browser UI shows a lightning bolt indicator when the data channel is active.
Flags
| Flag | Default | Description |
|---|---|---|
--webrtc |
false |
Enable WebRTC direct routing |
--webrtc-signaling-url |
(none) | Signaling server base URL (must be HTTPS) |
--webrtc-token |
(none) | Bearer token for authenticating with the signaling server |
--webrtc-stun |
stun:stun.l.google.com:19302 |
STUN server URL(s), repeatable |
--webrtc-max-conns |
10 |
Maximum concurrent WebRTC connections |
Configuration file
You can set the flags in config.yaml so you don’t have to pass them every time:
serve:
webrtc:
enabled: true
signaling_url: https://signal.example.com/webrtc
token: your-signaling-token
stun_urls:
- stun:stun.l.google.com:19302
max_conns: 10
Then just run:
term-llm serve web --webrtc
Security
WebRTC direct routing does not weaken the existing auth model:
- Auth tokens travel inside the data channel. Every request frame includes the same
Authorization: Bearerheader that HTTPS requests carry. The existing auth middleware validates it. - DTLS encrypts the data channel. The ICE connection is upgraded through DTLS with certificate fingerprint verification against the SDP offer, preventing man-in-the-middle attacks even if the signaling server is compromised.
- Path validation. Only paths under
{basePath}/v1/are dispatched. Requests for static assets, admin endpoints, or path-traversal attempts are rejected before reaching the HTTP handler. - Body size limit. Request bodies larger than 10 MB are rejected.
- Connection cap. The
--webrtc-max-connsflag limits concurrent WebRTC connections to prevent resource exhaustion. - HTTPS-only signaling. term-llm refuses to start if the signaling URL is not HTTPS.
How the signaling protocol works
The signaling server needs to implement two endpoints:
POST /session: creates a session and returns{ "session_id": "..." }along with optional STUN/TURN credentials.POST /signalandGET /signal: exchange SDP offers and answers keyed by session ID.
The browser creates a WebRTC offer, posts it via the signaling server. term-llm polls for offers, generates an answer, and posts it back. Once both sides have each other’s SDP (which includes ICE candidates), they establish a direct UDP connection. After that, the signaling server is no longer involved.
Debugging
Diagnostics mode
Add ?webrtc_diag=1 to the URL to enable detailed console logging:
https://your-host/chat?webrtc_diag=1
This logs the full connection lifecycle with timestamps:
[webrtc] +0ms init signaling=https://signal.example.com/webrtc
[webrtc] +156ms session created id=abc123 turn=turn:relay.example.com:3478 (155ms)
[webrtc] +173ms ICE candidate: srflx udp 203.0.113.5:8171 raddr=0.0.0.0:0
[webrtc] +536ms ICE candidate: relay udp 198.51.100.1:49920 raddr=203.0.113.5:8171
[webrtc] +4166ms ICE gathering complete
[webrtc] +4333ms offer sent (166ms)
[webrtc] +6217ms answer received
[webrtc] +6218ms ICE state=checking
[webrtc] +6224ms ICE state=connected
[webrtc] +6243ms data channel open; fetch patched
[webrtc] +9904ms → GET /chat/v1/sessions/abc123/state (0b)
[webrtc] +9922ms ← 200 GET /chat/v1/sessions/abc123/state (512b, 17ms)
Key things to look for:
- ICE candidates:
hostmeans a local interface,srflxis a STUN-discovered public address (direct),relayis a TURN relay. Direct connections use host or srflx candidates. - Signaling timing: the
(Nms)after session creation and offer sent shows how long the signaling server took. High values suggest network latency to the signaling server. - Per-request latency: each
←line shows status, response size, and round-trip time. Compare against the HTTPS path to confirm WebRTC is faster.
You can also set window.__WEBRTC_DIAGNOSTICS__ = true in the browser console before page load for the same effect.
Force TURN relay
Add ?webrtc_turn=1 to force all traffic through the TURN relay, ignoring direct candidates:
https://your-host/chat?webrtc_diag=1&webrtc_turn=1
This sets iceTransportPolicy=relay on the RTCPeerConnection. Only relay (TURN) candidates are used. Useful for:
- Measuring TURN latency vs direct connectivity
- Verifying your TURN server is working
- Testing the fallback path without leaving your LAN
With diagnostics enabled, you’ll see only relay candidates gathered, with no host or srflx.
Server-side logging
term-llm logs WebRTC events to stderr with the webrtc: prefix:
webrtc: ICE connection established for session abc123
webrtc: DTLS handshake complete for session abc123
webrtc: SCTP association established for session abc123
webrtc: data channel open for session abc123
If connections fail silently, check these logs for DTLS handshake errors or SCTP association failures.
Troubleshooting
No lightning bolt in the UI: WebRTC negotiation failed silently. Check:
- The signaling URL is reachable from both the browser and the home machine
- The signaling token is correct
- STUN servers are reachable (corporate firewalls sometimes block UDP)
- Both sides can reach each other via UDP (symmetric NAT on both sides will prevent direct connections without a TURN relay)
- Enable
?webrtc_diag=1to see where the process stalls
Works locally but not remotely: you likely need a TURN relay for NAT traversal. Add a TURN server URL via --webrtc-stun:
term-llm serve web --webrtc \
--webrtc-signaling-url https://signal.example.com/webrtc \
--webrtc-stun "turn:turn.example.com:3478?transport=udp"
High latency despite direct connection: enable ?webrtc_diag=1 and check:
- The ICE candidate pair should use
hostorsrflx, notrelay - If you see only
relaycandidates, your STUN server may be unreachable - Use
?webrtc_turn=1to confirm TURN adds measurable latency, then compare without it
ICE gathering takes 40+ seconds: likely IPv6 STUN timeouts or TCP TURN candidates. Ensure your TURN URL includes ?transport=udp to prevent TCP fallback attempts. The browser caps gathering at 4 seconds to mitigate this, but broken STUN servers can still delay the process.
Data channel opens then immediately closes: check server logs for panics. Common causes include missing SRTP protection profiles in the DTLS config (Chrome always negotiates use_srtp even for data channels) or nil pointer dereferences in pion library initialization.