@ai_kit/core includes full-duplex realtime transcription: push audio chunks as they arrive (microphone, live stream) over a WebSocket and receive transcription deltas as they come. It is compatible with Mistral’s realtime API (Voxtral model).
Not to be confused with
createTranscriptionStreamingModel (see Audio Transcription), which streams the output of a complete uploaded file. Here the input is pushed continuously — ideal for a microphone.Why a native WebSocket client?
The Vercel AI SDK (ai) has no realtime transcription primitive: experimental_transcribe / transcribe and the TranscriptionModelV3 interface are batch only. So @ai_kit/core ships a small direct WebSocket client — with no extra runtime dependency (Node ≥ 22’s global WebSocket sends the Authorization: Bearer header via undici).
Two public primitives
| Export | Role |
|---|---|
createRealtimeTranscription(config) | Generic, config-driven factory (Mistral-compatible by default, reusable for any compatible WebSocket endpoint) |
mistralRealtimeTranscription(opts?) | Mistral-first shortcut: applies the model, base URL, and MISTRAL_API_KEY fallback |
Audio format
Mistral expects raw PCMpcm_s16le, 16000 Hz, mono. No conversion is bundled. To convert a file with ffmpeg:
Quickstart — transcribeStream (high-level)
Best for transcribing a file or stream you can iterate. Pass an AsyncIterable<Uint8Array> of PCM and receive events until done.
transcribeStream opens the connection, pumps the audio in the background (then sends flush + end), and stops automatically after the done or error event.
Microphone / pushed source — connect (low-level)
When audio arrives via callbacks (microphone, incoming WebSocket), open a session and push chunks yourself.
Session methods
| Method | Role |
|---|---|
sendAudio(chunk) | Base64-encodes and sends PCM (auto-splits chunks > 262144 bytes) |
flush() | Asks the provider to flush its buffer and emit pending transcription |
end() | Signals the end of the audio stream |
close(code?, reason?) | Closes the WebSocket and ends the event stream |
events() | Async iterator over normalized events (same as for await ... of session) |
Normalized events
{ type: "unknown", raw } (never thrown) for forward compatibility.
Configuration
Connection options
| Option | Role |
|---|---|
audioFormat | { encoding, sampleRate } sent via session.update before audio |
targetStreamingDelayMs | Latency/accuracy tuning (e.g. 240 for responsiveness, 2400 for accuracy) |
timeoutMs | Handshake timeout (default 30000) |
signal | AbortSignal to interrupt the connection |
headers | Additional headers on the upgrade request |
Error handling
- Connection failure, handshake timeout, or abort → throws a
RealtimeTranscriptionError. - A server
errorevent is surfaced as{ type: "error", error };transcribeStreamstops after emitting it (in low-level mode, the caller decides).