Documentation Index
Fetch the complete documentation index at: https://docs.mascot.bot/llms.txt
Use this file to discover all available pages before exploring further.
createPCMStreamPlayer does one thing: play streamed PCM16
gap-tolerantly, and expose exactly what is playing as a MediaStream you
can feed to lip sync. It is the bridge for realtime AI providers that hand you
raw audio chunks and do not play them (Gemini Live, OpenAI Realtime over
WebSocket) and for server TTS that returns audio only.
It knows nothing about any provider — provider transport parsing is your
glue and stays in your app.
API
options: PCMStreamPlayerOptions
| Option | Type | Notes |
|---|---|---|
sampleRate | number | Required. The PCM sample rate (e.g. 24000 for Gemini Live and OpenAI Realtime). |
initialBufferMs | number? | Pre-roll before playback starts (jitter cushion). |
scheduleAheadMs | number? | How far ahead chunks are scheduled. |
onIdle | (() => void)? | Called once each time the player drains naturally (queue empty + every scheduled buffer finished). Not called by stop()/close(); may re-fire after a later push. See Knowing when playback finished. |
PCMStreamPlayer:
| Member | Type | Purpose |
|---|---|---|
pushBase64PCM16(b64) | (string) => void | Enqueue a base64 PCM16 chunk (Gemini inlineData.data, server TTS). |
pushPCM16(bytes) | (Uint8Array) => void | Enqueue a PCM16 byte chunk (OpenAI Realtime WS ArrayBuffer). |
outputStream | MediaStream (readonly) | The tap — feed this to useLipsyncStream. Runs parallel to the speakers. |
isPlaying | boolean (readonly) | true while audio is queued or scheduled; flips false on natural drain (when onIdle fires) or stop(). |
stop() | () => void | Drop queued audio immediately (barge-in / interruption). |
resume() | () => void | Resume the underlying AudioContext (call inside a user gesture). |
close() | () => Promise<void> | Tear down the player and its audio graph. |
Pattern
Create the player inside the user gesture, before anyawait — an
AudioContext created in a post-fetch microtask starts suspended and cannot
resume without another gesture.
Knowing when playback finished
There is no audio element to listen to, so the player surfaces a natural end through theonIdle option (with an isPlaying getter for polling).
onIdle fires the moment the queue empties and every scheduled buffer
has finished — i.e. all pushed audio has actually been heard.
- It does not fire on
stop()orclose()— those are explicit interruptions, not a natural end. - It can fire more than once per player: a push after a drain restarts playback and a later drain fires it again.
When NOT to use it
Decision rule: does the provider play the audio for you? Yes → tap its playback, no player. No (it hands you raw PCM) →createPCMStreamPlayer.
Server TTS
The same primitive powers “server returns audio only, the SDK does the lip sync”: your route synthesizes speech and returns base64 PCM16; the client plays it through the player and the tap drives the mouth. No server-side visemes, no SSE protocol. See Realtime providers.Next
Realtime providers
The per-provider recipe.
Streaming & mic
Feeding the tap to lip sync.
Core client
The vanilla engine.