Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mascot.bot/llms.txt

Use this file to discover all available pages before exploring further.

createPCMStreamPlayer does one thing: play streamed PCM16 gap-tolerantly, and expose exactly what is playing as a MediaStream you can feed to lip sync. It is the bridge for realtime AI providers that hand you raw audio chunks and do not play them (Gemini Live, OpenAI Realtime over WebSocket) and for server TTS that returns audio only. It knows nothing about any provider — provider transport parsing is your glue and stays in your app.
import { createPCMStreamPlayer } from "@mascotbot/core";

const player = createPCMStreamPlayer({ sampleRate: 24000 });

API

const player = createPCMStreamPlayer(options);
options: PCMStreamPlayerOptions
OptionTypeNotes
sampleRatenumberRequired. The PCM sample rate (e.g. 24000 for Gemini Live and OpenAI Realtime).
initialBufferMsnumber?Pre-roll before playback starts (jitter cushion).
scheduleAheadMsnumber?How far ahead chunks are scheduled.
onIdle(() => void)?Called once each time the player drains naturally (queue empty + every scheduled buffer finished). Not called by stop()/close(); may re-fire after a later push. See Knowing when playback finished.
Returned PCMStreamPlayer:
MemberTypePurpose
pushBase64PCM16(b64)(string) => voidEnqueue a base64 PCM16 chunk (Gemini inlineData.data, server TTS).
pushPCM16(bytes)(Uint8Array) => voidEnqueue a PCM16 byte chunk (OpenAI Realtime WS ArrayBuffer).
outputStreamMediaStream (readonly)The tap — feed this to useLipsyncStream. Runs parallel to the speakers.
isPlayingboolean (readonly)true while audio is queued or scheduled; flips false on natural drain (when onIdle fires) or stop().
stop()() => voidDrop queued audio immediately (barge-in / interruption).
resume()() => voidResume the underlying AudioContext (call inside a user gesture).
close()() => Promise<void>Tear down the player and its audio graph.

Pattern

Create the player inside the user gesture, before any await — an AudioContext created in a post-fetch microtask starts suspended and cannot resume without another gesture.
import { createPCMStreamPlayer } from "@mascotbot/core";
import { useLipsyncStream } from "@mascotbot/react/rive";

const player = createPCMStreamPlayer({ sampleRate: 24000 });

// The tap drives the avatar; the player drives the speakers.
useLipsyncStream({
  client,
  playback,
  source: { kind: "mediaStream", stream: player.outputStream },
});

// Gemini Live (@google/genai): assistant audio is base64 PCM16
session.onmessage = (m) => {
  const b64 = m?.serverContent?.modelTurn?.parts?.[0]?.inlineData?.data;
  if (typeof b64 === "string") player.pushBase64PCM16(b64);
  if (m?.serverContent?.interrupted) player.stop();
};

// OpenAI Realtime (WebSocket): assistant audio is a PCM16 ArrayBuffer
session.on("audio", (e) => player.pushPCM16(new Uint8Array(e.data)));
session.on("audio_interrupted", () => player.stop());

Knowing when playback finished

There is no audio element to listen to, so the player surfaces a natural end through the onIdle option (with an isPlaying getter for polling). onIdle fires the moment the queue empties and every scheduled buffer has finished — i.e. all pushed audio has actually been heard.
  • It does not fire on stop() or close() — those are explicit interruptions, not a natural end.
  • It can fire more than once per player: a push after a drain restarts playback and a later drain fires it again.
This is the signal to drive a queue / sequential consumer (streamed-TTS playlists, multi-utterance agents): advance to the next item, reset the avatar, or release resources — without estimating audio duration from byte counts.
const player = createPCMStreamPlayer({
  sampleRate: 24000,
  onIdle: () => {
    // All queued audio has played out. If more is still being fetched,
    // ignore — the next push restarts the player and onIdle fires
    // again when that drains.
    if (queue.length > 0 || fetching) return;
    playback.reset();    // return the avatar to neutral
    void player.close(); // release the AudioContext (or keep for reuse)
  },
});

When NOT to use it

Never route a self-playing provider through createPCMStreamPlayer — you would hear the voice twice (double audio). ElevenLabs Conversational AI and OpenAI Realtime over WebRTC play the audio themselves. For those, do not use the player: tap their existing playback with the SDK’s cross-browser createElementTap() and feed that to useLipsyncStream({ source: { kind: "mediaStream", stream } }).
Decision rule: does the provider play the audio for you? Yes → tap its playback, no player. No (it hands you raw PCM) → createPCMStreamPlayer.

Server TTS

The same primitive powers “server returns audio only, the SDK does the lip sync”: your route synthesizes speech and returns base64 PCM16; the client plays it through the player and the tap drives the mouth. No server-side visemes, no SSE protocol. See Realtime providers.

Next

Realtime providers

The per-provider recipe.

Streaming & mic

Feeding the tap to lip sync.

Core client

The vanilla engine.