Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mascot.bot/llms.txt

Use this file to discover all available pages before exploring further.

createStreamingSession() is the low-level streaming primitive on LipsyncClient. You feed it fixed-size audio windows and it returns one viseme result per window. Most React apps should use useLipsyncStream instead — it owns the audio graph for you. Reach for the raw session when you control the audio source yourself (a custom worklet, a Node pipeline, a non-React app).

The window contract

const session = client.createStreamingSession();

// audioWindow: 16 kHz mono Float32 in [-1, 1], 400 samples (25 ms)
const frame = await session.pushWindow(audioWindow);

session.close();
RuleDetail
Sample rate16 kHz mono. Use client.resample(buf, fromRate, 16000) upstream.
Window size400 samples — exactly 25 ms.
CadenceOne window at a time, in order. pushWindow is async — await each.
Cleanupsession.close() when the utterance/stream ends.
pushWindow resolves to a LipsyncStreamingFrameResult:
interface LipsyncStreamingFrameResult {
  visemeId: number;        // viseme id for this window
  silenceDetected: boolean; // input was below the silence floor
  frameIndex: number;       // monotonically increasing window counter
}

Drive the avatar

Append each result to a streaming MascotPlayback. offset is the audio-position in ms; the playback’s animation-frame clock fires the viseme when its time arrives.
import { MascotPlayback, getRiveInputs } from "@mascotbot/core/rive";

const playback = new MascotPlayback({ riveInputs: getRiveInputs(rive), stream: true, enableNaturalLipSync: true });
playback.play();

let ms = 0;
for await (const window of windows /* your 25 ms Float32 chunks */) {
  const frame = await session.pushWindow(window);
  if (!frame.silenceDetected) playback.pushVisemes([{ offset: ms, visemeId: frame.visemeId }]);
  ms += 25;
}
session.close();

Silence handling

silenceDetected reflects the SDK’s internal −50 dBFS input-amplitude silence gate. It suppresses the phantom mouth shapes a naive pipeline would emit during the inference tail after speech stops — you do not implement your own gate. Treat silenceDetected: true windows as “mouth at rest”.

Barge-in and interruption

To cut a response short (the user interrupts), stop feeding windows and reset playback:
playback.reset(); // clear queued cues; mouth returns to rest
// open a fresh session for the next utterance if needed
const next = client.createStreamingSession();
If your audio arrives as raw PCM from a network source, pair createPCMStreamPlayer (which plays it gap-tolerantly and exposes a MediaStream) with useLipsyncStream rather than hand-windowing — that handles buffering and the tap for you.

React equivalent

useLipsyncStream({ source: { kind: "manual" } }) wraps a streaming session and exposes pushAudio / pushBase64PCM16 / reset, while keeping the audio graph stable across renders. Prefer it in React; use the raw session only when you need full control of the window loop.

Next

Core client

init, processAudio, events.

PCM stream player

Play + tap raw PCM.

Streaming & mic (React)

The React wrapper.