Streaming Lip Sync Sessions — createStreamingSession

createStreamingSession() is the low-level streaming primitive on LipsyncClient. You feed it fixed-size audio windows and it returns one viseme result per window. Most React apps should use useLipsyncStream instead — it owns the audio graph for you. Reach for the raw session when you control the audio source yourself (a custom worklet, a Node pipeline, a non-React app).

The window contract

const session = client.createStreamingSession();

// audioWindow: 16 kHz mono Float32 in [-1, 1], 400 samples (25 ms)
const frame = await session.pushWindow(audioWindow);

session.close();

Rule	Detail
Sample rate	16 kHz mono. Use `client.resample(buf, fromRate, 16000)` upstream.
Window size	400 samples — exactly 25 ms.
Cadence	One window at a time, in order. `pushWindow` is async — `await` each.
Cleanup	`session.close()` when the utterance/stream ends.

pushWindow resolves to a LipsyncStreamingFrameResult:

interface LipsyncStreamingFrameResult {
  visemeId: number;        // viseme id for this window
  silenceDetected: boolean; // input was below the silence floor
  frameIndex: number;       // monotonically increasing window counter
}

Drive the avatar

Append each result to a streaming MascotPlayback. offset is the audio-position in ms; the playback’s animation-frame clock fires the viseme when its time arrives.

import { MascotPlayback, getRiveInputs } from "@mascotbot/core/rive";

const playback = new MascotPlayback({ riveInputs: getRiveInputs(rive), stream: true, enableNaturalLipSync: true });
playback.play();

let ms = 0;
for await (const window of windows /* your 25 ms Float32 chunks */) {
  const frame = await session.pushWindow(window);
  if (!frame.silenceDetected) playback.pushVisemes([{ offset: ms, visemeId: frame.visemeId }]);
  ms += 25;
}
session.close();

Silence handling

silenceDetected reflects the SDK’s internal −50 dBFS input-amplitude silence gate. It suppresses the phantom mouth shapes a naive pipeline would emit during the inference tail after speech stops — you do not implement your own gate. Treat silenceDetected: true windows as “mouth at rest”.

Barge-in and interruption

To cut a response short (the user interrupts), stop feeding windows and reset playback:

playback.reset(); // clear queued cues; mouth returns to rest
// open a fresh session for the next utterance if needed
const next = client.createStreamingSession();

If your audio arrives as raw PCM from a network source, pair createPCMStreamPlayer (which plays it gap-tolerantly and exposes a MediaStream) with useLipsyncStream rather than hand-windowing — that handles buffering and the tap for you.

React equivalent

useLipsyncStream({ source: { kind: "manual" } }) wraps a streaming session and exposes pushAudio / pushBase64PCM16 / reset, while keeping the audio graph stable across renders. Prefer it in React; use the raw session only when you need full control of the window loop.

Core client

init, processAudio, events.

PCM stream player

Play + tap raw PCM.

Streaming & mic (React)

The React wrapper.

Getting Started

Core concepts

React SDK

Core SDK (vanilla)

Realtime providers

Reference

Ready-made Mascots

Streaming Lip Sync Sessions - createStreamingSession & pushWindow

The window contract

Drive the avatar

Silence handling

Barge-in and interruption

React equivalent

Next

Core client

PCM stream player

Streaming & mic (React)

Getting Started

Core concepts

React SDK

Core SDK (vanilla)

Realtime providers

Reference

Ready-made Mascots

Documentation Index

​The window contract

​Drive the avatar

​Silence handling

​Barge-in and interruption

​React equivalent

​Next

Core client

PCM stream player

Streaming & mic (React)

The window contract

Drive the avatar

Silence handling

Barge-in and interruption

React equivalent

Next