Documentation Index
Fetch the complete documentation index at: https://docs.mascot.bot/llms.txt
Use this file to discover all available pages before exploring further.
useLipsyncStream is the single hook for live lip sync. It owns the audio
graph, runs inference per window, and feeds visemes into a MascotPlayback.
One hook, three input sources.
stream: true for live sources:
The three sources
source is a discriminated union.
{ kind: "mic" }
The user’s microphone. Optional constraints?: MediaTrackConstraints.
{ kind: "mediaStream", stream }
Tap any MediaStream — a played <audio>/<video> element, or a realtime AI
provider’s voice. The capture point is the playback point, so the mouth
cannot drift ahead of the speech. Pass null to detach.
createElementTap() is an SDK export (@mascotbot/react,
re-exported from lipsync-core) — the cross-browser tap detailed in
Realtime providers → Tap a playing element
(it replaces captureStream(), which Safari does not implement). The same
helper drives self-playing realtime providers (ElevenLabs, OpenAI Realtime
over WebRTC) — see Realtime providers.
{ kind: "manual" }
You push audio yourself with the returned functions — for providers that hand
you raw PCM, or any custom source:
createPCMStreamPlayer with a
{ kind: "mediaStream" } source is usually cleaner than manual pushing,
because the player also handles gap-tolerant playback.
Return value
| Field | Type | Notes |
|---|---|---|
error | Error | null | Stream/permission/inference error |
attached | boolean | The audio graph is live and tapping |
pushAudio | (Float32Array, number) => Promise<void> | Manual source only |
pushBase64PCM16 | (string, number) => Promise<void> | Manual source only |
reset | () => void | Clears buffered stream state |
Telemetry with onFrame
onFrame fires once per inference window with per-frame detail — wire it to a
debug HUD or metrics, not to your render path:
Lifecycle and stability
useLipsyncStream keeps the audio graph alive across re-renders by routing
client/playback through refs. Rive input handles are referentially
stable and playback is carried across any internal recreate, so an
unrelated parent re-render does not freeze the mouth. Only enabled: false (or a null mediaStream) tears the pipeline down — flip a
button without unmounting the component.
End-of-utterance silence is handled for you
When speech stops, naive pipelines emit phantom mouth shapes during the inference tail. The SDK suppresses this with an internal −50 dBFS input-amplitude silence gate — you do not implement your own gate. The mouth simply settles to rest when the audio goes quiet, on every source.The audio worklet
The worklet is embedded in the SDK and served from a Blob URL by default — there is no file to copy or host. PassworkletUrl only if your Content
Security Policy forbids worker-src blob: or you want CDN caching.
Next
Realtime providers
Tap OpenAI / Gemini / ElevenLabs voices.
PCM stream player
Play + tap raw provider PCM.
Hooks reference
Every hook at a glance.