Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mascot.bot/llms.txt

Use this file to discover all available pages before exploring further.

useLipsyncStream is the single hook for live lip sync. It owns the audio graph, runs inference per window, and feeds visemes into a MascotPlayback. One hook, three input sources.
import { useLipsyncStream } from "@mascotbot/react/rive";

const { error, attached, pushAudio, pushBase64PCM16, reset } = useLipsyncStream({
  client,                       // from useMascot()
  playback,                     // from useMascotPlayback({ stream: true })
  source: { kind: "mic" },      // see sources below
  enabled: isLive,              // optional gate
  onFrame: (f) => {},           // optional per-window telemetry
  debug: false,                 // optional verbose logging
});
Always create the playback with stream: true for live sources:
const playback = useMascotPlayback({ stream: true, enableNaturalLipSync: true });

The three sources

source is a discriminated union.

{ kind: "mic" }

The user’s microphone. Optional constraints?: MediaTrackConstraints.
"use client";
import { useEffect, useState } from "react";
import { useMascot } from "@mascotbot/react";
import { useMascotPlayback, useLipsyncStream } from "@mascotbot/react/rive";

function MicAvatar() {
  const { client, status } = useMascot();
  const playback = useMascotPlayback({ stream: true, enableNaturalLipSync: true });
  const [active, setActive] = useState(false);
  const isLive = active && status === "ready" && !!client;

  const { error } = useLipsyncStream({
    client, playback,
    source: { kind: "mic" },
    enabled: isLive, // gates getUserMedia + the audio graph without unmounting
  });

  useEffect(() => { if (status !== "ready") setActive(false); }, [status]);

  return (
    <>
      <button onClick={() => setActive((v) => !v)} disabled={status !== "ready"}>
        {active ? "Stop mic" : "Start mic"}
      </button>
      {error && <p>{error.message}</p>}
    </>
  );
}
The worklet outputs at zero gain, so the user’s speakers do not echo the mic.

{ kind: "mediaStream", stream }

Tap any MediaStream — a played <audio>/<video> element, or a realtime AI provider’s voice. The capture point is the playback point, so the mouth cannot drift ahead of the speech. Pass null to detach.
import { createElementTap, type ElementTap } from "@mascotbot/react";

const audioRef = useRef<HTMLAudioElement>(null);
const tapRef = useRef<ElementTap | null>(null);
const [stream, setStream] = useState<MediaStream | null>(null);

function play() {
  const el = audioRef.current;
  if (!el) return;
  if (!tapRef.current) {            // create inside the click gesture
    tapRef.current = createElementTap();
    setStream(tapRef.current.stream);
  }
  tapRef.current.attach(el);        // idempotent
  el.play();
}

useLipsyncStream({ client, playback, source: { kind: "mediaStream", stream } });
createElementTap() is an SDK export (@mascotbot/react, re-exported from lipsync-core) — the cross-browser tap detailed in Realtime providers → Tap a playing element (it replaces captureStream(), which Safari does not implement). The same helper drives self-playing realtime providers (ElevenLabs, OpenAI Realtime over WebRTC) — see Realtime providers.

{ kind: "manual" }

You push audio yourself with the returned functions — for providers that hand you raw PCM, or any custom source:
const { pushAudio, pushBase64PCM16, reset } = useLipsyncStream({
  client, playback, source: { kind: "manual" },
});

await pushAudio(float32Samples, 24000);     // Float32 [-1,1] + its sample rate
await pushBase64PCM16(base64Chunk, 24000);  // base64 PCM16 + its sample rate
reset();                                     // drop buffered state (barge-in)
For raw-PCM realtime providers, pairing createPCMStreamPlayer with a { kind: "mediaStream" } source is usually cleaner than manual pushing, because the player also handles gap-tolerant playback.

Return value

FieldTypeNotes
errorError | nullStream/permission/inference error
attachedbooleanThe audio graph is live and tapping
pushAudio(Float32Array, number) => Promise<void>Manual source only
pushBase64PCM16(string, number) => Promise<void>Manual source only
reset() => voidClears buffered stream state

Telemetry with onFrame

onFrame fires once per inference window with per-frame detail — wire it to a debug HUD or metrics, not to your render path:
useLipsyncStream({
  client, playback, source: { kind: "mic" },
  onFrame: (f) => {
    // f.frameIndex, f.visemeId, f.silenceDetected, f.inferenceMs,
    // f.emittedVisemeId, f.overruns, f.audioContextRate
  },
});

Lifecycle and stability

useLipsyncStream keeps the audio graph alive across re-renders by routing client/playback through refs. Rive input handles are referentially stable and playback is carried across any internal recreate, so an unrelated parent re-render does not freeze the mouth. Only enabled: false (or a null mediaStream) tears the pipeline down — flip a button without unmounting the component.

End-of-utterance silence is handled for you

When speech stops, naive pipelines emit phantom mouth shapes during the inference tail. The SDK suppresses this with an internal −50 dBFS input-amplitude silence gate — you do not implement your own gate. The mouth simply settles to rest when the audio goes quiet, on every source.

The audio worklet

The worklet is embedded in the SDK and served from a Blob URL by default — there is no file to copy or host. Pass workletUrl only if your Content Security Policy forbids worker-src blob: or you want CDN caching.

Next

Realtime providers

Tap OpenAI / Gemini / ElevenLabs voices.

PCM stream player

Play + tap raw provider PCM.

Hooks reference

Every hook at a glance.