The Mascotbot avatar SDK turns speech into a real-time talking avatar. It is a small, composable, low-level surface — audio in → a serializable viseme timeline → a thin Rive playback layer — backed by the licensed model and asset delivery. It does not ship a call UI, a TTS engine, or provider glue; those are recipes you compose, not framework you adopt.Documentation Index
Fetch the complete documentation index at: https://docs.mascot.bot/llms.txt
Use this file to discover all available pages before exploring further.
How it works
Authorize
MascotProvider (or LipsyncClient.init) exchanges your API key
with the edge worker, which returns a short-lived license and the WASM
runtime. Sessions auto-refresh in the background.Process speech
You hand the SDK 16 kHz mono audio — a recorded buffer, microphone
windows, or a tapped
MediaStream. The SDK produces a viseme id
per 10 ms frame.Packages
The SDK is two packages, each with a root and a/rive subpath. Import the
narrowest one for your use case.
| Import | What it is |
|---|---|
@mascotbot/core | Engine + offline VisemeTimeline + createPCMStreamPlayer. Framework-agnostic, no Rive, no React. |
@mascotbot/core/rive | Framework-agnostic Rive playback (MascotPlayback, getRiveInputs, hasRiveInput). |
@mascotbot/react | React provider + useMascot / useProcessAudio. |
@mascotbot/react/rive | React Rive layer: <Mascot>, useMascotRive, useMascotInputs, useMascotPlayback, useLipsyncStream. |
@mascotbot/react + @mascotbot/react/rive.
The /rive subpaths take @rive-app/webgl2 (and @rive-app/react-webgl2 for
React) as an optional peer dependency — install it only if you render an
avatar.
Three integration paths
Offline
Run inference once, persist the timeline as JSON, replay forever with
zero reprocessing.
Microphone & streaming
Drive the avatar live from the user’s mic, a tapped
MediaStream, or
manually pushed audio.Realtime AI
Connect OpenAI Realtime, Gemini Live, or ElevenLabs by tapping the
assistant’s voice in real time.
MascotPlayback instance driven by either
a VisemeTimeline (offline) or a live audio source. There is no separate API
to learn per path.
What the SDK does and does not do
The SDK writes exactly three Rive input families: mouth visemes (100..118), is_speaking, and stress. Every other state-machine input,
data-binding ViewModel, event, and listener on the Rive instance is yours,
accessed directly on the raw rive object. The SDK never wraps, gates, or
proxies it. This contract is detailed in
Rive co-existence.
The SDK is intentionally minimal — audio in, animation out. Upgrading an
existing integration? The migration guide maps every
change.
Browser support
- Chrome / Edge — full.
- Safari (desktop 17+, iOS 17+) — full; WebAssembly + WebGL2 required for the Rive avatar.
- Firefox — audio pipeline supported; the Rive renderer requires WebGL2.
crypto.subtle are unavailable.
Next
Installation
Private registry, keys, peer deps.
Quickstart
A working avatar in a few lines.
Visemes & the timeline
The core data model.