Offline Lip Sync — Generate, Persist & Replay

The offline path is the SDK’s most powerful pattern: run inference once, persist the result as JSON, and replay it forever without touching the model, the network, or a license refresh. It is the right tool for prefetching, queued playback, deterministic video export, and any case where the same audio is animated more than once. The artifact is the VisemeTimeline — plain, versioned JSON.

Generate → persist → replay

Generate once

Run processAudio (vanilla) or useProcessAudio (React). result.timeline is the artifact.

Persist anywhere

It is plain JSON — localStorage, your database, a CDN object, a file.

Replay with zero reprocessing

parseTimeline(JSON.parse(stored)) → playback.setTimeline(...) → playback.play(). No inference, no streaming session, no network.

React

"use client";
import { useMascot, useProcessAudio, parseTimeline } from "@mascotbot/react";
import { useMascotPlayback } from "@mascotbot/react/rive";

const KEY = "greeting.vtl";

function Greeting() {
  const { status } = useMascot();
  const cached = typeof window !== "undefined" && localStorage.getItem(KEY);
  // Only run inference when there is no cached timeline.
  const { result } = useProcessAudio(cached ? null : "/audio/greeting.wav");
  const playback = useMascotPlayback({ enableNaturalLipSync: true });

  function play() {
    if (status !== "ready") return;
    let timeline;
    if (cached) {
      timeline = parseTimeline(JSON.parse(cached)); // zero reprocessing
    } else if (result) {
      timeline = result.timeline;
      localStorage.setItem(KEY, JSON.stringify(timeline)); // persist for next time
    } else return;

    new Audio("/audio/greeting.wav").play().catch(() => {});
    playback.setTimeline(timeline);
    playback.play();
  }

  return <button onClick={play} disabled={status !== "ready"}>Play</button>;
}

Vanilla

import { LipsyncClient, parseTimeline } from "@mascotbot/core";

const client = await LipsyncClient.init({ apiKey: "mascot_pub_…" });

// 1. Generate once (16 kHz mono Float32 in [-1, 1])
const { timeline } = await client.processAudio(audio16kMono);

// 2. Persist
localStorage.setItem("greeting.vtl", JSON.stringify(timeline));

// 3. Later — replay, zero reprocessing
const restored = parseTimeline(JSON.parse(localStorage.getItem("greeting.vtl")!));
playback.setTimeline(restored);
playback.play();

Assembling a timeline yourself

If you already have per-frame viseme ids (e.g. from your own batch job), build a timeline with the pure converters instead of running inference:

import { framesToTimeline, timelineToCues } from "@mascotbot/core";

const timeline = framesToTimeline(visemeIdsPer10ms, { speechMs });
playback.setTimeline(timeline);
// timelineToCues(timeline) → the { offset, visemeId }[] the engine consumes

Prefetching & queues

Because a timeline is detached from the model, you can compute many ahead of time and play them instantly later:

Prefetch on idle — generate timelines for likely-next utterances during idle time; play from cache the moment they are needed (no inference latency at play time).
Queue playback — store a list of { audioUrl, timeline } pairs; for each, start the audio and playback.setTimeline(timeline) in lockstep.
Server-side precompute — generate timelines in a build step or backend job, ship the JSON with your assets, and the client never runs inference for that content at all.

speechMs rides inside the timeline, so cached replay never re-meters and never re-infers.

Deterministic video export

For frame-accurate rendering (recording the avatar to video), a timeline gives you a fixed, inspectable script: the same JSON produces the same mouth frames every run. Drive playback.seek(ms) to a render clock instead of wall-clock playback, capture the canvas per frame, and mux against the original audio. Because there is no live inference in the loop, export is reproducible and as fast as your renderer.

Versioning & the trust boundary

parseTimeline validates untrusted/persisted JSON and throws a LipsyncError with .code === "bad_timeline" on a version or shape mismatch — so a stale stored timeline fails loudly instead of animating garbage:

import { parseTimeline, LipsyncError } from "@mascotbot/core";

try {
  playback.setTimeline(parseTimeline(JSON.parse(stored)));
} catch (err) {
  if (err instanceof LipsyncError && err.code === "bad_timeline") {
    // regenerate via client.processAudio(); do not treat as license/network
  }
}

Always load persisted timelines through parseTimeline, never JSON.parse alone. VISEME_TIMELINE_VERSION bumps on breaking changes; old artifacts are rejected deterministically.

Visemes & the timeline

The timeline format in detail.

Core client

processAudio and helpers.

Error codes

bad_timeline and the rest.

Getting Started

Core concepts

React SDK

Core SDK (vanilla)

Realtime providers

Reference

Ready-made Mascots

Offline Lip Sync - Generate, Persist & Replay Viseme Timelines

Generate → persist → replay

React

Vanilla

Assembling a timeline yourself

Prefetching & queues

Deterministic video export

Versioning & the trust boundary

Next

Visemes & the timeline

Core client

Error codes

Getting Started

Core concepts

React SDK

Core SDK (vanilla)

Realtime providers

Reference

Ready-made Mascots

Documentation Index

​Generate → persist → replay

​React

​Vanilla

​Assembling a timeline yourself

​Prefetching & queues

​Deterministic video export

​Versioning & the trust boundary

​Next

Visemes & the timeline

Core client

Error codes

Generate → persist → replay

React

Vanilla

Assembling a timeline yourself

Prefetching & queues

Deterministic video export

Versioning & the trust boundary

Next