Skip to main content

Natural Lip Sync

Natural lip sync post-processes the viseme stream to produce more natural speech animation. Instead of snapping to every phoneme, it merges similar adjacent shapes and preserves the distinctive ones — the way a real mouth blends sounds during fast speech.

Why

Hitting every phoneme precisely looks robotic. In natural speech the mouth does not have time to fully form each shape; shapes blend, sometimes to the point of not changing at all. The processor mimics that blending while protecting the visually critical shapes (w/u, o, r, l, f/v, and bilabials p/b/m) so articulation still reads.

Enable it

It is one option on useMascotPlayback. This works for every path — offline, mic, and realtime:
import { useMascotPlayback } from "@mascotbot/react/rive";

const playback = useMascotPlayback({ enableNaturalLipSync: true });
That uses DEFAULT_NATURAL_LIPSYNC_CONFIG. To tune it, pass a naturalLipSyncConfig:
const playback = useMascotPlayback({
  enableNaturalLipSync: true,
  naturalLipSyncConfig: CONVERSATION, // a STABLE reference — see warning below
});
naturalLipSyncConfig must be a stable reference — a module constant, or memoized with useState / useMemo. A new object literal on every render reinitializes playback and lip sync breaks after the first audio chunk. This is the single most common integration bug; see Troubleshooting.

Configuration

NaturalLipSyncConfig (all fields optional — unspecified fields fall back to DEFAULT_NATURAL_LIPSYNC_CONFIG):
FieldTypeDefaultMeaning
minVisemeIntervalnumber (ms)60Minimum time between visemes; closer ones merge (~16 visemes/s max).
mergeWindownumber (ms)80Look-ahead window for finding similar visemes to merge.
keyVisemePreferencenumber 0–10.7Strength of preference for distinctive shapes. Higher keeps more.
preserveSilencebooleantrueKeep all silence visemes (recommended).
similarityThresholdnumber 0–10.6How similar two visemes must be to merge. Higher merges less.
preserveCriticalVisemesbooleantrueNever skip critical shapes (u/o/l/v/p/b/m).
criticalVisemeMinDurationnumber (ms)0Hold critical visemes at least this long (opt-in; 0 disables).
criticalVisemeAbsorbThresholdnumber (ms)30If holding a critical viseme shrinks the next one below this, drop the successor instead of flashing it. Only active when criticalVisemeMinDuration > 0.
criticalVisemeIdsreadonly number[]DEFAULT_CRITICAL_VISEME_IDSWhich viseme ids are “critical”.
DEFAULT_CRITICAL_VISEME_IDS is [7, 8, 13, 14, 18, 21] — w/u, o, r, l, f/v, and bilabials. Both DEFAULT_NATURAL_LIPSYNC_CONFIG and DEFAULT_CRITICAL_VISEME_IDS are exported so you can derive from them:
import { DEFAULT_NATURAL_LIPSYNC_CONFIG, DEFAULT_CRITICAL_VISEME_IDS } from "@mascotbot/react/rive";

// Hold s/z too, otherwise defaults
export const WITH_SIBILANTS = {
  ...DEFAULT_NATURAL_LIPSYNC_CONFIG,
  criticalVisemeIds: [...DEFAULT_CRITICAL_VISEME_IDS, 15],
};

Presets

Define presets as module-level constants so the reference is stable:
// Natural conversation — a good default for voice AI
export const CONVERSATION = {
  minVisemeInterval: 60,
  mergeWindow: 80,
  keyVisemePreference: 0.7,
  preserveSilence: true,
  similarityThreshold: 0.6,
  preserveCriticalVisemes: true,
} as const;

// Fast / excited speech — coarser merging
export const FAST_SPEECH = {
  minVisemeInterval: 90,
  mergeWindow: 120,
  keyVisemePreference: 0.6,
  preserveSilence: true,
  similarityThreshold: 0.4,
  preserveCriticalVisemes: true,
  criticalVisemeMinDuration: 70,
} as const;

// Clear articulation — education / language learning
export const EDUCATIONAL = {
  minVisemeInterval: 40,
  mergeWindow: 50,
  keyVisemePreference: 0.9,
  preserveSilence: true,
  similarityThreshold: 0.8,
  preserveCriticalVisemes: true,
} as const;
const playback = useMascotPlayback({ enableNaturalLipSync: true, naturalLipSyncConfig: CONVERSATION });
Start from CONVERSATION. Raise minVisemeInterval / mergeWindow for smoother (lazier) motion; lower them for crisper articulation.

Without React

The processor is exported from @mascotbot/core/rive as a class and a function:
import { processNaturalLipSync, NaturalLipSyncProcessor } from "@mascotbot/core/rive";
When you enable enableNaturalLipSync on MascotPlayback / useMascotPlayback, this runs internally — you only call it directly if you post-process visemes outside the playback engine.

Next

React hooks

useMascotPlayback options.

Visemes & the timeline

What is being processed.

Troubleshooting

The stable-reference bug.