Natural Lip Sync
Natural lip sync post-processes the viseme stream to produce more natural speech animation. Instead of snapping to every phoneme, it merges similar adjacent shapes and preserves the distinctive ones — the way a real mouth blends sounds during fast speech.Why
Hitting every phoneme precisely looks robotic. In natural speech the mouth does not have time to fully form each shape; shapes blend, sometimes to the point of not changing at all. The processor mimics that blending while protecting the visually critical shapes (w/u, o, r, l, f/v, and bilabials p/b/m) so articulation still reads.Enable it
It is one option onuseMascotPlayback. This works for every path — offline,
mic, and realtime:
DEFAULT_NATURAL_LIPSYNC_CONFIG. To tune it, pass a
naturalLipSyncConfig:
Configuration
NaturalLipSyncConfig (all fields optional — unspecified fields fall back to
DEFAULT_NATURAL_LIPSYNC_CONFIG):
| Field | Type | Default | Meaning |
|---|---|---|---|
minVisemeInterval | number (ms) | 60 | Minimum time between visemes; closer ones merge (~16 visemes/s max). |
mergeWindow | number (ms) | 80 | Look-ahead window for finding similar visemes to merge. |
keyVisemePreference | number 0–1 | 0.7 | Strength of preference for distinctive shapes. Higher keeps more. |
preserveSilence | boolean | true | Keep all silence visemes (recommended). |
similarityThreshold | number 0–1 | 0.6 | How similar two visemes must be to merge. Higher merges less. |
preserveCriticalVisemes | boolean | true | Never skip critical shapes (u/o/l/v/p/b/m). |
criticalVisemeMinDuration | number (ms) | 0 | Hold critical visemes at least this long (opt-in; 0 disables). |
criticalVisemeAbsorbThreshold | number (ms) | 30 | If holding a critical viseme shrinks the next one below this, drop the successor instead of flashing it. Only active when criticalVisemeMinDuration > 0. |
criticalVisemeIds | readonly number[] | DEFAULT_CRITICAL_VISEME_IDS | Which viseme ids are “critical”. |
DEFAULT_CRITICAL_VISEME_IDS is [7, 8, 13, 14, 18, 21] — w/u, o, r, l, f/v,
and bilabials. Both DEFAULT_NATURAL_LIPSYNC_CONFIG and
DEFAULT_CRITICAL_VISEME_IDS are exported so you can derive from them:
Presets
Define presets as module-level constants so the reference is stable:CONVERSATION. Raise minVisemeInterval / mergeWindow for
smoother (lazier) motion; lower them for crisper articulation.
Without React
The processor is exported from@mascotbot/core/rive as a class
and a function:
enableNaturalLipSync on MascotPlayback /
useMascotPlayback, this runs internally — you only call it directly if you
post-process visemes outside the playback engine.
Next
React hooks
useMascotPlayback options.Visemes & the timeline
What is being processed.
Troubleshooting
The stable-reference bug.