Natural Lip Sync
The Natural Lip Sync feature intelligently processes visemes (mouth shapes) to create more natural-looking speech animations. Instead of showing every single mouth shape, it merges similar shapes and preserves key distinctive movements, following professional animation principles.Why Natural Lip Sync?
Traditional lip sync can look robotic because it tries to hit every phoneme precisely. In natural speech, the mouth doesn’t have time to form each shape completely - they blend together. Our algorithm mimics this natural blending for more realistic results.Basic Usage
With useMascotSpeech
The easiest way to use natural lip sync is with theuseMascotSpeech
hook:
With useMascotElevenlabs
For ElevenLabs integration, configure it in theuseMascotElevenlabs
hook:
With useMascotPlayback (Advanced)
For direct control over playback, use theuseMascotPlayback
hook:
Configuration Parameters
minVisemeInterval
Default: 60msRange: 20-120ms
Purpose: Sets the minimum time between visemes. Lower values allow more mouth movement, higher values create smoother transitions.
mergeWindow
Default: 80msRange: 40-160ms
Purpose: Time window to look ahead for similar visemes to merge. Larger windows create smoother transitions.
keyVisemePreference
Default: 0.7Range: 0.0-1.0
Purpose: How strongly to preserve distinctive mouth shapes (like ‘p’, ‘b’, ‘m’, ‘f’, ‘v’). Higher values keep more key shapes.
similarityThreshold
Default: 0.6Range: 0.0-1.0
Purpose: How similar visemes need to be to merge them. Lower values merge more aggressively.
preserveSilence
Default: truePurpose: Whether to always keep silence visemes. Recommended to leave as
true
for natural pauses.
preserveCriticalVisemes
Default: truePurpose: Whether to preserve critical visemes that are characteristic for speech clarity. When enabled, visemes for sounds like “u”, “o”, “l”, “v” are never skipped, ensuring they remain visually readable even when they pass quickly in speech.
Preset Configurations
Natural Conversation
For everyday conversational speech:Fast Speech
For rapid or excited speech:Clear Articulation
For slower, clearer speech (e.g., educational content):Minimal Movement
For subtle lip movement (e.g., background characters):Advanced Usage
Processing Visemes Directly
You can also use the natural lip sync processor directly:Convenience Function
For one-time processing:Performance Considerations
Natural lip sync processing is lightweight and runs in real-time:- Processing typically takes less than 1ms for a typical utterance
- No additional memory overhead beyond the processed viseme array
- Compatible with streaming scenarios (processes chunks independently)
Troubleshooting
Too Much Movement
If the mouth is moving too rapidly:- Increase
minVisemeInterval
(try 80-100ms) - Increase
mergeWindow
(try 100-120ms) - Decrease
similarityThreshold
(try 0.3-0.4)
Not Enough Movement
If the mouth looks static:- Decrease
minVisemeInterval
(try 40-50ms) - Decrease
mergeWindow
(try 50-60ms) - Increase
similarityThreshold
(try 0.7-0.8) - Increase
keyVisemePreference
(try 0.8-0.9)
Missing Key Sounds
If important sounds (like ‘p’ or ‘f’) are missing:- Increase
keyVisemePreference
(try 0.8-1.0) - Increase
similarityThreshold
(try 0.7-0.8) - Ensure
preserveSilence
istrue
Critical Visemes Being Skipped
If characteristic sounds like “u”, “o”, “l”, “v” are not visible:- Ensure
preserveCriticalVisemes
istrue
(default) - This prevents these important mouth shapes from being merged away