Audio Prefetching & Advanced Control
Unlock the full potential of Mascot Bot SDK with advanced audio prefetching and manual state control. Essential for video exports, multi-scene rendering, and high-performance applications requiring precise timing control.
Why Use Prefetching?
Traditional streaming approaches introduce latency between speech segments. Prefetching eliminates these gaps by loading audio and viseme data ahead of time, enabling:
Seamless Video Exports : Pre-load all audio before rendering
Smooth Transitions : Zero delay between sequential speech
Offline Playback : Cache audio for disconnected scenarios
Performance Testing : Compare multiple voices without waiting
Core Concepts
Prefetching Architecture
The SDK’s prefetching system separates data fetching from playback:
1. Fetch Phase: Download audio + viseme data without playing
2. Store Phase: Cache data in memory with timing information
3. Play Phase: Use cached data for instant playback
Manual State Control
When prefetching, you often need direct control over the speaking state, bypassing automatic detection:
// Disable automatic state management
const speech = useMascotSpeech ({
disableAutomaticSpeakingState: true
});
// Get playback object from hook
const playback = useMascotPlayback ({
manualSpeakingStateControl: true
});
// Control speaking state manually
playback . setSpeakingStateManually ( true );
Quick Start
Basic Prefetching
The playAudioFromPrefetchedData
function shown below is not included in the SDK - you need to implement it yourself. See the complete implementation in the Examples section.
About the API endpoint : The apiEndpoint
should point to your proxy endpoint that calls the Mascot Bot API. Never call https://api.mascot.bot/v1/visemes-audio
directly from the client as this would expose your API keys.Create a proxy endpoint (e.g., /api/visemes-audio
) that:
Receives requests from your client
Adds your Mascot Bot API key to the Authorization header
Forwards the request to https://api.mascot.bot/v1/visemes-audio
Streams the response back to the client
See the API documentation for endpoint details.
Example: Next.js Proxy Endpoint
// app/api/visemes-audio/route.ts
import { NextRequest } from 'next/server' ;
export async function POST ( request : NextRequest ) {
try {
const body = await request . json ();
// Forward request to Mascot Bot API with authentication
const response = await fetch ( 'https://api.mascot.bot/v1/visemes-audio' , {
method: 'POST' ,
headers: {
'Authorization' : `Bearer ${ process . env . MASCOT_BOT_API_KEY } ` ,
'Content-Type' : 'application/json' ,
},
body: JSON . stringify ( body ),
});
if ( ! response . ok ) {
throw new Error ( `API error: ${ response . statusText } ` );
}
// Stream the SSE response back to client
return new Response ( response . body , {
headers: {
'Content-Type' : 'text/event-stream' ,
'Cache-Control' : 'no-cache' ,
'Connection' : 'keep-alive' ,
},
});
} catch ( error ) {
console . error ( 'Proxy error:' , error );
return new Response ( 'Internal Server Error' , { status: 500 });
}
}
import { useMascotSpeech , useMascotPlayback , MascotClient , MascotRive } from '@mascotbot-sdk/react' ;
function PrefetchExample () {
const speech = useMascotSpeech ({
apiEndpoint: "/api/visemes-audio" , // Your proxy endpoint (NOT the direct Mascot Bot API)
disableAutomaticSpeakingState: true // Critical for prefetching
});
const playback = useMascotPlayback ({
manualSpeakingStateControl: true
});
const handlePrefetchAndPlay = async () => {
// 1. Prefetch audio data
const prefetchedData = await speech . prefetchAudio ( "Hello world" , {
ttsParams: {
voice: "am_fenrir" , // or use MascotVoices.AmericanMaleFenrir
speed: 1.0
}
});
// 2. Load viseme data for lip sync
playback . loadPrefetchedData ( prefetchedData . audioData . visemesBySequence );
// 3. Manually control speaking state
playback . setSpeakingStateManually ( true );
playback . play ();
// 4. Play audio from prefetched data
await playAudioFromPrefetchedData ( prefetchedData );
// 5. Reset state when done
playback . setSpeakingStateManually ( false );
playback . reset ();
};
return (
< div >
{ /* The mascot visual component */ }
< MascotClient
src = "/mascot.riv" // Your Rive file
artboard = "Character"
inputs = { [ "is_speaking" ] } // Required for lip sync
>
< MascotRive />
</ MascotClient >
< button onClick = { handlePrefetchAndPlay } >
Prefetch & Play
</ button >
</ div >
);
}
API Reference
useMascotSpeech Options
interface MascotSpeechOptions {
apiEndpoint : string ; // Required: Your proxy endpoint (e.g., "/api/visemes-audio")
apiKey ?: string ; // API key - never use client-side, use proxy instead
disableAutomaticSpeakingState ?: boolean ; // Disable auto speaking state - required for prefetching
defaultVoice ?: string ; // Default: "am_fenrir"
bufferSize ?: number ; // Streaming buffer size. Default: 1
enableTimingEvents ?: boolean ; // Performance monitoring. Default: true
debug ?: boolean ; // Debug logging. Default: false
}
prefetchAudio Method
The prefetchAudio
method is only available through the useMascotSpeech
hook. You cannot use it standalone:
// First, get the speech object from the hook
const speech = useMascotSpeech ({
apiEndpoint: "/api/visemes-audio" ,
disableAutomaticSpeakingState: true
});
// Then use the prefetchAudio method
const prefetchedData = await speech . prefetchAudio (
text : string ,
options ?: {
ttsParams? : {
tts_engine? : string ; // 'mascotbot' (default), 'elevenlabs', 'cartesia'
voice ?: string ; // Voice ID (e.g., "am_fenrir", "af_bella") - see MascotVoices constant in SDK
speed ?: number ; // Playback speed (e.g., 1.0 for normal)
tts_api_key ?: string ; // API key for external TTS providers
}
}
): Promise <{
audioData : {
audioEvents : Map < number , AudioEvent >; // Base64-encoded PCM audio chunks
visemesBySequence : Map < number , VisemeData []>; // Viseme timing data
};
duration : number ; // Total duration in milliseconds
}>;
// AudioEvent structure
interface AudioEvent {
data : string ; // Base64-encoded PCM audio data
sample_rate : number ; // Sample rate (e.g., 24000)
}
useMascotPlayback Methods
interface MascotPlaybackMethods {
// Load prefetched viseme data
loadPrefetchedData ( visemeData : Map < number , VisemeData []>) : void ;
// Manual speaking state control
setSpeakingStateManually ( isSpeaking : boolean ) : void ;
// Standard playback controls
play () : void ;
pause () : void ;
reset () : void ;
}
Use Cases
Video Export System
The most common use case for prefetching is video export, where all audio must be loaded before rendering begins:
import { useMascotSpeech , useMascotPlayback } from '@mascotbot-sdk/react' ;
// Type definition for data returned by speech.prefetchAudio()
interface PrefetchedData {
audioData : {
audioEvents : Map < number , { data : string ; sample_rate : number }>;
visemesBySequence : Map < number , any []>;
};
duration : number ;
}
// Implementation for playing audio from prefetched data
async function playAudioFromPrefetchedData (
prefetchedData : PrefetchedData ,
audioContext ?: AudioContext
) : Promise < void > {
const ctx = audioContext || new AudioContext ();
if ( ctx . state === 'suspended' ) {
await ctx . resume ();
}
const audioBuffers : AudioBuffer [] = [];
for ( const [ sequence , audioEvent ] of prefetchedData . audioData . audioEvents ) {
if ( ! audioEvent . data ) continue ;
const binaryString = atob ( audioEvent . data );
const bytes = new Uint8Array ( binaryString . length );
for ( let i = 0 ; i < binaryString . length ; i ++ ) {
bytes [ i ] = binaryString . charCodeAt ( i );
}
const int16Array = new Int16Array ( bytes . length / 2 );
for ( let i = 0 ; i < int16Array . length ; i ++ ) {
const low = bytes [ i * 2 ];
const high = bytes [ i * 2 + 1 ];
int16Array [ i ] = ( high << 8 ) | low ;
}
const sampleRate = audioEvent . sample_rate || 44100 ;
const audioBuffer = ctx . createBuffer ( 1 , int16Array . length , sampleRate );
const channelData = audioBuffer . getChannelData ( 0 );
for ( let i = 0 ; i < int16Array . length ; i ++ ) {
channelData [ i ] = int16Array [ i ] / 32768.0 ;
}
audioBuffers . push ( audioBuffer );
}
if ( audioBuffers . length === 0 ) {
throw new Error ( 'No audio data to play' );
}
const totalLength = audioBuffers . reduce (( acc , buf ) => acc + buf . length , 0 );
const sampleRate = audioBuffers [ 0 ]. sampleRate ;
const combinedBuffer = ctx . createBuffer ( 1 , totalLength , sampleRate );
const combinedData = combinedBuffer . getChannelData ( 0 );
let offset = 0 ;
for ( const buffer of audioBuffers ) {
combinedData . set ( buffer . getChannelData ( 0 ), offset );
offset += buffer . length ;
}
const source = ctx . createBufferSource ();
source . buffer = combinedBuffer ;
source . connect ( ctx . destination );
return new Promise < void >(( resolve ) => {
source . onended = () => resolve ();
source . start ( 0 );
});
}
function VideoExporter () {
// Initialize hooks with proper configuration
const speech = useMascotSpeech ({
apiEndpoint: "/api/visemes-audio" , // Your proxy endpoint (NOT the direct Mascot Bot API)
disableAutomaticSpeakingState: true // Required for prefetching
});
const playback = useMascotPlayback ({
manualSpeakingStateControl: true // Required for manual control
});
async function exportVideo ( items : Array <{ text : string , voice : string }>) {
// 1. Create shared AudioContext for all exports
const sharedAudioContext = new AudioContext ();
const audioCache = new Map < number , PrefetchedData >();
// 2. Prefetch all audio in parallel
const prefetchPromises = items . map ( async ( item , index ) => {
const data = await speech . prefetchAudio ( item . text , {
ttsParams: {
voice: item . voice ,
speed: 1.0
}
});
audioCache . set ( index , data );
});
await Promise . all ( prefetchPromises );
// 3. Render video with cached audio
for ( let i = 0 ; i < items . length ; i ++ ) {
const prefetchedData = audioCache . get ( i );
if ( prefetchedData ) {
// Load visemes for lip sync animation
playback . loadPrefetchedData ( prefetchedData . audioData . visemesBySequence );
playback . setSpeakingStateManually ( true );
playback . play (); // This starts the visual animation
// Play audio separately
await playAudioFromPrefetchedData ( prefetchedData , sharedAudioContext );
// Reset after playback
playback . setSpeakingStateManually ( false );
playback . reset ();
}
}
}
return (
< button onClick = { () => exportVideo ([
{ text: "Hello world" , voice: "am_fenrir" },
{ text: "Welcome to our app" , voice: "af_bella" }
]) } >
Export Video
</ button >
);
}
Sequential Speech Queue
Prefetching enables smooth transitions between multiple speech segments:
import { useState , useEffect } from 'react' ;
import { useMascotSpeech , useMascotPlayback } from '@mascotbot-sdk/react' ;
// Type definition for data returned by speech.prefetchAudio()
interface PrefetchedData {
audioData : {
audioEvents : Map < number , { data : string ; sample_rate : number }>;
visemesBySequence : Map < number , any []>;
};
duration : number ;
}
// Simplified implementation for playing audio
async function playAudioFromPrefetchedData (
prefetchedData : PrefetchedData ,
audioContext ?: AudioContext
) : Promise < void > {
const ctx = audioContext || new AudioContext ();
if ( ctx . state === 'suspended' ) await ctx . resume ();
const audioBuffers : AudioBuffer [] = [];
for ( const [ _ , audioEvent ] of prefetchedData . audioData . audioEvents ) {
if ( ! audioEvent . data ) continue ;
const binaryString = atob ( audioEvent . data );
const bytes = new Uint8Array ( binaryString . length );
for ( let i = 0 ; i < binaryString . length ; i ++ ) {
bytes [ i ] = binaryString . charCodeAt ( i );
}
const int16Array = new Int16Array ( bytes . length / 2 );
for ( let i = 0 ; i < int16Array . length ; i ++ ) {
const low = bytes [ i * 2 ];
const high = bytes [ i * 2 + 1 ];
int16Array [ i ] = ( high << 8 ) | low ;
}
const audioBuffer = ctx . createBuffer ( 1 , int16Array . length , audioEvent . sample_rate || 44100 );
const channelData = audioBuffer . getChannelData ( 0 );
for ( let i = 0 ; i < int16Array . length ; i ++ ) {
channelData [ i ] = int16Array [ i ] / 32768.0 ;
}
audioBuffers . push ( audioBuffer );
}
if ( audioBuffers . length === 0 ) throw new Error ( 'No audio data' );
const totalLength = audioBuffers . reduce (( acc , buf ) => acc + buf . length , 0 );
const combinedBuffer = ctx . createBuffer ( 1 , totalLength , audioBuffers [ 0 ]. sampleRate );
const combinedData = combinedBuffer . getChannelData ( 0 );
let offset = 0 ;
for ( const buffer of audioBuffers ) {
combinedData . set ( buffer . getChannelData ( 0 ), offset );
offset += buffer . length ;
}
const source = ctx . createBufferSource ();
source . buffer = combinedBuffer ;
source . connect ( ctx . destination );
return new Promise < void >(( resolve ) => {
source . onended = resolve ;
source . start ( 0 );
});
}
function SpeechQueue ({ items } : { items : string [] }) {
const [ queue , setQueue ] = useState < PrefetchedData []>([]);
// Initialize hooks
const speech = useMascotSpeech ({
apiEndpoint: "/api/visemes-audio" , // Your proxy endpoint (NOT the direct Mascot Bot API)
disableAutomaticSpeakingState: true
});
const playback = useMascotPlayback ({
manualSpeakingStateControl: true
});
// Prefetch all items on mount
useEffect (() => {
const prefetchAll = async () => {
// Prefetch in parallel for better performance
const data = await Promise . all (
items . map ( text => speech . prefetchAudio ( text ))
);
setQueue ( data );
};
prefetchAll ();
}, [ items , speech ]);
// Play queue sequentially
const playQueue = async () => {
const audioContext = new AudioContext ();
for ( const data of queue ) {
// Start visual animation
playback . loadPrefetchedData ( data . audioData . visemesBySequence );
playback . setSpeakingStateManually ( true );
playback . play ();
// Play audio
await playAudioFromPrefetchedData ( data , audioContext );
// Stop animation
playback . setSpeakingStateManually ( false );
playback . reset ();
// Small gap between items
await new Promise ( resolve => setTimeout ( resolve , 100 ));
}
};
return < button onClick = { playQueue } > Play All </ button > ;
}
Prefetch multiple voice options for instant comparison:
import { useState , useEffect , useRef } from 'react' ;
import { useMascotSpeech , useMascotPlayback } from '@mascotbot-sdk/react' ;
// Type definition for data returned by speech.prefetchAudio()
interface PrefetchedData {
audioData : {
audioEvents : Map < number , { data : string ; sample_rate : number }>;
visemesBySequence : Map < number , any []>;
};
duration : number ;
}
// Audio playback implementation
async function playAudioFromPrefetchedData (
prefetchedData : PrefetchedData ,
audioContext : AudioContext
) : Promise < void > {
if ( audioContext . state === 'suspended' ) {
await audioContext . resume ();
}
const audioBuffers : AudioBuffer [] = [];
for ( const [ _ , audioEvent ] of prefetchedData . audioData . audioEvents ) {
if ( ! audioEvent . data ) continue ;
// Decode base64
const binaryString = atob ( audioEvent . data );
const bytes = new Uint8Array ( binaryString . length );
for ( let i = 0 ; i < binaryString . length ; i ++ ) {
bytes [ i ] = binaryString . charCodeAt ( i );
}
// Convert to Int16
const int16Array = new Int16Array ( bytes . length / 2 );
for ( let i = 0 ; i < int16Array . length ; i ++ ) {
const low = bytes [ i * 2 ];
const high = bytes [ i * 2 + 1 ];
int16Array [ i ] = ( high << 8 ) | low ;
}
// Create AudioBuffer
const audioBuffer = audioContext . createBuffer (
1 ,
int16Array . length ,
audioEvent . sample_rate || 44100
);
const channelData = audioBuffer . getChannelData ( 0 );
for ( let i = 0 ; i < int16Array . length ; i ++ ) {
channelData [ i ] = int16Array [ i ] / 32768.0 ;
}
audioBuffers . push ( audioBuffer );
}
if ( audioBuffers . length === 0 ) {
throw new Error ( 'No audio data to play' );
}
// Combine buffers
const totalLength = audioBuffers . reduce (( acc , buf ) => acc + buf . length , 0 );
const combinedBuffer = audioContext . createBuffer (
1 ,
totalLength ,
audioBuffers [ 0 ]. sampleRate
);
const combinedData = combinedBuffer . getChannelData ( 0 );
let offset = 0 ;
for ( const buffer of audioBuffers ) {
combinedData . set ( buffer . getChannelData ( 0 ), offset );
offset += buffer . length ;
}
// Play audio
const source = audioContext . createBufferSource ();
source . buffer = combinedBuffer ;
source . connect ( audioContext . destination );
return new Promise < void >(( resolve ) => {
source . onended = resolve ;
source . start ( 0 );
});
}
// Define voices outside component to avoid recreating on each render
const AVAILABLE_VOICES = [ 'am_fenrir' , 'af_bella' , 'bm_george' ]; // American male, American female, British male
function VoiceComparison ({ text } : { text : string }) {
const [ voiceData , setVoiceData ] = useState < Map < string , PrefetchedData >>();
const audioContextRef = useRef < AudioContext >();
// Initialize hooks
const speech = useMascotSpeech ({
apiEndpoint: "/api/visemes-audio" , // Your proxy endpoint (NOT the direct Mascot Bot API)
disableAutomaticSpeakingState: true
});
const playback = useMascotPlayback ({
manualSpeakingStateControl: true
});
useEffect (() => {
// Create AudioContext once
audioContextRef . current = new AudioContext ();
const prefetchVoices = async () => {
const data = new Map < string , PrefetchedData >();
await Promise . all (
AVAILABLE_VOICES . map ( async ( voice ) => {
const prefetched = await speech . prefetchAudio ( text , {
ttsParams: { voice }
});
data . set ( voice , prefetched );
})
);
setVoiceData ( data );
};
prefetchVoices ();
}, [ text , speech ]); // AVAILABLE_VOICES is constant, no need in deps
const playVoice = async ( voice : string ) => {
const data = voiceData ?. get ( voice );
if ( data && audioContextRef . current ) {
// Start visual animation
playback . loadPrefetchedData ( data . audioData . visemesBySequence );
playback . setSpeakingStateManually ( true );
playback . play ();
// Play audio
await playAudioFromPrefetchedData ( data , audioContextRef . current );
// Stop animation
playback . setSpeakingStateManually ( false );
playback . reset ();
}
};
return (
< div >
{ AVAILABLE_VOICES . map ( voice => (
< button key = { voice } onClick = { () => playVoice ( voice ) } >
Play { voice }
</ button >
)) }
</ div >
);
}
Examples
Playing Prefetched Audio
The playAudioFromPrefetchedData
function is NOT included in the SDK. You must implement it yourself using the Web Audio API.
Why isn’t this in the SDK? The SDK focuses on real-time streaming use cases. Prefetching is an advanced pattern where you may want custom control over audio playback timing, audio context management, and integration with your app’s audio system. By implementing this yourself, you have full control over these aspects.
Important clarification about useMascotPlayback
: This hook is for controlling the mascot’s visual animation (lip sync) using the viseme data. It does NOT play audio. You need to implement audio playback separately using the Web Audio API.
Audio Playback Implementation
Since the SDK doesn’t include audio playback for prefetched data, you need to implement it yourself. Here’s a complete implementation with proper error handling:
// Type definition for prefetched data
interface PrefetchedData {
audioData : {
audioEvents : Map < number , { data : string ; sample_rate : number }>;
visemesBySequence : Map < number , any []>;
};
duration : number ;
}
// Production-ready implementation with full error handling
async function playAudioFromPrefetchedData (
prefetchedData : PrefetchedData ,
audioContext ?: AudioContext
) : Promise < void > {
const ctx = audioContext || new AudioContext ();
if ( ctx . state === 'suspended' ) {
await ctx . resume ();
}
// Create audio buffer from all audio events
const audioBuffers : AudioBuffer [] = [];
for ( const [ sequence , audioEvent ] of prefetchedData . audioData . audioEvents ) {
if ( ! audioEvent . data ) continue ;
// Decode base64 to binary
const binaryString = atob ( audioEvent . data );
const bytes = new Uint8Array ( binaryString . length );
for ( let i = 0 ; i < binaryString . length ; i ++ ) {
bytes [ i ] = binaryString . charCodeAt ( i );
}
// Convert to Int16 PCM (little-endian)
const int16Array = new Int16Array ( bytes . length / 2 );
for ( let i = 0 ; i < int16Array . length ; i ++ ) {
const low = bytes [ i * 2 ];
const high = bytes [ i * 2 + 1 ];
int16Array [ i ] = ( high << 8 ) | low ;
}
// Create AudioBuffer
const sampleRate = audioEvent . sample_rate || 44100 ;
const audioBuffer = ctx . createBuffer ( 1 , int16Array . length , sampleRate );
const channelData = audioBuffer . getChannelData ( 0 );
// Convert Int16 to Float32
for ( let i = 0 ; i < int16Array . length ; i ++ ) {
channelData [ i ] = int16Array [ i ] / 32768.0 ;
}
audioBuffers . push ( audioBuffer );
}
if ( audioBuffers . length === 0 ) {
throw new Error ( 'No audio data to play' );
}
// Combine all buffers
const totalLength = audioBuffers . reduce (( acc , buf ) => acc + buf . length , 0 );
const sampleRate = audioBuffers [ 0 ]. sampleRate ;
const combinedBuffer = ctx . createBuffer ( 1 , totalLength , sampleRate );
const combinedData = combinedBuffer . getChannelData ( 0 );
let offset = 0 ;
for ( const buffer of audioBuffers ) {
combinedData . set ( buffer . getChannelData ( 0 ), offset );
offset += buffer . length ;
}
// Play the audio
const source = ctx . createBufferSource ();
source . buffer = combinedBuffer ;
source . connect ( ctx . destination );
return new Promise < void >(( resolve ) => {
source . onended = () => resolve ();
source . start ( 0 );
});
}
Complete Working Example
Here’s a full component that demonstrates prefetching with both audio playback and visual animation:
import { useState } from 'react' ;
import {
useMascotSpeech ,
useMascotPlayback ,
MascotClient ,
MascotRive
} from '@mascotbot-sdk/react' ;
// Type definition for prefetched data
interface PrefetchedData {
audioData : {
audioEvents : Map < number , { data : string ; sample_rate : number }>;
visemesBySequence : Map < number , any []>;
};
duration : number ;
}
// Audio playback implementation (from above)
async function playAudioFromPrefetchedData (
prefetchedData : PrefetchedData ,
audioContext ?: AudioContext
) {
const ctx = audioContext || new AudioContext ();
if ( ctx . state === 'suspended' ) {
await ctx . resume ();
}
const audioBuffers : AudioBuffer [] = [];
for ( const [ sequence , audioEvent ] of prefetchedData . audioData . audioEvents ) {
if ( ! audioEvent . data ) continue ;
const binaryString = atob ( audioEvent . data );
const bytes = new Uint8Array ( binaryString . length );
for ( let i = 0 ; i < binaryString . length ; i ++ ) {
bytes [ i ] = binaryString . charCodeAt ( i );
}
const int16Array = new Int16Array ( bytes . length / 2 );
for ( let i = 0 ; i < int16Array . length ; i ++ ) {
const low = bytes [ i * 2 ];
const high = bytes [ i * 2 + 1 ];
int16Array [ i ] = ( high << 8 ) | low ;
}
const sampleRate = audioEvent . sample_rate || 44100 ;
const audioBuffer = ctx . createBuffer ( 1 , int16Array . length , sampleRate );
const channelData = audioBuffer . getChannelData ( 0 );
for ( let i = 0 ; i < int16Array . length ; i ++ ) {
channelData [ i ] = int16Array [ i ] / 32768.0 ;
}
audioBuffers . push ( audioBuffer );
}
if ( audioBuffers . length === 0 ) {
throw new Error ( 'No audio data to play' );
}
const totalLength = audioBuffers . reduce (( acc , buf ) => acc + buf . length , 0 );
const sampleRate = audioBuffers [ 0 ]. sampleRate ;
const combinedBuffer = ctx . createBuffer ( 1 , totalLength , sampleRate );
const combinedData = combinedBuffer . getChannelData ( 0 );
let offset = 0 ;
for ( const buffer of audioBuffers ) {
combinedData . set ( buffer . getChannelData ( 0 ), offset );
offset += buffer . length ;
}
const source = ctx . createBufferSource ();
source . buffer = combinedBuffer ;
source . connect ( ctx . destination );
return new Promise < void >(( resolve ) => {
source . onended = () => resolve ();
source . start ( 0 );
});
}
// Complete prefetching component
export function PrefetchingMascot () {
const [ isPlaying , setIsPlaying ] = useState ( false );
const [ status , setStatus ] = useState ( 'Ready' );
const speech = useMascotSpeech ({
apiEndpoint: "/api/visemes-audio" , // Your proxy endpoint (NOT the direct Mascot Bot API)
disableAutomaticSpeakingState: true
});
const playback = useMascotPlayback ({
manualSpeakingStateControl: true
});
const playPrefetchedSpeech = async ( text : string ) => {
try {
setIsPlaying ( true );
setStatus ( 'Prefetching...' );
// 1. Prefetch audio and viseme data
const prefetched = await speech . prefetchAudio ( text );
setStatus ( 'Playing...' );
// 2. Start visual animation (lip sync)
playback . loadPrefetchedData ( prefetched . audioData . visemesBySequence );
playback . setSpeakingStateManually ( true );
playback . play ();
// 3. Play audio
await playAudioFromPrefetchedData ( prefetched );
// 4. Stop animation
playback . setSpeakingStateManually ( false );
playback . reset ();
setStatus ( 'Ready' );
} catch ( error ) {
console . error ( 'Playback error:' , error );
setStatus ( 'Error' );
} finally {
setIsPlaying ( false );
}
};
return (
< div style = { { textAlign: 'center' , padding: '20px' } } >
< div style = { { width: 400 , height: 400 , margin: '0 auto' } } >
< MascotClient
src = "/mascot.riv"
artboard = "Character"
inputs = { [ "is_speaking" ] }
>
< MascotRive />
</ MascotClient >
</ div >
< div style = { { marginTop: '20px' } } >
< p > Status: { status } </ p >
< button
onClick = { () => playPrefetchedSpeech ( "Hello! I am speaking with prefetched audio." ) }
disabled = { isPlaying }
style = { { margin: '5px' } }
>
Play Short Message
</ button >
< button
onClick = { () => playPrefetchedSpeech ( "This is a longer message to demonstrate how prefetching works with extended speech. The audio and viseme data are loaded before playback begins." ) }
disabled = { isPlaying }
style = { { margin: '5px' } }
>
Play Long Message
</ button >
</ div >
</ div >
);
}
Best Practices
1. Secure Your API Keys
Never expose your API keys in client-side code! Always use a proxy endpoint to call the Mascot Bot API.❌ Wrong : Calling the API directly from the client // NEVER DO THIS - Exposes your API key!
const speech = useMascotSpeech ({
apiEndpoint: "https://api.mascot.bot/v1/visemes-audio" ,
apiKey: "your-api-key" // This would be visible to anyone!
});
✅ Correct : Using a proxy endpoint // Safe - API key is stored securely on your server
const speech = useMascotSpeech ({
apiEndpoint: "/api/visemes-audio" // Your proxy endpoint
});
2. Always Disable Automatic State Management
When using prefetching, always disable automatic speaking state detection:
// ✅ Correct
const speech = useMascotSpeech ({
disableAutomaticSpeakingState: true
});
// ❌ Incorrect - will cause conflicts
const speech = useMascotSpeech ({});
3. Reuse AudioContext
Create a single AudioContext and reuse it across all prefetched audio playback:
// ✅ Correct - single context
const audioContext = new AudioContext ();
for ( const data of prefetchedItems ) {
await playAudioFromPrefetchedData ( data , audioContext );
}
// ❌ Incorrect - multiple contexts
for ( const data of prefetchedItems ) {
const ctx = new AudioContext (); // Creates new context each time
await playAudioFromPrefetchedData ( data , ctx );
}
4. Handle Errors Gracefully
Always implement error handling for prefetch operations:
const prefetchWithRetry = async ( text : string , maxRetries = 3 ) => {
for ( let attempt = 0 ; attempt < maxRetries ; attempt ++ ) {
try {
return await speech . prefetchAudio ( text );
} catch ( error ) {
if ( attempt === maxRetries - 1 ) throw error ;
await new Promise ( resolve => setTimeout ( resolve , 1000 * ( attempt + 1 )));
}
}
};
Error Resilience : When prefetching multiple items, the SDK continues processing other items even if one fails. This ensures partial success rather than complete failure. Always check individual results when batch processing.
5. Clean Up Resources
Always reset playback state after use:
try {
// Ensure audio context is ready
if ( audioContext . state === 'suspended' ) {
await audioContext . resume ();
}
playback . loadPrefetchedData ( data . audioData . visemesBySequence );
playback . setSpeakingStateManually ( true );
playback . play ();
await playAudioFromPrefetchedData ( data , audioContext );
} finally {
// Always clean up
playback . setSpeakingStateManually ( false );
playback . reset ();
speech . stopAndClear ();
}
Memory Management
Prefetching stores audio data in memory. For large projects:
// Clear cached data when no longer needed
const audioCache = new Map ();
// After use
audioCache . clear ();
Parallel vs Sequential Prefetching
// ✅ Parallel - faster for multiple items
const allData = await Promise . all (
items . map ( item => speech . prefetchAudio ( item ))
);
// ❌ Sequential - slower but uses less memory
const allData = [];
for ( const item of items ) {
allData . push ( await speech . prefetchAudio ( item ));
}
Browser Limits
Be aware of browser AudioContext limits (typically 6 simultaneous contexts):
// Monitor active contexts
let activeContexts = 0 ;
const MAX_CONTEXTS = 6 ;
if ( activeContexts < MAX_CONTEXTS ) {
const ctx = new AudioContext ();
activeContexts ++ ;
ctx . addEventListener ( 'statechange' , () => {
if ( ctx . state === 'closed' ) {
activeContexts -- ;
}
});
}
Progress Tracking
For large prefetch operations, implement progress tracking:
const prefetchWithProgress = async (
items : Array <{ text : string ; voice : string }>,
onProgress ?: ( current : number , total : number ) => void
) => {
const results = [];
// Prefetch in parallel for better performance
const promises = items . map ( async ( item , index ) => {
const data = await speech . prefetchAudio ( item . text , {
ttsParams: { voice: item . voice }
});
onProgress ?.( index + 1 , items . length );
return data ;
});
return Promise . all ( promises );
};
Troubleshooting
Common Issues
Audio plays but no lip sync
Ensure you’re loading viseme data before playing: // Load viseme data first
playback . loadPrefetchedData ( data . audioData . visemesBySequence );
// Then control speaking state
playback . setSpeakingStateManually ( true );
Make sure automatic state management is disabled: const speech = useMascotSpeech ({
disableAutomaticSpeakingState: true
});
Memory leaks with large exports
Clear prefetched data after use: // Clear individual items
audioCache . delete ( sceneId );
// Clear entire cache
audioCache . clear ();