Audio Prefetching & Advanced Control

Unlock the full potential of Mascot Bot SDK with advanced audio prefetching and manual state control. Essential for video exports, multi-scene rendering, and high-performance applications requiring precise timing control.

Quick Start

Start prefetching in minutes

Use Cases

Video export, queue management

API Reference

Complete API documentation

Examples

Real-world implementations

Why Use Prefetching?

Traditional streaming approaches introduce latency between speech segments. Prefetching eliminates these gaps by loading audio and viseme data ahead of time, enabling:

Seamless Video Exports: Pre-load all audio before rendering
Smooth Transitions: Zero delay between sequential speech
Offline Playback: Cache audio for disconnected scenarios
Performance Testing: Compare multiple voices without waiting

Core Concepts

Prefetching Architecture

The SDK’s prefetching system separates data fetching from playback:

Fetch Phase: Download audio + viseme data without playing
Store Phase: Cache data in memory with timing information
Play Phase: Use cached data for instant playback

Manual State Control

When prefetching, you often need direct control over the speaking state, bypassing automatic detection:

// Disable automatic state management
const speech = useMascotSpeech({
  disableAutomaticSpeakingState: true
});

// Get playback object from hook
const playback = useMascotPlayback({
  manualSpeakingStateControl: true
});

// Control speaking state manually
playback.setSpeakingStateManually(true);

Quick Start

Basic Prefetching

The playAudioFromPrefetchedData function shown below is not included in the SDK - you need to implement it yourself. See the complete implementation in the Examples section.

About the API endpoint: The apiEndpoint should point to your proxy endpoint that calls the Mascot Bot API. Never call https://api.mascot.bot/v1/visemes-audio directly from the client as this would expose your API keys.Create a proxy endpoint (e.g., /api/visemes-audio) that:

Receives requests from your client
Adds your Mascot Bot API key to the Authorization header
Forwards the request to https://api.mascot.bot/v1/visemes-audio
Streams the response back to the client

See the API documentation for endpoint details.

Example: Next.js Proxy Endpoint

// app/api/visemes-audio/route.ts
import { NextRequest } from 'next/server';

export async function POST(request: NextRequest) {
  try {
    const body = await request.json();
    
    // Forward request to Mascot Bot API with authentication
    const response = await fetch('https://api.mascot.bot/v1/visemes-audio', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.MASCOT_BOT_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify(body),
    });
    
    if (!response.ok) {
      throw new Error(`API error: ${response.statusText}`);
    }
    
    // Stream the SSE response back to client
    return new Response(response.body, {
      headers: {
        'Content-Type': 'text/event-stream',
        'Cache-Control': 'no-cache',
        'Connection': 'keep-alive',
      },
    });
  } catch (error) {
    console.error('Proxy error:', error);
    return new Response('Internal Server Error', { status: 500 });
  }
}

import { useMascotSpeech, useMascotPlayback, MascotClient, MascotRive } from '@mascotbot-sdk/react';

function PrefetchExample() {
  const speech = useMascotSpeech({
    apiEndpoint: "/api/visemes-audio", // Your proxy endpoint (NOT the direct Mascot Bot API)
    disableAutomaticSpeakingState: true // Critical for prefetching
  });

  const playback = useMascotPlayback({
    manualSpeakingStateControl: true
  });

  const handlePrefetchAndPlay = async () => {
    // 1. Prefetch audio data
    const prefetchedData = await speech.prefetchAudio("Hello world", {
      ttsParams: {
        voice: "am_fenrir", // or use MascotVoices.AmericanMaleFenrir
        speed: 1.0
      }
    });

    // 2. Load viseme data for lip sync
    playback.loadPrefetchedData(prefetchedData.audioData.visemesBySequence);

    // 3. Manually control speaking state
    playback.setSpeakingStateManually(true);
    playback.play();

    // 4. Play audio from prefetched data
    await playAudioFromPrefetchedData(prefetchedData);

    // 5. Reset state when done
    playback.setSpeakingStateManually(false);
    playback.reset();
  };

  return (
    <div>
      {/* The mascot visual component */}
      <MascotClient 
        src="/mascot.riv"  // Your Rive file
        artboard="Character"
        inputs={["is_speaking"]}  // Required for lip sync
      >
        <MascotRive />
      </MascotClient>
      
      <button onClick={handlePrefetchAndPlay}>
        Prefetch & Play
      </button>
    </div>
  );
}

API Reference

useMascotSpeech Options

interface MascotSpeechOptions {
  apiEndpoint: string;                      // Required: Your proxy endpoint (e.g., "/api/visemes-audio")
  apiKey?: string;                          // API key - never use client-side, use proxy instead
  disableAutomaticSpeakingState?: boolean;  // Disable auto speaking state - required for prefetching
  defaultVoice?: string;                    // Default: "am_fenrir"
  bufferSize?: number;                      // Streaming buffer size. Default: 1
  enableTimingEvents?: boolean;             // Performance monitoring. Default: true
  debug?: boolean;                          // Debug logging. Default: false
}

prefetchAudio Method

The prefetchAudio method is only available through the useMascotSpeech hook. You cannot use it standalone:

// First, get the speech object from the hook
const speech = useMascotSpeech({
  apiEndpoint: "/api/visemes-audio",
  disableAutomaticSpeakingState: true
});

// Then use the prefetchAudio method
const prefetchedData = await speech.prefetchAudio(
  text: string,
  options?: {
    ttsParams?: {
      tts_engine?: string;      // 'mascotbot' (default), 'elevenlabs', 'cartesia'
      voice?: string;           // Voice ID (e.g., "am_fenrir", "af_bella") - see MascotVoices constant in SDK
      speed?: number;           // Playback speed (e.g., 1.0 for normal)
      tts_api_key?: string;     // API key for external TTS providers
    }
  }
): Promise<{
  audioData: {
    audioEvents: Map<number, AudioEvent>;  // Base64-encoded PCM audio chunks
    visemesBySequence: Map<number, VisemeData[]>;  // Viseme timing data
  };
  duration: number;  // Total duration in milliseconds
}>;

// AudioEvent structure
interface AudioEvent {
  data: string;        // Base64-encoded PCM audio data
  sample_rate: number; // Sample rate (e.g., 24000)
}

useMascotPlayback Methods

interface MascotPlaybackMethods {
  // Load prefetched viseme data
  loadPrefetchedData(visemeData: Map<number, VisemeData[]>): void;

  // Manual speaking state control
  setSpeakingStateManually(isSpeaking: boolean): void;

  // Standard playback controls
  play(): void;
  pause(): void;
  reset(): void;
}

Use Cases

Video Export System

The most common use case for prefetching is video export, where all audio must be loaded before rendering begins:

import { useMascotSpeech, useMascotPlayback } from '@mascotbot-sdk/react';

// Type definition for data returned by speech.prefetchAudio()
interface PrefetchedData {
  audioData: {
    audioEvents: Map<number, { data: string; sample_rate: number }>;
    visemesBySequence: Map<number, any[]>;
  };
  duration: number;
}

// Implementation for playing audio from prefetched data
async function playAudioFromPrefetchedData(
  prefetchedData: PrefetchedData,
  audioContext?: AudioContext
): Promise<void> {
  const ctx = audioContext || new AudioContext();
  
  if (ctx.state === 'suspended') {
    await ctx.resume();
  }
  
  const audioBuffers: AudioBuffer[] = [];
  
  for (const [sequence, audioEvent] of prefetchedData.audioData.audioEvents) {
    if (!audioEvent.data) continue;
    
    const binaryString = atob(audioEvent.data);
    const bytes = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
      bytes[i] = binaryString.charCodeAt(i);
    }
    
    const int16Array = new Int16Array(bytes.length / 2);
    for (let i = 0; i < int16Array.length; i++) {
      const low = bytes[i * 2];
      const high = bytes[i * 2 + 1];
      int16Array[i] = (high << 8) | low;
    }
    
    const sampleRate = audioEvent.sample_rate || 44100;
    const audioBuffer = ctx.createBuffer(1, int16Array.length, sampleRate);
    const channelData = audioBuffer.getChannelData(0);
    
    for (let i = 0; i < int16Array.length; i++) {
      channelData[i] = int16Array[i] / 32768.0;
    }
    
    audioBuffers.push(audioBuffer);
  }
  
  if (audioBuffers.length === 0) {
    throw new Error('No audio data to play');
  }
  
  const totalLength = audioBuffers.reduce((acc, buf) => acc + buf.length, 0);
  const sampleRate = audioBuffers[0].sampleRate;
  const combinedBuffer = ctx.createBuffer(1, totalLength, sampleRate);
  const combinedData = combinedBuffer.getChannelData(0);
  
  let offset = 0;
  for (const buffer of audioBuffers) {
    combinedData.set(buffer.getChannelData(0), offset);
    offset += buffer.length;
  }
  
  const source = ctx.createBufferSource();
  source.buffer = combinedBuffer;
  source.connect(ctx.destination);
  
  return new Promise<void>((resolve) => {
    source.onended = () => resolve();
    source.start(0);
  });
}

function VideoExporter() {
  // Initialize hooks with proper configuration
  const speech = useMascotSpeech({
    apiEndpoint: "/api/visemes-audio", // Your proxy endpoint (NOT the direct Mascot Bot API)
    disableAutomaticSpeakingState: true // Required for prefetching
  });
  
  const playback = useMascotPlayback({
    manualSpeakingStateControl: true // Required for manual control
  });

  async function exportVideo(items: Array<{ text: string, voice: string }>) {
    // 1. Create shared AudioContext for all exports
    const sharedAudioContext = new AudioContext();
    const audioCache = new Map<number, PrefetchedData>();
    
    // 2. Prefetch all audio in parallel
    const prefetchPromises = items.map(async (item, index) => {
      const data = await speech.prefetchAudio(item.text, {
        ttsParams: {
          voice: item.voice,
          speed: 1.0
        }
      });
      audioCache.set(index, data);
    });

    await Promise.all(prefetchPromises);

    // 3. Render video with cached audio
    for (let i = 0; i < items.length; i++) {
      const prefetchedData = audioCache.get(i);
      if (prefetchedData) {
        // Load visemes for lip sync animation
        playback.loadPrefetchedData(prefetchedData.audioData.visemesBySequence);
        playback.setSpeakingStateManually(true);
        playback.play(); // This starts the visual animation
        
        // Play audio separately
        await playAudioFromPrefetchedData(prefetchedData, sharedAudioContext);
        
        // Reset after playback
        playback.setSpeakingStateManually(false);
        playback.reset();
      }
    }
  }
  
  return (
    <button onClick={() => exportVideo([
      { text: "Hello world", voice: "am_fenrir" },
      { text: "Welcome to our app", voice: "af_bella" }
    ])}>
      Export Video
    </button>
  );
}

Sequential Speech Queue

Prefetching enables smooth transitions between multiple speech segments:

import { useState, useEffect } from 'react';
import { useMascotSpeech, useMascotPlayback } from '@mascotbot-sdk/react';

// Type definition for data returned by speech.prefetchAudio()
interface PrefetchedData {
  audioData: {
    audioEvents: Map<number, { data: string; sample_rate: number }>;
    visemesBySequence: Map<number, any[]>;
  };
  duration: number;
}

// Simplified implementation for playing audio
async function playAudioFromPrefetchedData(
  prefetchedData: PrefetchedData,
  audioContext?: AudioContext
): Promise<void> {
  const ctx = audioContext || new AudioContext();
  if (ctx.state === 'suspended') await ctx.resume();
  
  const audioBuffers: AudioBuffer[] = [];
  
  for (const [_, audioEvent] of prefetchedData.audioData.audioEvents) {
    if (!audioEvent.data) continue;
    
    const binaryString = atob(audioEvent.data);
    const bytes = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
      bytes[i] = binaryString.charCodeAt(i);
    }
    
    const int16Array = new Int16Array(bytes.length / 2);
    for (let i = 0; i < int16Array.length; i++) {
      const low = bytes[i * 2];
      const high = bytes[i * 2 + 1];
      int16Array[i] = (high << 8) | low;
    }
    
    const audioBuffer = ctx.createBuffer(1, int16Array.length, audioEvent.sample_rate || 44100);
    const channelData = audioBuffer.getChannelData(0);
    for (let i = 0; i < int16Array.length; i++) {
      channelData[i] = int16Array[i] / 32768.0;
    }
    
    audioBuffers.push(audioBuffer);
  }
  
  if (audioBuffers.length === 0) throw new Error('No audio data');
  
  const totalLength = audioBuffers.reduce((acc, buf) => acc + buf.length, 0);
  const combinedBuffer = ctx.createBuffer(1, totalLength, audioBuffers[0].sampleRate);
  const combinedData = combinedBuffer.getChannelData(0);
  
  let offset = 0;
  for (const buffer of audioBuffers) {
    combinedData.set(buffer.getChannelData(0), offset);
    offset += buffer.length;
  }
  
  const source = ctx.createBufferSource();
  source.buffer = combinedBuffer;
  source.connect(ctx.destination);
  
  return new Promise<void>((resolve) => {
    source.onended = resolve;
    source.start(0);
  });
}

function SpeechQueue({ items }: { items: string[] }) {
  const [queue, setQueue] = useState<PrefetchedData[]>([]);
  
  // Initialize hooks
  const speech = useMascotSpeech({
    apiEndpoint: "/api/visemes-audio", // Your proxy endpoint (NOT the direct Mascot Bot API)
    disableAutomaticSpeakingState: true
  });
  
  const playback = useMascotPlayback({
    manualSpeakingStateControl: true
  });
  
  // Prefetch all items on mount
  useEffect(() => {
    const prefetchAll = async () => {
      // Prefetch in parallel for better performance
      const data = await Promise.all(
        items.map(text => speech.prefetchAudio(text))
      );
      setQueue(data);
    };
    prefetchAll();
  }, [items, speech]);

  // Play queue sequentially
  const playQueue = async () => {
    const audioContext = new AudioContext();
    
    for (const data of queue) {
      // Start visual animation
      playback.loadPrefetchedData(data.audioData.visemesBySequence);
      playback.setSpeakingStateManually(true);
      playback.play();
      
      // Play audio
      await playAudioFromPrefetchedData(data, audioContext);
      
      // Stop animation
      playback.setSpeakingStateManually(false);
      playback.reset();
      
      // Small gap between items
      await new Promise(resolve => setTimeout(resolve, 100));
    }
  };

  return <button onClick={playQueue}>Play All</button>;
}

Voice Comparison Tool

Prefetch multiple voice options for instant comparison:

import { useState, useEffect, useRef } from 'react';
import { useMascotSpeech, useMascotPlayback } from '@mascotbot-sdk/react';

// Type definition for data returned by speech.prefetchAudio()
interface PrefetchedData {
  audioData: {
    audioEvents: Map<number, { data: string; sample_rate: number }>;
    visemesBySequence: Map<number, any[]>;
  };
  duration: number;
}

// Audio playback implementation
async function playAudioFromPrefetchedData(
  prefetchedData: PrefetchedData,
  audioContext: AudioContext
): Promise<void> {
  if (audioContext.state === 'suspended') {
    await audioContext.resume();
  }
  
  const audioBuffers: AudioBuffer[] = [];
  
  for (const [_, audioEvent] of prefetchedData.audioData.audioEvents) {
    if (!audioEvent.data) continue;
    
    // Decode base64
    const binaryString = atob(audioEvent.data);
    const bytes = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
      bytes[i] = binaryString.charCodeAt(i);
    }
    
    // Convert to Int16
    const int16Array = new Int16Array(bytes.length / 2);
    for (let i = 0; i < int16Array.length; i++) {
      const low = bytes[i * 2];
      const high = bytes[i * 2 + 1];
      int16Array[i] = (high << 8) | low;
    }
    
    // Create AudioBuffer
    const audioBuffer = audioContext.createBuffer(
      1, 
      int16Array.length, 
      audioEvent.sample_rate || 44100
    );
    const channelData = audioBuffer.getChannelData(0);
    for (let i = 0; i < int16Array.length; i++) {
      channelData[i] = int16Array[i] / 32768.0;
    }
    
    audioBuffers.push(audioBuffer);
  }
  
  if (audioBuffers.length === 0) {
    throw new Error('No audio data to play');
  }
  
  // Combine buffers
  const totalLength = audioBuffers.reduce((acc, buf) => acc + buf.length, 0);
  const combinedBuffer = audioContext.createBuffer(
    1, 
    totalLength, 
    audioBuffers[0].sampleRate
  );
  const combinedData = combinedBuffer.getChannelData(0);
  
  let offset = 0;
  for (const buffer of audioBuffers) {
    combinedData.set(buffer.getChannelData(0), offset);
    offset += buffer.length;
  }
  
  // Play audio
  const source = audioContext.createBufferSource();
  source.buffer = combinedBuffer;
  source.connect(audioContext.destination);
  
  return new Promise<void>((resolve) => {
    source.onended = resolve;
    source.start(0);
  });
}

// Define voices outside component to avoid recreating on each render
const AVAILABLE_VOICES = ['am_fenrir', 'af_bella', 'bm_george']; // American male, American female, British male

function VoiceComparison({ text }: { text: string }) {
  const [voiceData, setVoiceData] = useState<Map<string, PrefetchedData>>();
  const audioContextRef = useRef<AudioContext>();
  
  // Initialize hooks
  const speech = useMascotSpeech({
    apiEndpoint: "/api/visemes-audio", // Your proxy endpoint (NOT the direct Mascot Bot API)
    disableAutomaticSpeakingState: true
  });
  
  const playback = useMascotPlayback({
    manualSpeakingStateControl: true
  });
  
  useEffect(() => {
    // Create AudioContext once
    audioContextRef.current = new AudioContext();
    
    const prefetchVoices = async () => {
      const data = new Map<string, PrefetchedData>();
      
      await Promise.all(
        AVAILABLE_VOICES.map(async (voice) => {
          const prefetched = await speech.prefetchAudio(text, {
            ttsParams: { voice }
          });
          data.set(voice, prefetched);
        })
      );
      
      setVoiceData(data);
    };
    
    prefetchVoices();
  }, [text, speech]); // AVAILABLE_VOICES is constant, no need in deps

  const playVoice = async (voice: string) => {
    const data = voiceData?.get(voice);
    if (data && audioContextRef.current) {
      // Start visual animation
      playback.loadPrefetchedData(data.audioData.visemesBySequence);
      playback.setSpeakingStateManually(true);
      playback.play();
      
      // Play audio
      await playAudioFromPrefetchedData(data, audioContextRef.current);
      
      // Stop animation
      playback.setSpeakingStateManually(false);
      playback.reset();
    }
  };

  return (
    <div>
      {AVAILABLE_VOICES.map(voice => (
        <button key={voice} onClick={() => playVoice(voice)}>
          Play {voice}
        </button>
      ))}
    </div>
  );
}

Examples

Playing Prefetched Audio

The playAudioFromPrefetchedData function is NOT included in the SDK. You must implement it yourself using the Web Audio API.

Why isn’t this in the SDK? The SDK focuses on real-time streaming use cases. Prefetching is an advanced pattern where you may want custom control over audio playback timing, audio context management, and integration with your app’s audio system. By implementing this yourself, you have full control over these aspects.

Important clarification about useMascotPlayback: This hook is for controlling the mascot’s visual animation (lip sync) using the viseme data. It does NOT play audio. You need to implement audio playback separately using the Web Audio API.

Audio Playback Implementation

Since the SDK doesn’t include audio playback for prefetched data, you need to implement it yourself. Here’s a complete implementation with proper error handling:

// Type definition for prefetched data
interface PrefetchedData {
  audioData: {
    audioEvents: Map<number, { data: string; sample_rate: number }>;
    visemesBySequence: Map<number, any[]>;
  };
  duration: number;
}

// Production-ready implementation with full error handling
async function playAudioFromPrefetchedData(
  prefetchedData: PrefetchedData,
  audioContext?: AudioContext
): Promise<void> {
  const ctx = audioContext || new AudioContext();
  
  if (ctx.state === 'suspended') {
    await ctx.resume();
  }
  
  // Create audio buffer from all audio events
  const audioBuffers: AudioBuffer[] = [];
  
  for (const [sequence, audioEvent] of prefetchedData.audioData.audioEvents) {
    if (!audioEvent.data) continue;
    
    // Decode base64 to binary
    const binaryString = atob(audioEvent.data);
    const bytes = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
      bytes[i] = binaryString.charCodeAt(i);
    }
    
    // Convert to Int16 PCM (little-endian)
    const int16Array = new Int16Array(bytes.length / 2);
    for (let i = 0; i < int16Array.length; i++) {
      const low = bytes[i * 2];
      const high = bytes[i * 2 + 1];
      int16Array[i] = (high << 8) | low;
    }
    
    // Create AudioBuffer
    const sampleRate = audioEvent.sample_rate || 44100;
    const audioBuffer = ctx.createBuffer(1, int16Array.length, sampleRate);
    const channelData = audioBuffer.getChannelData(0);
    
    // Convert Int16 to Float32
    for (let i = 0; i < int16Array.length; i++) {
      channelData[i] = int16Array[i] / 32768.0;
    }
    
    audioBuffers.push(audioBuffer);
  }
  
  if (audioBuffers.length === 0) {
    throw new Error('No audio data to play');
  }
  
  // Combine all buffers
  const totalLength = audioBuffers.reduce((acc, buf) => acc + buf.length, 0);
  const sampleRate = audioBuffers[0].sampleRate;
  const combinedBuffer = ctx.createBuffer(1, totalLength, sampleRate);
  const combinedData = combinedBuffer.getChannelData(0);
  
  let offset = 0;
  for (const buffer of audioBuffers) {
    combinedData.set(buffer.getChannelData(0), offset);
    offset += buffer.length;
  }
  
  // Play the audio
  const source = ctx.createBufferSource();
  source.buffer = combinedBuffer;
  source.connect(ctx.destination);
  
  return new Promise<void>((resolve) => {
    source.onended = () => resolve();
    source.start(0);
  });
}

Complete Working Example

Here’s a full component that demonstrates prefetching with both audio playback and visual animation:

import { useState } from 'react';
import { 
  useMascotSpeech, 
  useMascotPlayback, 
  MascotClient, 
  MascotRive 
} from '@mascotbot-sdk/react';

// Type definition for prefetched data
interface PrefetchedData {
  audioData: {
    audioEvents: Map<number, { data: string; sample_rate: number }>;
    visemesBySequence: Map<number, any[]>;
  };
  duration: number;
}

// Audio playback implementation (from above)
async function playAudioFromPrefetchedData(
  prefetchedData: PrefetchedData,
  audioContext?: AudioContext
) {
  const ctx = audioContext || new AudioContext();
  
  if (ctx.state === 'suspended') {
    await ctx.resume();
  }
  
  const audioBuffers: AudioBuffer[] = [];
  
  for (const [sequence, audioEvent] of prefetchedData.audioData.audioEvents) {
    if (!audioEvent.data) continue;
    
    const binaryString = atob(audioEvent.data);
    const bytes = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
      bytes[i] = binaryString.charCodeAt(i);
    }
    
    const int16Array = new Int16Array(bytes.length / 2);
    for (let i = 0; i < int16Array.length; i++) {
      const low = bytes[i * 2];
      const high = bytes[i * 2 + 1];
      int16Array[i] = (high << 8) | low;
    }
    
    const sampleRate = audioEvent.sample_rate || 44100;
    const audioBuffer = ctx.createBuffer(1, int16Array.length, sampleRate);
    const channelData = audioBuffer.getChannelData(0);
    
    for (let i = 0; i < int16Array.length; i++) {
      channelData[i] = int16Array[i] / 32768.0;
    }
    
    audioBuffers.push(audioBuffer);
  }
  
  if (audioBuffers.length === 0) {
    throw new Error('No audio data to play');
  }
  
  const totalLength = audioBuffers.reduce((acc, buf) => acc + buf.length, 0);
  const sampleRate = audioBuffers[0].sampleRate;
  const combinedBuffer = ctx.createBuffer(1, totalLength, sampleRate);
  const combinedData = combinedBuffer.getChannelData(0);
  
  let offset = 0;
  for (const buffer of audioBuffers) {
    combinedData.set(buffer.getChannelData(0), offset);
    offset += buffer.length;
  }
  
  const source = ctx.createBufferSource();
  source.buffer = combinedBuffer;
  source.connect(ctx.destination);
  
  return new Promise<void>((resolve) => {
    source.onended = () => resolve();
    source.start(0);
  });
}

// Complete prefetching component
export function PrefetchingMascot() {
  const [isPlaying, setIsPlaying] = useState(false);
  const [status, setStatus] = useState('Ready');
  
  const speech = useMascotSpeech({
    apiEndpoint: "/api/visemes-audio", // Your proxy endpoint (NOT the direct Mascot Bot API)
    disableAutomaticSpeakingState: true
  });

  const playback = useMascotPlayback({
    manualSpeakingStateControl: true
  });

  const playPrefetchedSpeech = async (text: string) => {
    try {
      setIsPlaying(true);
      setStatus('Prefetching...');
      
      // 1. Prefetch audio and viseme data
      const prefetched = await speech.prefetchAudio(text);
      
      setStatus('Playing...');
      
      // 2. Start visual animation (lip sync)
      playback.loadPrefetchedData(prefetched.audioData.visemesBySequence);
      playback.setSpeakingStateManually(true);
      playback.play();
      
      // 3. Play audio
      await playAudioFromPrefetchedData(prefetched);
      
      // 4. Stop animation
      playback.setSpeakingStateManually(false);
      playback.reset();
      
      setStatus('Ready');
    } catch (error) {
      console.error('Playback error:', error);
      setStatus('Error');
    } finally {
      setIsPlaying(false);
    }
  };

  return (
    <div style={{ textAlign: 'center', padding: '20px' }}>
      <div style={{ width: 400, height: 400, margin: '0 auto' }}>
        <MascotClient 
          src="/mascot.riv"
          artboard="Character"
          inputs={["is_speaking"]}
        >
          <MascotRive />
        </MascotClient>
      </div>
      
      <div style={{ marginTop: '20px' }}>
        <p>Status: {status}</p>
        
        <button 
          onClick={() => playPrefetchedSpeech("Hello! I am speaking with prefetched audio.")}
          disabled={isPlaying}
          style={{ margin: '5px' }}
        >
          Play Short Message
        </button>
        
        <button 
          onClick={() => playPrefetchedSpeech("This is a longer message to demonstrate how prefetching works with extended speech. The audio and viseme data are loaded before playback begins.")}
          disabled={isPlaying}
          style={{ margin: '5px' }}
        >
          Play Long Message
        </button>
      </div>
    </div>
  );
}

Best Practices

1. Secure Your API Keys

Never expose your API keys in client-side code! Always use a proxy endpoint to call the Mascot Bot API.❌ Wrong: Calling the API directly from the client

// NEVER DO THIS - Exposes your API key!
const speech = useMascotSpeech({
  apiEndpoint: "https://api.mascot.bot/v1/visemes-audio",
  apiKey: "your-api-key" // This would be visible to anyone!
});

✅ Correct: Using a proxy endpoint

// Safe - API key is stored securely on your server
const speech = useMascotSpeech({
  apiEndpoint: "/api/visemes-audio" // Your proxy endpoint
});

2. Always Disable Automatic State Management

When using prefetching, always disable automatic speaking state detection:

// ✅ Correct
const speech = useMascotSpeech({
  disableAutomaticSpeakingState: true
});

// ❌ Incorrect - will cause conflicts
const speech = useMascotSpeech({});

3. Reuse AudioContext

Create a single AudioContext and reuse it across all prefetched audio playback:

// ✅ Correct - single context
const audioContext = new AudioContext();

for (const data of prefetchedItems) {
  await playAudioFromPrefetchedData(data, audioContext);
}

// ❌ Incorrect - multiple contexts
for (const data of prefetchedItems) {
  const ctx = new AudioContext(); // Creates new context each time
  await playAudioFromPrefetchedData(data, ctx);
}

4. Handle Errors Gracefully

Always implement error handling for prefetch operations:

const prefetchWithRetry = async (text: string, maxRetries = 3) => {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await speech.prefetchAudio(text);
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, 1000 * (attempt + 1)));
    }
  }
};

Error Resilience: When prefetching multiple items, the SDK continues processing other items even if one fails. This ensures partial success rather than complete failure. Always check individual results when batch processing.

5. Clean Up Resources

Always reset playback state after use:

try {
  // Ensure audio context is ready
  if (audioContext.state === 'suspended') {
    await audioContext.resume();
  }
  
  playback.loadPrefetchedData(data.audioData.visemesBySequence);
  playback.setSpeakingStateManually(true);
  playback.play();
  
  await playAudioFromPrefetchedData(data, audioContext);
} finally {
  // Always clean up
  playback.setSpeakingStateManually(false);
  playback.reset();
  speech.stopAndClear();
}

Performance Considerations

Memory Management

Prefetching stores audio data in memory. For large projects:

// Clear cached data when no longer needed
const audioCache = new Map();

// After use
audioCache.clear();

Parallel vs Sequential Prefetching

// ✅ Parallel - faster for multiple items
const allData = await Promise.all(
  items.map(item => speech.prefetchAudio(item))
);

// ❌ Sequential - slower but uses less memory
const allData = [];
for (const item of items) {
  allData.push(await speech.prefetchAudio(item));
}

Browser Limits

Be aware of browser AudioContext limits (typically 6 simultaneous contexts):

// Monitor active contexts
let activeContexts = 0;
const MAX_CONTEXTS = 6;

if (activeContexts < MAX_CONTEXTS) {
  const ctx = new AudioContext();
  activeContexts++;
  
  ctx.addEventListener('statechange', () => {
    if (ctx.state === 'closed') {
      activeContexts--;
    }
  });
}

Progress Tracking

For large prefetch operations, implement progress tracking:

const prefetchWithProgress = async (
  items: Array<{ text: string; voice: string }>,
  onProgress?: (current: number, total: number) => void
) => {
  const results = [];
  
  // Prefetch in parallel for better performance
  const promises = items.map(async (item, index) => {
    const data = await speech.prefetchAudio(item.text, {
      ttsParams: { voice: item.voice }
    });
    onProgress?.(index + 1, items.length);
    return data;
  });
  
  return Promise.all(promises);
};

Troubleshooting

Common Issues

Audio plays but no lip sync

Ensure you’re loading viseme data before playing:

// Load viseme data first
playback.loadPrefetchedData(data.audioData.visemesBySequence);
// Then control speaking state
playback.setSpeakingStateManually(true);

Speaking state conflicts

Make sure automatic state management is disabled:

const speech = useMascotSpeech({
  disableAutomaticSpeakingState: true
});

Memory leaks with large exports

Clear prefetched data after use:

// Clear individual items
audioCache.delete(sceneId);

// Clear entire cache
audioCache.clear();

Getting Started

Libraries

Ready-made Mascots

Audio Prefetching & Advanced Control - Optimize Mascot Bot SDK Performance

Audio Prefetching & Advanced Control

Quick Start

Use Cases

API Reference

Examples

Why Use Prefetching?

Core Concepts

Prefetching Architecture

Manual State Control

Quick Start

Basic Prefetching

API Reference

useMascotSpeech Options

prefetchAudio Method

useMascotPlayback Methods

Use Cases

Video Export System

Sequential Speech Queue

Voice Comparison Tool

Examples

Playing Prefetched Audio

Audio Playback Implementation

Complete Working Example

Best Practices

1. Secure Your API Keys

2. Always Disable Automatic State Management

3. Reuse AudioContext

4. Handle Errors Gracefully

5. Clean Up Resources

Performance Considerations

Memory Management

Parallel vs Sequential Prefetching

Browser Limits

Progress Tracking

Troubleshooting

Common Issues

Getting Started

Libraries

Ready-made Mascots

​Audio Prefetching & Advanced Control

Quick Start

Use Cases

API Reference

Examples

​Why Use Prefetching?

​Core Concepts

​Prefetching Architecture

​Manual State Control

​Quick Start

​Basic Prefetching

​API Reference

​useMascotSpeech Options

​prefetchAudio Method

​useMascotPlayback Methods

​Use Cases

​Video Export System

​Sequential Speech Queue

​Voice Comparison Tool

​Examples

​Playing Prefetched Audio

​Audio Playback Implementation

​Complete Working Example

​Best Practices

​1. Secure Your API Keys

​2. Always Disable Automatic State Management

​3. Reuse AudioContext

​4. Handle Errors Gracefully

​5. Clean Up Resources

​Performance Considerations

​Memory Management

​Parallel vs Sequential Prefetching

​Browser Limits

​Progress Tracking

​Troubleshooting

​Common Issues

Audio Prefetching & Advanced Control

Why Use Prefetching?

Core Concepts

Prefetching Architecture

Manual State Control

Quick Start

Basic Prefetching

API Reference

useMascotSpeech Options

prefetchAudio Method

useMascotPlayback Methods

Use Cases

Video Export System

Sequential Speech Queue

Voice Comparison Tool

Examples

Playing Prefetched Audio

Audio Playback Implementation

Complete Working Example

Best Practices

1. Secure Your API Keys

2. Always Disable Automatic State Management

3. Reuse AudioContext

4. Handle Errors Gracefully

5. Clean Up Resources

Performance Considerations

Memory Management

Parallel vs Sequential Prefetching

Browser Limits

Progress Tracking

Troubleshooting

Common Issues