ElevenLabs Avatar Integration - Real-time Visual Avatars for Your Voice AI
Transform your ElevenLabs voice agents into engaging visual experiences with Mascot Bot SDK. Get perfect lip sync, seamless WebSocket integration, and production-ready React components that work alongside your existing ElevenLabs implementation.
Why Add Avatars to Your ElevenLabs Conversational AI?
Voice-only AI can feel impersonal. By adding a conversational AI avatar with real-time lip sync, you create more engaging, human-like interactions. Our voice to avatar SDK works seamlessly with your existing ElevenLabs setup - no modifications needed.
You’ll receive the SDK .tgz file after subscribing to one of our plans. The SDK works alongside the official ElevenLabs React SDK without any modifications. Both packages are required for the integration.
Want to see a complete working example? Check out our open-source demo repository with full implementation, or deploy it directly to Vercel with one click.
ElevenLabs requires signed URLs for WebSocket connections. Use the Mascot Bot proxy endpoint to get a signed URL that includes automatic viseme injection:
The Mascot Bot proxy endpoint handles the ElevenLabs authentication and adds real-time viseme data to the WebSocket stream. This is required for the avatar lip-sync to work properly.
Copy
// app/api/get-signed-url/route.tsimport { NextRequest, NextResponse } from 'next/server';export async function POST(request: NextRequest) { try { const body = await request.json(); const { dynamicVariables } = body; // Use Mascot Bot proxy endpoint for automatic viseme injection const response = await fetch( 'https://api.mascot.bot/v1/get-signed-url', { method: 'POST', headers: { 'Authorization': `Bearer ${process.env.MASCOT_BOT_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ config: { provider: 'elevenlabs', provider_config: { agent_id: process.env.ELEVENLABS_AGENT_ID, api_key: process.env.ELEVENLABS_API_KEY, // Optional: pass dynamic variables if provided ...(dynamicVariables && { dynamic_variables: dynamicVariables }), }, }, }), // Ensure fresh URL for WebSocket avatar connection cache: 'no-store', } ); if (!response.ok) { throw new Error('Failed to get signed URL'); } const data = await response.json(); return NextResponse.json({ signedUrl: data.signed_url }); } catch (error) { console.error('Error fetching signed URL:', error); return NextResponse.json( { error: 'Failed to generate signed URL' }, { status: 500 } ); }}// Force dynamic to prevent caching issuesexport const dynamic = 'force-dynamic';
Required environment variables:
MASCOT_BOT_API_KEY: Your Mascot Bot API key (get from app.mascot.bot)
ELEVENLABS_API_KEY: Your ElevenLabs API key
ELEVENLABS_AGENT_ID: Your ElevenLabs conversational AI agent ID
Create more realistic mouth movements by adjusting natural lip sync parameters:
Start with the “conversation” preset for most use cases. Adjust parameters based on your specific needs - higher minVisemeInterval for smoother movements, lower for more articulation.
Copy
import { useState } from 'react';// Different presets for various use casesconst lipSyncPresets = { // Natural conversation - best for most voice AI conversation: { minVisemeInterval: 40, mergeWindow: 60, keyVisemePreference: 0.6, preserveSilence: true, similarityThreshold: 0.4, preserveCriticalVisemes: true, criticalVisemeMinDuration: 80, }, // Fast speech - for excited or rapid voice fastSpeech: { minVisemeInterval: 80, mergeWindow: 100, keyVisemePreference: 0.5, preserveSilence: true, similarityThreshold: 0.3, preserveCriticalVisemes: true, }, // Clear articulation - for educational AI tutor avatar educational: { minVisemeInterval: 40, mergeWindow: 50, keyVisemePreference: 0.9, preserveSilence: true, similarityThreshold: 0.8, preserveCriticalVisemes: true, }};// Inside your component - use state for stable referencesconst [lipSyncConfig] = useState(lipSyncPresets.conversation);// Apply preset to your conversational AI avataruseMascotElevenlabs({ conversation, naturalLipSync: true, naturalLipSyncConfig: lipSyncConfig});
ElevenLabs supports dynamic variables that allow you to personalize conversations with runtime values. Mascot Bot SDK fully supports this feature through the proxy endpoint.
Dynamic variables work independently from the Mascot Bot SDK - they are passed directly to ElevenLabs through our proxy. This means you can use all ElevenLabs dynamic variable features without any conflicts.
The Mascot Bot proxy endpoint is essential for avatar integration:
Authentication: Your app calls /api/get-signed-url with your credentials
Proxy Setup: Mascot Bot creates a WebSocket proxy to ElevenLabs
Viseme Injection: The proxy analyzes audio streams and injects viseme data
Client Connection: Your app connects using the signed URL from step 1
Real-time Sync: Avatar receives both audio and viseme data seamlessly
Do NOT connect directly to ElevenLabs WebSocket URLs. The avatar lip-sync requires viseme data that only the Mascot Bot proxy provides. Direct connections will result in no mouth movement.
ElevenLabs doesn’t provide viseme (mouth shape) data in their WebSocket stream. Our proxy analyzes the audio in real-time and injects synchronized viseme events, enabling perfect lip-sync.
While you can connect directly for audio-only features, the avatar lip-sync will NOT work without our proxy. The mouth won’t move because there’s no viseme data.
Unlike pre-rendered solutions, our real-time alternative provides dynamic, responsive avatars that truly connect with users. Integrate in minutes and see the difference an animated character for voice AI can make.
Get the latest SDK from app.mascot.bot and install: npm install ./mascotbot-sdk-react-[version].tgz
Add the useMascotElevenlabs hook to your app
Choose or customize your avatar
Deploy your enhanced conversational AI avatar
Transform your ElevenLabs implementation today with the most developer-friendly avatar SDK for voice AI. Your users will love the engaging visual experience!