ElevenLabs Avatar Integration - Real-time Visual Avatars for Your Voice AI
Transform your ElevenLabs voice agents into engaging visual experiences with Mascot Bot SDK. Get perfect lip sync, seamless integration, and production-ready React components that work alongside the official ElevenLabs SDK — unchanged. The SDK works alongside your ElevenLabs setup and animates a real-time avatar from the audio.Quick Start
Add avatars in 5 minutes
Live Demo
See voice avatars in action
GitHub Repo
Complete example code
Features
Real-time lip sync & more
API Reference
Complete hook documentation
Deploy
One-click deployment
Why Add Avatars to Your ElevenLabs Conversational AI?
Voice-only AI can feel impersonal. By adding a conversational AI avatar with real-time lip sync, you create more engaging, human-like interactions. The Mascot Bot voice to avatar SDK works alongside your existing ElevenLabs setup — you keep using@elevenlabs/client exactly as you do today, and the SDK lip-syncs whatever the agent says in real time.
Features
Real-time Lip Sync
The SDK turns ElevenLabs’ audio output into a real-time talking avatar with no perceptible lag. There is no server round-trip for visemes — only ElevenLabs’ own audio stream.120fps Animation Performance
Smooth, natural voice-driven facial animation powered by WebGL2 and the Rive runtime.Native ElevenLabs Support
Works alongside@elevenlabs/client with zero conflicts and zero modifications to your ElevenLabs code. The SDK never proxies or intercepts the ElevenLabs connection — it only taps the audio it already plays.
Customizable Avatars
Choose from ready-made mascots or bring your own Rive file. The SDK only writes the mouth,is_speaking, and stress — every other input, outfit, gesture, and ViewModel stays yours (Rive co-existence).
Streaming Avatar Audio
ElevenLabs plays the assistant’s audio itself; the SDK captures that exact playback as aMediaStream and lip-syncs it. The capture point is the playback point, so the mouth never drifts ahead of speech.
Natural Lip Sync Processing
An optional post-processor merges rapid visemes and preserves the distinctive shapes for natural, non-robotic motion — Natural lip sync.Quick Start
Installation
The SDK installs from the private registrynpm.mascot.bot. Add an .npmrc, then install alongside the official ElevenLabs client:
.npmrc
Get your Mascot Bot key at app.mascot.bot/api-keys (
mascot_dev_… for localhost, mascot_pub_… for production). The SDK works alongside the official ElevenLabs SDK without any modifications. Full setup: Installation.Want a complete working example? See the open-source demo repository, or deploy it to Vercel with one click.
Basic Integration
Three pieces: a server route that mints an ElevenLabs signed URL, the ElevenLabsConversation, and the SDK tapping its audio.
Complete Implementation Guide
Step 1: Mint an ElevenLabs Signed URL (Server-Side)
ElevenLabs needs a signed URL for the WebSocket. Mint it on the server so the standingxi-api-key never reaches the browser. This is the standard ElevenLabs signed-URL endpoint.
Required environment variables (server-side only):
ELEVENLABS_API_KEY— your ElevenLabs API keyELEVENLABS_AGENT_ID— your ElevenLabs Conversational AI agent id
mascot_pub_…) is a separate, browser-safe publishable key passed to <MascotProvider>.Step 2: Create Your Avatar Component
ElevenLabs plays the assistant audio internally through an<audio> element. Capture that element, expose it as a MediaStream, and feed it to useLipsyncStream. Leave playback with ElevenLabs.
Step 3: Advanced Features
Natural Lip Sync Configuration
Tune the post-processor by passing a stablenaturalLipSyncConfig to useMascotPlayback:
Embedded Avatar Widget
Mount the avatar small and fixed for an embeddable AI agent with face. The SDK only animates the mouth — your own widget chrome, click handlers, and Rive inputs are untouched:useMascotInputs().has(name) then drive it yourself — Rive co-existence.
Gestures on Every Agent Turn
The legacy SDK auto-fired agesture trigger at the start of every agent
utterance (the old gesture: true flag on useMascotElevenlabs). 0.2.x
removed the auto-fire — consumers wire it themselves. ElevenLabs makes
this a one-liner: Conversation.startSession exposes an onModeChange
callback that flips to "speaking" the moment the first audio chunk of
a new turn lands.
gesture on the parent <Mascot inputs={["gesture", ...]}>
so the SDK exposes a real trigger handle. On .riv files without a
gesture input the SDK returns a no-op shim, so the optional-chain
fire?.() stays safe.
For a provider-agnostic approach driven by the speech envelope (works
identically for OpenAI / Gemini), see Stress emphasis and gestures.
Step 4: Dynamic Variables
ElevenLabs dynamic variables personalize conversations at runtime. They are an ElevenLabs feature and are completely independent of the SDK — pass them straight toConversation.startSession:
{{name}} / {{role}} placeholders. The SDK does not see or touch these — it only lip-syncs the resulting audio.
API Reference
This integration uses the standard SDK surface plus the official ElevenLabs client.| Surface | Role |
|---|---|
<MascotProvider apiKey> | Initializes the licensed avatar client. Config. |
<MascotProvider> / <Mascot src> / <MascotRive> | Load and render the Rive avatar. |
useMascotPlayback({ stream: true, enableNaturalLipSync, naturalLipSyncConfig }) | The mouth playback engine. |
useLipsyncStream({ client, playback, source: { kind: "mediaStream", stream } }) | Lip-syncs the tapped ElevenLabs audio. Reference. |
@elevenlabs/client Conversation | The official ElevenLabs SDK — unchanged. |
GET /v1/convai/conversation/get-signed-url with your xi-api-key. No Mascot Bot endpoint sits in the path.
Use Cases
AI Customer Service Avatar
A visible virtual assistant with face for support — visual feedback during voice conversations, on-brand appearance, expressions driven by your own Rive inputs.Educational AI Tutor Avatar
Clear articulation for learning; pair with theeducational natural-lip-sync preset for crisper mouth shapes.
Voice AI Virtual Receptionist
A welcoming visual presence with natural conversation flow and a brand-customizable mascot.Technical Details
Voice-to-Animation Pipeline
- ElevenLabs streams and plays the assistant’s audio (its own WebSocket, untouched).
- The SDK captures that playback as a
MediaStream. - The SDK infers a viseme per 10 ms frame from the audio.
- The Rive runtime renders the mouth at up to 120fps.
- Optional natural lip sync smooths the motion.
Performance
- Low audio-to-visual delay (the capture point is the playback point).
- WebGL2-accelerated rendering.
- End-of-utterance phantom mouth shapes are suppressed by the SDK’s internal silence gate — you do not implement one.
Troubleshooting
Avatar Not Moving?
Only the First Second of Speech Animates?
A newnaturalLipSyncConfig object on every render reinitializes playback. Use a stable module constant or useState/useMemo — see the example above and Troubleshooting.
Hearing the Voice Twice?
You routed ElevenLabs throughcreatePCMStreamPlayer. ElevenLabs self-plays — tap its audio instead (Step 2), never the PCM player.
Dynamic Variables Not Applied?
They are an ElevenLabs concern. Ensure the agent prompt has the{{placeholders}} and that you pass dynamicVariables to Conversation.startSession. The SDK is not involved.
FAQ
Can You Add an Avatar to ElevenLabs?
Yes. The SDK works alongside the official@elevenlabs/client with no modifications. You connect ElevenLabs as usual; the SDK lip-syncs its audio in real time.
Does It Work With My Existing ElevenLabs Setup?
Yes. Keep your@elevenlabs/client code exactly as it is — the SDK lip-syncs the audio ElevenLabs plays, in real time.
Do I Modify My ElevenLabs Code?
No. Keep yourConversation setup. You only add a MediaStream tap of the audio it plays.
How Is the Lip Sync Synchronized?
The audio is tapped at its playback point with a Web-AudioMediaStreamDestination, so visemes are derived from exactly what the user hears — the mouth cannot run ahead of the voice.
Is Audio Sent to Mascot Bot?
Your users’ speech is processed by the SDK in their browser and isn’t sent to or stored on Mascotbot servers.What Is the Voice Avatar SDK?
A React/JavaScript library that adds a real-time, lip-synced avatar to any voice AI — including ElevenLabs Conversational AI.Start with ElevenLabs Avatar Today
Ready to transform your voice AI? The open-source avatar for ElevenLabs makes it simple:Try Voice Avatar Demo
Experience it yourself
Demo Repository
Complete working example
Next Steps
- Get a key at app.mascot.bot/api-keys and install from the private registry.
- Add
<MascotProvider>+<MascotProvider>/<Mascot>and the avatar component above. - Choose a ready-made mascot or your own Rive file.
- Tune motion with natural lip sync; review the realtime overview for the general pattern.