Gemini Live API Avatar — Build an Interactive AI Avatar with Real-time Lip Sync
Add a lip-synced animated avatar to your Gemini Live API application in minutes. Mascot Bot SDK works alongside the official Google AI SDK (@google/genai) — your existing Gemini code stays untouched. The SDK plays Gemini’s audio output and animates a real-time avatar from it.
Quick Start
Add avatars in 5 minutes
Live Demo
See Gemini Live avatar in action
GitHub Repo
Complete example code
Features
Real-time lip sync & more
API Reference
Complete hook documentation
Deploy
One-click deployment
Why Add an Avatar to Your Gemini Live API App?
Voice-only Gemini feels disembodied. A lip-synced avatar makes the assistant feel present. The Mascot Bot SDK adds that without changing how you use Gemini Live: you keep@google/genai, and the SDK lip-syncs Gemini’s audio output in real time.
How It Works: Real-Time Lip Sync
Gemini Live streams the assistant’s voice as raw base64 PCM16 (it does not play the audio for you). The pattern:- Your server mints a short-lived ephemeral token so the standing Gemini key never reaches the browser.
- The browser connects to Gemini Live with
@google/genaiusing that token. createPCMStreamPlayerplays Gemini’s PCM gap-tolerantly and exposes it as aMediaStream.useLipsyncStreamtaps that stream; the SDK infers visemes and drives the avatar.
Features
Real-time Lip Sync for Gemini Live API
Real-time viseme inference from Gemini’s audio output — no server round-trip for visemes, no perceptible lag.120fps Avatar Animation
WebGL2 + Rive runtime for smooth, natural facial motion.Native Google AI SDK Compatibility
Use@google/genai exactly as documented by Google. The SDK never proxies or wraps the Gemini connection — it only plays and taps the audio.
Ephemeral Token Security
Mint single-use ephemeral tokens server-side withai.authTokens.create(...). The standing Gemini API key never reaches the client.
Streaming Avatar Audio
createPCMStreamPlayer plays Gemini’s PCM gap-tolerantly and exposes a parallel MediaStream tap so the avatar stays locked to what is heard.
Natural Lip Sync Processing
Optional viseme post-processing for natural, non-robotic motion — Natural lip sync.Webcam Video Streaming
Gemini Live can accept webcam frames (session.sendRealtimeInput({ video })). That is a Gemini capability you use directly through @google/genai — it is independent of lip sync and the SDK does not gate it.
Session Management
Gemini Live sessions are time-limited. Re-mint a token and reconnect when a session ends; the SDK’splayer.stop() handles barge-in/interruption.
Quick Start
Installation
.npmrc
Get your Mascot Bot key at app.mascot.bot/api-keys. Full registry/key setup: Installation.
@google/genai is Google’s official SDK, used unchanged.Basic Integration
GeminiAvatar is built in Step 2.
Complete Implementation Guide
Step 1: Set Up Ephemeral Token Generation (Server-Side)
Mint a single-use ephemeral token so the standingGEMINI_API_KEY stays on the server.
Step 2: Create Your Avatar Component
Gemini Live does not play audio —createPCMStreamPlayer plays it and exposes the tap. The microphone is sent to Gemini via session.sendRealtimeInput.
Step 3: Advanced Features
- Natural lip sync — pass a stable
naturalLipSyncConfig; full reference and presets in Natural lip sync. - Barge-in —
player.stop()onserverContent.interrupted(shown above) drops queued audio instantly. - Webcam video — send frames to Gemini via
session.sendRealtimeInput({ video: … }). This is a Gemini Live feature, used directly through@google/genai; it does not involve the lip sync SDK. - Custom Rive inputs — the SDK only writes the mouth. Drive gestures/outfits yourself; detect them with
useMascotInputs().has(name)(Rive co-existence).
API Reference
The integration uses the standard SDK surface plus Google’s official SDK:| Surface | Role |
|---|---|
<MascotProvider apiKey> | Licensed avatar client. Config. |
<MascotProvider> / <Mascot src> / <MascotRive> | Load and render the avatar. |
useMascotPlayback({ stream: true, enableNaturalLipSync }) | Mouth playback engine. |
createPCMStreamPlayer({ sampleRate: 24000 }) | Plays Gemini PCM + exposes the tap. Reference. |
useLipsyncStream({ source: { kind: "mediaStream", stream } }) | Lip-syncs the tapped audio. Reference. |
@google/genai ai.live.connect / ai.authTokens.create | Google’s official SDK — unchanged. Model models/gemini-3.1-flash-live-preview. |
Gemini Live API Pricing & Free Tier
Gemini Live API usage is billed by Google per their pricing; the ephemeral-token model adds no Mascot Bot cost. Mascot Bot meters by your plan’s speech/MAU allowance — replaying a persisted timeline does not re-meter. Check current Gemini pricing in the Google AI documentation.Use Cases
AI Customer Service Avatar
A visible assistant for support — visual presence during Gemini voice conversations.Educational AI Tutor
Pair with theeducational natural-lip-sync preset for crisp articulation in language/learning apps.
Voice AI Virtual Receptionist
A branded, welcoming front desk powered by Gemini Live.AI Mascot for Streaming & Content
A reactive on-screen character; drive non-mouth animation yourself via raw Rive inputs.Troubleshooting
Avatar Not Moving?
Confirmstatus === "ready", that player.outputStream is set as the mediaStream source, and that the Rive file uses artboard Character + state machine mascotStateMachine with inputs 100–118.
Only First Second of Speech Animated?
A non-stablenaturalLipSyncConfig reinitializes playback. Use a module constant — Troubleshooting.
Connection Fails on Second Call?
Ephemeral tokens are single-use (uses: 1). Mint a fresh token per session/reconnect.
No Audio Playing?
createPCMStreamPlayer must be created inside the user-gesture click before any await, or its AudioContext starts suspended. Also confirm you are calling player.pushBase64PCM16 on modelTurn audio parts.
Session Disconnects After ~10 Minutes?
Gemini Live sessions are time-limited. Detectonclose, mint a new token, and reconnect.
FAQ
How Does Mascot Bot Work with the Google AI SDK?
It runs alongside it. You use@google/genai as documented; the SDK plays Gemini’s PCM and lip-syncs it in real time.
Does It Work With My Existing Gemini Code?
Yes. Use@google/genai exactly as documented; the SDK turns Gemini’s audio output into a real-time avatar.
Do I Modify My Gemini Code?
No. Add a PCM player + auseLipsyncStream tap; the Gemini connection is unchanged.
Can I Use My Own Ephemeral Token Setup?
Yes. Any server route returning a validai.authTokens.create token name works.
What Gemini Models Support the Live API?
Use a Live-API model such asmodels/gemini-3.1-flash-live-preview with apiVersion: "v1alpha".
Is Audio Sent to Mascot Bot?
Your users’ speech is processed by the SDK in their browser and isn’t sent to or stored on Mascotbot servers.Is This an Open-Source Alternative to Pre-rendered Interactive Avatars?
Yes — a real-time alternative to server-rendered talking-head services.Start Building with Gemini Live API Avatar
Live Demo
See it in action
Demo Repository
Complete working example
Next Steps
- Get a key at app.mascot.bot/api-keys and install from the private registry.
- Add the server token route and the avatar component above.
- Tune motion with natural lip sync.
- Review the realtime overview and PCM stream player for the underlying pattern.