Documentation Index
Fetch the complete documentation index at: https://docs.mascot.bot/llms.txt
Use this file to discover all available pages before exploring further.
OpenAI Realtime API Avatar — Build an Interactive Lip-Synced AI Avatar
Add a lip-synced animated avatar to your OpenAI Realtime API application in minutes. Mascot Bot SDK works alongside the official OpenAI Agents Realtime SDK (@openai/agents-realtime) — your existing OpenAI code stays untouched. The SDK works alongside your OpenAI Realtime setup and animates a real-time avatar from the audio.
Quick Start
Add avatars in 5 minutes
Live Demo
See OpenAI Realtime avatar in action
GitHub Repo
Complete example code
Features
Real-time lip sync & more
API Reference
Complete hook documentation
Deploy
One-click deployment
Why Add an Avatar to Your OpenAI Realtime API App?
A voice-only ChatGPT-style assistant is invisible. A lip-synced avatar gives it a face and makes interactions feel human. The Mascot Bot SDK adds that without changing how you use OpenAI Realtime: keep@openai/agents-realtime, and the SDK lip-syncs the assistant’s audio in real time.
How It Works: Tap the Assistant Audio
OpenAI Realtime has two transports, and the integration differs only in how you obtain the assistant’s audio as aMediaStream:
- WebRTC (recommended, cleanest). The session plays the assistant audio into an
<audio>element you supply; you tap it with the SDK’s cross-browsercreateElementTap(). No extra SDK audio piece. - WebSocket. The session hands you raw PCM16 chunks and does not play them;
createPCMStreamPlayerplays them and exposes the tap.
Features
Real-time Lip Sync for OpenAI Realtime API
Real-time viseme inference from the assistant’s audio — no server round-trip for visemes.120fps Avatar Animation
WebGL2 + Rive runtime for smooth facial motion.Native OpenAI Agents Realtime SDK Compatibility
Use@openai/agents-realtime exactly as documented. The SDK never proxies the OpenAI connection.
Ephemeral Token Security
Mint a short-livedclient_secret server-side via POST /v1/realtime/client_secrets. The standing OPENAI_API_KEY never reaches the browser.
WebRTC or WebSocket Streaming
WebRTC self-plays (tap via the SDK’screateElementTap()); WebSocket hands you PCM (play + tap via createPCMStreamPlayer). Both end at one useLipsyncStream call.
Natural Lip Sync Processing
Optional viseme post-processing for natural, non-robotic motion — Natural lip sync.Voice Activity Detection (VAD)
Configure server-side VAD (turn_detection: { type: "server_vad" }) in the token route — an OpenAI feature, unchanged by the SDK.
Session Management
Re-mint a client secret per session.player.stop() / session.on("audio_interrupted") handle barge-in.
Quick Start
Installation
.npmrc
Get your Mascot Bot key at app.mascot.bot/api-keys. Full setup: Installation.
@openai/agents-realtime is OpenAI’s official SDK, used unchanged.Basic Integration
OpenAIAvatar is built in Step 2.
Complete Implementation Guide
Step 1: Set Up Ephemeral Token Generation (Server-Side)
Step 2: Create Your Avatar Component
Recommended — WebRTC (the session self-plays; tap its<audio>):
Step 3: Advanced Features
- Natural lip sync — stable
naturalLipSyncConfig; presets in Natural lip sync. - Barge-in —
session.on("audio_interrupted", () => player.stop())(WebSocket) or stop playback on the WebRTC element. - VAD — tune
turn_detectionin the token route (OpenAI feature). - Custom Rive inputs — the SDK only writes the mouth; drive gestures/outfits yourself, detect via
useMascotInputs().has(name)(Rive co-existence).
API Reference
The standard SDK surface plus OpenAI’s official SDK:| Surface | Role |
|---|---|
<MascotProvider apiKey> | Licensed avatar client. Config. |
<MascotProvider> / <Mascot src> / <MascotRive> | Load and render the avatar. |
useMascotPlayback({ stream: true, enableNaturalLipSync }) | Mouth playback engine. |
useLipsyncStream({ source: { kind: "mediaStream", stream } }) | Lip-syncs the tapped audio. Reference. |
createPCMStreamPlayer({ sampleRate: 24000 }) | WebSocket transport only — plays PCM + exposes the tap. Reference. |
@openai/agents-realtime RealtimeSession | OpenAI’s official SDK — unchanged. Model gpt-realtime. |
OpenAI Realtime API Pricing for Voice Avatars
All-in Cost Per Hour (OpenAI + Mascot Bot)
OpenAI Realtime usage is billed by OpenAI per their pricing. Mascot Bot meters by your plan’s speech-seconds or MAU allowance and adds no per-minute audio cost. Replaying a persisted timeline does not re-meter. Check current OpenAI Realtime pricing in the OpenAI documentation.MascotBot vs HeyGen vs D-ID vs Synthesia for Interactive Avatars
Pre-rendered talking-head services (HeyGen, D-ID, Synthesia) generate video server-side and stream it back — higher latency, per-minute video cost, and no real-time control. Mascot Bot is a real-time alternative: a lightweight vector avatar lip-synced at up to 120fps, no video pipeline, and full control of every non-mouth animation through raw Rive.Use Cases
AI Customer Service Avatar
A visible support assistant with on-brand appearance.ChatGPT Avatar for Your Product
Give your GPT-powered assistant a face that talks in real time.Educational AI Tutor
Crisp articulation with theeducational natural-lip-sync preset.
Voice AI Virtual Receptionist
A welcoming branded front desk.AI Mascot for Streaming & Content
A reactive on-screen character; non-mouth animation is yours via raw Rive inputs.Troubleshooting
Avatar Not Moving?
Confirmstatus === "ready", the tap stream is set (WebRTC: createElementTap(); WebSocket: player.outputStream), and the Rive file uses artboard Character + state machine mascotStateMachine with inputs 100–118.
Only First Second of Speech Animated?
Non-stablenaturalLipSyncConfig reinitializes playback — use a module constant (Troubleshooting).
Connection Fails on Second Call?
Client secrets are short-lived/single-use. Mint a fresh one per session.No Audio Playing?
On WebSocket,createPCMStreamPlayer must be created inside the user-gesture click before any await. On WebRTC, ensure the supplied <audio> element is allowed to play (the click satisfies autoplay).
”Invalid audio — empty bytes” Errors in Console?
On the WebSocket mic path, pass theInt16Array view itself to session.sendAudio (d.mono as unknown as ArrayBuffer), not d.mono.buffer — the backing buffer is pooled/over-long and serializes as empty bytes.
sendAudio Not Working?
Confirm the recorder sample rate matches the session (24 kHz in the example) and that you started recording aftersession.connect.
FAQ
How Does Mascot Bot Work with the OpenAI Agents Realtime SDK?
Alongside it. You use@openai/agents-realtime as documented; the SDK lip-syncs the assistant’s audio in real time.
Does It Work With My Existing OpenAI Code?
Yes. Use@openai/agents-realtime as documented; the SDK turns the assistant’s audio into a real-time avatar.
Do I Modify My OpenAI Realtime Code?
No. Add an audio tap (WebRTC) or a PCM player (WebSocket) plus oneuseLipsyncStream call.
What OpenAI Models Support the Realtime API?
A Realtime model such asgpt-realtime, configured in the server token route.
How Does the SDK Connect to OpenAI?
You connect directly to OpenAI with an ephemeral client secret minted by your server; the SDK only taps the resulting audio.Is Audio Sent to Mascot Bot?
Your users’ speech is processed by the SDK in their browser and isn’t sent to or stored on Mascotbot servers.Does Mascot Bot Support Both OpenAI and Gemini?
Yes — see the Gemini Live guide and the Realtime overview.Is This an Open-Source Alternative to HeyGen Interactive Avatar?
Yes — a real-time alternative to server-rendered talking-head avatars.Start Building with OpenAI Realtime API Avatar
Live Demo
See it in action
Demo Repository
Complete working example
Next Steps
- Get a key at app.mascot.bot/api-keys and install from the private registry.
- Add the server token route and the avatar component above (WebRTC recommended).
- Tune motion with natural lip sync.
- Review the realtime overview and PCM stream player for the underlying pattern.