Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mascot.bot/llms.txt

Use this file to discover all available pages before exploring further.

OpenAI Realtime API Avatar — Build an Interactive Lip-Synced AI Avatar

Add a lip-synced animated avatar to your OpenAI Realtime API application in minutes. Mascot Bot SDK works alongside the official OpenAI Agents Realtime SDK (@openai/agents-realtime) — your existing OpenAI code stays untouched. The SDK works alongside your OpenAI Realtime setup and animates a real-time avatar from the audio. OpenAI Realtime API Avatar with real-time lip sync — interactive AI mascot powered by GPT

Quick Start

Add avatars in 5 minutes

Live Demo

See OpenAI Realtime avatar in action

GitHub Repo

Complete example code

Features

Real-time lip sync & more

API Reference

Complete hook documentation

Deploy

One-click deployment

Why Add an Avatar to Your OpenAI Realtime API App?

A voice-only ChatGPT-style assistant is invisible. A lip-synced avatar gives it a face and makes interactions feel human. The Mascot Bot SDK adds that without changing how you use OpenAI Realtime: keep @openai/agents-realtime, and the SDK lip-syncs the assistant’s audio in real time.

How It Works: Tap the Assistant Audio

OpenAI Realtime has two transports, and the integration differs only in how you obtain the assistant’s audio as a MediaStream:
  • WebRTC (recommended, cleanest). The session plays the assistant audio into an <audio> element you supply; you tap it with the SDK’s cross-browser createElementTap(). No extra SDK audio piece.
  • WebSocket. The session hands you raw PCM16 chunks and does not play them; createPCMStreamPlayer plays them and exposes the tap.
Either way: the browser connects to OpenAI with an ephemeral client secret minted server-side, and the SDK infers visemes from the assistant audio in real time. No Mascot Bot endpoint sits in the path. See Realtime overview.

Features

  Real-time Lip Sync for OpenAI Realtime API

Real-time viseme inference from the assistant’s audio — no server round-trip for visemes.

  120fps Avatar Animation

WebGL2 + Rive runtime for smooth facial motion.

  Native OpenAI Agents Realtime SDK Compatibility

Use @openai/agents-realtime exactly as documented. The SDK never proxies the OpenAI connection.

  Ephemeral Token Security

Mint a short-lived client_secret server-side via POST /v1/realtime/client_secrets. The standing OPENAI_API_KEY never reaches the browser.

  WebRTC or WebSocket Streaming

WebRTC self-plays (tap via the SDK’s createElementTap()); WebSocket hands you PCM (play + tap via createPCMStreamPlayer). Both end at one useLipsyncStream call.

  Natural Lip Sync Processing

Optional viseme post-processing for natural, non-robotic motion — Natural lip sync.

  Voice Activity Detection (VAD)

Configure server-side VAD (turn_detection: { type: "server_vad" }) in the token route — an OpenAI feature, unchanged by the SDK.

  Session Management

Re-mint a client secret per session. player.stop() / session.on("audio_interrupted") handle barge-in.

Quick Start

Installation

.npmrc
@mascotbot:registry=https://npm.mascot.bot/
//npm.mascot.bot/:_authToken=mascot_xxx
pnpm add @mascotbot/react @rive-app/react-webgl2 @rive-app/webgl2 @openai/agents-realtime
Get your Mascot Bot key at app.mascot.bot/api-keys. Full setup: Installation. @openai/agents-realtime is OpenAI’s official SDK, used unchanged.

Basic Integration

"use client";
import { MascotProvider } from "@mascotbot/react";
import { Mascot, MascotRive } from "@mascotbot/react/rive";

export default function App() {
  return (
    <MascotProvider apiKey="mascot_pub_…">
      <MascotProvider>
        <Mascot src="/mascot.riv">
          <MascotRive />
          <OpenAIAvatar />
        </Mascot>
      </MascotProvider>
    </MascotProvider>
  );
}
OpenAIAvatar is built in Step 2.

Complete Implementation Guide

Step 1: Set Up Ephemeral Token Generation (Server-Side)

// app/api/openai/token/route.ts
export const runtime = "nodejs";

export async function POST() {
  const key = process.env.OPENAI_API_KEY;
  if (!key) return Response.json({ error: "OPENAI_API_KEY not set" }, { status: 400 });

  const model = "gpt-realtime";
  const res = await fetch("https://api.openai.com/v1/realtime/client_secrets", {
    method: "POST",
    headers: { Authorization: `Bearer ${key}`, "Content-Type": "application/json" },
    body: JSON.stringify({
      session: {
        type: "realtime",
        model,
        output_modalities: ["audio"],
        audio: {
          input: { turn_detection: { type: "server_vad", threshold: 0.5, silence_duration_ms: 500 } },
          output: { voice: "marin" },
        },
      },
    }),
  });
  if (!res.ok) return Response.json({ error: `OpenAI ${res.status}` }, { status: 502 });
  const json = (await res.json()) as { value: string };
  return Response.json({ clientSecret: json.value, model });
}

Step 2: Create Your Avatar Component

Recommended — WebRTC (the session self-plays; tap its <audio>):
"use client";
import { useEffect, useRef, useState } from "react";
import { useMascot } from "@mascotbot/react";
import { useMascotPlayback, useLipsyncStream } from "@mascotbot/react/rive";

const LIP_SYNC = { minVisemeInterval: 60, mergeWindow: 80 } as const;

export function OpenAIAvatar() {
  const { client, status } = useMascot();
  const playback = useMascotPlayback({ stream: true, enableNaturalLipSync: true, naturalLipSyncConfig: LIP_SYNC });
  const [stream, setStream] = useState<MediaStream | null>(null);
  const teardownRef = useRef<null | (() => void)>(null);

  const { error, attached } = useLipsyncStream({ client, playback, source: { kind: "mediaStream", stream } });

  useEffect(() => () => teardownRef.current?.(), []);

  const connect = async () => {
    if (status !== "ready") return;
    const { clientSecret } = await (await fetch("/api/openai/token", { method: "POST" })).json();
    const { RealtimeAgent, RealtimeSession } = await import("@openai/agents-realtime");

    const audioEl = new Audio(); // supply our own so we can tap it
    const agent = new RealtimeAgent({ name: "Assistant", instructions: "Keep replies short." });
    const session = new RealtimeSession(agent, { transport: "webrtc" });
    await session.connect({ apiKey: clientSecret, audioElement: audioEl });

    // createElementTap from "@mascotbot/react" — cross-browser
    // tap (Safari has no captureStream). /realtime/overview#tap-a-playing-element
    const tap = createElementTap();
    setStream(tap.stream);
    tap.attach(audioEl);
    teardownRef.current = () => { tap.close(); void session.close(); };
  };

  return (
    <div>
      <button onClick={connect} disabled={status !== "ready"}>Connect</button>
      <span>{stream ? (attached ? "lip-sync attached" : "attaching…") : "idle"}</span>
      {error ? <p>{error.message}</p> : null}
    </div>
  );
}
Alternative — WebSocket (the session hands you raw PCM; play + tap it):
import { createPCMStreamPlayer } from "@mascotbot/react";
import { WavRecorder } from "wavtools";

// inside connect():
const player = createPCMStreamPlayer({ sampleRate: 24000 }); // create in the click, before await
setStream(player.outputStream);

const session = new RealtimeSession(agent, { transport: "websocket", model });
session.on("audio", (e: { data: ArrayBuffer }) => player.pushPCM16(new Uint8Array(e.data)));
session.on("audio_interrupted", () => player.stop());
await session.connect({ apiKey: clientSecret });

const recorder = new WavRecorder({ sampleRate: 24000 });
await recorder.begin();
// Pass the typed-array view itself (cast) — `.buffer` is over-long and yields "empty bytes".
await recorder.record((d: { mono: Int16Array }) => session.sendAudio(d.mono as unknown as ArrayBuffer));
On WebSocket, never feed a self-playing element through the player and never use both paths at once — that double-plays the voice. The PCM player is for the WebSocket transport only; WebRTC self-plays and is tapped directly.

Step 3: Advanced Features

  • Natural lip sync — stable naturalLipSyncConfig; presets in Natural lip sync.
  • Barge-insession.on("audio_interrupted", () => player.stop()) (WebSocket) or stop playback on the WebRTC element.
  • VAD — tune turn_detection in the token route (OpenAI feature).
  • Custom Rive inputs — the SDK only writes the mouth; drive gestures/outfits yourself, detect via useMascotInputs().has(name) (Rive co-existence).

API Reference

The standard SDK surface plus OpenAI’s official SDK:
SurfaceRole
<MascotProvider apiKey>Licensed avatar client. Config.
<MascotProvider> / <Mascot src> / <MascotRive>Load and render the avatar.
useMascotPlayback({ stream: true, enableNaturalLipSync })Mouth playback engine.
useLipsyncStream({ source: { kind: "mediaStream", stream } })Lip-syncs the tapped audio. Reference.
createPCMStreamPlayer({ sampleRate: 24000 })WebSocket transport only — plays PCM + exposes the tap. Reference.
@openai/agents-realtime RealtimeSessionOpenAI’s official SDK — unchanged. Model gpt-realtime.

OpenAI Realtime API Pricing for Voice Avatars

All-in Cost Per Hour (OpenAI + Mascot Bot)

OpenAI Realtime usage is billed by OpenAI per their pricing. Mascot Bot meters by your plan’s speech-seconds or MAU allowance and adds no per-minute audio cost. Replaying a persisted timeline does not re-meter. Check current OpenAI Realtime pricing in the OpenAI documentation.

MascotBot vs HeyGen vs D-ID vs Synthesia for Interactive Avatars

Pre-rendered talking-head services (HeyGen, D-ID, Synthesia) generate video server-side and stream it back — higher latency, per-minute video cost, and no real-time control. Mascot Bot is a real-time alternative: a lightweight vector avatar lip-synced at up to 120fps, no video pipeline, and full control of every non-mouth animation through raw Rive.

Use Cases

AI Customer Service Avatar

A visible support assistant with on-brand appearance.

ChatGPT Avatar for Your Product

Give your GPT-powered assistant a face that talks in real time.

Educational AI Tutor

Crisp articulation with the educational natural-lip-sync preset.

Voice AI Virtual Receptionist

A welcoming branded front desk.

AI Mascot for Streaming & Content

A reactive on-screen character; non-mouth animation is yours via raw Rive inputs.

Troubleshooting

Avatar Not Moving?

Confirm status === "ready", the tap stream is set (WebRTC: createElementTap(); WebSocket: player.outputStream), and the Rive file uses artboard Character + state machine mascotStateMachine with inputs 100118.

Only First Second of Speech Animated?

Non-stable naturalLipSyncConfig reinitializes playback — use a module constant (Troubleshooting).

Connection Fails on Second Call?

Client secrets are short-lived/single-use. Mint a fresh one per session.

No Audio Playing?

On WebSocket, createPCMStreamPlayer must be created inside the user-gesture click before any await. On WebRTC, ensure the supplied <audio> element is allowed to play (the click satisfies autoplay).

”Invalid audio — empty bytes” Errors in Console?

On the WebSocket mic path, pass the Int16Array view itself to session.sendAudio (d.mono as unknown as ArrayBuffer), not d.mono.buffer — the backing buffer is pooled/over-long and serializes as empty bytes.

sendAudio Not Working?

Confirm the recorder sample rate matches the session (24 kHz in the example) and that you started recording after session.connect.

FAQ

How Does Mascot Bot Work with the OpenAI Agents Realtime SDK?

Alongside it. You use @openai/agents-realtime as documented; the SDK lip-syncs the assistant’s audio in real time.

Does It Work With My Existing OpenAI Code?

Yes. Use @openai/agents-realtime as documented; the SDK turns the assistant’s audio into a real-time avatar.

Do I Modify My OpenAI Realtime Code?

No. Add an audio tap (WebRTC) or a PCM player (WebSocket) plus one useLipsyncStream call.

What OpenAI Models Support the Realtime API?

A Realtime model such as gpt-realtime, configured in the server token route.

How Does the SDK Connect to OpenAI?

You connect directly to OpenAI with an ephemeral client secret minted by your server; the SDK only taps the resulting audio.

Is Audio Sent to Mascot Bot?

Your users’ speech is processed by the SDK in their browser and isn’t sent to or stored on Mascotbot servers.

Does Mascot Bot Support Both OpenAI and Gemini?

Yes — see the Gemini Live guide and the Realtime overview.

Is This an Open-Source Alternative to HeyGen Interactive Avatar?

Yes — a real-time alternative to server-rendered talking-head avatars.

Start Building with OpenAI Realtime API Avatar

Live Demo

See it in action

Demo Repository

Complete working example

Next Steps

  1. Get a key at app.mascot.bot/api-keys and install from the private registry.
  2. Add the server token route and the avatar component above (WebRTC recommended).
  3. Tune motion with natural lip sync.
  4. Review the realtime overview and PCM stream player for the underlying pattern.