> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mascot.bot/llms.txt
> Use this file to discover all available pages before exploring further.

# API Reference

> Viseme Prediction API for facial animation and speech synthesis

Welcome to the Mascotbot Viseme Prediction API documentation. This API provides two main capabilities for creating synchronized facial animations:

## Available Endpoints

### `/v1/visemes` - Process Audio for Visemes

Process existing audio files to generate viseme predictions for facial animation. Ideal when you already have audio and need synchronized mouth movements.

### `/v1/visemes-audio` - Generate Speech and Visemes

Convert text to speech while simultaneously generating viseme predictions. Supports multiple TTS engines including ElevenLabs and Cartesia for high-quality voice synthesis.

### `/v1/get-signed-url` - Generate a signed URL for conversational AI with visemes

Get a temporary signed URL (expires in 10 minutes) for connecting to a conversational AI proxy. Supports ElevenLabs Conversational AI, Gemini Live API, and OpenAI Realtime API — the proxy enriches the audio stream with synchronized viseme data for avatar lip sync.

## Language-Specific Viseme Models

All endpoints support an optional `model` parameter to select a language-optimized viseme prediction model. This improves lip sync accuracy for non-English content.

| Model      | Language         | Parameter Value       |
| ---------- | ---------------- | --------------------- |
| Default    | English          | `"default"` (or omit) |
| Indonesian | Bahasa Indonesia | `"indonesian"`        |

Pass `"model": "indonesian"` in the request body (for REST endpoints) or `viseme_model=indonesian` as a query parameter (for WebSocket endpoints). If omitted, the default English model is used.

## Real-time Streaming

Both endpoints use Server-Sent Events (SSE) to provide real-time streaming responses, enabling low-latency playback and immediate visual feedback.

<CardGroup cols={2}>
  <Card title="Process Audio for Visemes" icon="waveform" href="/api-reference/endpoint/post">
    Convert existing audio files to viseme predictions for facial animation
  </Card>

  <Card title="Generate Speech and Visemes" icon="microphone" href="/api-reference/endpoint/visemes-audio">
    Convert text to speech with synchronized viseme generation using multiple TTS engines
  </Card>

  <Card title="ElevenLabs Conversational AI" icon="comments" href="/api-reference/endpoint/get-signed-url">
    Use your existing ElevenLabs Conversational AI Agent with visemes stream added on top.
  </Card>

  <Card title="Gemini Live API Avatar" icon="google" href="/libraries/gemini-live-api-avatar">
    Build interactive AI avatars with Gemini Live API and real-time lip sync.
  </Card>
</CardGroup>
