Generate Speech and Visemes

POST

visemes-audio

curl --request POST \
  --url https://api.mascot.bot/v1/visemes-audio \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "text": "Hello! This uses ElevenLabs for high-quality speech.",
  "tts_engine": "elevenlabs",
  "tts_api_key": "sk_your_elevenlabs_api_key_here",
  "voice": "N2lVS1w4EtoT3dr4eOWO",
  "speed": 1.1
}'

"data: {\"type\":\"audio\",\"data\":\"UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA=\",\"sample_rate\":16000,\"sequence\":0,\"chunk_duration_ms\":250}\n"

Convert text to speech using various TTS engines and generate synchronized viseme data for facial animation. This endpoint returns real-time streaming data via Server-Sent Events (SSE) for low-latency playback.

Supported TTS Engines

mascotbot (default): Built-in TTS engine
elevenlabs: High-quality TTS with custom voices
cartesia: Alternative TTS provider

The response streams both audio chunks (base64-encoded PCM) and viseme timing data.

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Response

200

text/event-stream

Successful response streams synchronized audio and viseme data using Server-Sent Events.

The stream contains alternating audio and visemes events:

Audio events contain base64-encoded PCM audio data for playback
Visemes events contain facial animation data synchronized with audio
Error events indicate processing errors

Events are delivered in real-time with typical first chunk latency of 200-500ms.

Server-Sent Events stream with "data: " prefix followed by JSON.

Event types:

audio: Contains PCM audio data and timing info
visemes: Contains viseme IDs and timing offsets
error: Contains error information

Integrating TTS & LipsyncLearn how to connect any TTS to our lipsync API for animating mascots.

curl --request POST \
  --url https://api.mascot.bot/v1/visemes-audio \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "text": "Hello! This uses ElevenLabs for high-quality speech.",
  "tts_engine": "elevenlabs",
  "tts_api_key": "sk_your_elevenlabs_api_key_here",
  "voice": "N2lVS1w4EtoT3dr4eOWO",
  "speed": 1.1
}'

"data: {\"type\":\"audio\",\"data\":\"UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA=\",\"sample_rate\":16000,\"sequence\":0,\"chunk_duration_ms\":250}\n"

API Documentation

Endpoints

Integration Examples

Supported TTS Engines

Authorizations

Body

Response

API Documentation

Endpoints

Integration Examples

​Supported TTS Engines

Authorizations

Body

Response

Supported TTS Engines