Takes base64 encoded audio and streams viseme (mouth shapes) predictions using Server-Sent Events (SSE). Requires valid API key.
model parameter to select a viseme model optimized for your audio’s language:
"default" — English (used when model is omitted)"indonesian" — Bahasa IndonesiaBearer authentication header of the form Bearer <token>, where <token> is your auth token.
Base64 encoded audio data
"UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA="
Audio sample rate in Hz
16000
Viseme model to use for prediction. Different models are optimized for different languages.
Available models: default (English), indonesian (Bahasa Indonesia).
default, indonesian "default"
Successful response streams viseme predictions using Server-Sent Events. Each event contains viseme data for a processed audio chunk.
Server-Sent Events stream with the following format: