API Documentation
API Reference
Viseme Prediction API for facial animation and speech synthesis
Welcome to the Mascotbot Viseme Prediction API documentation. This API provides two main capabilities for creating synchronized facial animations:
Available Endpoints
/v1/visemes
- Process Audio for Visemes
Process existing audio files to generate viseme predictions for facial animation. Ideal when you already have audio and need synchronized mouth movements.
/v1/visemes-audio
- Generate Speech and Visemes
Convert text to speech while simultaneously generating viseme predictions. Supports multiple TTS engines including ElevenLabs and Cartesia for high-quality voice synthesis.
Real-time Streaming
Both endpoints use Server-Sent Events (SSE) to provide real-time streaming responses, enabling low-latency playback and immediate visual feedback.