Welcome to the Mascotbot Viseme Prediction API documentation. This API provides two main capabilities for creating synchronized facial animations:

Available Endpoints

/v1/visemes - Process Audio for Visemes

Process existing audio files to generate viseme predictions for facial animation. Ideal when you already have audio and need synchronized mouth movements.

/v1/visemes-audio - Generate Speech and Visemes

Convert text to speech while simultaneously generating viseme predictions. Supports multiple TTS engines including ElevenLabs and Cartesia for high-quality voice synthesis.

Real-time Streaming

Both endpoints use Server-Sent Events (SSE) to provide real-time streaming responses, enabling low-latency playback and immediate visual feedback.