Process existing audio files to generate viseme predictions for facial animation. Ideal when you already have audio and need synchronized mouth movements.
Convert text to speech while simultaneously generating viseme predictions. Supports multiple TTS engines including ElevenLabs and Cartesia for high-quality voice synthesis.