New: Simplified Integration Available
For most use cases, we recommend using our /v1/visemes-audio
endpoint which handles both text-to-speech and viseme generation in a single streaming call. This guide shows the traditional approach for custom TTS integration.
In this guide, we’ll explore how developers can use our lipsync endpoint to connect any Text-to-Speech (TTS) service and animate a mascot using our lipsync/visemes API. This involves setting up a server route to orchestrate the TTS and lipsync API and then playing it on the front end.
Elevenlabs Voice Over Integration
This demo shows how to use the lipsync API with Elevenlabs’ TTS service to animate a mascot. The integration synchronizes audio and viseme data, bringing the mascot to life with any text input. Below is an interactive example of this integration so you could try it in action.
This example server route waits for the full completion of both audio and lipsync, as it is a basic example. For faster text-to-playback, you can implement playback on a client as soon as the API returns the first viseme/lipsync and audio data chunks
How this demo works
Here’s a high-level overview of the system and data flow:
- Client-Side: The user inputs text, which is sent to the server.
- Server-Side: The server processes the text using a TTS service to generate audio and sends it to the lipsync API to get visemes.
- Client-Side: The audio and visemes are sent back to the client, where they are used to animate the mascot.
Step-by-Step Implementation
1. Setting Up the Server Route
Create a server route to handle the TTS and lipsync API requests. Here’s a simplified example using Node.js:
import { NextRequest, NextResponse } from "next/server";
import { ElevenLabsClient } from "elevenlabs";
import { createWavBufferFromPCM } from "./utils";
const ELEVENLABS_API_KEY = "your-elevenlabs-api-key";
const VOICE_ID = "your-voice-id";
const MASCOTBOT_API_KEY = "your-mascotbot-api-key";
const elevenlabs = new ElevenLabsClient({
apiKey: ELEVENLABS_API_KEY,
});
export async function POST(req: NextRequest) {
try {
// Parse the incoming request to get the text
const { text } = await req.json();
// Convert text to speech using ElevenLabs TTS service
const rawPcmBase64 = (await elevenlabs.textToSpeech.convertWithTimestamps(VOICE_ID, {
output_format: "pcm_16000",
text,
})) as { audio_base64: string };
// Convert the base64 PCM audio to a WAV buffer
const pcmBuffer = Buffer.from(rawPcmBase64.audio_base64, "base64");
const wavBuffer = createWavBufferFromPCM(pcmBuffer, 16000, 16, 1);
// Send the WAV audio to the MascotBot API to get visemes
const response = await fetch("https://api.mascot.bot/v1/visemes", {
method: "POST",
body: JSON.stringify({
audio: wavBuffer.toString("base64"),
sample_rate: 16000,
}),
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${MASCOTBOT_API_KEY}`,
},
});
// Parse the response to get visemes data
const visemes = await response.json();
// Return the audio and visemes to the client
return NextResponse.json({
audio: wavBuffer.toString("base64"),
visemes,
});
} catch (error) {
// Handle errors and return a 500 status
return NextResponse.json({ message: "Internal Server Error" }, { status: 500 });
}
}
2. Client-Side Integration
On the client side, use the server route to fetch audio and visemes, then play them using a mascotbot SDK animation library.
import { useRef, useState } from "react";
import { useMascotPlayback } from "@mascotbot-sdk/react";
import { Button, Textarea } from "@mascotbot/ui";
const API_DOMAIN = "https://your-api-domain.com";
export default function VoiceOverSidebar() {
const mascotPlayback = useMascotPlayback();
const [text, setText] = useState<string>("");
const audioRef = useRef<HTMLAudioElement>(null);
return (
<div>
{/* Audio element to play the generated audio */}
<audio ref={audioRef} playsInline />
{/* Textarea for user input */}
<Textarea value={text} onChange={({ target }) => setText(target.value)} />
{/* Button to trigger the TTS and lipsync process */}
<Button
onClick={async () => {
// Send the text to the server to get audio and visemes
const response = await fetch(`${API_DOMAIN}/api/voice-over`, {
method: "POST",
body: JSON.stringify({ text }),
});
// Parse the response to get audio and visemes
const { audio, visemes } = await response.json();
// Convert the base64 audio to a Blob and create a URL
const audioBlob = new Blob([audio], { type: "audio/wav" });
const audioUrl = URL.createObjectURL(audioBlob);
// Add visemes to the mascot playback
mascotPlayback.add(visemes);
// Play the audio and animate the mascot
if (audioRef.current) {
audioRef.current.src = audioUrl;
audioRef.current.oncanplay = () => {
mascotPlayback.play();
audioRef.current.play();
};
audioRef.current.onended = () => mascotPlayback.reset();
}
}}
>
Play
</Button>
</div>
);
}
Conclusion
By following this guide, you can integrate any TTS service with our lipsync API to animate mascots on your platform. This setup allows for dynamic and engaging user experiences by synchronizing audio with visual animations.
You’ll get a working example once subscribed to one of our paid plans.