In this guide, we’ll explore how developers can use our lipsync endpoint to connect any Text-to-Speech (TTS) service and animate a mascot using our lipsync/visemes API. This involves setting up a server route to orchestrate the TTS and lipsync API and then playing it on the front end.

Elevenlabs Voice Over Integration

This demo shows how to use the lipsync API with Elevenlabs’ TTS service to animate a mascot. The integration synchronizes audio and viseme data, bringing the mascot to life with any text input. Below is an interactive example of this integration so you could try it in acion.

This example server route waits for the full completion of both audio and lipsync, as it is a basic example. For faster text-to-playback, you can implement playback on a client as soon as the API returns the first viseme/lipsync and audio data chunks

How this demo works

Here’s a high-level overview of the system and data flow:

  1. Client-Side: The user inputs text, which is sent to the server.
  2. Server-Side: The server processes the text using a TTS service to generate audio and sends it to the lipsync API to get visemes.
  3. Client-Side: The audio and visemes are sent back to the client, where they are used to animate the mascot.

Step-by-Step Implementation

1. Setting Up the Server Route

Create a server route to handle the TTS and lipsync API requests. Here’s a simplified example using Node.js:

import { NextRequest, NextResponse } from "next/server";
import { ElevenLabsClient } from "elevenlabs";
import { createWavBufferFromPCM } from "./utils";

const ELEVENLABS_API_KEY = "your-elevenlabs-api-key";
const VOICE_ID = "your-voice-id";
const MASCOTBOT_API_KEY = "your-mascotbot-api-key";

const elevenlabs = new ElevenLabsClient({
  apiKey: ELEVENLABS_API_KEY,
});

export async function POST(req: NextRequest) {
  try {
    // Parse the incoming request to get the text
    const { text } = await req.json();

    // Convert text to speech using ElevenLabs TTS service
    const rawPcmBase64 = (await elevenlabs.textToSpeech.convertWithTimestamps(VOICE_ID, {
      output_format: "pcm_16000",
      text,
    })) as { audio_base64: string };

    // Convert the base64 PCM audio to a WAV buffer
    const pcmBuffer = Buffer.from(rawPcmBase64.audio_base64, "base64");
    const wavBuffer = createWavBufferFromPCM(pcmBuffer, 16000, 16, 1);

    // Send the WAV audio to the MascotBot API to get visemes
    const response = await fetch("https://api.mascot.bot/v1/visemes", {
      method: "POST",
      body: JSON.stringify({
        audio: wavBuffer.toString("base64"),
        sample_rate: 16000,
      }),
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${MASCOTBOT_API_KEY}`,
      },
    });

    // Parse the response to get visemes data
    const visemes = await response.json();

    // Return the audio and visemes to the client
    return NextResponse.json({
      audio: wavBuffer.toString("base64"),
      visemes,
    });
  } catch (error) {
    // Handle errors and return a 500 status
    return NextResponse.json({ message: "Internal Server Error" }, { status: 500 });
  }
}

2. Client-Side Integration

On the client side, use the server route to fetch audio and visemes, then play them using a mascotbot SDK animation library.

import { useRef, useState } from "react";
import { useMascotPlayback } from "@mascotbot-sdk/react";
import { Button, Textarea } from "@mascotbot/ui";

const API_DOMAIN = "https://your-api-domain.com";

export default function VoiceOverSidebar() {
  const mascotPlayback = useMascotPlayback();
  const [text, setText] = useState<string>("");
  const audioRef = useRef<HTMLAudioElement>(null);

  return (
    <div>
      {/* Audio element to play the generated audio */}
      <audio ref={audioRef} playsInline />

      {/* Textarea for user input */}
      <Textarea value={text} onChange={({ target }) => setText(target.value)} />

      {/* Button to trigger the TTS and lipsync process */}
      <Button
        onClick={async () => {
          // Send the text to the server to get audio and visemes
          const response = await fetch(`${API_DOMAIN}/api/voice-over`, {
            method: "POST",
            body: JSON.stringify({ text }),
          });

          // Parse the response to get audio and visemes
          const { audio, visemes } = await response.json();

          // Convert the base64 audio to a Blob and create a URL
          const audioBlob = new Blob([audio], { type: "audio/wav" });
          const audioUrl = URL.createObjectURL(audioBlob);

          // Add visemes to the mascot playback
          mascotPlayback.add(visemes);

          // Play the audio and animate the mascot
          if (audioRef.current) {
            audioRef.current.src = audioUrl;
            audioRef.current.oncanplay = () => {
              mascotPlayback.play();
              audioRef.current.play();
            };
            audioRef.current.onended = () => mascotPlayback.reset();
          }
        }}
      >
        Play
      </Button>
    </div>
  );
}

Conclusion

By following this guide, you can integrate any TTS service with our lipsync API to animate mascots on your platform. This setup allows for dynamic and engaging user experiences by synchronizing audio with visual animations.

You’ll get a working example once subscribed to one of our paid plans.