Skip to main content

Troubleshooting

Common integration issues with @mascotbot/react 0.2.x and their fixes. If your symptom is a license refusal, match the error.code against the error-code reference first.

Install fails with 401 / 403 from npm.mascot.bot

The private registry needs a valid token in .npmrc:
@mascotbot:registry=https://npm.mascot.bot/
//npm.mascot.bot/:_authToken=mascot_xxx
A 403 wrong_key_scope means the token is the wrong kind for the registry — mint one at app.mascot.bot/api-keys. Make sure the .npmrc is at the project root and the token has no trailing newline.

Status never reaches ready

Read status and error from useMascot():
  • status === "refused" → an authorization problem. Branch on error.code (dev_key_on_public_domain, prod_key_on_localhost, origin_not_allowed, key_disabled, …) and show the matching fix. See Licensing & keys.
  • status === "error" with a NetworkError → the device cannot reach license.mascot.bot. Check connectivity, ad blockers, and corporate proxies.
  • Stuck on initializing with no error → WebAssembly or crypto.subtle is unavailable (old browser, insecure context). The SDK requires a secure context (HTTPS or localhost).

Blank canvas — the avatar never renders

Almost always the Rive state machine name. Pass only mascotStateMachine (or STATE_MACHINE_NAMES[0]) to Rive. Rive 2.37+ throws on any unknown state-machine name in the array; the throw fires LoadError, suppresses Load, and leaves the canvas blank. Also verify the artboard is named Character and the file exposes mouth inputs 100118.

Mouth frozen during an active call

The SDK does not freeze on parent re-renders — Rive input handles are referentially stable and playback is carried across any internal recreate. A frozen mouth during a call is almost always one of:
  • Unstable naturalLipSyncConfig — a new object literal every render reinitializes the natural-lipsync processor. Pass a module constant or a useState/useMemo reference.
  • Audio is not reaching the tap — pass onFrame to useLipsyncStream and log silenceDetected / emittedVisemeId. Rising emitted IDs with a dead mouth means audio is reaching the engine but the Rive handle isn’t being written; a flat zero / silenceDetected: true means the tap is on a silent corpse (typical of a self-playing realtime provider torn down and not re-attached — see ElevenLabs 2nd-call diagnostic).
  • Wrong source shape{ kind: "mediaStream", stream } where stream is null or has only ended tracks.

Call disconnects immediately after onConnect (reason: 'user')

Symptom: Conversation.startSession({ ... }) resolves, your onConnect fires, the agent’s first_message may even reach onMessage, then onStatusChange flips to disconnectingdisconnected and onDisconnect runs with details === { reason: 'user' }. There’s no network error, no onError, no server-side disconnect — your own code called endSession(). The usual cause is an unmount-cleanup effect whose dep array contains a teardown callback whose identity flips on every render:
// BUG — re-runs the cleanup on EVERY render, not just unmount.
const teardown = useCallback(() => {
  void convoRef.current?.endSession().catch(() => {});
  // …other resource releases
}, [setSomeMascotInput]);                  // ← unstable dep

useEffect(() => () => teardown(), [teardown]);
setSomeMascotInput typically traces back to handles returned by useMascotInputs() (which intentionally return a fresh { custom, has } object per render — see Rive co-existence). Each fresh handle → new useCallback chain → new teardown identity → the cleanup runs, which calls endSession(), which surfaces as a 'user' disconnect right after onConnect. Fix — stabilise the unmount cleanup with a ref so it runs once on unmount only, while always invoking the latest teardown closure:
const teardownRef = useRef(teardown);
teardownRef.current = teardown;          // refresh every render, no deps
useEffect(() => () => teardownRef.current?.(), []);   // [] — true unmount
This pattern is correct regardless of how often teardown’s identity changes; the ref always points at the latest closure when the component finally unmounts. Apply the same shape to any other long-lived cleanup that depends on hook handles which are fresh-per-render (useMascotInputs, useMascotRive — see also the ElevenLabs onModeChange recipe which captures custom in a ref for the same reason). How to diagnose: temporarily log the disconnect detail and any frames so you can distinguish a self-end from an agent/server end:
await Conversation.startSession({
  signedUrl,
  onMessage:      (m) => console.log("[debug] msg:", m),
  onStatusChange: (s) => console.log("[debug] status:", s.status),
  onDisconnect:   (d) => console.log("[debug] disconnect reason:", d),
  // …
});
reason: 'user' is your code; reason: 'agent' is the agent stopping the call; any onError first means a server-side problem.

Mouth flickers when speech stops

This is handled by the SDK’s internal −50 dBFS silence gate — do not add your own gate. If you still see phantom shapes at end of utterance, you are likely feeding a self-playing realtime provider through createPCMStreamPlayer (double audio / doubled inference). Tap the provider’s own output instead — see Realtime providers.

Both the SDK and the provider play audio (double voice)

createPCMStreamPlayer is only for providers that hand you raw PCM and do not play it (Gemini Live, OpenAI Realtime over WebSocket). For self-playing providers (ElevenLabs, OpenAI Realtime over WebRTC), do not use the player — tap their existing playback with the SDK’s cross-browser createElementTap() and feed that to useLipsyncStream({ source: { kind: "mediaStream", stream } }).

No audio in a realtime/TTS demo

The AudioContext (and createPCMStreamPlayer) must be created inside the user-gesture handler, before any await. A context created in a post-fetch microtask starts suspended and cannot resume without another gesture. Create or resume() the player synchronously at the top of the click handler.

ElevenLabs 2nd call has no lip sync (1st call worked)

Symptom: an ElevenLabs widget animates the mouth on the very first call, you end it cleanly, then start a new call — voice plays, console is clean, but the mouth is frozen for the entire second call. You’re using the window.Audio patch + <audio> poll pattern from the ElevenLabs avatar guide (the cross-browser tap approach). The class:
  • The patch stashes a reference to the <audio> element ElevenLabs constructs (e.g. w.__el = el) so a 100 ms poll can tap.attach() it once it’s wired up.
  • On call-end, endSession() stops the conversation but the stashed reference and the srcObject MediaStream both remain on window. The MediaStream’s audio tracks transition to readyState: 'ended', but el.srcObject instanceof MediaStream is still true.
  • On call #2, the poll runs almost immediately — typically before ElevenLabs has called new Audio() again. The naive check el && el.srcObject instanceof MediaStream accepts the stale reference, tap.attach() lands on a silent corpse, and zero audio reaches the new tap.
Two corrections, both required (one defends against the other failing):
// 1. Reject any candidate whose audio tracks are no longer 'live'.
const isLive = (el: HTMLMediaElement | null | undefined) =>
  !!el && el.srcObject instanceof MediaStream &&
  el.srcObject.getAudioTracks().some((t) => t.readyState === "live");

const iv = window.setInterval(() => {
  const el = w.__el;
  if (isLive(el)) {
    tap.attach(el as HTMLMediaElement);
    window.clearInterval(iv);
  } else if (++tries > 100) {
    window.clearInterval(iv);
  }
}, 100);

// 2. In teardown, null the stash AND close the tap. Otherwise the
//    next call's poll latches onto the stale ref before the next
//    `new Audio()` lands, and every restart leaks a worklet graph.
teardownRef.current = () => {
  window.clearInterval(iv);
  w.Audio = OrigAudio;
  w.__el = null;          // ← without this, next call's first poll
                          //   sees the still-set MediaStream and
                          //   attaches to the dead element
  tap.close();            // ← releases the tap's AudioContext + stream
  void convo.endSession();
};
The isLive check is the real defense — even if you forget the null in teardown, no element with readyState !== 'live' will ever be attached. The null-out is belt-and-suspenders that also avoids one wasteful poll iteration.

One widget’s lip sync is fast/garbled after another widget ran

Symptom: widget A (e.g. a Gemini call) works; you end it and start widget B (e.g. an ElevenLabs widget) on the same page, and B’s mouth runs at ~2× / flickers. B alone, or B-then-A, is fine. The whole page shares one <MascotProvider> → one LipsyncClient. useLipsyncStream’s mediaStream pipeline is keyed on the stream’s identity and tears down (closes its AudioContext + worklet + streaming session) only when that stream reference changes. If, on call-end, you only player.stop() but keep the same player.outputStream in state, the pipeline never tears down — it lingers on the shared client. Widget B then opens a second inference pipeline on the same client and the two corrupt each other’s pacing. Fully release the pipeline on every call-end path (stop, onclose, error), symmetric with however you created it:
player.stop();
player.close();          // releases the AudioContext, not just the queue
playerRef.current = null;
setVoiceStream(null);     // ← the key line: changes the stream identity
                          //   so useLipsyncStream runs its teardown
createPCMStreamPlayer().stop() only drops queued audio (barge-in); .close() releases the context. Self-playing taps must likewise stop polling and setStream(null). (Switching the avatar by unmounting the <Mascot> subtree tears down implicitly — this bug only surfaces when a call ends without unmounting.)

Avatar customizations (gender, colors, outline) don’t apply

Symptom: you set useMascotInputs().custom.gender.value = … once (e.g. in a mount effect) and the avatar still shows defaults. Custom inputs are no-op shims until Rive has bound the real state-machine handles, which happens asynchronously after load. A single early write lands on a shim and is lost; the state machine then settles into its default pose and never re-evaluates. Consume raw useMascotInputs() (its custom/has are a fresh object every render — do not freeze them in a memo for this), gate the write on has(...), and re-assert every render. The re-application is idempotent and load-bearing — a one-shot write is the bug:
const { custom, has } = useMascotInputs();
useEffect(() => {
  if (!has("gender")) return;            // real input bound yet?
  custom.gender.value = female ? 2 : 1;
  custom.colourful.value = true;
}); // no dep array → re-asserts until (and after) Rive binds

Avatar is hidden behind a section background

<MascotRive> renders a position: relative element with no z-index. A positioned background sibling (z-5, an absolute image, etc.) will paint over it. Wrap only <MascotRive> in a low positive z — never a wrapper that also contains your call controls, or that wrapper becomes a stacking context and traps the controls under a sibling gradient:
<div className="relative h-full w-full z-[6]">
  <MascotRive />
</div>

Next.js Pages Router: “Named export not found”

Pages Router has stricter module resolution. Transpile the package:
/** @type {import("next").NextConfig} */
module.exports = { transpilePackages: ["@mascotbot/react"] };
Then clear the cache: rm -rf .next.

CSP blocks the audio worklet

The worklet is served from a Blob URL by default. If your Content Security Policy forbids worker-src blob:, either allow it or host the worklet yourself and pass its URL via workletUrl on useLipsyncStream.

Still stuck?

Compare against the reference integration in apps/lipsync-demo (single file, no design-system deps), or email support@mascot.bot. See also the migration guide.