Multi-modal emotion console

Capture voice, text, and facial movement in real time to infer emotions using a Plutchik-based taxonomy. Start a session to stream local audio/video features to the FastAPI backend and watch the fused prediction update live.

Idle – start a capture session to analyze emotion signals.

Live camera feed

Enable the session to capture facial expression metrics.

Text transcript

The full transcript is sent with each inference window to enrich the prediction.

Voice metrics

Start capturing to compute RMS energy, pitch, tempo, and jitter.

Aggregated predictions will appear here once the backend returns the first inference window.

Looking for the realtime assistant?

The live conversation workspace has moved to a dedicated route where you can share UI screenshots with the assistant for richer context.Open realtime assistant