Skip to main content
 
Field Guides

How to detect when an agent has finished speaking

Learn how to accurately detect when a voice agent has finished speaking using SpeechHandle, and how to notify web clients.

Last Updated:


When working with Text-to-Speech (TTS) audio in the agent system, you may need to detect when an agent has completely finished speaking. This guide covers the recommended approach and how to make this information available to web clients.

Why detect playback completion?

Common use cases include:

  • UI feedback: Update a visual indicator (like an animated avatar or speaking icon) when the agent stops talking.
  • Turn-taking logic: Trigger actions only after the agent finishes, such as enabling a "your turn" prompt.
  • Analytics: Measure actual speaking duration for conversation analysis.
  • Accessibility: Provide screen reader cues that the agent has finished speaking.

Using SpeechHandle.wait_for_playout() (Recommended)

The most accurate way to detect when an agent has finished speaking is to use the SpeechHandle returned by session.say() or session.generate_reply():

# Using session.say()
handle = session.say("Hello, how can I help you today?")
await handle.wait_for_playout()
logger.info("Agent finished speaking")
# Using session.generate_reply()
handle = await session.generate_reply(instructions="Greet the user")
await handle.wait_for_playout()
logger.info("Agent finished speaking")

The wait_for_playout() method returns when the audio has been fully played to the participant. You can also check if the speech was interrupted:

handle = session.say("Let me explain...")
await handle.wait_for_playout()
if handle.interrupted:
logger.info("Speech was interrupted by user")
else:
logger.info("Speech completed naturally")

Using callbacks for non-blocking detection

If you don't want to await the playout, you can use a callback:

def on_playout_complete(handle):
logger.info(f"Playout complete, interrupted: {handle.interrupted}")
handle = session.say("Processing your request...")
handle.add_done_callback(on_playout_complete)

Notifying web clients

To notify web clients when speech completes, you can use participant attributes or data messages. Here's an example using attributes:

Agent-side implementation

import json
async def notify_speech_complete(room, interrupted: bool):
# Update participant attributes to signal completion
await room.local_participant.set_attributes({
"speech_state": "idle",
"last_speech_interrupted": str(interrupted).lower()
})
# Usage
handle = session.say("Hello!")
await handle.wait_for_playout()
await notify_speech_complete(ctx.room, handle.interrupted)

Client-side implementation

On the web client, listen for attribute changes:

room.on(RoomEvent.ParticipantAttributesChanged, (changedAttributes, participant) => {
if (participant.isAgent && changedAttributes.speech_state === 'idle') {
console.log('Agent finished speaking');
const wasInterrupted = changedAttributes.last_speech_interrupted === 'true';
// Handle the completion event
}
});

Important consideration

The final parameter in the transcription listener does not indicate that the audio has finished playing—it only indicates that the transcription is complete. The transcription may finish before or after the actual audio playback completes.

For accurate playback completion detection, always use SpeechHandle.wait_for_playout() rather than relying on transcription events.

Summary

ApproachUse CaseAccuracy
SpeechHandle.wait_for_playout()Detecting actual audio playback completion✅ Accurate
SpeechHandle.add_done_callback()Non-blocking playback detection✅ Accurate
Transcription final flagDetecting when transcription is complete❌ Not for playback timing

Additional resources

For more examples and advanced implementations, refer to our voice agents examples repository.