How to detect when an agent has finished speaking
Learn how to accurately detect when a voice agent has finished speaking using SpeechHandle, and how to notify web clients.
Last Updated:
When working with Text-to-Speech (TTS) audio in the agent system, you may need to detect when an agent has completely finished speaking. This guide covers the recommended approach and how to make this information available to web clients.
Why detect playback completion?
Common use cases include:
- UI feedback: Update a visual indicator (like an animated avatar or speaking icon) when the agent stops talking.
- Turn-taking logic: Trigger actions only after the agent finishes, such as enabling a "your turn" prompt.
- Analytics: Measure actual speaking duration for conversation analysis.
- Accessibility: Provide screen reader cues that the agent has finished speaking.
Using SpeechHandle.wait_for_playout() (Recommended)
The most accurate way to detect when an agent has finished speaking is to use the SpeechHandle returned by session.say() or session.generate_reply():
# Using session.say()handle = session.say("Hello, how can I help you today?")await handle.wait_for_playout()logger.info("Agent finished speaking")
# Using session.generate_reply()handle = await session.generate_reply(instructions="Greet the user")await handle.wait_for_playout()logger.info("Agent finished speaking")
The wait_for_playout() method returns when the audio has been fully played to the participant. You can also check if the speech was interrupted:
handle = session.say("Let me explain...")await handle.wait_for_playout()
if handle.interrupted: logger.info("Speech was interrupted by user")else: logger.info("Speech completed naturally")
Using callbacks for non-blocking detection
If you don't want to await the playout, you can use a callback:
def on_playout_complete(handle): logger.info(f"Playout complete, interrupted: {handle.interrupted}")
handle = session.say("Processing your request...")handle.add_done_callback(on_playout_complete)
Notifying web clients
To notify web clients when speech completes, you can use participant attributes or data messages. Here's an example using attributes:
Agent-side implementation
import json
async def notify_speech_complete(room, interrupted: bool): # Update participant attributes to signal completion await room.local_participant.set_attributes({ "speech_state": "idle", "last_speech_interrupted": str(interrupted).lower() })
# Usagehandle = session.say("Hello!")await handle.wait_for_playout()await notify_speech_complete(ctx.room, handle.interrupted)
Client-side implementation
On the web client, listen for attribute changes:
room.on(RoomEvent.ParticipantAttributesChanged, (changedAttributes, participant) => { if (participant.isAgent && changedAttributes.speech_state === 'idle') { console.log('Agent finished speaking'); const wasInterrupted = changedAttributes.last_speech_interrupted === 'true'; // Handle the completion event }});
Important consideration
The final parameter in the transcription listener does not indicate that the audio has finished playing—it only indicates that the transcription is complete. The transcription may finish before or after the actual audio playback completes.
For accurate playback completion detection, always use SpeechHandle.wait_for_playout() rather than relying on transcription events.
Summary
| Approach | Use Case | Accuracy |
|---|---|---|
SpeechHandle.wait_for_playout() | Detecting actual audio playback completion | ✅ Accurate |
SpeechHandle.add_done_callback() | Non-blocking playback detection | ✅ Accurate |
Transcription final flag | Detecting when transcription is complete | ❌ Not for playback timing |
Additional resources
For more examples and advanced implementations, refer to our voice agents examples repository.