VAD and turn detection configuration guide
Understand which VAD and turn detection parameters apply to your agent type, and how to configure them for optimal voice interaction.
Last Updated:
LiveKit Agents provides extensive control over Voice Activity Detection (VAD) and turn detection. However, not all parameters apply to every agent configuration. This guide explains which parameters matter for your specific setup.
VAD Configuration Parameters
These parameters are defined in the Silero VAD implementation and control how speech is detected in audio streams:
| Parameter | Description |
|---|---|
min_speech_duration | Minimum duration of speech to trigger detection |
min_silence_duration | Minimum silence duration to mark end of speech |
prefix_padding_duration | Audio to include before detected speech starts |
max_buffered_speech | Maximum speech audio to buffer before processing |
activation_threshold | Confidence threshold to classify audio as speech |
These parameters are used whenever VAD is present in your pipeline.
Turn Detection and Interrupt Parameters
These parameters are defined in the AgentSession constructor and control how the agent handles conversational turns:
| Parameter | Description |
|---|---|
allow_interruptions | Whether users can interrupt agent responses |
min_interruption_duration | Minimum speech duration to trigger an interruption |
min_endpointing_delay | Minimum wait time before finalizing a user turn |
max_endpointing_delay | Maximum wait time when turn detector confidence is low |
Scenario Reference
Pipeline Agents with Turn Detector Model
When using a turn detector model alongside VAD and STT, all parameters are active:
| Parameter | Status |
|---|---|
| All VAD parameters | Used |
allow_interruptions | Used |
min_interruption_duration | Used |
min_endpointing_delay | Used |
max_endpointing_delay | Used |
The turn detector model provides confidence scores that influence timing between min_endpointing_delay and max_endpointing_delay.
Pipeline Agents without Turn Detector Model
Without a turn detector model, max_endpointing_delay becomes irrelevant since there are no confidence predictions to act on:
| Parameter | Status |
|---|---|
| All VAD parameters | Used |
allow_interruptions | Used |
min_interruption_duration | Used |
min_endpointing_delay | Used (as the default delay) |
max_endpointing_delay | Not used |
Realtime Model Agents (OpenAI, Google)
When using realtime models like OpenAI's Realtime API or Google's realtime models, VAD and turn detection are handled server-side:
| Parameter | Status | Reason |
|---|---|---|
min_speech_duration | Not used | Server-side VAD |
min_silence_duration | Not used | Server-side VAD |
prefix_padding_duration | Not used | Server-side VAD |
max_buffered_speech | Not used | Server-side VAD |
activation_threshold | Not used | Server-side VAD |
min_endpointing_delay | Not used | Server-side turn detection |
max_endpointing_delay | Not used | Server-side turn detection |
allow_interruptions | Used | Controls client-side behavior |
min_interruption_duration | Used | Controls client-side behavior |
Universal Parameters
These parameters work across all agent types:
| Parameter | Description |
|---|---|
allow_interruptions | Controls whether user speech interrupts agent playback |
min_interruption_duration | Sets minimum speech duration to count as interruption |
Quick Reference by Agent Type
| Agent Type | VAD Params | Endpointing Delays | Interruption Params |
|---|---|---|---|
| Pipeline + Turn Detector | All | Both | Both |
| Pipeline (no Turn Detector) | All | min only | Both |
| Realtime (OpenAI/Google) | None | None | Both |
How Turn Detection Mode is Selected
The turn detection mode is automatically selected based on available components, with this priority order:
- realtime_llm — Uses server-side VAD and turn detection
- vad — Uses client-side Silero VAD
- stt — Falls back to STT-based detection
- manual — No automatic turn detection
Configuration Examples
Pipeline Agent with Turn Detector
from livekit.agents import AgentSessionfrom livekit.plugins import silero
session = AgentSession( vad=silero.VAD.load( min_speech_duration=0.1, min_silence_duration=0.3, prefix_padding_duration=0.5, activation_threshold=0.5, ), turn_detector=my_turn_detector, allow_interruptions=True, min_interruption_duration=0.5, min_endpointing_delay=0.5, max_endpointing_delay=6.0, # Used with turn detector confidence)
Pipeline Agent without Turn Detector
from livekit.agents import AgentSessionfrom livekit.plugins import silero
session = AgentSession( vad=silero.VAD.load( min_speech_duration=0.1, min_silence_duration=0.5, # More important without turn detector prefix_padding_duration=0.5, activation_threshold=0.5, ), allow_interruptions=True, min_interruption_duration=0.5, min_endpointing_delay=0.8, # Acts as the fixed delay # max_endpointing_delay not needed)
Realtime Model Agent
from livekit.agents import AgentSessionfrom livekit.plugins import openai
session = AgentSession( llm=openai.realtime.RealtimeModel(), # VAD params not needed - handled server-side # Endpointing delays not needed - handled server-side allow_interruptions=True, min_interruption_duration=0.5,)
Summary
Understanding which parameters apply to your agent configuration prevents confusion and helps you tune the right settings. Remember:
- Pipeline agents give you full control over VAD and turn detection
- Realtime agents delegate most detection to the server, but you still control interruption behavior
- Turn detector models enable confidence-based timing with the endpointing delay range
For complete documentation on turn detection modes, interruption handling, and session configuration, see the Turns overview in the LiveKit docs.