Skip to main content
 
Field Guides

VAD and turn detection configuration guide

Understand which VAD and turn detection parameters apply to your agent type, and how to configure them for optimal voice interaction.

Last Updated:


LiveKit Agents provides extensive control over Voice Activity Detection (VAD) and turn detection. However, not all parameters apply to every agent configuration. This guide explains which parameters matter for your specific setup.

VAD Configuration Parameters

These parameters are defined in the Silero VAD implementation and control how speech is detected in audio streams:

ParameterDescription
min_speech_durationMinimum duration of speech to trigger detection
min_silence_durationMinimum silence duration to mark end of speech
prefix_padding_durationAudio to include before detected speech starts
max_buffered_speechMaximum speech audio to buffer before processing
activation_thresholdConfidence threshold to classify audio as speech

These parameters are used whenever VAD is present in your pipeline.

Turn Detection and Interrupt Parameters

These parameters are defined in the AgentSession constructor and control how the agent handles conversational turns:

ParameterDescription
allow_interruptionsWhether users can interrupt agent responses
min_interruption_durationMinimum speech duration to trigger an interruption
min_endpointing_delayMinimum wait time before finalizing a user turn
max_endpointing_delayMaximum wait time when turn detector confidence is low

Scenario Reference

Pipeline Agents with Turn Detector Model

When using a turn detector model alongside VAD and STT, all parameters are active:

ParameterStatus
All VAD parametersUsed
allow_interruptionsUsed
min_interruption_durationUsed
min_endpointing_delayUsed
max_endpointing_delayUsed

The turn detector model provides confidence scores that influence timing between min_endpointing_delay and max_endpointing_delay.

Pipeline Agents without Turn Detector Model

Without a turn detector model, max_endpointing_delay becomes irrelevant since there are no confidence predictions to act on:

ParameterStatus
All VAD parametersUsed
allow_interruptionsUsed
min_interruption_durationUsed
min_endpointing_delayUsed (as the default delay)
max_endpointing_delayNot used

Realtime Model Agents (OpenAI, Google)

When using realtime models like OpenAI's Realtime API or Google's realtime models, VAD and turn detection are handled server-side:

ParameterStatusReason
min_speech_durationNot usedServer-side VAD
min_silence_durationNot usedServer-side VAD
prefix_padding_durationNot usedServer-side VAD
max_buffered_speechNot usedServer-side VAD
activation_thresholdNot usedServer-side VAD
min_endpointing_delayNot usedServer-side turn detection
max_endpointing_delayNot usedServer-side turn detection
allow_interruptionsUsedControls client-side behavior
min_interruption_durationUsedControls client-side behavior

Universal Parameters

These parameters work across all agent types:

ParameterDescription
allow_interruptionsControls whether user speech interrupts agent playback
min_interruption_durationSets minimum speech duration to count as interruption

Quick Reference by Agent Type

Agent TypeVAD ParamsEndpointing DelaysInterruption Params
Pipeline + Turn DetectorAllBothBoth
Pipeline (no Turn Detector)Allmin onlyBoth
Realtime (OpenAI/Google)NoneNoneBoth

How Turn Detection Mode is Selected

The turn detection mode is automatically selected based on available components, with this priority order:

  1. realtime_llm — Uses server-side VAD and turn detection
  2. vad — Uses client-side Silero VAD
  3. stt — Falls back to STT-based detection
  4. manual — No automatic turn detection

Configuration Examples

Pipeline Agent with Turn Detector

from livekit.agents import AgentSession
from livekit.plugins import silero
session = AgentSession(
vad=silero.VAD.load(
min_speech_duration=0.1,
min_silence_duration=0.3,
prefix_padding_duration=0.5,
activation_threshold=0.5,
),
turn_detector=my_turn_detector,
allow_interruptions=True,
min_interruption_duration=0.5,
min_endpointing_delay=0.5,
max_endpointing_delay=6.0, # Used with turn detector confidence
)

Pipeline Agent without Turn Detector

from livekit.agents import AgentSession
from livekit.plugins import silero
session = AgentSession(
vad=silero.VAD.load(
min_speech_duration=0.1,
min_silence_duration=0.5, # More important without turn detector
prefix_padding_duration=0.5,
activation_threshold=0.5,
),
allow_interruptions=True,
min_interruption_duration=0.5,
min_endpointing_delay=0.8, # Acts as the fixed delay
# max_endpointing_delay not needed
)

Realtime Model Agent

from livekit.agents import AgentSession
from livekit.plugins import openai
session = AgentSession(
llm=openai.realtime.RealtimeModel(),
# VAD params not needed - handled server-side
# Endpointing delays not needed - handled server-side
allow_interruptions=True,
min_interruption_duration=0.5,
)

Summary

Understanding which parameters apply to your agent configuration prevents confusion and helps you tune the right settings. Remember:

  • Pipeline agents give you full control over VAD and turn detection
  • Realtime agents delegate most detection to the server, but you still control interruption behavior
  • Turn detector models enable confidence-based timing with the endpointing delay range

For complete documentation on turn detection modes, interruption handling, and session configuration, see the Turns overview in the LiveKit docs.