How can I reduce latency in voice agents using STT, TTS and LLM?

When implementing voice agents that utilize Speech-to-Text (STT), Text-to-Speech (TTS), and Large Language Models (LLM), latency can impact the performance and user experience. Understanding how to optimize these components is crucial for maintaining efficient operations.

Optimizing Network Proximity

The primary way to reduce latency for voice agents is to optimize the network proximity between your agent and the various services it uses.

Ensure your agent is close (in terms of network latency) to your:

LLM service — Deploy your agent in a region close to your LLM provider's servers
Speech-to-Text service — Minimize network hops between your agent and STT endpoints
Text-to-Speech service — Choose TTS providers with infrastructure near your deployment

Provider-Specific Optimizations

Check the provider's documentation for specific optimization recommendations:

LLM, STT, and TTS providers often have guidelines on their sites for how to optimize for their service
Many providers offer regional endpoints — use the one closest to your agent deployment
Some providers offer dedicated or priority endpoints for lower latency

Measuring Latency with Agents Metrics

Use the Agents observability tools to understand how much latency the agent is experiencing. The metrics help you identify bottlenecks and measure the impact of optimizations.

Key Latency Metrics

The most important latency metrics to focus on initially are:

Metric	Description
Time To First Token (TTFT)	The time from when a request is sent to an LLM until the first token of the response is received. Lower TTFT means faster perceived responsiveness.
Time To First Byte (TTFB)	The time from when a request is sent until the first byte of the response is received. This applies to STT, TTS, and other network requests.

Additional Optimization Strategies

Beyond network proximity, consider these strategies:

Use streaming responses — Enable streaming for LLM, STT, and TTS to process data incrementally
Optimize model selection — Smaller, faster models may provide acceptable quality with lower latency
Implement caching — Cache common responses or audio segments where appropriate
Pre-warm connections — Establish connections before they're needed to avoid cold start delays

Summary

Reducing latency in voice agents requires a holistic approach:

Deploy agents close to your AI service providers
Monitor TTFT and TTFB metrics to identify bottlenecks
Follow provider-specific optimization guidelines
Use streaming and caching where possible

For more details on monitoring agent performance, see the observability documentation.

How can I reduce latency in voice agents using STT, TTS and LLM?

Optimizing Network Proximity

Provider-Specific Optimizations

Measuring Latency with Agents Metrics

Key Latency Metrics

Additional Optimization Strategies

Summary

Agents overview

Quickstart guide

Agent models

Building multi-agent architectures with LiveKit agents

Can you increase agent deployment limits?

Optimizing Network Proximity

Provider-Specific Optimizations

Measuring Latency with Agents Metrics

Key Latency Metrics

Additional Optimization Strategies

Summary

Read related documentation

Agents overview

Quickstart guide

Agent models

Find more Agents guides

Building multi-agent architectures with LiveKit agents

Can you increase agent deployment limits?