Skip to main content
 
Field Guides

How can I reduce latency in voice agents using STT, TTS and LLM?

Optimize network proximity and key metrics like TTFT and TTFB to reduce latency in voice agents that use Speech-to-Text, Text-to-Speech, and Large Language Models.

Last Updated:


When implementing voice agents that utilize Speech-to-Text (STT), Text-to-Speech (TTS), and Large Language Models (LLM), latency can impact the performance and user experience. Understanding how to optimize these components is crucial for maintaining efficient operations.

Optimizing Network Proximity

The primary way to reduce latency for voice agents is to optimize the network proximity between your agent and the various services it uses.

Ensure your agent is close (in terms of network latency) to your:

  • LLM service — Deploy your agent in a region close to your LLM provider's servers
  • Speech-to-Text service — Minimize network hops between your agent and STT endpoints
  • Text-to-Speech service — Choose TTS providers with infrastructure near your deployment

Provider-Specific Optimizations

Check the provider's documentation for specific optimization recommendations:

  • LLM, STT, and TTS providers often have guidelines on their sites for how to optimize for their service
  • Many providers offer regional endpoints — use the one closest to your agent deployment
  • Some providers offer dedicated or priority endpoints for lower latency

Measuring Latency with Agents Metrics

Use the Agents observability tools to understand how much latency the agent is experiencing. The metrics help you identify bottlenecks and measure the impact of optimizations.

Key Latency Metrics

The most important latency metrics to focus on initially are:

MetricDescription
Time To First Token (TTFT)The time from when a request is sent to an LLM until the first token of the response is received. Lower TTFT means faster perceived responsiveness.
Time To First Byte (TTFB)The time from when a request is sent until the first byte of the response is received. This applies to STT, TTS, and other network requests.

Additional Optimization Strategies

Beyond network proximity, consider these strategies:

  • Use streaming responses — Enable streaming for LLM, STT, and TTS to process data incrementally
  • Optimize model selection — Smaller, faster models may provide acceptable quality with lower latency
  • Implement caching — Cache common responses or audio segments where appropriate
  • Pre-warm connections — Establish connections before they're needed to avoid cold start delays

Summary

Reducing latency in voice agents requires a holistic approach:

  1. Deploy agents close to your AI service providers
  2. Monitor TTFT and TTFB metrics to identify bottlenecks
  3. Follow provider-specific optimization guidelines
  4. Use streaming and caching where possible

For more details on monitoring agent performance, see the observability documentation.