How to build an agent with speech-to-text input and text-only output
Configure a LiveKit Agent to accept audio input via speech-to-text while responding only with text, and learn how to receive the text responses on your frontend.
Last Updated:
You can configure a LiveKit Agent to accept audio input (speech-to-text) while responding only with text—no TTS audio output. This is useful for chat-style interfaces where you want voice input but text-based responses.
Configuration
To set this up, disable audio output when starting your agent session while keeping audio input enabled:
Python
from livekit.agents.voice import room_io
await session.start( agent=MyAgent(), room=ctx.room, room_options=room_io.RoomOptions( audio_output=False, # Disable TTS audio output # audio_input remains True by default ),)
Node.js
await session.start({ agent: new MyAgent(), room: ctx.room, outputOptions: { audioEnabled: false, // Disable TTS audio output }, // inputOptions.audioEnabled remains true by default});
When audio output is disabled:
- The agent will not publish an audio track to the room
- Text responses are published to the
lk.transcriptiontext stream topic - Responses are sent without the
lk.transcribed_track_idattribute (since there's no audio track to associate with) - Text is sent without speech synchronization
Receiving Agent Responses
To receive the agent's text responses, you need to listen to the lk.transcription text stream topic on your frontend client. The built-in playground UI uses legacy transcription events and won't display responses when audio track publishing is disabled.
JavaScript
room.registerTextStreamHandler('lk.transcription', async (reader, participantInfo) => { const message = await reader.readAll(); // Check if this is a transcription (has track ID) or a text-only response const isTranscription = reader.info.attributes['lk.transcribed_track_id'] != null; if (isTranscription) { console.log(`Transcription from ${participantInfo.identity}: ${message}`); } else { // This is a text-only agent response (no audio) console.log(`Agent response from ${participantInfo.identity}: ${message}`); }});
Swift
try await room.registerTextStreamHandler(for: "lk.transcription") { reader, participantIdentity in let message = try await reader.readAll() if let _ = reader.info.attributes["lk.transcribed_track_id"] { print("Transcription from \(participantIdentity): \(message)") } else { // Text-only agent response print("Agent response from \(participantIdentity): \(message)") }}
React
For React applications, use the useTranscriptions hook from @livekit/components-react:
import { useTranscriptions } from '@livekit/components-react';
function ChatDisplay() { const transcriptions = useTranscriptions(); return ( <div> {transcriptions.map((segment) => ( <div key={segment.id}> <strong>{segment.participant?.identity}:</strong> {segment.text} </div> ))} </div> );}
Important Notes
-
Console playground limitation: If you're using the console playground and don't see agent responses to audio input, this is expected behavior. You must implement a custom receiver to listen to the
lk.transcriptiontext stream topic. -
Distinguishing response types: When audio output is disabled, agent responses won't have a
lk.transcribed_track_idattribute. You can use this to differentiate between transcriptions of audio tracks and text-only responses. -
Hybrid mode: If you need to dynamically toggle audio on and off, use
session.output.set_audio_enabled()instead of disabling it inRoomOptions. See the text and transcriptions guide for more details.
Example Implementations
For complete working examples, check out:
- Transcriber agent: An agent that performs STT without TTS or LLM
- Text streams documentation: Full guide on receiving and sending text streams
Additional Resources
- Text and transcriptions guide - Complete documentation on text input/output in agents
- Voice agents examples repository - More agent examples
Read related documentation
- LiveKit Agents overview - Get started with voice AI agents
- Speech-to-text plugins - Configure STT providers
- Text streams - Send and receive text data
Find more Agents guides
- Troubleshooting STT not picking up utterances - Diagnose speech detection issues
- How to detect when an agent has finished speaking - Track playback completion events