Skip to main content
 
Field Guides

How to build an agent with speech-to-text input and text-only output

Configure a LiveKit Agent to accept audio input via speech-to-text while responding only with text, and learn how to receive the text responses on your frontend.

Last Updated:


You can configure a LiveKit Agent to accept audio input (speech-to-text) while responding only with text—no TTS audio output. This is useful for chat-style interfaces where you want voice input but text-based responses.

Configuration

To set this up, disable audio output when starting your agent session while keeping audio input enabled:

Python

from livekit.agents.voice import room_io
await session.start(
agent=MyAgent(),
room=ctx.room,
room_options=room_io.RoomOptions(
audio_output=False, # Disable TTS audio output
# audio_input remains True by default
),
)

Node.js

await session.start({
agent: new MyAgent(),
room: ctx.room,
outputOptions: {
audioEnabled: false, // Disable TTS audio output
},
// inputOptions.audioEnabled remains true by default
});

When audio output is disabled:

  • The agent will not publish an audio track to the room
  • Text responses are published to the lk.transcription text stream topic
  • Responses are sent without the lk.transcribed_track_id attribute (since there's no audio track to associate with)
  • Text is sent without speech synchronization

Receiving Agent Responses

To receive the agent's text responses, you need to listen to the lk.transcription text stream topic on your frontend client. The built-in playground UI uses legacy transcription events and won't display responses when audio track publishing is disabled.

JavaScript

room.registerTextStreamHandler('lk.transcription', async (reader, participantInfo) => {
const message = await reader.readAll();
// Check if this is a transcription (has track ID) or a text-only response
const isTranscription = reader.info.attributes['lk.transcribed_track_id'] != null;
if (isTranscription) {
console.log(`Transcription from ${participantInfo.identity}: ${message}`);
} else {
// This is a text-only agent response (no audio)
console.log(`Agent response from ${participantInfo.identity}: ${message}`);
}
});

Swift

try await room.registerTextStreamHandler(for: "lk.transcription") { reader, participantIdentity in
let message = try await reader.readAll()
if let _ = reader.info.attributes["lk.transcribed_track_id"] {
print("Transcription from \(participantIdentity): \(message)")
} else {
// Text-only agent response
print("Agent response from \(participantIdentity): \(message)")
}
}

React

For React applications, use the useTranscriptions hook from @livekit/components-react:

import { useTranscriptions } from '@livekit/components-react';
function ChatDisplay() {
const transcriptions = useTranscriptions();
return (
<div>
{transcriptions.map((segment) => (
<div key={segment.id}>
<strong>{segment.participant?.identity}:</strong> {segment.text}
</div>
))}
</div>
);
}

Important Notes

  • Console playground limitation: If you're using the console playground and don't see agent responses to audio input, this is expected behavior. You must implement a custom receiver to listen to the lk.transcription text stream topic.

  • Distinguishing response types: When audio output is disabled, agent responses won't have a lk.transcribed_track_id attribute. You can use this to differentiate between transcriptions of audio tracks and text-only responses.

  • Hybrid mode: If you need to dynamically toggle audio on and off, use session.output.set_audio_enabled() instead of disabling it in RoomOptions. See the text and transcriptions guide for more details.

Example Implementations

For complete working examples, check out:

Additional Resources

Read related documentation

Find more Agents guides