How to build an agent with speech-to-text input and text-only output

You can configure a LiveKit Agent to accept audio input (speech-to-text) while responding only with text—no TTS audio output. This is useful for chat-style interfaces where you want voice input but text-based responses.

Configuration

To set this up, disable audio output when starting your agent session while keeping audio input enabled:

Python

from livekit.agents.voice import room_io

await session.start(
    agent=MyAgent(),
    room=ctx.room,
    room_options=room_io.RoomOptions(
        audio_output=False,  # Disable TTS audio output
        # audio_input remains True by default
    ),
)

Node.js

await session.start({
  agent: new MyAgent(),
  room: ctx.room,
  outputOptions: {
    audioEnabled: false,  // Disable TTS audio output
  },
  // inputOptions.audioEnabled remains true by default
});

When audio output is disabled:

The agent will not publish an audio track to the room
Text responses are published to the lk.transcription text stream topic
Responses are sent without the lk.transcribed_track_id attribute (since there's no audio track to associate with)
Text is sent without speech synchronization

Receiving Agent Responses

To receive the agent's text responses, you need to listen to the lk.transcription text stream topic on your frontend client. The built-in playground UI uses legacy transcription events and won't display responses when audio track publishing is disabled.

JavaScript

room.registerTextStreamHandler('lk.transcription', async (reader, participantInfo) => {
  const message = await reader.readAll();
  
  // Check if this is a transcription (has track ID) or a text-only response
  const isTranscription = reader.info.attributes['lk.transcribed_track_id'] != null;
  
  if (isTranscription) {
    console.log(`Transcription from ${participantInfo.identity}: ${message}`);
  } else {
    // This is a text-only agent response (no audio)
    console.log(`Agent response from ${participantInfo.identity}: ${message}`);
  }
});

Swift

try await room.registerTextStreamHandler(for: "lk.transcription") { reader, participantIdentity in
    let message = try await reader.readAll()
    
    if let _ = reader.info.attributes["lk.transcribed_track_id"] {
        print("Transcription from \(participantIdentity): \(message)")
    } else {
        // Text-only agent response
        print("Agent response from \(participantIdentity): \(message)")
    }
}

React

For React applications, use the useTranscriptions hook from @livekit/components-react:

import { useTranscriptions } from '@livekit/components-react';

function ChatDisplay() {
  const transcriptions = useTranscriptions();
  
  return (
    <div>
      {transcriptions.map((segment) => (
        <div key={segment.id}>
          <strong>{segment.participant?.identity}:</strong> {segment.text}
        </div>
      ))}
    </div>
  );
}

Important Notes

Console playground limitation: If you're using the console playground and don't see agent responses to audio input, this is expected behavior. You must implement a custom receiver to listen to the lk.transcription text stream topic.
Distinguishing response types: When audio output is disabled, agent responses won't have a lk.transcribed_track_id attribute. You can use this to differentiate between transcriptions of audio tracks and text-only responses.
Hybrid mode: If you need to dynamically toggle audio on and off, use session.output.set_audio_enabled() instead of disabling it in RoomOptions. See the text and transcriptions guide for more details.

Example Implementations

For complete working examples, check out:

Transcriber agent: An agent that performs STT without TTS or LLM
Text streams documentation: Full guide on receiving and sending text streams

Additional Resources

Text and transcriptions guide - Complete documentation on text input/output in agents
Voice agents examples repository - More agent examples

Read related documentation

LiveKit Agents overview - Get started with voice AI agents
Speech-to-text plugins - Configure STT providers
Text streams - Send and receive text data

Find more Agents guides

Troubleshooting STT not picking up utterances - Diagnose speech detection issues
How to detect when an agent has finished speaking - Track playback completion events