Redacting PII from agent logs and transcripts

Handling credit cards, SSNs, or health data? Here's how to redact PII from your agent's output.

Real-time redaction with `llm_node`

To redact PII from agent speech before it's spoken, override the llm_node method:

import re

class PIIRedactingAgent(Agent):
    def redact_pii(self, text: str) -> str:
        patterns = {
            r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]',
            r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]',
            r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]',
            r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]',
        }
        for pattern, replacement in patterns.items():
            text = re.sub(pattern, replacement, text)
        return text

    async def llm_node(self, chat_ctx, tools, model_settings=None):
        async def process_stream():
            async with self.session.llm.chat(
                chat_ctx=chat_ctx, tools=tools, tool_choice=None
            ) as stream:
                async for chunk in stream:
                    content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else None
                    if content:
                        yield self.redact_pii(content)
                    else:
                        yield chunk
        return process_stream()

This intercepts LLM output and scrubs patterns before they reach TTS or logs.

Redacting transcripts for export

To export redacted transcripts for analytics or record-keeping:

def redact_pii(text: str) -> str:
    patterns = {
        r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]',
        r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]',
        r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]',
        r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]',
    }
    for pattern, replacement in patterns.items():
        text = re.sub(pattern, replacement, text)
    return text

async def on_session_end(ctx: JobContext) -> None:
    report = ctx.make_session_report()
    data = report.to_dict()

    # Redact PII from conversation history
    for item in data.get('history', []):
        if 'content' in item:
            item['content'] = redact_pii(item['content'])

    # Save to your own storage
    save_to_s3(data)  # or wherever you need it

@server.rtc_session(agent_name="my-agent", on_session_end=on_session_end)
async def entrypoint(ctx: JobContext):
    session = AgentSession()
    # ...

Which approach to use

Scenario	Approach
Prevent PII in agent speech	`llm_node` override with regex patterns
Redacted transcripts for analytics	Custom export with `on_session_end` callback
Both real-time and export redaction	Combine both approaches

For compliance requirements, regex patterns can miss edge cases. Consider using a dedicated PII detection service for production systems.

Redacting PII from agent logs and transcripts

Real-time redaction with `llm_node`

Redacting transcripts for export

Which approach to use

Agents overview

Quickstart guide

Agent models

Building multi-agent architectures with LiveKit agents

Can you increase agent deployment limits?

Real-time redaction with llm_node

Redacting transcripts for export

Which approach to use

Read related documentation

Agents overview

Quickstart guide

Agent models

Find more Agents guides

Building multi-agent architectures with LiveKit agents

Can you increase agent deployment limits?

Real-time redaction with `llm_node`