Redacting PII from agent logs and transcripts
Learn how to protect sensitive user data by redacting PII from agent logs, transcripts, and observability data in LiveKit.
Last Updated:
Handling credit cards, SSNs, or health data? Here's how to redact PII from your agent's output.
Real-time redaction with llm_node
To redact PII from agent speech before it's spoken, override the llm_node method:
import re
class PIIRedactingAgent(Agent): def redact_pii(self, text: str) -> str: patterns = { r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]', r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]', r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]', r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]', } for pattern, replacement in patterns.items(): text = re.sub(pattern, replacement, text) return text
async def llm_node(self, chat_ctx, tools, model_settings=None): async def process_stream(): async with self.session.llm.chat( chat_ctx=chat_ctx, tools=tools, tool_choice=None ) as stream: async for chunk in stream: content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else None if content: yield self.redact_pii(content) else: yield chunk return process_stream()
This intercepts LLM output and scrubs patterns before they reach TTS or logs.
Redacting transcripts for export
To export redacted transcripts for analytics or record-keeping:
def redact_pii(text: str) -> str: patterns = { r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]', r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]', r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]', r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]', } for pattern, replacement in patterns.items(): text = re.sub(pattern, replacement, text) return text
async def on_session_end(ctx: JobContext) -> None: report = ctx.make_session_report() data = report.to_dict()
# Redact PII from conversation history for item in data.get('history', []): if 'content' in item: item['content'] = redact_pii(item['content'])
# Save to your own storage save_to_s3(data) # or wherever you need it
@server.rtc_session(on_session_end=on_session_end)async def entrypoint(ctx: JobContext): session = AgentSession() # ...
Which approach to use
| Scenario | Approach |
|---|---|
| Prevent PII in agent speech | llm_node override with regex patterns |
| Redacted transcripts for analytics | Custom export with on_session_end callback |
| Both real-time and export redaction | Combine both approaches |
For compliance requirements, regex patterns can miss edge cases. Consider using a dedicated PII detection service for production systems.
Read related documentation
Find more Agents guides
Building multi-agent architectures with LiveKit agents
Learn best practices for building multi-agent architectures including session state management, chat context handling, TaskGroup patterns, and dynamic per-client routing.
Can you increase agent deployment limits?
Understand the hard cap on agent deployments and how to build a multi-tenant agent that scales without provisioning more slots.