Skip to main content
Field Guides

Redacting PII from agent logs and transcripts

Learn how to protect sensitive user data by redacting PII from agent logs, transcripts, and observability data in LiveKit.

Last Updated:


Handling credit cards, SSNs, or health data? Here's how to redact PII from your agent's output.

Real-time redaction with llm_node

To redact PII from agent speech before it's spoken, override the llm_node method:

import re
class PIIRedactingAgent(Agent):
def redact_pii(self, text: str) -> str:
patterns = {
r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]',
r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]',
r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]',
r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]',
}
for pattern, replacement in patterns.items():
text = re.sub(pattern, replacement, text)
return text
async def llm_node(self, chat_ctx, tools, model_settings=None):
async def process_stream():
async with self.session.llm.chat(
chat_ctx=chat_ctx, tools=tools, tool_choice=None
) as stream:
async for chunk in stream:
content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else None
if content:
yield self.redact_pii(content)
else:
yield chunk
return process_stream()

This intercepts LLM output and scrubs patterns before they reach TTS or logs.

Redacting transcripts for export

To export redacted transcripts for analytics or record-keeping:

def redact_pii(text: str) -> str:
patterns = {
r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]',
r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]',
r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]',
r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]',
}
for pattern, replacement in patterns.items():
text = re.sub(pattern, replacement, text)
return text
async def on_session_end(ctx: JobContext) -> None:
report = ctx.make_session_report()
data = report.to_dict()
# Redact PII from conversation history
for item in data.get('history', []):
if 'content' in item:
item['content'] = redact_pii(item['content'])
# Save to your own storage
save_to_s3(data) # or wherever you need it
@server.rtc_session(on_session_end=on_session_end)
async def entrypoint(ctx: JobContext):
session = AgentSession()
# ...

Which approach to use

ScenarioApproach
Prevent PII in agent speechllm_node override with regex patterns
Redacted transcripts for analyticsCustom export with on_session_end callback
Both real-time and export redactionCombine both approaches

For compliance requirements, regex patterns can miss edge cases. Consider using a dedicated PII detection service for production systems.