Skip to main content
Field Guides

Redacting PII from agent logs and transcripts

Learn how to protect sensitive user data by redacting PII from agent logs, transcripts, and observability data in LiveKit.

Last Updated:


Handling credit cards, SSNs, or health data? Here's how to keep it out of your logs.

The simple answer: disable recording

For most compliance use cases, disable observability uploads entirely:

await session.start(
agent=agent,
room=ctx.room,
record=False
)
await session.start({
agent,
room: ctx.room,
record: false,
});

This prevents all session data—audio, transcripts, traces, and logs—from being uploaded to LiveKit Cloud. Done.

If you need transcripts (but redacted)

Disable Cloud recording, then export redacted transcripts yourself:

async def on_session_end(ctx: JobContext) -> None:
report = ctx.make_session_report()
data = report.to_dict()
# Redact PII from conversation history
for item in data.get('history', []):
if 'content' in item:
item['content'] = redact_pii(item['content'])
# Save to your own storage
save_to_s3(data) # or wherever you need it
@server.rtc_session(on_session_end=on_session_end)
async def entrypoint(ctx: JobContext):
session = AgentSession(record=False)
# ...

Real-time redaction (advanced)

Need to redact PII from agent speech before it's spoken? Override llm_node:

import re
class PIIRedactingAgent(Agent):
def redact_pii(self, text: str) -> str:
patterns = {
r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]',
r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]',
r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]',
r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]',
}
for pattern, replacement in patterns.items():
text = re.sub(pattern, replacement, text)
return text
async def llm_node(self, chat_ctx, tools, model_settings=None):
async def process_stream():
async with self.session.llm.chat(
chat_ctx=chat_ctx, tools=tools, tool_choice=None
) as stream:
async for chunk in stream:
content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else None
if content:
yield self.redact_pii(content)
else:
yield chunk
return process_stream()

This intercepts LLM output and scrubs patterns before they reach TTS or logs. Combine with record=False for complete protection.

Project-level settings

Disable observability for all sessions in a project:

  1. Go to SettingsData and privacy in the Cloud dashboard
  2. Toggle Agent observability off

Which approach to use

ScenarioApproach
HIPAA/PCI-DSS compliancerecord=False — no exceptions
Need redacted transcripts for analyticsrecord=False + custom export with redaction
Redact specific patterns in agent speechllm_node override + record=False
All sessions in project are sensitiveProject-level disable

For compliance, always use record=False. Regex patterns can miss edge cases.