Redacting PII from agent logs and transcripts
Learn how to protect sensitive user data by redacting PII from agent logs, transcripts, and observability data in LiveKit.
Last Updated:
Handling credit cards, SSNs, or health data? Here's how to keep it out of your logs.
The simple answer: disable recording
For most compliance use cases, disable observability uploads entirely:
await session.start( agent=agent, room=ctx.room, record=False)
await session.start({ agent, room: ctx.room, record: false,});
This prevents all session data—audio, transcripts, traces, and logs—from being uploaded to LiveKit Cloud. Done.
If you need transcripts (but redacted)
Disable Cloud recording, then export redacted transcripts yourself:
async def on_session_end(ctx: JobContext) -> None: report = ctx.make_session_report() data = report.to_dict() # Redact PII from conversation history for item in data.get('history', []): if 'content' in item: item['content'] = redact_pii(item['content']) # Save to your own storage save_to_s3(data) # or wherever you need it
@server.rtc_session(on_session_end=on_session_end)async def entrypoint(ctx: JobContext): session = AgentSession(record=False) # ...
Real-time redaction (advanced)
Need to redact PII from agent speech before it's spoken? Override llm_node:
import re
class PIIRedactingAgent(Agent): def redact_pii(self, text: str) -> str: patterns = { r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]', r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]', r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]', r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]', } for pattern, replacement in patterns.items(): text = re.sub(pattern, replacement, text) return text
async def llm_node(self, chat_ctx, tools, model_settings=None): async def process_stream(): async with self.session.llm.chat( chat_ctx=chat_ctx, tools=tools, tool_choice=None ) as stream: async for chunk in stream: content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else None if content: yield self.redact_pii(content) else: yield chunk return process_stream()
This intercepts LLM output and scrubs patterns before they reach TTS or logs. Combine with record=False for complete protection.
Project-level settings
Disable observability for all sessions in a project:
- Go to Settings → Data and privacy in the Cloud dashboard
- Toggle Agent observability off
Which approach to use
| Scenario | Approach |
|---|---|
| HIPAA/PCI-DSS compliance | record=False — no exceptions |
| Need redacted transcripts for analytics | record=False + custom export with redaction |
| Redact specific patterns in agent speech | llm_node override + record=False |
| All sessions in project are sensitive | Project-level disable |
For compliance, always use record=False. Regex patterns can miss edge cases.