Skip to main content
Field Guides

Can you increase agent deployment limits?

Understand the hard cap on agent deployments and how to build a multi-tenant agent that scales without provisioning more slots.

Last Updated:

Troubleshooting

Customer-facing agents stay warm so they can answer in milliseconds. That low-latency guarantee means each deployed agent consumes compute 24/7, so the platform enforces a hard ceiling on how many agents you can keep running simultaneously.

Why the Limit Exists

  • Always-on resources. Idle agents still hold memory, GPU/CPU reservations, and network sockets to keep latency tight; raising the cap would drive up baseline costs for everyone.
  • Predictable quality of service. A fixed number of long-lived agents lets us size infrastructure to deliver consistent response times under bursty traffic.
  • Operational safety. Hard limits prevent runaway deployments from preempting capacity needed by other workloads.

Because of those constraints, the default limit is not something Support can toggle higher on demand.

Build a Multi-Tenant Agent Instead

Rather than spinning up one agent per customer, run a single multi-tenant agent that hydrates the right prompt and context when a call begins:

  1. Identify the caller (account ID, phone number, JWT claims, etc.) as part of your session setup.
  2. Pull the tenant’s prompt, tools, and guardrails from your config store or CRM.
  3. Inject those values into the agent before you accept media, then let the session proceed normally.

This approach gives you:

  • One deployment to manage. Updates, observability, and rollbacks happen in one place.
  • Efficient load balancing. Every session hits the same agent pool, so autoscaling reacts faster and wastes less headroom.
  • Consistent behavior. You can enforce defaults (logging, escalation rules, compliance checks) once instead of per-tenant.

Need More Agents Anyway?

If you still require dedicated agents—for example, contractual isolation between regulated tenants—we can provision additional slots on the Enterprise plan for an added fee. Reach out through your account team or Support with the rationale, projected concurrency, and compliance requirements so we can quote the capacity expansion.

Until then, treat the default cap as fixed and invest in multi-tenant design; it is the fastest path to scale while keeping latency low and operations simple.