Rate limits in RenLayer protect three things at once: your upstream provider budget (you don’t pay for a runaway agent), your own systems (an agent in a loop can hammer an internal API), and your end users (one user shouldn’t consume the entire tenant quota).
Three scopes
Limits can be defined at any of three scopes:
- Per-agent: caps on what a single agent can do, regardless of which user triggered it. Useful for capping a back-office automation.
- Per-user: caps on the actions attributable to a single end user, identified by the
X-RenLayer-Userheader. Useful when one agent serves many users. - Per-tenant: caps on aggregate usage across all agents and users in a tenant. Useful as a cost backstop.
Limits stack. A request must pass all applicable limits to be allowed.
Two dimensions
Each limit applies to one of two dimensions:
- Requests per minute / hour / day: protects against runaway loops.
- Tokens per minute / hour / day: protects against expensive-but-infrequent calls (e.g. one agent submitting 100k-token contexts).
What happens when a limit is hit
When a request would exceed an active limit, the proxy returns a structured 429 Too Many Requests error with:
- The scope that was hit (
per_agent,per_user,per_tenant). - The dimension (
requestsortokens). - The window (
minute,hour,day). - The retry-after time in seconds.
The trace is recorded with status DENIED and a reason of rate_limit_exceeded. This is distinct from a policy DENY: the dashboard separates the two so you can spot quota exhaustion vs governance rejections.
Authoring
Rate limits are managed in the console under each agent’s Limits tab (per-agent and per-user) or the tenant Settings page (per-tenant). Edits propagate within seconds; there is no proxy restart.
Burst handling
Limits use a sliding-window counter rather than a fixed bucket. This avoids the classic spike at the start of each minute. Bursts up to the configured ceiling are allowed; sustained traffic above it is rejected.
Observability
The console exposes a per-agent and per-user Quota chart that shows usage against the limit over the last 24 hours. Combined with the dashboard’s DENIED count, this is the fastest way to detect a runaway agent.
Where to go next
- Policies: for content-based blocking.
- DLP: for data-content protections.
- Console: agents: where per-agent limits are configured.