Docs

Proxy

Rate limits

Per-agent, per-user, and per-tenant rate limits in the RenLayer proxy, protecting upstream budgets and stopping runaway agents.

Rate limits in RenLayer protect three things at once: your upstream provider budget (you don’t pay for a runaway agent), your own systems (an agent in a loop can hammer an internal API), and your end users (one user shouldn’t consume the entire tenant quota).

Three scopes

Limits can be defined at any of three scopes:

  • Per-agent: caps on what a single agent can do, regardless of which user triggered it. Useful for capping a back-office automation.
  • Per-user: caps on the actions attributable to a single end user, identified by the X-RenLayer-User header. Useful when one agent serves many users.
  • Per-tenant: caps on aggregate usage across all agents and users in a tenant. Useful as a cost backstop.

Limits stack. A request must pass all applicable limits to be allowed.

Two dimensions

Each limit applies to one of two dimensions:

  • Requests per minute / hour / day: protects against runaway loops.
  • Tokens per minute / hour / day: protects against expensive-but-infrequent calls (e.g. one agent submitting 100k-token contexts).

What happens when a limit is hit

When a request would exceed an active limit, the proxy returns a structured 429 Too Many Requests error with:

  • The scope that was hit (per_agent, per_user, per_tenant).
  • The dimension (requests or tokens).
  • The window (minute, hour, day).
  • The retry-after time in seconds.

The trace is recorded with status DENIED and a reason of rate_limit_exceeded. This is distinct from a policy DENY: the dashboard separates the two so you can spot quota exhaustion vs governance rejections.

Authoring

Rate limits are managed in the console under each agent’s Limits tab (per-agent and per-user) or the tenant Settings page (per-tenant). Edits propagate within seconds; there is no proxy restart.

Burst handling

Limits use a sliding-window counter rather than a fixed bucket. This avoids the classic spike at the start of each minute. Bursts up to the configured ceiling are allowed; sustained traffic above it is rejected.

Observability

The console exposes a per-agent and per-user Quota chart that shows usage against the limit over the last 24 hours. Combined with the dashboard’s DENIED count, this is the fastest way to detect a runaway agent.

Where to go next

  • Policies: for content-based blocking.
  • DLP: for data-content protections.
  • Console: agents: where per-agent limits are configured.

Last updated: