The RenLayer proxy is a single binary (and a single container image) that you can deploy in one of three topologies. The right choice depends on how your agents are organized and how strict your network boundaries are.
Pattern 1: Sidecar
Each agent (or pod hosting an agent) runs its own proxy instance, typically on localhost:8080. The agent talks to the local proxy, and the proxy egresses to upstream providers.
Best when:
- You run agents in Kubernetes and already use a sidecar pattern.
- You want fault isolation, if one proxy hiccups, only one agent is affected.
- You need per-agent network policy (e.g. only this pod is allowed to reach
api.openai.com).
Trade-offs: more proxy instances to operate, slightly higher overall resource usage.
Pattern 2: Gateway
A small fleet of proxy instances sits behind a load balancer (or service mesh) and serves all agents in a region. Agents call the shared proxy URL.
Best when:
- You have many small agents and want to centralize the egress point.
- You want a single chokepoint for network ACLs to upstream providers.
- You want to share connection pools and TLS sessions to the upstream.
Trade-offs: a gateway-wide outage affects every agent.
Pattern 3: Standalone
A single proxy instance for a small environment, typical for development, demos, and single-node production deployments.
Best when:
- You are evaluating RenLayer on a laptop or test cluster.
- You run a small fleet of agents and don’t yet need horizontal scale.
Configuration
The proxy is configured entirely through environment variables. The most common ones:
RENLAYER_DATABASE_URL: Postgres connection string (shared with the Platform API).RENLAYER_API_URL: base URL of the Platform API.RENLAYER_LISTEN_ADDR: the bind address (e.g.0.0.0.0:8080).RENLAYER_DEFAULT_UPSTREAM: fallback upstream URL when the agent doesn’t override it.RENLAYER_LOG_LEVEL: one oferror,warn,info,debug,trace.
Per-agent upstreams (e.g. one agent goes to OpenAI, another to a private vLLM) are configured on the agent record in the console, no proxy restart required.
Health and readiness
The proxy exposes /healthz (liveness) and /readyz (readiness, verifies database connectivity). Use these for Kubernetes probes and load-balancer health checks.
Sizing
A typical proxy instance handles 2,000–4,000 requests per second on a modest 2-vCPU container with policy evaluation and pattern-based DLP enabled. Large language-model calls are bound by the upstream’s latency, not the proxy.
Where to go next
- How it works: request flow.
- Policies: what runs inline.
- Rate limits: protecting your upstream budget.