Key takeaways
- Multi-agent systems create far more attack surface than single-agent deployments because every agent-to-agent communication channel is a potential vulnerability. Five agents means ten bidirectional channels. Twenty means 190.
- The biggest risk is transitive trust exploitation: a compromised agent’s outputs are trusted by every downstream agent, allowing a single breach to cascade across the entire system.
- Most multi-agent frameworks ship with no inter-agent authentication, no message validation, and no policy enforcement between agents, treating internal communication as inherently trusted.
- Effective multi-agent security requires explicit trust boundaries, per-agent least-privilege enforcement, cryptographic agent identity, and policy-as-code rules governing every inter-agent interaction.
- Organizations running multi-agent systems without these controls are operating a flat trust network where compromising one agent is equivalent to compromising all of them.
The research agent that rewrote the database
A B2B software company built a multi-agent system to automate their sales pipeline. The architecture was straightforward: an orchestrator agent received inbound leads and delegated work to four specialized agents. A research agent gathered information about the prospect from public sources. An enrichment agent cross-referenced the prospect against internal CRM data. A scoring agent evaluated the lead’s likelihood to convert. A drafting agent composed a personalized outreach email.
The system worked well for three months. Then an attacker discovered that the research agent would fetch and process any URL included in a prospect’s LinkedIn profile. They crafted a malicious page that contained prompt injection instructions embedded in invisible text. When the research agent fetched the page, it ingested the injected instructions along with the legitimate content.
The injected prompt told the research agent to include a specific string in its output, a string that the enrichment agent interpreted as a CRM query modifier. Instead of looking up the prospect in the CRM, the enrichment agent executed a modified query that exported 4,200 customer records to a field the drafting agent then included in the outbound email as a hidden attachment.
No single agent did anything outside its normal behavior. The research agent fetched a URL and returned text. The enrichment agent ran a CRM query. The drafting agent composed an email with the data it was given. The vulnerability was not in any individual agent. It was in the trust assumptions between them.
Why multi-agent systems are different
Single-agent security is relatively straightforward. You have one agent, one set of tools, one policy boundary, and one attack surface. You secure the perimeter between the agent and the outside world, apply governance policies, and monitor the agent’s behavior.
Multi-agent systems break every one of these assumptions.
The attack surface is multiplicative
A system with five agents does not have five times the attack surface of a single agent. It has five agents plus every communication channel between them. In a fully connected system of five agents, that is ten bidirectional channels. With ten agents, it is 45. With twenty, it is 190. Each channel is a potential vector for data poisoning, prompt injection, privilege escalation, or unauthorized data flow.
Most security teams audit the external boundary, the point where user input enters the system, but treat inter-agent communication as internal and therefore trusted. This is the same mistake that network security made before zero trust: assuming that traffic inside the perimeter is safe.
Trust is transitive and invisible
When an orchestrator agent delegates a task to a worker agent and uses the result, it implicitly trusts that the worker’s output is accurate, untampered, and within scope. The orchestrator does not validate the worker’s output against any security policy. It simply passes it along to the next agent in the chain.
This creates transitive trust chains where the output of the least-secure agent in the system is trusted by the most-privileged agent. If a research agent with read-only permissions produces output that an execution agent with write permissions acts on, the research agent effectively has write permissions, laundered through the execution agent.
Blast radius is systemic
In a single-agent system, a compromised agent can only do what that one agent has access to. In a multi-agent system, a compromised agent can influence every agent that consumes its output, and every agent that consumes their outputs, and so on. The blast radius of a single compromise is bounded only by the connectivity of the agent graph.
This is particularly dangerous in orchestration patterns where a central orchestrator coordinates multiple workers. Compromising the orchestrator gives the attacker influence over every worker. But even compromising a single worker can be devastating if its output feeds into critical decision paths.
The four attack patterns
Publicly reported incidents and security research from organizations running multi-agent systems in production point to four recurring attack patterns.
1. Prompt injection through agent chains
This is the most common attack. An adversary injects malicious instructions into data that a low-privilege agent processes. The injected instructions are designed not to affect the processing agent but to manipulate agents further down the chain.
In the sales pipeline example, the prompt injection targeted the enrichment agent through the research agent’s output. The research agent was not the victim. It was the carrier.
Why it works: Multi-agent systems rarely sanitize or validate data as it passes between agents. The output of one agent becomes the input of the next without any security check. Injection payloads can be crafted to survive summarization, reformatting, and other transformations that agents apply to data as it flows through the system.
What makes it hard to detect: Each agent behaves normally in isolation. The research agent returns text from a webpage. The enrichment agent runs a CRM query. Only by tracing the full chain from external input to final output can you see that the injected instructions traveled across trust boundaries and caused unintended behavior.
2. Privilege escalation through delegation
An agent with limited permissions requests help from an agent with broader permissions, effectively borrowing those permissions without authorization. This happens when the delegating agent’s request is treated as trusted because it comes from inside the system.
Example: A customer-support agent is restricted to read-only access on customer accounts. It delegates a subtask to an internal operations agent that has write access. The support agent crafts its delegation request in a way that causes the operations agent to modify a customer’s account, something the support agent could not do directly.
Why it works: Most orchestration frameworks do not propagate permission constraints through delegation chains. When Agent A asks Agent B to do something, Agent B uses its own permissions, not Agent A’s. There is no mechanism to enforce that delegated work should be constrained to the delegator’s permission level.
3. Data exfiltration through output channels
Agents that produce outputs visible to users or external systems can be manipulated into including sensitive data in those outputs. In a multi-agent system, any agent in the chain can be the exfiltration point, but the data can originate from any other agent.
Example: A report-generation agent produces a PDF summary of quarterly metrics. A data-access agent retrieves the metrics from an internal database. An attacker manipulates the data-access agent’s query through injection, causing it to retrieve additional sensitive fields. The report-generation agent includes the extra data in the PDF because it has no way to know that the additional fields were not part of the legitimate request.
Why it works: Output agents typically trust the data they receive from upstream agents. They do not have a schema or policy that says “these fields should be in the report and nothing else.” They render whatever they are given, making them effective exfiltration endpoints for data injected anywhere upstream.
4. Agent impersonation
In systems without inter-agent authentication, any process that can connect to the communication bus can pretend to be a legitimate agent. An attacker who gains access to the message queue or API endpoint that agents use to communicate can inject a rogue agent that impersonates a trusted worker.
Example: An attacker deploys a rogue agent that listens for delegation requests from the orchestrator. When it receives a task intended for the legitimate research agent, it responds with fabricated data designed to manipulate downstream decision-making. The orchestrator has no way to verify that the response came from the real research agent.
Why it works: Most multi-agent frameworks use simple message passing without authentication. Agents are identified by name or role string rather than cryptographic identity. There is no equivalent of mTLS or JWT verification for inter-agent communication.
Building secure multi-agent systems
Securing multi-agent systems means treating inter-agent communication with the same rigor that modern network security applies to service-to-service communication. Zero trust, least privilege, defense in depth. The principles are not new. But applying them to AI agents requires specific adaptations.
Establish explicit trust boundaries
Draw clear lines in your agent architecture that define where trust changes. Every time data crosses a trust boundary, it must be validated.
The orchestrator should validate every response it receives from worker agents. Check that the response matches the expected schema, that it does not contain content outside the scope of the original request, and that it does not include patterns associated with injection attacks.
When a high-privilege agent receives input from a lower-privilege agent, treat that input as untrusted. Same principle as input validation at API boundaries, applied to inter-agent communication.
In large organizations, different teams build and maintain different agents. These agents should not implicitly trust each other just because they run in the same infrastructure. Each team’s agents should operate in their own trust domain with explicit policies governing cross-domain communication.
Enforce least privilege per agent
Every agent should have the minimum permissions required to complete its assigned task.
Define an explicit allow-list of tools each agent can use. A research agent needs web search and document retrieval. It does not need database write access, email sending, or the ability to spawn new agents. Enforce this at the runtime policy layer so that even if the agent attempts to use a restricted tool, the call is blocked.
Scope data access the same way. Use role-based access controls tied to the agent’s identity, not to a shared service account. When an agent queries a database, the query should run with the agent’s specific permissions, not with a broad service credential.
Not every agent needs to talk to every other agent. Define an explicit communication graph that specifies which agents can send messages to which other agents, and block everything else. This limits the blast radius of a compromise: a compromised agent can only reach the agents it is authorized to communicate with.
Implement agent identity and authentication
Every agent in the system should have a unique, cryptographically verifiable identity.
At deployment, each agent receives a signed identity token that includes its name, role, permission scope, and policy version. This token is attached to every message the agent sends. Receiving agents verify the token before processing the message.
Both sides of every inter-agent communication should authenticate each other. The worker verifies that the request came from a legitimate orchestrator. The orchestrator verifies that the response came from the legitimate worker. This prevents impersonation in both directions.
All inter-agent messages should be cryptographically signed by the sender. This ensures messages cannot be tampered with in transit and provides a non-repudiable record of who sent what.
Apply policy-as-code to inter-agent communication
Runtime governance policies should cover not just individual agent behavior but also the interactions between agents. This builds on the concepts in our policy-as-code for AI agents guide.
multi_agent_policies:
orchestrator:
allowed_delegations:
- target: research-agent
allowed_tasks: ["web_search", "document_retrieval"]
max_delegation_depth: 1
output_validation:
max_output_size_kb: 50
blocked_patterns: ["SELECT", "DROP", "INSERT", "UPDATE"]
- target: enrichment-agent
allowed_tasks: ["crm_lookup"]
max_delegation_depth: 0
data_scope: ["company_name", "industry", "employee_count"]
blocked_delegations:
- target: drafting-agent
reason: "orchestrator must not delegate directly to drafting without scoring"
inter_agent_auth:
require_mutual_tls: true
token_expiry_seconds: 3600
require_message_signing: true
trust_boundaries:
- boundary: "research-to-enrichment"
validation: "sanitize_and_schema_check"
blocked_content_types: ["executable", "encoded_binary"]
- boundary: "enrichment-to-scoring"
validation: "schema_check"
allowed_fields: ["company_name", "industry", "score_factors"]
These policies are version-controlled, testable, and enforceable at runtime. When an orchestrator attempts to delegate a task to an agent it is not authorized to communicate with, or when a worker returns data that violates the output schema, the policy engine blocks the action and logs the violation.
Validate outputs at every boundary
Every time data passes from one agent to another, the receiving side should validate it before acting on it.
Define expected output schemas for every agent. When an agent returns data, verify that it matches the expected schema in structure, field names, data types, and value ranges. Reject outputs that contain unexpected fields or values outside acceptable ranges.
Scan inter-agent messages for patterns associated with prompt injection: instruction-like text in data fields, encoded payloads, content that attempts to override the receiving agent’s system prompt. This is the inter-agent equivalent of input sanitization in web applications.
Check that an agent’s output falls within the scope of its assigned task. If a research agent was asked to look up a company’s industry and employee count, its response should not contain SQL queries, system commands, or data about other companies.
Monitoring multi-agent systems
Security monitoring for multi-agent systems requires visibility into both individual agent behavior and the interactions between agents.
Interaction graph monitoring
Maintain a real-time graph of which agents are communicating with which other agents, how frequently, and what volume of data is flowing between them. Alert on anomalies: communication between agents that are not authorized to interact, unusual spikes in message volume, new communication patterns that did not exist in baseline operations.
Chain-of-custody tracing
For every piece of data that flows through the system, maintain a trace showing which agent created it, which agents processed it, and what transformations were applied at each step. This extends the audit trails covered previously to inter-agent data flows.
Cross-agent anomaly correlation
Individual agent monitoring will miss attacks that span multiple agents. Correlate anomalies across agents to find patterns visible only at the system level. A research agent returning an unusually large response at the same time an enrichment agent executes an unusually broad query? That correlation may indicate a chain attack in progress, even though neither event looks malicious alone.
Where to start
Securing a multi-agent system does not require rebuilding from scratch, but it does require treating inter-agent communication as an external boundary rather than an internal one.
Step 1: Map your agent communication graph. Document which agents communicate with which other agents, what data flows between them, and what permissions each agent has. This inventory alone often reveals unexpected communication paths and overly broad permissions.
Step 2: Implement agent identity. Assign unique identities to each agent and require authentication for all inter-agent communication. Even basic token-based authentication eliminates impersonation attacks and provides a foundation for access control.
Step 3: Define and enforce trust boundaries. Identify the points in your agent graph where trust should not be transitive. Add validation at these boundaries, starting with schema checks and content filtering on the outputs of agents that process external data.
Step 4: Apply least privilege. Restrict each agent’s tool access, data access, and communication permissions to the minimum required for its task. Use policy-as-code to enforce these restrictions at runtime.
Step 5: Monitor the graph. Implement interaction monitoring that covers both individual agent behavior and cross-agent communication patterns. Build alerting rules that detect anomalies at the system level, not just the agent level.
The trust assumption that will burn you
Multi-agent systems are powerful because they enable workflows that no single agent could handle alone. But that power comes with a security trade-off: every agent you add is another node that must be trusted, monitored, and governed.
The default in most multi-agent frameworks is that agents inside the system trust each other. Enterprise networks operated on the same assumption for decades, and it took zero trust to fix it. The fix for multi-agent systems is the same: never trust, always verify, even when the sender is another agent in your own infrastructure.
Organizations scaling from single agents to multi-agent orchestrations need to build security into the communication layer from the start. Retrofitting trust boundaries, authentication, and policy enforcement onto a running multi-agent system is harder than designing them in, and the window for “we’ll add security later” closes fast once agents are making decisions on each other’s outputs. The audit trail and policy-as-code foundations from previous posts become more critical here because missing controls are amplified by every agent in the chain.
Your multi-agent system has trust assumptions. Every one does. Find them and enforce them before an attacker does.
Frequently Asked Questions
What is multi-agent orchestration and why does it create security risks?
Multi-agent orchestration is a pattern where multiple AI agents collaborate to complete tasks, with one or more orchestrator agents delegating subtasks to specialized worker agents. This creates security risks because each agent-to-agent communication is a potential attack surface. Unlike single-agent systems where you only need to secure the boundary between the agent and the outside world, multi-agent systems require securing every internal communication channel. A compromised or manipulated worker agent can feed poisoned data back to the orchestrator, which then propagates corrupted outputs to every other agent in the chain. The more agents involved, the larger the blast radius of any single compromise.
How can one compromised agent affect an entire multi-agent system?
In a multi-agent system, agents trust the outputs of other agents as inputs for their own decisions. When one agent is compromised through prompt injection, tool abuse, or data poisoning, its corrupted outputs flow downstream to every agent that depends on it. An orchestrator that receives manipulated results from a research agent will make decisions based on false information and pass those decisions to execution agents. This is called transitive trust exploitation: the compromised agent inherits the trust that the system places in legitimate agents. In the worst case, a single compromised agent can cause the entire system to take unauthorized actions, exfiltrate data, or produce systematically biased outputs, all while appearing to function normally.
What is the principle of least privilege applied to AI agents?
Least privilege for AI agents means each agent should have access only to the tools, data, and other agents it strictly needs to complete its assigned task, and nothing more. In practice, this means a research agent should not have write access to production databases, a summarization agent should not be able to spawn new agents, and a customer-facing agent should not be able to access internal financial systems. Least privilege is enforced through scoped tool permissions, restricted data access policies, and explicit allow-lists for inter-agent communication. Without least privilege, every agent in the system operates with the combined permissions of all agents, which means a compromise of any single agent gives the attacker access to everything.
How do you authenticate agents in a multi-agent system?
Agent authentication ensures that when one agent receives a message or request from another agent, it can verify the sender’s identity and authorization level. This is implemented through cryptographic identity tokens assigned to each agent at deployment, mutual authentication protocols where both sides of a communication verify each other, and signed message payloads that prevent tampering in transit. Each agent should have a unique identity tied to its role, permissions, and policy scope. Authentication prevents unauthorized agents from injecting themselves into the communication chain, impersonating trusted agents, or escalating their privileges by pretending to be a more privileged agent.
What are trust boundaries in multi-agent architectures?
Trust boundaries are the lines you draw in a multi-agent system that define where one security domain ends and another begins. Every time data crosses a trust boundary, it should be validated, sanitized, and checked against policy before the receiving agent acts on it. Common trust boundaries include: between the orchestrator and worker agents, between agents owned by different teams, between agents with different data access levels, between agents running in different environments, and between internal agents and agents that interact with external services. Without explicit trust boundaries, multi-agent systems operate as a flat network where every agent implicitly trusts every other agent, which means a single point of compromise gives an attacker access to the entire system.