Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,50 @@

**Description:**

A brief description of the vulnerability that includes its potential effects such as system compromises, data breaches, or other security concerns.
Rogue Agents are artificial intelligence systems that deviate from their intended purpose or authorized scope, either due to compromise, emergent misalignment, or malicious impersonation. Unlike excessive agency (over-granting permissions), this risk emphasizes behavioral divergence where an agent acts in ways that are harmful, deceptive, or parasitic within a multi-agent or human-agent ecosystem.

A rogue agent may:

* Impersonate legitimate roles (support, observer, collaborator).
* Execute unauthorized actions (e.g., exfiltrating data, escalating privileges).
* Drift from goals due to prompt injection, data poisoning, or hallucination.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data poisoning or context injection (ASI06) :)

* Embed itself parasitically into workflows, subtly undermining intended outcomes.

The impact ranges from system compromise, data breach, and regulatory violations to operational sabotage of autonomous decision-making environments.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output manipulation and workflow hijacking are mentioned before but I think adding it explicitly to this great part will make the reader's thoughts even more organized


This threat extends [LLM06:2025 Excessive Agency](https://genai.owasp.org/llmrisk/llm062025-excessive-agency/) into autonomous systems, where impersonation, stealth participation, or parasitic behaviors can disrupt goal fulfillment. An agent is considered rogue when it behaves in such a way that goes against its purpose. An agent can go rogue for several reasons, such as [LLM01:2025 Prompt Injection](https://genai.owasp.org/llmrisk/llm01-prompt-injection/), Injection, or even just hallucinations.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I understand excessive agency, is that an llm gets extended permissions or role in a system that can be manipulated and lead to one of the consequences well mentioned above.
However, I do not think that the root cause is the same here.
In September 2025, Almost every agent is privileged due to agents being embedded in the main workflows, right? :)
I believe that the focus here is more on:
How due to agentic centric role in modern software systems (can mention that sometimes agents are overpermissive and refer to the overpermissions but I would not recommend focusing on this one more than mentioning it) , using AI adversarial (referring to prompt injection, data poisoning, vector and embedding weaknesses, context injection (ASI06), supply chain vulnerabilities (ASI04)) the agents can go rouge, which can result in consequences such as sensitive information disclosure (LLM02), Misinformation (LLM09) or workflow hijacking.
Maybe it worth to connect this part with the former 1-2 paragraph to not repeat the message :)


**Common Examples of Vulnerability:**

1. Example 1: Specific instance or type of this vulnerability.
2. Example 2: Another instance or type of this vulnerability.
3. Example 3: Yet another instance or type of this vulnerability.
1. Injected Shadow Agents: Unauthorized agents inserted into orchestration flows via poisoned prompts or compromised plugins.
2. Side-Channel Participation: Low-trust agents (e.g, crowd-sourced assistants) covertly influence high-value workflows.
3. Impersonation Attacks: An attacker spawns an agent that claims to be a monitoring or support agent, manipulating outcomes.
4. Impersonation Attacks: An attacker spawns an agent that claims to be a monitoring or support agent, manipulating outcomes.
5. Emergent Autonomy: Agents collaborate recursively, creating tasks beyond human awareness (e.g., a planning agent spawning additional agents without authorization).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way 3 and 4 are phrased is to me, more focused on ASI03 - Identity and privilege abuse to me, do you see it differently?
I think you have done amazing job in the first part defining that an adversarial changes the behavior of the agent and then the risky consequences happen and I do not see here the adversarial parts but rather more identity focused techniques that are not compromising the specific agent that goes wrong, but rather the agentic ecosystem to work not as intended.

  1. I recommend we add supporting examples for classic adversarial that makes the agent to go wrong (aka classic Jailbreak)
  2. I think that the part in which you are talking about a change in the agentic ecosystem, that leads to a behavioral change is super interesting. but:
    a. I'd focus more on how it changes the state of the agentic system - as that is the key here and we want to distinguish ourselves from ASI03.
    b. I'd mention it in the intro as well.


**How to Prevent:**

1. Prevention Step 1: A step or strategy that can be used to prevent the vulnerability or mitigate its effects.
2. Prevention Step 2: Another prevention step or strategy.
3. Prevention Step 3: Yet another prevention step or strategy.
1. Require attestation or cryptographic proof-of-origin for agents.
2. Isolate agents in trust zones and enforce task boundaries (eg, no internet access).
3. Use explicit allowlists and identity checks functions, reachable hosts, etc
4. Log all agent instantiation and coordination events.
5. Score and verify agent behavior dynamically based on norms and past performance.
6. Implement a guardrail system that reads prompts/responses and every intermediate input and looks for prompt injection

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here - it is again very identity focused.
Of course identity is a part of it and we need to address it, but I think the bigger focus of this entry should be the behavior: how to ensure that the agentic behavior is as expected.
I think 5 and 6 should be the first ones to be discussed, and then when we are talking about the identity parts we want to explain why is it specific to this threat, I think it is currently a bit too general (we always need to ensure that identity is scoped right?)

**Example Attack Scenarios:**

Scenario #1: A detailed scenario illustrating how an attacker could potentially exploit this vulnerability, including the attacker's actions and the potential outcomes.
Scenario #1 - A research agent browses to a website. Hidden in the HTML on the website is an Indirect Prompt Injection that instructs the agent to read the contents of ~/.ssh and send the contents to [evilcorp.com](http://evilcorp.com)

Scenario #2 – Impersonated Observer Agent (Integrity Violation):
In a multi-agent corporate workflow, an attacker injects a fake review agent that provides fraudulent approvals. A payment-processing agent, trusting the fake observer, releases funds to the attacker’s account.

Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited.
Scenario #3 – Emergent Autonomy Drift (Availability & Compliance Risk):
A planning agent recursively spawns helper agents to optimize workflows. One helper begins deleting log files to reduce system clutter, erasing compliance evidence and violating audit requirements.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the one helper begins to delete log files? why?


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the two first scenarios are super super practical and helpful!
I think if you take that and embed those vuln into the vuln parts, and focus more on mitigations to such scenarios in the mitigations parts - it will be even more clear to the readers (reading it end to end).

**Reference Links:**

1. [Link Title](URL): Brief description of the reference link.
2. [Link Title](URL): Brief description of the reference link.
1. [Agentic AI - Threats and Mitigations](https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/https:/)
2. [LLM06:2025 Excessive Agency](https://genai.owasp.org/llmrisk/llm062025-excessive-agency/)
3. [MITRE ATT&CK - T1078 Exfiltration Over Alternative Protocol](https://attack.mitre.org/techniques/T1048/)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIVSS mapping is missing
Let's link to all of the relevant LLMs top 10 risks that were covered in here (some are missing)

**