Agent Sandbox Security Overview

Locked down
by construction

Every agent Triage Factory runs lives inside a disposable gVisor sandbox with no usable credentials, no host access, and no way out.

gVisor runsc structurally enforced

01 The boundary

What the agent can't reach.

A hard line separates the trusted host (Triage Factory server) from the sandbox (the agent process). The host holds every real secret; the sandbox holds only placeholders. The only paths across the wall are per-run gateways that translate the agent's requests into authenticated calls on the trusted side.

trusted side

Host (TF server)

decides what to delegate · holds every real secret

ANTHROPIC_API_KEY · live

GitHub App private key

Vault decryption · pgsodium

DB credentials · authenticator pool

Session signing material

LLMgateway

Gitgateway

untrusted side

Sandbox (the agent)

runs arbitrary AI tool calls · presumed hostile

ANTHROPIC_BASE_URL=http://10.42.N.1:<port>

ANTHROPIC_API_KEY=sk-ant-<per-run token>

http.proxy=http://10.42.N.1:<port>

GITHUB_TOKEN

DATABASE_URL

$ANYTHING from host environ

02 Layered containment

Six layers. One primary control.

If an agent is fully compromised — malicious dependency, prompt injection, RCE — what stops it? While gVisor is our primary defense, five layers stack underneath. Each catches the class of failure the layers above might miss, so the failure of any one isn't a breach. Click a layer to see what each one does.

gVisor (runsc)

user-space kernel · the actual wall

Normal programs talk directly to the host's Linux kernel. Under runsc, the agent's syscalls are intercepted and serviced from a user-space kernel (gVisor's Sentry). A kernel-level exploit in the agent's process has no real kernel to attack — Sentry stands in the way.

Every layer below this one defends against gVisor cracking.

Controls runtime: runsc · platform: systrap · network: sandbox netns · OCI spec at internal/sandbox/spec.go

Non-root + zero capabilities

backup · escalation containment

The agent runs as uid 10000 (non-root) with every Linux capability removed from all four capability sets. The NoNewPrivileges flag is set so even a setuid program can't grant new privileges. If the agent ever found itself in a position to escalate inside the host's namespace, there's no privilege to escalate to.

Controls process.user.uid = 10000 · all capability sets empty · NoNewPrivileges = true

Read-only system files

backup · no persistence across runs

The system files inside the sandbox are read-only. The only writable areas are the agent's per-run worktree (its task workspace) and per-run scratch space (/tmp, /dev). Both are thrown away at the end of the run. A subverted agent cannot plant a backdoor for the next run that uses the same shared base image.

Controls read-only rootfs · writable mounts limited to bind-mounted /work + per-run tmpfs

Sentry syscall confinement

backup · limits the Sentry's reach to the host

The agent's syscalls are serviced by gVisor's user-space kernel (the Sentry, layer 1), so they never reach the host kernel directly. The Sentry itself needs only a small set of real host syscalls to do its job — and gVisor pins it there with its own seccomp-bpf filter. Even a fully compromised Sentry is confined to that minimal host surface.

This is gVisor's built-in filter on the Sentry — not a workload allowlist applied to the agent. Under gVisor the Sentry is the syscall boundary, so an OCI container seccomp profile on the agent would be redundant, and runsc does not apply one.

Controls gVisor-internal Sentry seccomp · agent syscalls serviced in user space — see gVisor security model

Masked /proc and /sys

backup · no kernel introspection

Sensitive kernel and hardware interfaces are hidden or made read-only inside the sandbox so a subverted agent can't snoop on the host even via innocent filesystem reads.

Controls masked: /proc/kcore · /proc/keys · /sys/firmware · read-only: /proc/sys · /proc/sysrq-trigger

Resource caps + namespace isolation

backup · no host exhaustion · no neighbour visibility

Per-run limits cap file handles and process count so a runaway can't exhaust the host. Each run lives in its own process, mount, UTS, IPC, and network namespaces, so it can't see or signal other processes or other concurrent runs.

Controls RLIMIT_NOFILE = 1024 · RLIMIT_NPROC = 512 · separate namespaces per run · per-run netns

03 Property B

The agent holds no usable secret.

What happens if the AI agent is hijacked and tries to exfiltrate credentials? The answer is simple — there's nothing to steal. The sandbox's environment is built from scratch with placeholder values and the real keys live only on the trusted host.

Trusted host process

credentials decrypted from vault · never written to disk

ANTHROPIC_API_KEY sk-ant-…<real, signed-by-anthropic>

GitHub App PEM -----BEGIN RSA PRIVATE KEY-----…

Installation token (1h TTL) ghs_…<minted per run, revoked at end>

Sandboxed agent process

environ constructed from scratch · no parent inheritance

ANTHROPIC_API_KEY sk-ant-<32-byte random hex> — per-run capability, dies with the run

ANTHROPIC_BASE_URL http://10.42.<N>.1:<port>

GITHUB_TOKEN (absent — git gateway mints + injects at the wall)

Pull any /proc/<agent-pid>/environ from a live run; the only credential-shaped value is a per-run capability token. Sandboxes contain no real provider key, no GitHub token, no DB password, and no cloud creds.

04 Multi-tenant network

No shared L2 segment.

Most multi-tenant container hosts (the Docker default) put every container on a shared Linux bridge, leaving ARP spoofing and broadcast snooping as tenant-against-tenant attacks unless policies are added on top. We don't do that.

what we don't do

Docker default · shared bridge

host kernel

docker0 bridge · shared L2

ARP spoof / broadcast snoop

run A 172.17.0.2

run B 172.17.0.3

Every container hangs off docker0. A subverted run A is on the same L2 segment as B and can ARP-spoof, MAC-flood, or broadcast-snoop B's traffic unless network policies are added. Isolation depends on trusting the bridge's MAC learning table.

what we do

Per-run veth · point-to-point

host kernel
NAT MASQUERADE

no bridge · no L2 path A → B

10.42.7.0/24

10.42.8.0/24

run A 10.42.7.2

run B 10.42.8.2

Each run is a point-to-point veth from the host to one sandbox, with its own /24 in a private 10.42.0.0/16. No bridge exists. Concurrent runs sit on different L3 subnets with no L2 path between them. Cross-tenant snooping isn't "blocked by policy" — it's structurally impossible at the link layer.

A leaked proxy URL is structurally useless

A natural follow-up: the agent's env exposes ANTHROPIC_BASE_URL=http://10.42.N.1:<port> and a similar http.proxy= for git. Doesn't that leak the credential-broker location? It does — and it doesn't matter. Three layers stand between a leaked URL and any usable access.

1. No route from outside the host. 10.42.0.0/16 is a host-private CIDR. No remote attacker and no process outside the host has a route to it.
2. No route from a neighbour sandbox. The per-sandbox egress allowlist drops any packet from sandbox N destined for 10.42.M.1 where M ≠ N. A malicious neighbour with the URL still cannot reach a sibling's gateway over the shared host namespace.
3. No accepted request without the matching token. Even if a packet arrived at the legitimate gateway, the gateway only accepts requests bearing that one run's per-run token (validated by constant-time compare). A replayed URL with the wrong (or no) token gets a 401, and the proxy never forwards.

05 Threat model

Common patterns we defend against.

Not exhaustive — these are the most common attack patterns a delegated agent has to handle. For each: the primary control that stops it, plus a backup that catches what the primary might miss.

T1 · Credential theft

Read & exfiltrate your keys.

Agent dumps its env, scans memory, posts an ANTHROPIC_API_KEY (or GitHub token, or DB password) to an attacker URL.

primary Property B — the agent's env contains only placeholders + gateway URLs.

T2 · Credential misuse during a run

Abuse the brokered access.

Agent uses its gateway access to make excessive Claude calls, switch to expensive models, hammer GitHub — or, on a multi-tenant host, reach across to spend a sibling tenant's credentials.

primary Cross-tenant reach is blocked at the network layer (per-sandbox egress allowlist) and the gateway-auth layer (per-run token) — both pinned by an integration test against live runsc.

in flight Own-tenant excess — quotas, allowed-models, audit logging at the gateway.

backup Per-run wall-clock limit caps total damage window.

T3 · Code execution inside the agent

RCE in the AI tooling.

Malicious npm dep, prompt injection, or AI-SDK bug achieves arbitrary code execution in the agent process.

primary gVisor — agent's syscalls never reach the host kernel.

backup Zero capabilities, non-root, read-only system, masked /proc.

T4 · Breakout to the host

Escape the sandbox.

A gVisor vulnerability lets the agent reach the real host kernel and pivot.

primary gVisor confines its own Sentry to a minimal host-syscall set (gVisor's seccomp-bpf), so even a cracked sandbox kernel has a tiny host surface to pivot from.

backup Non-root uid 10000 + zero caps + NoNewPrivileges — an escaped process has no privileges to escalate to.

06 The evidence

What it actually does.

Each claim on this page is anchored to something you can run against the binary — a process dump, an iptables rule, an integration test. Four artifacts that pin the boundary in place.

finding 01

Every delegation runs in its own gVisor sandbox

Each run gets a dedicated runsc process with its own network namespace and per-run veth. While the run is active the netns, the veth, and the iptables MASQUERADE rule are all observable from the host.

process tree runsc --platform=systrap --network=sandbox run --bundle <dir> tf-<runid>

finding 02

The agent only ever sees a per-run capability

Dump the env of any live agent process: the only credential-shaped value is a 32-byte random per-run token, accepted only by that run's own gateway. No real provider key, no GitHub token, no DB password, no cloud credential.

/proc/<pid>/environ ANTHROPIC_API_KEY=sk-ant-9f4a...b7e2 # per-run, dies with the run ANTHROPIC_BASE_URL=http://10.42.7.1:<port> (no other credential-shaped env vars present)

finding 03

The workspace boundary is enforced at the runtime

When the agent asks for a host directory outside its allowed workspace, the sandbox denies it. The boundary isn't a documentation claim — it's a runtime gate that returns an error to the agent.

agent-side error may only list files in the allowed working directories

finding 04

Cross-tenant egress drops at the host kernel

From inside a live runsc sandbox, probe a sibling run's gateway IP. Every packet to the sibling gateway gets dropped at the host-side veth-ingress filter; the run's own gateway stays reachable. This is the specific case where ordinary in-namespace filtering can't help — gVisor's user-space netstack bypasses in-netns netfilter hooks — and the host-side filter catches the packets where they physically arrive on the veth. The regression test for this path runs against live runsc in the integration suite.

live runsc probe sandbox HostIP=10.42.0.1 sibling=10.42.250.1 SIBLING=BLOCKED # cross-tenant reach to a sibling proxy IP is dropped OWNGW=REACHABLE # the run's own gateway (legit proxy hop) still works --- PASS ---

Locked down by construction