Locked down
by construction
Every agent Triage Factory runs lives inside a disposable gVisor sandbox with no usable credentials, no host access, and no way out.
What the agent can't reach.
A hard line separates the trusted host (Triage Factory server) from the sandbox (the agent process). The host holds every real secret; the sandbox holds only placeholders. The only paths across the wall are per-run gateways that translate the agent's requests into authenticated calls on the trusted side.
Host (TF server)
Sandbox (the agent)
Six layers. One primary control.
If an agent is fully compromised — malicious dependency, prompt injection, RCE — what stops it? While gVisor is our primary defense, five layers stack underneath. Each catches the class of failure the layers above might miss, so the failure of any one isn't a breach. Click a layer to see what each one does.
gVisor (runsc)
user-space kernel · the actual wallNormal programs talk directly to the host's Linux kernel. Under runsc, the agent's syscalls are intercepted and serviced from a user-space kernel (gVisor's Sentry). A kernel-level exploit in the agent's process has no real kernel to attack — Sentry stands in the way.
Every layer below this one defends against gVisor cracking.
internal/sandbox/spec.go Non-root + zero capabilities
backup · escalation containmentThe agent runs as uid 10000 (non-root) with every Linux capability removed from all four capability sets. The NoNewPrivileges flag is set so even a setuid program can't grant new privileges. If the agent ever found itself in a position to escalate inside the host's namespace, there's no privilege to escalate to.
Read-only system files
backup · no persistence across runsThe system files inside the sandbox are read-only. The only writable areas are the agent's per-run worktree (its task workspace) and per-run scratch space (/tmp, /dev). Both are thrown away at the end of the run. A subverted agent cannot plant a backdoor for the next run that uses the same shared base image.
/work + per-run tmpfs Seccomp — catastrophic-syscall backstop
backup · activates only if gVisor escapesUnder normal operation the agent's syscalls never reach the host kernel — gVisor handles them in user space (layer 1). A seccomp filter is installed as a defense-in-depth backstop: if a gVisor vulnerability ever let a syscall slip through to the host, the filter denies the catastrophic ones outright — mount, kexec_load, init_module, reboot, swapon.
The filter is a permissive baseline (Docker's default profile minus the dangerous syscalls — ~346 allowed, ~10 explicitly denied). Its job is catching catastrophic syscalls if gVisor cracks, not enforcing a tight allowlist.
internal/sandbox/syscalls.go Masked /proc and /sys
backup · no kernel introspectionSensitive kernel and hardware interfaces are hidden or made read-only inside the sandbox so a subverted agent can't snoop on the host even via innocent filesystem reads.
/proc/kcore · /proc/keys · /sys/firmware · read-only: /proc/sys · /proc/sysrq-trigger Resource caps + namespace isolation
backup · no host exhaustion · no neighbour visibilityPer-run limits cap file handles and process count so a runaway can't exhaust the host. Each run lives in its own process, mount, UTS, IPC, and network namespaces, so it can't see or signal other processes or other concurrent runs.
The agent holds no usable secret.
What happens if the AI agent is hijacked and tries to exfiltrate credentials? The answer is simple — there's nothing to steal. The sandbox's environment is built from scratch with placeholder values and the real keys live only on the trusted host.
Trusted host process
Sandboxed agent process
/proc/<agent-pid>/environ from a live run; the only credential-shaped value is a per-run capability token. Sandboxes contain no real provider key, no GitHub token, no DB password, and no cloud creds.
No shared L2 segment.
Most multi-tenant container hosts (the Docker default) put every container on a shared Linux bridge, leaving ARP spoofing and broadcast snooping as tenant-against-tenant attacks unless policies are added on top. We don't do that.
Docker default · shared bridge
Every container hangs off docker0. A subverted run A is on the same L2 segment as B and can ARP-spoof, MAC-flood, or broadcast-snoop B's traffic unless network policies are added. Isolation depends on trusting the bridge's MAC learning table.
Per-run veth · point-to-point
NAT MASQUERADE
Each run is a point-to-point veth from the host to one sandbox, with its own /24 in a private 10.42.0.0/16. No bridge exists. Concurrent runs sit on different L3 subnets with no L2 path between them. Cross-tenant snooping isn't "blocked by policy" — it's structurally impossible at the link layer.
A leaked proxy URL is structurally useless
A natural follow-up: the agent's env exposes ANTHROPIC_BASE_URL=http://10.42.N.1:<port> and a similar http.proxy= for git. Doesn't that leak the credential-broker location? It does — and it doesn't matter. Three layers stand between a leaked URL and any usable access.
- 1. No route from outside the host.
10.42.0.0/16is a host-private CIDR. No remote attacker and no process outside the host has a route to it. - 2. No route from a neighbour sandbox.
The per-sandbox egress allowlist drops any packet from sandbox N destined for
10.42.M.1where M ≠ N. A malicious neighbour with the URL still cannot reach a sibling's gateway over the shared host namespace. - 3. No accepted request without the matching token. Even if a packet arrived at the legitimate gateway, the gateway only accepts requests bearing that one run's per-run token (validated by constant-time compare). A replayed URL with the wrong (or no) token gets a 401, and the proxy never forwards.
Common patterns we defend against.
Not exhaustive — these are the most common attack patterns a delegated agent has to handle. For each: the primary control that stops it, plus a backup that catches what the primary might miss.
Read & exfiltrate your keys.
Agent dumps its env, scans memory, posts an ANTHROPIC_API_KEY (or GitHub token, or DB password) to an attacker URL.
Abuse the brokered access.
Agent uses its gateway access to make excessive Claude calls, switch to expensive models, hammer GitHub — or, on a multi-tenant host, reach across to spend a sibling tenant's credentials.
RCE in the AI tooling.
Malicious npm dep, prompt injection, or AI-SDK bug achieves arbitrary code execution in the agent process.
Escape the sandbox.
A gVisor vulnerability lets the agent reach the real host kernel and pivot.
What it actually does.
Each claim on this page is anchored to something you can run against the binary — a process dump, an iptables rule, an integration test. Four artifacts that pin the boundary in place.
Every delegation runs in its own gVisor sandbox
Each run gets a dedicated runsc process with its own network namespace and per-run veth. While the run is active the netns, the veth, and the iptables MASQUERADE rule are all observable from the host.
The agent only ever sees a per-run capability
Dump the env of any live agent process: the only credential-shaped value is a 32-byte random per-run token, accepted only by that run's own gateway. No real provider key, no GitHub token, no DB password, no cloud credential.
The workspace boundary is enforced at the runtime
When the agent asks for a host directory outside its allowed workspace, the sandbox denies it. The boundary isn't a documentation claim — it's a runtime gate that returns an error to the agent.
Cross-tenant egress drops at the host kernel
From inside a live runsc sandbox, probe a sibling run's gateway IP. Every packet to the sibling gateway gets dropped at the host-side veth-ingress filter; the run's own gateway stays reachable. This is the specific case where ordinary in-namespace filtering can't help — gVisor's user-space netstack bypasses in-netns netfilter hooks — and the host-side filter catches the packets where they physically arrive on the veth. The regression test for this path runs against live runsc in the integration suite.