agentskit.js
Security

Sandbox: deep dive

How @agentskit/sandbox executes untrusted code — backends, policies, audit trail, integration with the agent runtime.

@agentskit/sandbox is the primitive layer underneath the mandatory-sandbox policy. Where the policy is about which tools an agent can call, the sandbox is about where that code actually runs. This page covers the full surface.

#When you need it

ScenarioSandbox
Agent emits arbitrary JS / PythonRequired
Agent runs shell against user-supplied commandsRequired
Agent reads / writes files in user-controlled pathsStrongly recommended
Agent calls a fixed set of HTTP integrations (Slack, GitHub, etc.)Not required — those tools are already constrained

If your agent's tool set is "send Slack message" + "read three blog posts", you don't need a sandbox. If your agent generates code or shells out to anything user-supplied, you do.

#Backends

@agentskit/sandbox is a thin abstraction; backends do the actual isolation. Each backend implements the SandboxBackend interface:

interface SandboxBackend {
  execute(code: string, options?: ExecuteOptions): Promise<ExecuteResult>
  dispose(): Promise<void>
}

interface ExecuteResult {
  stdout: string
  stderr: string
  exitCode: number
  durationMs: number
}

#E2B (default — production-ready)

import { createSandbox } from '@agentskit/sandbox'

const sandbox = createSandbox({
  apiKey: process.env.E2B_API_KEY!,
  language: 'python',
  timeout: 30_000,        // ms; per call
  memoryLimit: '512mb',
})

E2B hosts the VM. Their free tier covers low-volume demos; paid tiers scale. Strict isolation: no host filesystem access, no network unless explicitly enabled.

#WebContainer (browser-only)

Run inside the user's browser via StackBlitz WebContainer API. Useful for in-browser playgrounds; skip for any agent that processes secrets — the sandbox is only as private as the user's machine.

#Custom backend

import type { SandboxBackend } from '@agentskit/sandbox'

const myBackend: SandboxBackend = {
  async execute(code, options) {
    // Hand off to your VM, container, isolate, etc.
    return { stdout: '', stderr: '', exitCode: 0, durationMs: 0 }
  },
  async dispose() {},
}

const sandbox = createSandbox({ backend: myBackend })

#Policy + sandbox together

The two compose:

import { createSandbox, createMandatorySandbox } from '@agentskit/sandbox'
import { shell, filesystem } from '@agentskit/tools'

const sandbox = createSandbox({ apiKey: process.env.E2B_API_KEY! })

const policy = {
  allow: ['ls', 'cat', 'grep', 'wc', 'python'],
  deny:  ['rm', 'sudo', 'curl', 'wget', 'ssh'],
  requireSandbox: ['shell', 'fs_write'],   // tools that MUST sandbox
  validators: {
    shell: (args) => args.cmd.length < 256 || 'command too long',
  },
}

const tools = [
  ...createMandatorySandbox(shell(), policy, sandbox),
  ...createMandatorySandbox(filesystem({ basePath: './workspace' }), policy, sandbox),
]

The policy gates which invocations reach the sandbox; the sandbox gates what those invocations can actually do. Skip either layer and you've got a hole.

#Audit trail

Every sandbox call surfaces as a structured event the runtime can log:

import { createRuntime } from '@agentskit/runtime'
import type { Observer } from '@agentskit/core'

const audit: Observer = {
  name: 'sandbox-audit',
  on(event) {
    if (event.type === 'tool:start' && event.metadata?.sandbox) {
      console.log(`[sandbox] ${event.name} → ${event.metadata.backend}`)
    }
  },
}

createRuntime({ adapter, tools, observers: [audit] })

For HMAC-signed audit logs (tamper-evident, replayable), pair with createSignedAuditLog from @agentskit/observability.

#Failure modes

SymptomCauseFix
AK_SANDBOX_PEER_MISSINGE2B SDK not installednpm install @e2b/code-interpreter
AK_SANDBOX_BACKEND_FAILEDBackend init failed (auth, network)Check apiKey; verify E2B account has free quota left.
AK_SANDBOX_DENIEDPolicy denied the callAdd the verb to allow, or change the agent's plan.
AK_SANDBOX_INVALID_TOOLWrapped tool has no execute fnThe wrap was applied to a malformed ToolDefinition — fix the source tool.
Sandbox call timeoutHit timeout capBump timeout (default 30s) or split the work.

Errors are typed (SandboxError) — pattern-match on code for programmatic recovery.

#Cost + latency

Per E2B's pricing, a sandboxed call is in the ~50–500 ms range plus ~1¢ per minute of VM time. Budget accordingly. The costGuard observer treats sandbox calls like any other observable cost dimension.

Explore nearby

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

On this page