Mandatory tool sandbox

Enforce a sandbox policy across every tool the agent can call — allow-list, deny-list, require-sandbox, per-tool validators.

Powerful tools (shell, filesystem, code execution) shouldn't be run raw. createMandatorySandbox wraps every ToolDefinition with a policy layer so a bad agent decision can't bypass the rules.

Four knobs:

allow — explicit allow-list; everything else denied.
deny — specific tools are blocked.
requireSandbox — listed tools (or '*') must run inside the shared sandbox tool, regardless of their own execute.
validators — synchronous per-tool argument checks.

Wire it up

import { createMandatorySandbox, sandboxTool } from '@agentskit/sandbox'
import { filesystem, shell, webSearch } from '@agentskit/tools'

const policy = createMandatorySandbox({
  sandbox: sandboxTool(),
  policy: {
    requireSandbox: ['shell'],
    deny: ['filesystem'],
    allow: ['shell', 'web_search', 'code_execution'],
    validators: {
      web_search: args => {
        if (typeof args.q !== 'string' || args.q.length > 200) {
          throw new Error('web_search requires a query ≤ 200 chars')
        }
      },
    },
    onPolicyEvent: e => logger.info('[policy]', e),
  },
})

const safeTools = [shell(), webSearch(), filesystem({ basePath })].map(t => policy.wrap(t))

const runtime = createRuntime({ adapter, tools: safeTools })

How enforcement works

Denied / not-in-allow tools: the wrapper replaces execute with a thunk that throws. The runtime surfaces the error to the model rather than running anything.
Require-sandbox tools: the wrapper replaces execute with the sandbox tool's execute, so the original tool's body never runs.
Validators: run synchronously before execution; throw to abort.

check(tool) returns { allowed, mustSandbox, reason? } without wrapping — useful for CI rules that fail the build when a new tool would be denied, or for admin dashboards that show the current policy effect.

Pair with

HITL approvals — require a human decision on top of the sandbox for the riskiest ops.
Signed audit log — record every allow/deny/run decision for SOC 2 evidence.
Rate limiting — cap how often any given tool can be invoked per user.

Mandatory tool sandbox

Install

Wire it up

How enforcement works

Dry-run

Pair with

See also

On this page