agentskit.js
Recipes

Deterministic replay

Record a real agent session once, then replay it bit-for-bit in tests and bug repros — no more flaky LLM traces.

LLMs are non-deterministic. That makes debugging a nightmare: a bug that reproduces locally vanishes the next run, and CI flakes when providers return slightly different tokens.

@agentskit/eval/replay fixes this by recording every StreamChunk an adapter produces into a cassette, then letting you replay it through a fake adapter that is bit-for-bit identical — same tokens, same order, same tool calls, zero network.

Install

npm install -D @agentskit/eval

Record once

Wrap any real adapter. Every streamed chunk is captured into a Cassette object you can persist to disk.

record-session.ts
import { createRecordingAdapter, saveCassette } from '@agentskit/eval/replay'
import { createRuntime } from '@agentskit/runtime'
import { anthropic } from '@agentskit/adapters'

const base = anthropic({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4-6' })
const { factory, cassette } = createRecordingAdapter(base, { seed: 'bug-repro-#427' })

const runtime = createRuntime({ adapter: factory })
await runtime.run('Summarize the quarterly report')

await saveCassette('./fixtures/bug-427.cassette.json', cassette)

Replay forever

In tests — or when you're iterating on a fix — swap the real adapter for the cassette. No API keys, no latency, no flake.

bug-427.test.ts
import { createReplayAdapter, loadCassette } from '@agentskit/eval/replay'
import { createRuntime } from '@agentskit/runtime'

const cassette = await loadCassette('./fixtures/bug-427.cassette.json')
const runtime = createRuntime({ adapter: createReplayAdapter(cassette) })

const result = await runtime.run('Summarize the quarterly report')
expect(result.content).toContain('Q3 revenue')

Matching modes

The replay adapter accepts a mode option that controls how incoming requests map to recorded entries.

ModeBehaviorUse when
strict (default)Request fingerprint (messages + context) must match exactlyYour test sends the same input as the recording
sequentialReturns next unused entry regardless of requestRequests include timestamps or volatile metadata
looseMatches by last user message content onlyYou care about the prompt, not the surrounding context
createReplayAdapter(cassette, { mode: 'sequential' })

What goes in a cassette

Cassettes are plain JSON. Commit them to git, diff them in PRs.

{
  "version": 1,
  "seed": "bug-repro-#427",
  "entries": [
    {
      "request": { "messages": [{ "role": "user", "content": "..." }] },
      "chunks": [
        { "type": "text", "content": "Hello" },
        { "type": "tool_call", "toolCall": { "id": "t1", "name": "search", "args": "{...}" } },
        { "type": "done" }
      ]
    }
  ]
}

See also

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

On this page