Deterministic replay

Record a real agent session once, then replay it bit-for-bit in tests and bug repros — no more flaky LLM traces.

LLMs are non-deterministic. That makes debugging a nightmare: a bug that reproduces locally vanishes the next run, and CI flakes when providers return slightly different tokens.

@agentskit/eval/replay fixes this by recording every StreamChunk an adapter produces into a cassette, then letting you replay it through a fake adapter that is bit-for-bit identical — same tokens, same order, same tool calls, zero network.

Install

npm install -D @agentskit/eval

Record once

Wrap any real adapter. Every streamed chunk is captured into a Cassette object you can persist to disk.

record-session.ts

import { createRecordingAdapter, saveCassette } from '@agentskit/eval/replay'
import { createRuntime } from '@agentskit/runtime'
import { anthropic } from '@agentskit/adapters'

const base = anthropic({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4-6' })
const { factory, cassette } = createRecordingAdapter(base, { seed: 'bug-repro-#427' })

const runtime = createRuntime({ adapter: factory })
await runtime.run('Summarize the quarterly report')

await saveCassette('./fixtures/bug-427.cassette.json', cassette)

Replay forever

In tests — or when you're iterating on a fix — swap the real adapter for the cassette. No API keys, no latency, no flake.

bug-427.test.ts

import { createReplayAdapter, loadCassette } from '@agentskit/eval/replay'
import { createRuntime } from '@agentskit/runtime'

const cassette = await loadCassette('./fixtures/bug-427.cassette.json')
const runtime = createRuntime({ adapter: createReplayAdapter(cassette) })

const result = await runtime.run('Summarize the quarterly report')
expect(result.content).toContain('Q3 revenue')

Matching modes

The replay adapter accepts a mode option that controls how incoming requests map to recorded entries.

Mode	Behavior	Use when
`strict` (default)	Request fingerprint (messages + context) must match exactly	Your test sends the same input as the recording
`sequential`	Returns next unused entry regardless of request	Requests include timestamps or volatile metadata
`loose`	Matches by last user message content only	You care about the prompt, not the surrounding context

createReplayAdapter(cassette, { mode: 'sequential' })

What goes in a cassette

Cassettes are plain JSON. Commit them to git, diff them in PRs.

{
  "version": 1,
  "seed": "bug-repro-#427",
  "entries": [
    {
      "request": { "messages": [{ "role": "user", "content": "..." }] },
      "chunks": [
        { "type": "text", "content": "Hello" },
        { "type": "tool_call", "toolCall": { "id": "t1", "name": "search", "args": "{...}" } },
        { "type": "done" }
      ]
    }
  ]
}