Deterministic replay
Record LLM responses to a cassette file, then replay them in CI without network calls for fast, deterministic tests.
Non-determinism and API latency make agent tests slow and flaky. createRecordingAdapter captures every response to a cassette file on first run; createReplayAdapter replays those responses in subsequent runs β same output, no network, sub-millisecond per call.
#Record
import { createRecordingAdapter } from '@agentskit/eval'
const rec = createRecordingAdapter({
inner: openai({ apiKey }),
cassettePath: '.agentskit/cassettes/triage.jsonl',
})Run your suite once with rec β every call captured.
#Replay
import { createReplayAdapter } from '@agentskit/eval'
const replay = createReplayAdapter({
cassettePath: '.agentskit/cassettes/triage.jsonl',
})Use replay in CI β zero network, deterministic.
#Time travel
import { createTimeTravelSession } from '@agentskit/eval'
const session = createTimeTravelSession({ cassettePath })
session.rewindTo(step)
session.override(step, { output: 'alternate response' })
const forked = session.fork()#Replay against different model
import { replayAgainst } from '@agentskit/eval'
const diff = await replayAgainst({
cassettePath,
adapter: anthropic(...),
})#Related
Explore nearby
- PeerEvals
Run eval suites against any async agent function, replay recorded sessions in CI, and track prompt regressions with snapshots.
- PeerEval suites
Define cases with inputs and assertions, then run them against any async agent function to get pass rates and latency metrics.
- PeerPrompt snapshots + diff
Assert that rendered prompts haven't changed unexpectedly, and trace exactly which edit caused a drift.