Evals
Prompt snapshots + diff
Assert that rendered prompts haven't changed unexpectedly, and trace exactly which edit caused a drift.
Prompts are code β they can regress. matchPromptSnapshot works like Jest snapshots: the first run writes the reference, subsequent runs compare against it. When something drifts, promptDiff and attributePromptChange tell you which change caused it.
#matchPromptSnapshot
import { matchPromptSnapshot } from '@agentskit/eval'
await matchPromptSnapshot({
name: 'triage-v1',
actual: renderedPrompt,
mode: 'exact', // | 'normalized' | 'similarity'
path: '.agentskit/snapshots',
similarityThreshold: 0.95,
})#promptDiff + attributePromptChange
import { promptDiff, attributePromptChange } from '@agentskit/eval'
const delta = promptDiff(before, after)
const attribution = attributePromptChange(delta, history)#Related
Explore nearby
- PeerEvals
Run eval suites against any async agent function, replay recorded sessions in CI, and track prompt regressions with snapshots.
- PeerEval suites
Define cases with inputs and assertions, then run them against any async agent function to get pass rates and latency metrics.
- PeerDeterministic replay
Record LLM responses to a cassette file, then replay them in CI without network calls for fast, deterministic tests.