Packages
@agentskit/eval
Eval suites + deterministic replay + snapshot testing + prompt diff + CI reporters.
When to reach for it
- You want to score agent quality with numbers, in CI.
- You want deterministic replay (record once, replay forever).
- You want Jest-style prompt snapshots with semantic tolerance.
- You want a "git blame for prompts" — diff + attribution.
Install
npm install -D @agentskit/evalHello world
import { runEval } from '@agentskit/eval'
const result = await runEval({
agent: async (input) => (await runtime.run(input)).content,
suite: {
name: 'qa',
cases: [{ input: 'Capital of France?', expected: 'Paris' }],
},
})
console.log(`${result.passed}/${result.totalCases} passed`)Surface
runEval({ agent, suite })./replay:createRecordingAdapter·createReplayAdapter· cassettes ·createTimeTravelSession·replayAgainst·summarizeReplay./snapshot:matchPromptSnapshot./diff:promptDiff·attributePromptChange·formatDiff./ci:renderJUnit·renderMarkdown·renderGitHubAnnotations·reportToCi.
Recipes
- Eval suite
- Deterministic replay
- Time-travel debug
- Replay-different-model
- Prompt snapshots
- Prompt diff
- Evals in CI
Related
Source
npm: @agentskit/eval · repo: packages/eval