For agents
@agentskit/eval — for agents
Evaluation harness + deterministic replay + snapshot testing + diff + CI reporters.
Install
npm install @agentskit/evalPrimary exports
runEval({ agent, suite })— run anEvalSuiteagainst any async agent fn.
Subpaths
| Subpath | Contents |
|---|---|
@agentskit/eval/replay | createRecordingAdapter, createReplayAdapter, cassettes, createTimeTravelSession, replayAgainst, summarizeReplay. See Deterministic replay, Time travel, Replay-different-model. |
@agentskit/eval/snapshot | matchPromptSnapshot (exact / normalized / similarity). See Snapshots. |
@agentskit/eval/diff | promptDiff, attributePromptChange, formatDiff. See Prompt diff. |
@agentskit/eval/ci | renderJUnit, renderMarkdown, renderGitHubAnnotations, reportToCi. See Evals in CI. |
Minimal example
import { runEval } from '@agentskit/eval'
const result = await runEval({
agent: async (input) => (await runtime.run(input)).content,
suite: {
name: 'qa',
cases: [{ input: 'Capital of France?', expected: 'Paris' }],
},
})
console.log(`${result.passed}/${result.totalCases}`)Related
- @agentskit/runtime.
- @agentskit/core/eval-format — portable format spec.