agentskit.js
Packages

@agentskit/eval

Eval suites + deterministic replay + snapshot testing + prompt diff + CI reporters.

When to reach for it

  • You want to score agent quality with numbers, in CI.
  • You want deterministic replay (record once, replay forever).
  • You want Jest-style prompt snapshots with semantic tolerance.
  • You want a "git blame for prompts" — diff + attribution.

Install

npm install -D @agentskit/eval

Hello world

import { runEval } from '@agentskit/eval'

const result = await runEval({
  agent: async (input) => (await runtime.run(input)).content,
  suite: {
    name: 'qa',
    cases: [{ input: 'Capital of France?', expected: 'Paris' }],
  },
})
console.log(`${result.passed}/${result.totalCases} passed`)

Surface

  • runEval({ agent, suite }).
  • /replay: createRecordingAdapter · createReplayAdapter · cassettes · createTimeTravelSession · replayAgainst · summarizeReplay.
  • /snapshot: matchPromptSnapshot.
  • /diff: promptDiff · attributePromptChange · formatDiff.
  • /ci: renderJUnit · renderMarkdown · renderGitHubAnnotations · reportToCi.

Recipes

Source

npm: @agentskit/eval · repo: packages/eval

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

On this page