agentskit.js
For agents

@agentskit/eval — for agents

Evaluation harness + deterministic replay + snapshot testing + diff + CI reporters.

Install

npm install @agentskit/eval

Primary exports

  • runEval({ agent, suite }) — run an EvalSuite against any async agent fn.

Subpaths

SubpathContents
@agentskit/eval/replaycreateRecordingAdapter, createReplayAdapter, cassettes, createTimeTravelSession, replayAgainst, summarizeReplay. See Deterministic replay, Time travel, Replay-different-model.
@agentskit/eval/snapshotmatchPromptSnapshot (exact / normalized / similarity). See Snapshots.
@agentskit/eval/diffpromptDiff, attributePromptChange, formatDiff. See Prompt diff.
@agentskit/eval/cirenderJUnit, renderMarkdown, renderGitHubAnnotations, reportToCi. See Evals in CI.

Minimal example

import { runEval } from '@agentskit/eval'

const result = await runEval({
  agent: async (input) => (await runtime.run(input)).content,
  suite: {
    name: 'qa',
    cases: [{ input: 'Capital of France?', expected: 'Paris' }],
  },
})

console.log(`${result.passed}/${result.totalCases}`)

Source

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

On this page