agentskit.js
Evals

Evals

Measure quality with numbers, not vibes. Suites, replay, snapshots, diff, CI reporters.

Suites

  • runEval({ agent, suite }) — run any EvalSuite against any async agent fn. Recipe.

Deterministic replay

  • createRecordingAdapter + createReplayAdapter — bit-for-bit replay. Recipe.
  • createTimeTravelSession — rewind + override + fork. Recipe.
  • replayAgainst — A/B cassette vs different model. Recipe.

Snapshots + diff

  • matchPromptSnapshot — Jest-style with exact / normalized / similarity. Recipe.
  • promptDiff + attributePromptChange — git-blame for prompts. Recipe.

CI reporters

  • reportToCi + renderJUnit + renderMarkdown + renderGitHubAnnotations. Recipe.

Open format

  • @agentskit/core/eval-format — portable eval JSON spec. Specs.

Per-primitive deep dives land in step 6 of the docs IA rollout.

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

On this page