@agentskit/eval — for agents
Evaluation harness + deterministic replay + snapshot testing + diff + CI reporters.
#Install
npm install @agentskit/eval#Primary exports
runEval({ agent, suite })— run anEvalSuiteagainst any async agent fn.
#Subpaths
| Subpath | Contents |
|---|---|
@agentskit/eval/replay | createRecordingAdapter, createReplayAdapter, cassettes, createTimeTravelSession, replayAgainst, summarizeReplay. See Deterministic replay, Time travel, Replay-different-model. |
@agentskit/eval/snapshot | matchPromptSnapshot (exact / normalized / similarity). See Snapshots. |
@agentskit/eval/diff | promptDiff, attributePromptChange, formatDiff. See Prompt diff. |
@agentskit/eval/ci | renderJUnit, renderMarkdown, renderGitHubAnnotations, reportToCi. See Evals in CI. |
#Minimal example
import { runEval } from '@agentskit/eval'
const result = await runEval({
agent: async (input) => (await runtime.run(input)).content,
suite: {
name: 'qa',
cases: [{ input: 'Capital of France?', expected: 'Paris' }],
},
})
console.log(`${result.passed}/${result.totalCases}`)#Related
- @agentskit/eval-braintrust — Braintrust reporter backend.
- @agentskit/runtime.
- @agentskit/observability — trace + cost data feeds eval comparisons.
- @agentskit/core/eval-format — portable format spec.
#Source
- npm: https://www.npmjs.com/package/@agentskit/eval
- repo: https://github.com/AgentsKit-io/agentskit/tree/main/packages/eval
Explore nearby
- PeerFor agents — overview
Dense, LLM-friendly reference for every AgentsKit package. Designed to paste into an agent's context window.
- Peer@agentskit/core — for agents
Zero-dependency foundation. Contracts, chat controller, primitives, and a dozen feature subpaths.
- Peer@agentskit/adapters — for agents
Provider adapters (OpenAI-compatible + native) + router + ensemble + fallback + generic factory.