agentskit.js
Evals

Eval suites

Run any async agent fn against an EvalSuite. Pass/fail per case.

import { runEval } from '@agentskit/eval'

const suite = {
  name: 'support-triage',
  cases: [
    {
      id: 'refund',
      input: 'How do I get a refund?',
      assert: (out) => out.includes('refund policy'),
    },
  ],
}

const report = await runEval({
  agent: async (input) => runtime.run({ input }).then((r) => r.output),
  suite,
})

console.log(report.passRate, report.failures)

Assertions

  • boolean fn → pass/fail
  • async LLM-as-judge → ({ pass, rationale })
  • regex → match required

Metrics

Built-in: passRate, latencyP50, latencyP95, tokensTotal, usdTotal.

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

On this page