agentskit.js
Evals

Eval suites

Define cases with inputs and assertions, then run them against any async agent function to get pass rates and latency metrics.

runEval is the entry point for all evaluations: give it an async function that wraps your agent and an EvalSuite with test cases, and it returns a report with per-case results and aggregate metrics. Assertions can be boolean functions, regex, or an LLM-as-judge that returns a rationale.

import { runEval } from '@agentskit/eval'

const suite = {
  name: 'support-triage',
  cases: [
    {
      id: 'refund',
      input: 'How do I get a refund?',
      assert: (out) => out.includes('refund policy'),
    },
  ],
}

const report = await runEval({
  agent: async (input) => runtime.run({ input }).then((r) => r.output),
  suite,
})

console.log(report.passRate, report.failures)

#Assertions

  • boolean fn β†’ pass/fail
  • async LLM-as-judge β†’ ({ pass, rationale })
  • regex β†’ match required

#Metrics

Built-in: passRate, latencyP50, latencyP95, tokensTotal, usdTotal.

Explore nearby

✎ Edit this page on GitHubΒ·Found a problem? Open an issue β†’Β·How to contribute β†’

On this page