Evals
CI reporters
Write eval results as JUnit XML, Markdown, or GitHub annotations so failures block PRs and surface in pull request checks.
reportToCi takes a completed eval report and writes it in the formats your CI stack expects β JUnit for test result tracking, Markdown as a PR artifact, and GitHub annotations that pin failures directly to changed lines.
import {
reportToCi,
renderJUnit,
renderMarkdown,
renderGitHubAnnotations,
} from '@agentskit/eval'
const report = await runEval({ agent, suite })
await reportToCi({
report,
output: [
{ kind: 'junit', path: 'eval-report.xml' },
{ kind: 'markdown', path: 'eval-report.md' },
{ kind: 'github-annotations' }, // auto-writes to ::error / ::warning
],
})#GitHub Actions
- run: pnpm eval
- uses: actions/upload-artifact@v4
with:
name: eval-report
path: eval-report.md#Related
Explore nearby
- PeerEvals
Run eval suites against any async agent function, replay recorded sessions in CI, and track prompt regressions with snapshots.
- PeerEval suites
Define cases with inputs and assertions, then run them against any async agent function to get pass rates and latency metrics.
- PeerDeterministic replay
Record LLM responses to a cassette file, then replay them in CI without network calls for fast, deterministic tests.