CI reporters

Write eval results as JUnit XML, Markdown, or GitHub annotations so failures block PRs and surface in pull request checks.

reportToCi takes a completed eval report and writes it in the formats your CI stack expects — JUnit for test result tracking, Markdown as a PR artifact, and GitHub annotations that pin failures directly to changed lines.

import {
  reportToCi,
  renderJUnit,
  renderMarkdown,
  renderGitHubAnnotations,
} from '@agentskit/eval'

const report = await runEval({ agent, suite })

await reportToCi({
  report,
  output: [
    { kind: 'junit', path: 'eval-report.xml' },
    { kind: 'markdown', path: 'eval-report.md' },
    { kind: 'github-annotations' }, // auto-writes to ::error / ::warning
  ],
})

#GitHub Actions

- run: pnpm eval
- uses: actions/upload-artifact@v4
  with:
    name: eval-report
    path: eval-report.md

Recipe: evals CI
Suites

Explore nearby

Peer
Evals
Run eval suites against any async agent function, replay recorded sessions in CI, and track prompt regressions with snapshots.
Peer
Eval suites
Define cases with inputs and assertions, then run them against any async agent function to get pass rates and latency metrics.
Peer
Deterministic replay
Record LLM responses to a cassette file, then replay them in CI without network calls for fast, deterministic tests.

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

CI reporters

#GitHub Actions

#Related

Explore nearby

On this page