@agentskit/eval-braintrust — for agents
Braintrust scoring pipeline — 4 quality + 4 robustness scorers, runner, and CI regression helpers.
#Install
npm install @agentskit/eval-braintrust braintrustbraintrust is loaded lazily; runs without it still produce scored output (no upload).
#Primary exports
#Runner
runBraintrustEval({ cases, agent, scorers, options, bt? })— scores cases through the agent, optionally logs to a Braintrust experiment.scoreCase(scorers, args)— scores a single case; surfaces scorer crashes asscorer_error.summarize(cases)—{ name: { mean, n } }aggregate.
#Quality scorers
taskSuccess— substring / regex / predicate match againstexpected.factualGrounding— fraction ofmetadata.sourcesreferenced in the output.citationCorrectness— cite-tag presence + match againstmetadata.expectedCitations.toolArgValidity— fraction ofmetadata.toolCallswithschemaValid !== false.
#Robustness scorers
schemaSurvival— 1 unlessmetadata.parseErrororschemaValid === false.hitlGateCorrectness— 1 whenhitlExpected === hitlTriggered.fallbackResilience— 1 on clean run or recovered fallback; 0 on uncovered errors.noCrashSurvival— 0 whenmetadata.crashedoruncaughtException.
#Families
qualityFamily,robustnessFamily,ALL_SCORERS.
#Subpaths
| Subpath | Contents |
|---|---|
@agentskit/eval-braintrust/scorers | Individual scorer factories + families. |
@agentskit/eval-braintrust/ci | detectRegressions(baseline, current, thresholds), formatAlertsMarkdown(alerts). |
#Minimal example
import {
runBraintrustEval,
ALL_SCORERS,
} from '@agentskit/eval-braintrust'
const result = await runBraintrustEval({
cases: [{ input: 'Capital of France?', output: '', expected: 'Paris' }],
agent: async input => ({ output: await agent.run(input) }),
scorers: ALL_SCORERS,
options: { projectName: 'agentskit-showcase' },
})
console.log(result.summary, result.url)#CI regression alert
import { detectRegressions, formatAlertsMarkdown } from '@agentskit/eval-braintrust/ci'
const alerts = detectRegressions(baseline.summary, current.summary, { default: 0.05 })
if (alerts.length) {
process.stdout.write(formatAlertsMarkdown(alerts))
process.exit(1)
}#Related
- @agentskit/eval — base eval primitives.
- @agentskit/observability-langfuse — pair with tracing for trace → eval datasets.
#Source
- npm: https://www.npmjs.com/package/@agentskit/eval-braintrust
- repo: https://github.com/AgentsKit-io/agentskit/tree/main/packages/eval-braintrust
Explore nearby
- PeerFor agents — overview
Dense, LLM-friendly reference for every AgentsKit package. Designed to paste into an agent's context window.
- Peer@agentskit/core — for agents
Zero-dependency foundation. Contracts, chat controller, primitives, and a dozen feature subpaths.
- Peer@agentskit/adapters — for agents
Provider adapters (OpenAI-compatible + native) + router + ensemble + fallback + generic factory.