A/B prompts with feature flags
Ship multiple prompts, route users deterministically, measure which wins.
Picking a new prompt is a product decision. Ship the old and new
versions side-by-side, route each user deterministically, and let
your analytics decide the winner. @agentskit/core/prompt-experiments
is the 1 KB glue that wires any feature-flag provider (PostHog,
GrowthBook, Unleash, custom) to a typed A/B prompt picker with
sticky-hash fallback.
#Install
Built into @agentskit/core.
import {
createPromptExperiment,
flagResolver,
} from '@agentskit/core/prompt-experiments'#Sticky-hash baseline (no flag provider)
Good for smoke tests, demos, or when you haven't picked a flag
service yet. Same subjectId always maps to the same variant.
import { createPromptExperiment, stickyResolver } from '@agentskit/core/prompt-experiments'
const exp = createPromptExperiment({
name: 'support-tone',
variants: [
{ id: 'v1', prompt: 'Be concise and formal.', weight: 1 },
{ id: 'v2', prompt: 'Be warm and playful.', weight: 1 },
],
resolve: stickyResolver(),
onExposure: d => analytics.track('prompt-exposure', d),
})
const { prompt, variantId } = await exp.pick({ subjectId: currentUser.id })#Plug in your flag provider
flagResolver wraps any (name, context) => variantId function β
PostHog's getFeatureFlagPayload, GrowthBook's getFeatureValue,
Unleash, LaunchDarkly. If the provider returns an unknown variant
(rollout paused, flag misconfigured, network error), the picker
falls back to the sticky resolver so users still see some prompt.
import posthog from 'posthog-node'
const exp = createPromptExperiment({
name: 'support-tone',
variants: [
{ id: 'control', prompt: 'Be concise and formal.' },
{ id: 'playful', prompt: 'Be warm and playful.' },
],
resolve: flagResolver(async (name, ctx) => {
return posthog.getFeatureFlag(name, ctx.subjectId ?? 'anon') as string
}, 'support-tone'),
onExposure: d => {
posthog.capture({
distinctId: d.subjectId ?? 'anon',
event: '$feature_flag_called',
properties: { $feature_flag: d.name, $feature_flag_response: d.variantId, fallback: d.fallback },
})
},
})#Decision shape
{
name: 'support-tone',
variantId: 'playful',
prompt: 'Be warm and playful.',
fallback: false, // true if the custom resolver failed
}Every call hits onExposure, so your analytics pipeline can
attribute downstream events (conversions, satisfaction, regenerations)
to the variant.
#Multiple variants per property
prompt is typed on the variant so you can A/B whole message
structures, not just strings:
createPromptExperiment<{ system: string; temperature: number }>({
name: 'agent-config',
variants: [
{ id: 'cold', prompt: { system: 'You are precise.', temperature: 0 } },
{ id: 'warm', prompt: { system: 'You are warm.', temperature: 0.7 } },
],
resolve: flagResolver(getVariant, 'agent-config'),
})#See also
- Eval suite β score each variant quantitatively
- Evals in CI β gate the winner
Explore nearby
- PeerRecipes
Copy-paste solutions grouped by theme. Every recipe end-to-end, runs as written.
- PeerCustom adapter
Wrap any LLM API as an AgentsKit adapter. Plug-and-play with the rest of the kit in 30 lines.
- PeerAdapter contract tests
Verify any adapter against the ADR 0001 invariants A1βA10 with the shared test harness.