agentskit.js
Recipes

A/B prompts with feature flags

Ship multiple prompts, route users deterministically, measure which wins.

Picking a new prompt is a product decision. Ship the old and new versions side-by-side, route each user deterministically, and let your analytics decide the winner. @agentskit/core/prompt-experiments is the 1 KB glue that wires any feature-flag provider (PostHog, GrowthBook, Unleash, custom) to a typed A/B prompt picker with sticky-hash fallback.

Install

Built into @agentskit/core.

import {
  createPromptExperiment,
  flagResolver,
} from '@agentskit/core/prompt-experiments'

Sticky-hash baseline (no flag provider)

Good for smoke tests, demos, or when you haven't picked a flag service yet. Same subjectId always maps to the same variant.

import { createPromptExperiment, stickyResolver } from '@agentskit/core/prompt-experiments'

const exp = createPromptExperiment({
  name: 'support-tone',
  variants: [
    { id: 'v1', prompt: 'Be concise and formal.', weight: 1 },
    { id: 'v2', prompt: 'Be warm and playful.', weight: 1 },
  ],
  resolve: stickyResolver(),
  onExposure: d => analytics.track('prompt-exposure', d),
})

const { prompt, variantId } = await exp.pick({ subjectId: currentUser.id })

Plug in your flag provider

flagResolver wraps any (name, context) => variantId function — PostHog's getFeatureFlagPayload, GrowthBook's getFeatureValue, Unleash, LaunchDarkly. If the provider returns an unknown variant (rollout paused, flag misconfigured, network error), the picker falls back to the sticky resolver so users still see some prompt.

import posthog from 'posthog-node'

const exp = createPromptExperiment({
  name: 'support-tone',
  variants: [
    { id: 'control', prompt: 'Be concise and formal.' },
    { id: 'playful', prompt: 'Be warm and playful.' },
  ],
  resolve: flagResolver(async (name, ctx) => {
    return posthog.getFeatureFlag(name, ctx.subjectId ?? 'anon') as string
  }, 'support-tone'),
  onExposure: d => {
    posthog.capture({
      distinctId: d.subjectId ?? 'anon',
      event: '$feature_flag_called',
      properties: { $feature_flag: d.name, $feature_flag_response: d.variantId, fallback: d.fallback },
    })
  },
})

Decision shape

{
  name: 'support-tone',
  variantId: 'playful',
  prompt: 'Be warm and playful.',
  fallback: false,   // true if the custom resolver failed
}

Every call hits onExposure, so your analytics pipeline can attribute downstream events (conversions, satisfaction, regenerations) to the variant.

Multiple variants per property

prompt is typed on the variant so you can A/B whole message structures, not just strings:

createPromptExperiment<{ system: string; temperature: number }>({
  name: 'agent-config',
  variants: [
    { id: 'cold', prompt: { system: 'You are precise.', temperature: 0 } },
    { id: 'warm', prompt: { system: 'You are warm.', temperature: 0.7 } },
  ],
  resolve: flagResolver(getVariant, 'agent-config'),
})

See also

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

On this page