Rate limiting

LLMs can loop. Tools can fan out. Add a per-user token bucket to every request so a single conversation can't blow the budget.

import { runtime } from '@agentskit/runtime'
import { tokenBucket } from '@agentskit/core'

const userBucket = tokenBucket({ capacity: 50_000, refillPerMinute: 10_000 })
const toolBucket = tokenBucket({ capacity: 200, refillPerMinute: 100 })

export async function POST(req: Request) {
  const { userId, message } = await req.json()
  if (!userBucket.take(userId, 1)) {
    return new Response('rate limited', { status: 429 })
  }
  return runtime.run(message, {
    onToolCall: ({ name }) => {
      if (!toolBucket.take(`${userId}:${name}`, 1)) {
        throw new Error(`tool ${name} rate limited`)
      }
    },
  })
}

⚡ Performance

The bucket lives in memory by default. For horizontal scale, back it with Redis and INCR + TTL — the interface is identical.

Tip

Set onTokenBudgetExceeded on the runtime too — it stops the ReAct loop from burning tokens in a flawed plan.

Explore nearby

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →