agentskit.js
Providers

vllm

vLLM — high-throughput self-hosted inference with OpenAI-compatible API. For production workloads on your own GPUs.

import { vllm } from '@agentskit/adapters'

const adapter = vllm({
  model: 'meta-llama/Llama-3.3-70B-Instruct',
  url: 'http://localhost:8000/v1',
})

Options

OptionTypeDefault
modelstringrequired
urlstringhttp://localhost:8000/v1
fetchtypeof fetchglobal

Why vllm

  • PagedAttention + continuous batching → best-in-class throughput.
  • OpenAI-compatible; cluster-friendly.
✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

On this page