Providers

llamacpp

llama.cpp server — run GGUF models on CPU or GPU with minimal overhead.

import { llamacpp } from '@agentskit/adapters'

const adapter = llamacpp({
  url: 'http://localhost:8080',
})

#Options

Option	Type	Default
`url`	`string`	`http://localhost:8080`
`fetch`	`typeof fetch`	global

#Why llamacpp

Runs everywhere, including Raspberry Pi + embedded.
GGUF quantizations from 4-bit to 16-bit.

Providers overview · ollama · vllm

Explore nearby

Peer
Providers
25 native chat and embedder adapters, plus higher-order adapters that compose candidates. Separate from the 140-provider models catalog.
Peer
Choosing an adapter
Capability decision table and rules of thumb for picking a chat adapter.
Peer
Hosted chat adapters
17 managed-LLM adapters. Same contract; swap by changing one import.

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

vllm

vLLM — high-throughput self-hosted inference with OpenAI-compatible API. For production workloads on your own GPUs.

webllm (browser-only / WebGPU)

100% on-device inference via WebLLM (MLC). No API key, no token cost, no inference network call.

On this page

Options Why llamacpp Related

Ask the docs

Ask anything about AgentsKit. Answers come from the docs corpus and cite their sources.

Build a chat like this - step by step ->