agentskit.js
Providers

webllm (browser-only / WebGPU)

100% on-device inference via WebLLM (MLC). No API key, no token cost, no inference network call.

import { webllm } from '@agentskit/adapters'

const adapter = webllm({
  model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC',
  onProgress: (info) => console.log(info.progress, info.text),
})

@mlc-ai/web-llm is an optional peer dependency β€” install it alongside this package when you opt into browser-only inference.

npm install @mlc-ai/web-llm

#Options

OptionTypeDefault
modelstringrequired (MLC catalog id, e.g. Llama-3.1-8B-Instruct-q4f16_1-MLC)
engineWebLlmEngineLikeoptional β€” inject a pre-loaded engine to skip the cold-start cost
onProgress({ progress, text }) => voidoptional β€” fires during model download / WebGPU compile

#Capabilities

{ streaming: true, tools: false } β€” WebLLM streams tokens via OpenAI-compatible chunks; tool calls are not exposed by the engine.

#Why

  • 100% on-device. No data leaves the browser. Inference uses the user's GPU via WebGPU.
  • No API key. No token cost. No rate limits beyond the user's hardware.
  • Offline-friendly. First model fetch is online; subsequent runs work without network.

#Caveats

  • WebGPU required. Chromium 113+ / Edge 113+ / recent Safari Tech Preview. No Firefox stable yet.
  • First load is heavy. A quantized 8B model is ~4 GB compressed; expect 30–90s initial download. Cache warms persistently after.
  • No tool calls. If the agent needs tools, run a hosted adapter alongside or wait for tool-use support upstream.

#Pre-loading the engine

The cold-start (download + WebGPU compile) is the slowest part. Warm it once at app boot:

import { CreateMLCEngine } from '@mlc-ai/web-llm'
import { webllm } from '@agentskit/adapters'

const engine = await CreateMLCEngine('Llama-3.1-8B-Instruct-q4f16_1-MLC', {
  initProgressCallback: (info) => updateUi(info.progress, info.text),
})

const adapter = webllm({ model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC', engine })

Explore nearby

✎ Edit this page on GitHubΒ·Found a problem? Open an issue β†’Β·How to contribute β†’

On this page