agentskit.js
Providers

replicate

Replicate — hosted open models behind one API. Two-step prediction + SSE stream.

import { replicate } from '@agentskit/adapters'

const adapter = replicate({
  apiKey: process.env.REPLICATE_API_TOKEN!,
  model: 'meta/meta-llama-3-70b-instruct',
})

#Options

OptionTypeDefault
apiKeystringrequired
modelstringrequired (e.g. meta/meta-llama-3-70b-instruct)
versionstringoptional — pin a specific version hash
baseUrlstringhttps://api.replicate.com
toInput(request) => Record<string, unknown>{ prompt } (joined [ROLE] content)

#Capabilities

{ streaming: true, tools: false } — Replicate's prediction surface doesn't expose a uniform tool-calling shape across models, so the adapter ships text-stream-only.

#How streaming works

Replicate uses a two-step prediction protocol:

  1. POST to /v1/models/{owner}/{name}/predictions (or /v1/predictions with version) — returns a prediction with a urls.stream SSE endpoint.
  2. GET that SSE stream; events of type output carry text deltas, done ends the run, error surfaces failure.

This implies one extra round-trip before the first token. For latency-sensitive workloads, prefer groq or cerebras.

#Custom input shape

Some models take chat-style inputs, others { prompt, max_tokens, temperature }. Override toInput to fit:

replicate({
  apiKey: process.env.REPLICATE_API_TOKEN!,
  model: 'meta/meta-llama-3-70b-instruct',
  toInput: (request) => ({
    prompt: request.messages.map(m => m.content).join('\n'),
    max_tokens: 512,
    temperature: 0.7,
  }),
})

#Env

VarPurpose
REPLICATE_API_TOKENAPI token

Explore nearby

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

On this page