Providers
replicate
Replicate — hosted open models behind one API. Two-step prediction + SSE stream.
import { replicate } from '@agentskit/adapters'
const adapter = replicate({
apiKey: process.env.REPLICATE_API_TOKEN!,
model: 'meta/meta-llama-3-70b-instruct',
})#Options
| Option | Type | Default |
|---|---|---|
apiKey | string | required |
model | string | required (e.g. meta/meta-llama-3-70b-instruct) |
version | string | optional — pin a specific version hash |
baseUrl | string | https://api.replicate.com |
toInput | (request) => Record<string, unknown> | { prompt } (joined [ROLE] content) |
#Capabilities
{ streaming: true, tools: false } — Replicate's prediction surface doesn't expose a uniform tool-calling shape across models, so the adapter ships text-stream-only.
#How streaming works
Replicate uses a two-step prediction protocol:
- POST to
/v1/models/{owner}/{name}/predictions(or/v1/predictionswithversion) — returns a prediction with aurls.streamSSE endpoint. - GET that SSE stream; events of type
outputcarry text deltas,doneends the run,errorsurfaces failure.
This implies one extra round-trip before the first token. For latency-sensitive workloads, prefer groq or cerebras.
#Custom input shape
Some models take chat-style inputs, others { prompt, max_tokens, temperature }. Override toInput to fit:
replicate({
apiKey: process.env.REPLICATE_API_TOKEN!,
model: 'meta/meta-llama-3-70b-instruct',
toInput: (request) => ({
prompt: request.messages.map(m => m.content).join('\n'),
max_tokens: 512,
temperature: 0.7,
}),
})#Env
| Var | Purpose |
|---|---|
REPLICATE_API_TOKEN | API token |