Build an Ask-the-docs chat
The exact grounded RAG chat in this site's corner — index your docs, ground a free model, stream cited answers. Copy it, or scaffold it with one command.
This is the recipe for the chat you're using right now: a grounded RAG assistant over your own docs that streams concise, cited answers — running at $0 on the OpenRouter free pool, built entirely on @agentskit/*.
Two ways in: scaffold it with the CLI, or wire it by hand.
#Fastest path — the CLI
npx agentskit add docs-chatdocs-chat is a UI component, not just an agent file — agentskit add now installs it via the component registry flow defined in RFC-0006.
#What the command actually does
Scan — if .agentskit/components.json is absent, the CLI scans your project: UI binding (react, svelte, vue, …), meta-framework (next-app, next-pages, sveltekit, nuxt, remix, tanstack-start, vite, and more), package manager, TypeScript vs JavaScript, monorepo root. Ambiguous signals surface as validation errors — never a silent guess. With components.json committed (written once by agentskit init), the scan is skipped entirely and every add runs non-interactively.
Validate — before anything is written the CLI checks compatibility and refuses with a concrete error plus a pointer to a supported alternative if the detected environment can't satisfy the component's runtimeRequirement or embeddingBackend (for example, onnx-node is blocked on edge or Expo targets). Peer-range conflicts across all resolved dependencies are surfaced in aggregate, not on first hit. Pass --dry-run to see the full plan — files, targets, deps, env, conflicts — without writing anything.
Place — files land in framework-correct locations. The server handler is placed per serverTargetByMeta[metaFramework]: a Next.js App Router route handler (app/api/ask/route.ts), a SvelteKit +server.ts, a Nuxt server/api/*.post.ts, a Remix resource route, and so on. The client component composes the matching @agentskit/* binding. You own every file — edit guardrails, styling tokens, and adapter choices freely.
Record — the installer appends a tamper-evident entry to .agentskit/install-log.jsonl and updates the installed marker in components.json with per-file SHA-256 and the pinned registry ref, enabling agentskit diff docs-chat and agentskit update docs-chat later.
#Safety guarantees
- Per-file SHA-256 verification — every fetched file is verified against the signed manifest before any write. A mismatch aborts the entire install.
- Path-containment guard —
path.resolve(dest)is asserted to stay inside the target directory on every file, both on the write path and ondiffreads. A../../.env-style path escape is anIntegrityError. - Transactional (all-or-nothing) — files are staged in a sibling temp directory, verified, then moved atomically. Any pre-commit failure rolls back all partial writes and reports "rolled back N files." Your tree is never left dirty.
- Append-only audit log —
.agentskit/install-log.jsonlchains entries viaprevEntryHash(SHA-256 of the prior entry); a futureagentskit auditcommand walks the chain and fails on any gap or mismatch.
#Framework support
The first shipping port is React × Next.js (app router). Other frameworks (sveltekit, nuxt, remix, tanstack-start, angular, expo, ink) are rolling out port-by-port, each gated by the binding stability requirements in RFC-0004. The CLI will refuse to install a port that has not shipped — it will not copy broken source into an unsupported framework.
#Zero-prompts via agentskit init
npx agentskit init # writes .agentskit/components.json — commit this file
npx agentskit add docs-chatAfter init, every subsequent add reads the committed config and runs non-interactively. In CI pass --yes to exit non-zero on any blocker instead of hanging.
#After install
The ready output prints a per-framework usage snippet wiring createAskHandler to your retriever and adapter, the required env vars (written to .env.example), and a "run the indexer before first use" step. The installer copies an indexer (agentskit ask index ./docs) and an empty index stub — it never ships AgentsKit's own corpus.
Point the handler at your retriever and adapter as shown in the sections below, then run the indexer. A full runnable version lives in apps/example-rag-chat — swap the sample docs for yours.
#1. Index your docs (RAG)
Chunk + embed your docs once, into any @agentskit/memory vector store. Embedding stays free + local with an ONNX model:
import { createRAG } from '@agentskit/rag'
import { fileVectorMemory } from '@agentskit/memory'
import { pipeline } from '@huggingface/transformers'
// Local ONNX embedder — $0, no API key.
let extractor: Awaited<ReturnType<typeof pipeline>> | null = null
const embed = async (text: string): Promise<number[]> => {
extractor ??= await pipeline('feature-extraction', 'Xenova/bge-small-en-v1.5')
const out = await extractor(text, { pooling: 'mean', normalize: true })
return Array.from(out.data as Float32Array)
}
const rag = createRAG({ embed, store: fileVectorMemory({ path: './docs-index' }), topK: 6 })
await rag.ingest(docs) // docs: { id, content, metadata: { path, title } }[]For a committed, read-only index (great for serverless), generate it at build time and ship the JSON — see the docs-site's scripts/gen-ask-index.mjs.
#2. Stream grounded, cited answers
createRAG returns a Retriever — drop it straight into useChat (or createRuntime). Two design choices make it reliable on free models:
- Co-locate the context with the question (free models attend to recent tokens far better than a long system prompt).
- Emit citations from what you retrieved — don't depend on a weak model to call a
citetool.
import { useChat } from '@agentskit/react'
import { openrouter, createFallbackAdapter } from '@agentskit/adapters'
const FREE = ['meta-llama/llama-3.3-70b-instruct:free', 'qwen/qwen3-next-80b-a3b-instruct:free']
const adapter = createFallbackAdapter(
FREE.map((model) => ({ id: model, adapter: openrouter({ apiKey, model }) })),
)
const chat = useChat({
adapter,
retriever: rag,
systemPrompt: 'Answer ONLY from the provided docs. Be concise. Decline + name the nearest page when uncovered. Never mention other frameworks.',
})createFallbackAdapter cascades across the free pool when one is rate-limited (429). Keep answers short and AgentsKit-specific with a tight system prompt.
#3. The widget (optional, fancy)
The floating chat is a headless, slotted component — open by default, branded logo, animated loading, and a "build this" link, all overridable:
<AskDocsWidget
logo={<YourMark />} // header logo slot
loadingState={<YourSpinner />} // loading slot
title="Ask our docs"
docsHref="/docs/cookbook/ask-the-docs"
/>Generative UI (option buttons, forms, runnable code) activates with ASK_RICH_UI=1 and a capable model — free models stay on reliable markdown text.
#Guardrails (built in)
Before any model call, a cheap triage runs — it saves your free-tier quota on trivia and blocks the obvious attacks:
- Greetings ("hi", "oi") and noise ("test", empty, gibberish) → an instant canned reply, no LLM.
- Prompt injection ("ignore previous instructions", "reveal your system prompt", "you are now…", "jailbreak") → a firm decline that never changes role or leaks the prompt — no LLM.
- Real questions (even one word like "memory") fall through.
It's additive and extensible — keep the defenses, add your own:
import { triageMessage } from './lib/ask/guard'
const triage = triageMessage(userText, {
greetings: ['salut', 'ciao'],
noise: ['blah'],
injectionPatterns: [/give me the raw config/i],
replies: { greeting: 'Hey! Ask me about our product docs.' },
})
if (triage.kind === 'canned') return stream(triage.reply) // skip the modelThis sits on top of the other layers: client system messages are stripped, the
retrieved context is fenced as untrusted data, and the grounded prompt keeps
answers on-topic — defense-in-depth.
#Going further
- Durable rate limit — front the route with Upstash (in-memory fallback for dev).
- Eval it — measure retrieval
recall@k/MRRdeterministically + an LLM judge; gate in CI. - Run code in-browser —
@agentskit/sandbox/webwebWorkerBackend(zero-vendor) powers runnable snippets.
That's the whole thing: index → ground → stream → cite. Grounding quality scales with model quality — start free, bring your own key when you want the rich UI.
Explore nearby
- PeerCookbook
Copy-paste recipes for the things every agent app needs. Each recipe stands on its own.
- PeerStreaming chat
useChat + abort + back-pressure. The minimum viable streaming chat, production-ready.
- PeerTools + memory together
The "chat with state and actions" loop — persistent memory plus tool execution.