Recipes
Document loaders
One-line fetchers for URL, GitHub, Notion, Confluence, Google Drive, and PDF into your RAG pipeline.
Every RAG pipeline starts with "turn an external document into an
InputDocument". @agentskit/rag now ships six loaders that cover
the common sources; each accepts a custom fetch for tests and
returns InputDocument[] ready to pipe into RAG.ingest.
Install
npm install @agentskit/ragLoaders
| Loader | Source |
|---|---|
loadUrl(url) | Any HTTP URL (raw text / html) |
loadGitHubFile(owner, repo, path, { ref?, token? }) | Single file via raw.githubusercontent.com |
loadGitHubTree(owner, repo, { filter?, maxFiles? }) | Recursive repo tree, filtered |
loadNotionPage(pageId, { token }) | Flattens paragraphs + headings |
loadConfluencePage(pageId, { baseUrl, token }) | Atlassian storage body |
loadGoogleDriveFile(fileId, { accessToken }) | Drive export as text/plain |
loadPdf(url, { parsePdf }) | BYO PDF parser (pdf-parse, pdfjs, etc.) |
Example — RAG over a GitHub repo
import { createRAG, loadGitHubTree } from '@agentskit/rag'
import { fileVectorMemory } from '@agentskit/memory'
import { openaiEmbedder } from '@agentskit/adapters'
const docs = await loadGitHubTree('my-org', 'my-repo', {
token: process.env.GITHUB_TOKEN!,
filter: path => path.endsWith('.md') || path.endsWith('.ts'),
maxFiles: 500,
})
const rag = createRAG({
embed: openaiEmbedder({ apiKey: process.env.OPENAI_API_KEY! }),
store: fileVectorMemory({ path: './kb.json' }),
})
await rag.ingest(docs)Example — PDF via any parser
import { loadPdf } from '@agentskit/rag'
import pdfParse from 'pdf-parse'
const docs = await loadPdf('https://example.com/report.pdf', {
parsePdf: async bytes => {
const result = await pdfParse(Buffer.from(bytes))
return { text: result.text, pages: result.numpages }
},
})