agentskit.js
Recipes

Document loaders

One-line fetchers for URL, GitHub, Notion, Confluence, Google Drive, and PDF into your RAG pipeline.

Every RAG pipeline starts with "turn an external document into an InputDocument". @agentskit/rag now ships six loaders that cover the common sources; each accepts a custom fetch for tests and returns InputDocument[] ready to pipe into RAG.ingest.

Install

npm install @agentskit/rag

Loaders

LoaderSource
loadUrl(url)Any HTTP URL (raw text / html)
loadGitHubFile(owner, repo, path, { ref?, token? })Single file via raw.githubusercontent.com
loadGitHubTree(owner, repo, { filter?, maxFiles? })Recursive repo tree, filtered
loadNotionPage(pageId, { token })Flattens paragraphs + headings
loadConfluencePage(pageId, { baseUrl, token })Atlassian storage body
loadGoogleDriveFile(fileId, { accessToken })Drive export as text/plain
loadPdf(url, { parsePdf })BYO PDF parser (pdf-parse, pdfjs, etc.)

Example — RAG over a GitHub repo

import { createRAG, loadGitHubTree } from '@agentskit/rag'
import { fileVectorMemory } from '@agentskit/memory'
import { openaiEmbedder } from '@agentskit/adapters'

const docs = await loadGitHubTree('my-org', 'my-repo', {
  token: process.env.GITHUB_TOKEN!,
  filter: path => path.endsWith('.md') || path.endsWith('.ts'),
  maxFiles: 500,
})

const rag = createRAG({
  embed: openaiEmbedder({ apiKey: process.env.OPENAI_API_KEY! }),
  store: fileVectorMemory({ path: './kb.json' }),
})
await rag.ingest(docs)

Example — PDF via any parser

import { loadPdf } from '@agentskit/rag'
import pdfParse from 'pdf-parse'

const docs = await loadPdf('https://example.com/report.pdf', {
  parsePdf: async bytes => {
    const result = await pdfParse(Buffer.from(bytes))
    return { text: result.text, pages: result.numpages }
  },
})

See also

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

On this page