agentskit.js
Data layerRAG

Document loaders

Fetch + normalize documents from URLs, GitHub, Notion, Confluence, Google Drive, PDFs.

All loaders return LoadedDocument[] ready for rag.ingest.

URL

import { loadUrl } from '@agentskit/rag'
const docs = await loadUrl('https://example.com/post')

Strips boilerplate; keeps main content + title.

GitHub

import { loadGitHubFile, loadGitHubTree } from '@agentskit/rag'

const single = await loadGitHubFile({ owner, repo, path: 'README.md', ref: 'main' })
const tree = await loadGitHubTree({ owner, repo, ref: 'main', include: ['**/*.md'] })

Requires GITHUB_TOKEN for private repos.

Notion

import { loadNotionPage } from '@agentskit/rag'
const docs = await loadNotionPage({ token: process.env.NOTION_TOKEN!, pageId })

Confluence

import { loadConfluencePage } from '@agentskit/rag'
const docs = await loadConfluencePage({ baseUrl, auth, pageId })

Google Drive

import { loadGoogleDriveFile } from '@agentskit/rag'
const docs = await loadGoogleDriveFile({ fileId, accessToken })

PDF

import { loadPdf } from '@agentskit/rag'
const docs = await loadPdf({ buffer, parse: myPdfParse })

BYO parser (e.g. pdf-parse, unpdf) to keep core deps zero.

✎ Edit this page on GitHub·Found a problem? Open an issue →·How to contribute →

On this page