RAG
Document loaders
Fetch + normalize documents from URLs, GitHub, Notion, Confluence, Google Drive, S3, GCS, Dropbox, OneDrive, PDFs.
All loaders return LoadedDocument[] ready for rag.ingest.
#URL
import { loadUrl } from '@agentskit/rag'
const docs = await loadUrl('https://example.com/post')Strips boilerplate; keeps main content + title.
#GitHub
import { loadGitHubFile, loadGitHubTree } from '@agentskit/rag'
const single = await loadGitHubFile({ owner, repo, path: 'README.md', ref: 'main' })
const tree = await loadGitHubTree({ owner, repo, ref: 'main', include: ['**/*.md'] })Requires GITHUB_TOKEN for private repos.
#Notion
import { loadNotionPage } from '@agentskit/rag'
const docs = await loadNotionPage({ token: process.env.NOTION_TOKEN!, pageId })#Confluence
import { loadConfluencePage } from '@agentskit/rag'
const docs = await loadConfluencePage({ baseUrl, auth, pageId })#Google Drive
import { loadGoogleDriveFile } from '@agentskit/rag'
const docs = await loadGoogleDriveFile({ fileId, accessToken })#S3 (and S3-compatible: R2, MinIO)
import { S3Client, ListObjectsV2Command, GetObjectCommand } from '@aws-sdk/client-s3'
import { loadS3 } from '@agentskit/rag'
const client = new S3Client({ region: 'us-east-1' })
const docs = await loadS3({
client,
bucket: 'my-bucket',
prefix: 'docs/',
filter: key => key.endsWith('.md'),
// Optional β pass commands to skip the dynamic import path:
commands: { ListObjectsV2Command, GetObjectCommand },
})@aws-sdk/client-s3 is an optional peer dep. For Cloudflare R2 / MinIO, configure the client's endpoint to the compatible host.
#Google Cloud Storage
import { loadGcs } from '@agentskit/rag'
const docs = await loadGcs({
bucket: 'my-bucket',
prefix: 'docs/',
accessToken: process.env.GCP_ACCESS_TOKEN!, // string or () => Promise<string>
filter: name => name.endsWith('.md'),
})OAuth2 token bring-your-own β mint via google-auth-library, Workload Identity, or gcloud auth print-access-token.
#Dropbox
import { loadDropbox } from '@agentskit/rag'
const docs = await loadDropbox({
accessToken: process.env.DROPBOX_TOKEN!,
path: '/team-docs',
filter: p => p.endsWith('.md'),
})Walks a folder recursively via files/list_folder and downloads each file via files/download.
#OneDrive (Microsoft Graph)
import { loadOneDrive } from '@agentskit/rag'
const docs = await loadOneDrive({
accessToken: msalToken, // string or () => Promise<string>
driveId: 'b!...', // omit for the signed-in user's drive
folderItemId: '01ABC...', // omit for root
})Walks the drive children recursively, follows @microsoft.graph.downloadUrl for each file.
import { loadPdf } from '@agentskit/rag'
const docs = await loadPdf({ buffer, parse: myPdfParse })BYO parser (e.g. pdf-parse, unpdf) to keep core deps zero.