ToolsIntegrations
documentParsers
PDF, DOCX, XLSX parsers — BYO parser functions keep core dependency-free.
import { documentParsers } from '@agentskit/tools'
import pdfParse from 'pdf-parse'
import * as mammoth from 'mammoth'
import * as xlsx from 'xlsx'
const runtime = createRuntime({
adapter,
tools: [...documentParsers({
parsePdf: async (buf) => (await pdfParse(buf)).text,
parseDocx: async (buf) => (await mammoth.extractRawText({ buffer: buf })).value,
parseXlsx: async (buf) => {
const wb = xlsx.read(buf)
return wb.SheetNames.map((n) => xlsx.utils.sheet_to_csv(wb.Sheets[n])).join('\n---\n')
},
})],
})#Sub-tools
| Name | Purpose |
|---|---|
parsePdf | Extract text from a PDF buffer |
parseDocx | Extract text from a .docx buffer |
parseXlsx | Extract CSV-flat sheets from .xlsx |
Bundled: documentParsers(config) returns all three.
#Why BYO
Core stays zero-dep. You pick parser quality + size trade-offs:
- PDF:
pdf-parse(small) /unpdf(WASM, browser-safe) /pdfjs-dist(Mozilla). - DOCX:
mammoth(most faithful) /docx4js. - XLSX:
xlsx(SheetJS) /exceljs.
#Example — resume intake
const runtime = createRuntime({
adapter,
tools: [
...s3({ client, bucket: 'resumes' }),
...documentParsers({ parsePdf, parseDocx }),
...rag.tools,
],
})#Related
- Integrations overview · s3.
- RAG loaders —
loadPdfuses the same BYO pattern.
Explore nearby
- PeerIntegrations
20+ ready-made connectors for the services agents actually need. Each follows the same contract — install, config, execute — and ships granular sub-tools alongside a bundled set.
- Peergithub
GitHub REST v3 — search issues, create issues, comment. Pairs with HITL for ship-gating bots.
- PeergithubActions
GitHub Actions — list runs and trigger workflow_dispatch events.