Prompt injection
Detect instruction-hijacking patterns in user input, tool results, and RAG chunks before they reach the model.
Prompt injection is the main attack surface for agents: a user β or content the agent retrieves β attempts to override the system prompt or change the agent's behavior. createInjectionDetector catches common patterns with zero-cost heuristics and optionally escalates high-risk inputs to an LLM classifier.
import { createInjectionDetector } from '@agentskit/core/security'
const detector = createInjectionDetector({
classifier: async (text) => {
// Optional LLM-based classifier
return adapter.complete({ ... })
},
})
const verdict = await detector.check(userInput)
if (verdict.blocked) throw new Error(verdict.reason)#Heuristic layer
The heuristic layer runs synchronously at zero cost. It catches "ignore previous instructions", role-swap attempts, system-prompt leak probes, and fenced payloads like <|system|>.
#Model classifier layer
Pluggable. Use any adapter to score high-risk inputs that pass the heuristics but still look suspicious.
#Where to run it
- User input β
chat.sendpreprocess. - Tool results β before feeding back into the loop (tool-output can be attacker-controlled).
- RAG retrievals β classify each chunk before context-injection.
#Related
Explore nearby
- PeerSecurity
Six primitives for production agents: PII redaction, injection detection, rate limiting, audit log, sandbox enforcement, and HITL approvals.
- PeerPII redaction
Strip emails, phones, SSNs, and API keys from messages before they reach the model or get written to logs.
- PeerInput validation
Schema validation of tool inputs and user messages β zod, JSON Schema, prompt injection, length limits, and allowlists.