Prompt injection detector
Score incoming text for injection attempts — heuristics + optional model classifier (Llama Guard, Rebuff).
Prompt injection is user input that tries to rewrite the agent's
instructions. createInjectionDetector gives you a two-layer defense:
cheap regex heuristics for the common patterns, and a pluggable
model classifier for the subtle ones. The verdict is the max of both
signals.
#Install
import { createInjectionDetector } from '@agentskit/core/security'#Heuristic-only (fast, free)
const detector = createInjectionDetector()
const verdict = await detector.check(userMessage)
if (verdict.blocked) {
audit.append({ actor: userId, action: 'injection_blocked', payload: verdict })
return 'Sorry, that request was blocked.'
}Default heuristics catch the usual suspects: "ignore previous instructions", "you are now a...", system-prompt leakage, developer mode, policy bypass phrasing, tool-call smuggling, role confusion.
#Layer a model classifier (Llama Guard, Prompt Guard, Rebuff)
const detector = createInjectionDetector({
threshold: 0.7,
classifier: async input => {
const res = await fetch('https://api.example.com/llama-guard', {
method: 'POST',
body: JSON.stringify({ text: input }),
headers: { 'content-type': 'application/json', authorization: `Bearer ${process.env.LG_KEY}` },
})
const { unsafe_score } = (await res.json()) as { unsafe_score: number }
return unsafe_score
},
})Classifier errors are swallowed — you degrade to heuristic-only instead of rejecting all traffic when the upstream flakes.
#Verdict shape
{
score: number, // max(heuristic, classifier)
blocked: boolean, // score >= threshold
hits: [{ name, weight }], // heuristic hits
source: 'heuristic' | 'hybrid',
}#Add your own heuristics
import { DEFAULT_INJECTION_HEURISTICS, createInjectionDetector } from '@agentskit/core/security'
createInjectionDetector({
heuristics: [
...DEFAULT_INJECTION_HEURISTICS,
{ name: 'off-topic-divert', pattern: /let['’]s talk about something else/i, weight: 0.5 },
],
})#See also
Explore nearby
- PeerRecipes
Copy-paste solutions grouped by theme. Every recipe end-to-end, runs as written.
- PeerCustom adapter
Wrap any LLM API as an AgentsKit adapter. Plug-and-play with the rest of the kit in 30 lines.
- PeerAdapter contract tests
Verify any adapter against the ADR 0001 invariants A1–A10 with the shared test harness.