agentskit.js
Security

Prompt injection

Detect instruction-hijacking patterns in user input, tool results, and RAG chunks before they reach the model.

Prompt injection is the main attack surface for agents: a user β€” or content the agent retrieves β€” attempts to override the system prompt or change the agent's behavior. createInjectionDetector catches common patterns with zero-cost heuristics and optionally escalates high-risk inputs to an LLM classifier.

import { createInjectionDetector } from '@agentskit/core/security'

const detector = createInjectionDetector({
  classifier: async (text) => {
    // Optional LLM-based classifier
    return adapter.complete({ ... })
  },
})

const verdict = await detector.check(userInput)
if (verdict.blocked) throw new Error(verdict.reason)

#Heuristic layer

The heuristic layer runs synchronously at zero cost. It catches "ignore previous instructions", role-swap attempts, system-prompt leak probes, and fenced payloads like <|system|>.

#Model classifier layer

Pluggable. Use any adapter to score high-risk inputs that pass the heuristics but still look suspicious.

#Where to run it

  • User input β†’ chat.send preprocess.
  • Tool results β†’ before feeding back into the loop (tool-output can be attacker-controlled).
  • RAG retrievals β†’ classify each chunk before context-injection.

Explore nearby

✎ Edit this page on GitHubΒ·Found a problem? Open an issue β†’Β·How to contribute β†’

On this page