Prerequisites

  • Target LLM API endpoint or chatbot URL
  • API key for the target model (if testing your own deployment)
  • System prompt / instructions (if available — client should provide for authorized testing)
  • Written authorization explicitly covering AI security testing
  • Garak installed: pip install garak
  1. 1

    Select AI / LLM Security from Home Screen

    Click 🧠 AI / LLM Security. Select target type: OpenAI-compatible API, Anthropic API, Hugging Face model, or custom HTTP endpoint.

  2. 2

    Configure Target Model

    Target Type : openai / anthropic / huggingface / rest API Endpoint : https://api.target.com/v1/chat/completions API Key : [encrypted on entry] Model Name : gpt-4o / claude-3 / mistral-7b / custom System Prompt : [paste if known] Context Window : 4096 / 8192 / 128000 tokens
  3. 3

    Claude Orchestrates LLM Security Test Suite

    Phase 1: Prompt injection — direct + indirect injection probes Phase 2: Jailbreak attempts — DAN variants, roleplay bypasses, encoding tricks Phase 3: System prompt leak — extract hidden instructions Phase 4: Data leakage — training data extraction (memorization) Phase 5: Model denial — excessive token consumption, repetition attacks Phase 6: Agentic security — tool call injection, confused deputy attacks Phase 7: Bias/toxicity — Garak perspective probes Phase 8: Report — OWASP LLM Top 10 mapping
  4. 4

    Review AI Security Report

    Report maps findings to OWASP LLM Top 10 (LLM01–LLM10). Each finding includes: the attack vector used, model response demonstrating the vulnerability, risk rating, and remediation (prompt hardening, output filtering, guardrails).

⏱️ Typical duration: 30–90 minutes for a standard LLM API. Agentic system testing: 2–4 hours.

Common Issues

Create a custom Garak config YAML with your endpoint details. Required fields: uri, request_template (JSON body structure), response_json_field (path to response text). See PhantomYerra's Garak config templates in config/garak_templates/. Ensure the endpoint accepts POST with JSON Content-Type.

Strong content filtering is actually a positive security finding — document it as a defense. Test the edges: encoded inputs (Base64, ROT13, leetspeak), multi-turn manipulation, non-English prompts, and semantic equivalents. Even well-filtered models often have edge cases. Test indirect injection via document uploads or tool outputs if the system has those capabilities.

Use browser DevTools to capture the API calls made by the web UI — most LLM chatbots call a REST API under the hood. Replay those requests with modified payloads using curl or Burp Suite. PhantomYerra's Web Pentest module can also intercept and modify these requests via the integrated proxy.