Local-Only Mode

PhantomYerra's biggest competitive advantage: 100% local AI inference with zero data transmitted outside your machine. Works in fully air-gapped environments with no internet connection required after initial setup.

Your Data Never Leaves Your Machine

Every competitor, including Shannon: requires an internet connection to an AI API for every scan. Your target URLs, source code snippets, vulnerability descriptions, and findings all travel to an external server for processing. PhantomYerra is the only security tool that gives you a genuine choice.

Shannon (Cloud-Only)

Requires Anthropic API for every scan
Target URLs sent to external servers
Source code analysis leaves your machine
Findings transmitted to AI for narrative writing
No air-gapped mode
No internet = no AI = degraded functionality
Not suitable for classified/regulated environments

PhantomYerra (Local-First)

Full AI inference via Ollama: zero API calls
Target URLs never transmitted
Source code stays 100% local
All findings processed on-machine
True air-gapped mode available
No internet = full functionality via local models
Approved for government, healthcare, finance

Three Operating Modes

☁️

Cloud Mode (Default)

Uses Anthropic Claude for maximum capability. PrivacyFilter anonymizes all data before transmission - target URLs, IPs, and company names are replaced with tokens. Real values never leave your machine.

🔒

Local-Only Mode

All AI inference routes to Ollama running on localhost. Zero data transmitted. Works 100% offline. Choose this for air-gapped networks, classified environments, or maximum confidentiality.

⚡

Hybrid Mode

Smart routing: target-sensitive analysis (URLs, source code, finding details) goes to Ollama locally. Non-sensitive tasks (report narrative, remediation advice) go to Claude for maximum quality.

🛡️

Air-Gapped Mode

Complete network isolation. All AI is local. License checked locally. No telemetry. No external calls of any kind. Designed for defence, intelligence, and critical infrastructure environments.

How to Enable Local Mode

Step 1: Install Ollama

Windows

# Download from https://ollama.com/download/windows
# Run the installer - Ollama starts automatically as a system service
# Verify installation:
ollama --version

Linux

curl -fsSL https://ollama.com/install.sh | sh
# Ollama runs as a systemd service on port 11434
ollama --version

macOS

# Download from https://ollama.com/download/mac
# Open the .dmg and drag Ollama to Applications
# Ollama runs in the menu bar - click to start
ollama --version

Step 2: Pull a Model

Choose a model based on your available VRAM. If you have no GPU, Ollama will run on CPU (slower but functional). Contact support@phantomyerra.com for the current recommended model names for your hardware configuration.

# Minimum (CPU-only, works on any machine):
ollama pull [small-fast-model]     # ~4 GB - fast, general purpose

# Recommended for security work (8-16 GB VRAM):
ollama pull [code-analysis-model]  # best for source code analysis

# Best quality (24-48 GB VRAM or large RAM for CPU):
ollama pull [large-reasoning-model] # complex security reasoning

Step 3: Enable Local Mode in PhantomYerra

Open Settings

Click the gear icon in the bottom-left sidebar or press Ctrl+,

Navigate to AI Configuration

In Settings, select AI Configuration from the left menu.

Select "Local-Only Mode"

Under AI Provider Mode, click Local-Only (Ollama). PhantomYerra will verify Ollama is running before switching.

AI Configuration - Local Mode Toggle Settings panel showing three mode cards: Cloud AI, Local-Only (Ollama), Hybrid. Local-Only card is selected with a green border. Below: Ollama status indicator showing "Running - 2 models installed". Model selector showing the recommended model as best model.

Verify the Status Indicator

The top bar shows a shield icon with "Local AI" when local mode is active. All subsequent scans use Ollama exclusively.

Recommended Models

Contact support@phantomyerra.com for current recommended Ollama model names. PhantomYerra auto-selects the best available model from what you have installed - larger models provide better security reasoning at the cost of speed and hardware requirements.

Model Tier	Best For	VRAM Required	Speed
Large (70B class)	Complex security reasoning, attack chain analysis	48 GB VRAM (or 64 GB RAM for CPU)	Slow - high quality
Medium (34B class)	Source code analysis, SAST, exploit generation	20 GB VRAM (or 32 GB RAM for CPU)	Medium
Small (7B class)	Fast general tasks, entry point (any machine)	6 GB VRAM (or 8 GB RAM for CPU)	Very Fast

Performance Expectations

Capability	Local (small model)	Local (large model)	Cloud AI
Payload generation	Good	Excellent	Excellent
Code vulnerability analysis	Fair	Excellent	Excellent
Attack chain reasoning	Fair	Very Good	Excellent
Report narrative writing	Good	Very Good	Excellent
Speed (tokens/sec)	~50 t/s (GPU)	~8 t/s (GPU)	~80 t/s (API)
Data privacy	100% local	100% local	Anonymized via PrivacyFilter
Internet required	No	No	Yes
Cost per scan	$0	$0	~$0.05–0.50

Use Cases

Government and Defence Contractors

Security assessments on classified or FOUO systems cannot transmit data to commercial AI APIs. Local-only mode satisfies ITAR, CMMC, and IL4/IL5 data handling requirements. No data leaves your secure enclave.

Healthcare and HIPAA Environments

Penetration testing of healthcare systems involves PHI and PII in HTTP traffic and scan output. Local-only mode ensures no patient data ever reaches an external server, maintaining HIPAA compliance.

Financial Institutions (SOX, PCI-DSS)

Testing banking applications, trading systems, and payment processors requires strict data sovereignty. Local-only mode keeps all cardholder data, account numbers, and financial records on-premises.

Red Team Engagements - Target Confidentiality

During red team operations, target names, internal network topology, and attack strategies are highly sensitive. Local-only mode ensures the client's identity and vulnerabilities never reach third-party servers.

Frequently Asked Questions

What if I have no GPU?

Ollama runs on CPU: it is slower but fully functional. For CPU-only machines, use a small (7B class) model (approximately 4 GB). On a modern 8-core CPU with 16 GB RAM, expect 5-15 tokens/second - sufficient for all PhantomYerra tasks. Larger models require 64+ GB RAM to run on CPU.

Can I mix local and cloud for different tasks?

Yes: use Hybrid Mode. PhantomYerra automatically routes target-sensitive analysis (source code, URLs, finding details) to Ollama locally, while non-sensitive tasks (executive report narrative, generic remediation advice) go to the cloud AI for maximum quality. You get the best of both worlds.

Is local AI as capable as cloud AI?

For security tasks, large local models (70B class) approach cloud AI quality for vulnerability analysis, payload generation, and code review. Cloud AI is still superior for nuanced report narrative and complex multi-hop reasoning. If absolute maximum capability is needed and your data policy allows it, use Hybrid Mode - local for sensitive analysis, cloud AI for narrative.

Does local mode work for SAST and source code analysis?

Yes - and this is one of the strongest use cases. codellama:34b is specifically trained on code and performs extremely well for vulnerability pattern detection, taint analysis, and exploit generation from source. With local mode, your entire codebase stays on your machine - no source code ever transmitted.

How do I use local mode on an air-gapped network?

Install Ollama and pull your chosen models while you have internet access. Then disconnect the machine from the internet - PhantomYerra's local mode continues to work perfectly. For model updates, re-connect briefly, run ollama pull <model>, and disconnect again. The initial model download is the only internet access required.