Metadata-Version: 2.4
Name: agentseal
Version: 0.3.1
Summary: Security validator for AI agents - tests your agent's resistance to prompt extraction and injection attacks
Project-URL: Homepage, https://agentseal.io
Project-URL: Repository, https://github.com/AgentSeal/agentseal
Project-URL: Issues, https://github.com/AgentSeal/agentseal/issues
Author-email: AgentSeal <hello@agentseal.org>
License: FSL-1.1-Apache-2.0
License-File: LICENSE
Keywords: ai-agents,llm,pentesting,prompt-injection,security
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Security
Requires-Python: >=3.10
Requires-Dist: httpx>=0.28
Requires-Dist: pyyaml>=6.0
Provides-Extra: all
Requires-Dist: anthropic>=0.30; extra == 'all'
Requires-Dist: huggingface-hub>=0.20; extra == 'all'
Requires-Dist: numpy>=1.24; extra == 'all'
Requires-Dist: onnxruntime>=1.17; extra == 'all'
Requires-Dist: openai>=1.0; extra == 'all'
Requires-Dist: pyyaml>=6.0; extra == 'all'
Requires-Dist: tokenizers>=0.15; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.30; extra == 'anthropic'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Provides-Extra: semantic
Requires-Dist: huggingface-hub>=0.20; extra == 'semantic'
Requires-Dist: numpy>=1.24; extra == 'semantic'
Requires-Dist: onnxruntime>=1.17; extra == 'semantic'
Requires-Dist: tokenizers>=0.15; extra == 'semantic'
Description-Content-Type: text/markdown

# AgentSeal

**Find out if your AI agent can be hacked** - before someone else does.

```
   ██████╗   ██████╗ ███████╗███╗   ██╗████████╗███████╗███████╗ █████╗ ██╗
  ██╔══██╗ ██╔════╝ ██╔════╝████╗  ██║╚══██╔══╝██╔════╝██╔════╝██╔══██╗██║
  ███████║ ██║  ███╗█████╗  ██╔██╗ ██║   ██║   ███████╗█████╗  ███████║██║
  ██╔══██║ ██║   ██║██╔══╝  ██║╚██╗██║   ██║   ╚════██║██╔══╝  ██╔══██║██║
  ██║  ██║ ╚██████╔╝███████╗██║ ╚████║   ██║   ███████║███████╗██║  ██║███████╗
  ╚═╝  ╚═╝  ╚═════╝ ╚══════╝╚═╝  ╚═══╝   ╚═╝   ╚══════╝╚══════╝╚═╝  ╚═╝╚══════╝
```

AgentSeal is a security scanner for AI agents. It sends **150+ attack probes** to your agent and tells you exactly where it's vulnerable - so you can fix it before attackers find out.

---

## What does AgentSeal do?

Every AI agent has a **system prompt** - the hidden instructions that tell it how to behave. Attackers can try to:

1. **Extract your prompt** - trick the agent into revealing its secret instructions
2. **Inject new instructions** - override the agent's behavior and make it do something it shouldn't

AgentSeal tests your agent against both of these attacks using 150+ techniques (up to ~301 with MCP, RAG, and genome probes). You get:

- A **trust score from 0 to 100** - how secure your agent is
- A **detailed breakdown** of which attacks succeeded and which were blocked
- **Specific recommendations** on how to fix the vulnerabilities it finds

No AI expertise required. Just point AgentSeal at your agent and get results.

---

## Who is this for?

- **You built an AI agent** (chatbot, assistant, copilot, etc.) and want to know if it's secure
- **You manage AI products** and need to verify they meet security standards before shipping
- **You're a developer** who wants to add security scanning to your CI/CD pipeline
- **You're curious** whether your favorite AI tool is actually protecting your data

---

## Quick Start

### Step 1: Install AgentSeal

You need Python 3.10 or newer. If you don't have Python, [download it here](https://www.python.org/downloads/).

```bash
pip install agentseal
```

### Step 2: Run your first scan

Pick whichever matches your setup:

**Option A: Test a system prompt against a cloud model (e.g. GPT-4o)**

```bash
export OPENAI_API_KEY=your-api-key-here

agentseal scan \
  --prompt "You are a helpful customer support agent for Acme Corp..." \
  --model gpt-4o
```

**Option B: Test with a free local model (Ollama)**

If you don't have an API key, you can use [Ollama](https://ollama.com) to run a free local model:

```bash
# Install Ollama from https://ollama.com, then:
ollama pull llama3.1:8b

agentseal scan \
  --prompt "You are a helpful assistant..." \
  --model ollama/llama3.1:8b
```

**Option C: Test a live agent endpoint**

If your agent is already running as an API:

```bash
agentseal scan --url http://localhost:8080/chat
```

### Step 3: Read your results

AgentSeal will show you something like:

```
Trust Score: 73/100 (HIGH)

  Extraction resistance:  82/100  (9 blocked, 2 partial, 1 leaked)
  Injection resistance:   68/100  (7 blocked, 3 leaked)
  Boundary integrity:     75/100
  Consistency:            90/100

Top vulnerabilities:
  1. [CRITICAL] Direct ask #3 - agent revealed full system prompt
  2. [HIGH] Persona hijack #2 - agent followed injected instructions
  3. [MEDIUM] Encoding trick #1 - agent leaked partial prompt via Base64

Remediation:
  - Add explicit refusal instructions to your system prompt
  - Use delimiters to separate system instructions from user input
  - Consider adding an input/output filter layer
```

A score of **75+** means your agent is solid. **Below 50** means serious problems - fix those before going live.

---

## How It Works

```
┌─────────────┐  150-301 attack probes     ┌──────────────┐
│             │ ─────────────────────────>│              │
│  AgentSeal  │                           │  Your Agent  │
│             │ <─────────────────────────│              │
└─────────────┘     agent responses       └──────────────┘
       │
       ▼
  Deterministic analysis (no AI judge - fully reproducible)
       │
       ▼
  Trust score + detailed report + fix recommendations
```

**Why deterministic?** Unlike tools that use another AI to judge results, AgentSeal uses pattern matching. This means running the same scan twice gives the **exact same results** - no randomness, no extra API costs.

---

## Scan Modes

AgentSeal supports multiple scan modes you can combine depending on your agent's architecture:

| Command | Probes | What it tests | Tier |
|---------|:------:|---------------|:----:|
| `agentseal scan` | 150 | Base scan - 70 extraction + 80 injection probes | Free |
| `agentseal scan --adaptive` | 150+ | + adaptive mutation transforms on blocked probes | Free |
| `agentseal watch` | 5 | Canary regression scan - fast check with baseline comparison | Free |
| `agentseal scan --mcp` | 176 | + 26 MCP tool poisoning probes | Pro |
| `agentseal scan --rag` | 170 | + 20 RAG poisoning probes | Pro |
| `agentseal scan --mcp --rag` | 196 | Full attack surface - all probe categories | Pro |
| `agentseal scan --genome` | 150 + ~105 | + Behavioral genome mapping - finds decision boundaries | Pro |
| `agentseal scan --mcp --rag --genome` | 196 + ~105 | Everything - the most thorough scan available | Pro |

---

## Free vs Pro

The core scanner is **completely free** and open source. Pro unlocks advanced probe categories, genome mapping, and reporting.

| Feature | Free | Pro |
|---------|:----:|:---:|
| 150 base attack probes (extraction + injection) | Yes | Yes |
| Adaptive mutations (`--adaptive`) | Yes | Yes |
| Canary regression watch (`agentseal watch`) | Yes | Yes |
| Interactive fix flow (autofix & re-scan) | Yes | Yes |
| Terminal report with scores and remediation | Yes | Yes |
| JSON output (`--save results.json`) | Yes | Yes |
| SARIF output for GitHub Security tab | Yes | Yes |
| CI/CD integration (`--min-score`) | Yes | Yes |
| Defense fingerprinting | Yes | Yes |
| **MCP tool poisoning probes** (`--mcp`, +26 probes) | - | Yes |
| **RAG poisoning probes** (`--rag`, +20 probes) | - | Yes |
| **Behavioral genome mapping** (`--genome`) | - | Yes |
| **PDF security assessment report** (`--report`) | - | Yes |
| **Dashboard** (track security over time, `--upload`) | - | Yes |

### Get Pro

```bash
agentseal login
```

This opens your browser to sign in at [agentseal.org](https://agentseal.org). Once logged in, Pro features unlock automatically.

---

## CLI Reference

### Scanning

```bash
# Scan a system prompt against a model
agentseal scan --prompt "Your prompt here..." --model gpt-4o

# Scan a prompt from a file
agentseal scan --file ./my-prompt.txt --model gpt-4o

# Scan a live HTTP endpoint
agentseal scan --url http://localhost:8080/chat

# Save results as JSON
agentseal scan --prompt "..." --model gpt-4o --save results.json

# Output as SARIF (for GitHub Security tab)
agentseal scan --prompt "..." --model gpt-4o --output sarif --save results.sarif

# Set a minimum score - exit code 1 if it fails (great for CI/CD)
agentseal scan --prompt "..." --model gpt-4o --min-score 75

# Verbose mode - see each probe result as it runs
agentseal scan --prompt "..." --model gpt-4o --verbose
```

### More options

```bash
# Enable adaptive mutations (tests encoding bypasses)
agentseal scan --prompt "..." --model gpt-4o --adaptive

# Generate a hardened prompt with security fixes
agentseal scan --prompt "..." --model gpt-4o --fix hardened_prompt.txt
```

### Regression monitoring

```bash
# Set a baseline (first run)
agentseal watch --prompt "..." --model gpt-4o --set-baseline

# Check for regressions (subsequent runs)
agentseal watch --prompt "..." --model gpt-4o

# With webhook alerts
agentseal watch --prompt "..." --model gpt-4o --webhook-url https://hooks.slack.com/...
```

### Pro features (requires `agentseal login`)

```bash
# MCP tool poisoning probes (+26 probes)
agentseal scan --prompt "..." --model gpt-4o --mcp

# RAG poisoning probes (+20 probes)
agentseal scan --prompt "..." --model gpt-4o --rag

# Behavioral genome mapping (find exact decision boundaries)
agentseal scan --prompt "..." --model gpt-4o --genome

# Full Pro scan - everything enabled
agentseal scan --prompt "..." --model gpt-4o --mcp --rag --genome --adaptive

# Generate a PDF security report
agentseal scan --prompt "..." --model gpt-4o --report security-report.pdf

# Upload results to your dashboard
agentseal scan --prompt "..." --model gpt-4o --upload
```

### Account

```bash
# Log in (opens browser)
agentseal login

# Activate with a license key (alternative)
agentseal activate <your-license-key>
```

### Supported models

| Provider | How to use | API key needed? |
|----------|-----------|-----------------|
| **OpenAI** | `--model gpt-4o` | Yes - set `OPENAI_API_KEY` |
| **Anthropic** | `--model claude-sonnet-4-5-20250929` | Yes - set `ANTHROPIC_API_KEY` |
| **Ollama** (local, free) | `--model ollama/llama3.1:8b` | No |
| **LiteLLM** (proxy) | `--model any-model --litellm-url http://...` | Depends on setup |
| **Any HTTP API** | `--url http://your-agent.com/chat` | No |

---

## CI/CD Integration

Add AgentSeal to your pipeline to **automatically block insecure agents from shipping**.

### GitHub Actions

```yaml
name: Agent Security Scan
on: [push, pull_request]

jobs:
  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install AgentSeal
        run: pip install agentseal

      - name: Run security scan
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          agentseal scan \
            --file ./prompts/system_prompt.txt \
            --model gpt-4o \
            --min-score 75 \
            --output sarif \
            --save results.sarif

      - name: Upload results to GitHub Security
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: results.sarif
```

### How it works

- `--min-score 75` makes the command exit with code 1 if the trust score is below 75
- Your CI pipeline treats exit code 1 as a failure - blocking the merge/deploy
- `--output sarif` produces results in SARIF format, which GitHub displays in the Security tab
- You can adjust the threshold: use 60 for early development, 85 for production agents

### Other CI systems

AgentSeal is just a CLI command - it works in any CI system that can run Python:

```bash
# Generic CI step
pip install agentseal
agentseal scan --file ./prompt.txt --model gpt-4o --min-score 75
```

Exit codes:
- `0` - Score meets or exceeds `--min-score` (pass)
- `1` - Score is below `--min-score` (fail)

---

## Python API

For developers who want to integrate AgentSeal into their own code:

```python
import asyncio
from agentseal import AgentValidator

# Define your agent function
async def my_agent(message: str) -> str:
    # Replace this with your actual agent logic
    return "I can help with that!"

async def main():
    validator = AgentValidator(
        agent_fn=my_agent,
        ground_truth_prompt="You are a helpful assistant...",
    )
    report = await validator.run()

    # Print the terminal report
    report.print()

    # Access the score programmatically
    print(f"Trust score: {report.trust_score}/100")

    # Export as JSON
    data = report.to_dict()

    # Get just the leaked probes
    for result in report.get_leaked():
        print(f"  LEAKED: {result.technique}")

    # Get remediation suggestions
    for fix in report.get_remediation():
        print(f"  FIX: {fix}")

asyncio.run(main())
```

### With OpenAI

```python
import openai
from agentseal import AgentValidator

client = openai.AsyncOpenAI()
validator = AgentValidator.from_openai(
    client=client,
    model="gpt-4o",
    system_prompt="You are a helpful assistant...",
)
report = await validator.run()
```

### With Anthropic

```python
import anthropic
from agentseal import AgentValidator

client = anthropic.AsyncAnthropic()
validator = AgentValidator.from_anthropic(
    client=client,
    model="claude-sonnet-4-5-20250929",
    system_prompt="You are a helpful assistant...",
)
report = await validator.run()
```

### Testing an HTTP endpoint

```python
from agentseal import AgentValidator

validator = AgentValidator.from_endpoint(
    url="http://localhost:8080/chat",
    ground_truth_prompt="You are a helpful assistant...",
    message_field="input",       # customize if your API uses different field names
    response_field="output",
)
report = await validator.run()
```

---

## FAQ

### How long does a scan take?

With a local model (Ollama): **1-3 minutes**. With cloud APIs (OpenAI, Anthropic): **3-6 minutes**. You can adjust speed with `--concurrency` (default is 3 parallel probes).

### What's a good trust score?

| Score | What it means |
|-------|---------------|
| **85-100** | Excellent - strong protection across the board |
| **70-84** | Good - minor gaps, fine for most use cases |
| **50-69** | Needs work - several attack categories succeed |
| **Below 50** | Serious problems - don't deploy without fixing these |

### Does AgentSeal send my system prompt anywhere?

**No.** Your system prompt is only sent to the model you specify (OpenAI, Ollama, etc.). AgentSeal itself never collects, stores, or transmits your prompts. Everything runs locally.

### Do I need an API key?

Only if you're testing against a cloud model (OpenAI, Anthropic). If you use [Ollama](https://ollama.com), everything runs locally for free - no API key, no account, no cost.

### What's the difference between free and Pro?

Free gives you the full 150-probe scanner with adaptive mutations, regression monitoring, interactive fix flow, JSON/SARIF output, and CI/CD integration. Pro adds MCP tool poisoning probes (+26), RAG poisoning probes (+20), behavioral genome mapping, PDF reports, and a dashboard. See the [comparison table](#free-vs-pro).

### Can I contribute new attack probes?

Yes! See [CONTRIBUTING.md](CONTRIBUTING.md). We welcome new probes, detection improvements, and bug fixes.

---

## Contributing

We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for how to get started.

For security vulnerabilities, please email [hello@agentseal.org](mailto:hello@agentseal.org) instead of opening a public issue.

---

## License

[FSL-1.1-Apache-2.0](LICENSE) - Functional Source License, Version 1.1, with Apache 2.0 future license.

Copyright 2026 AgentSeal.
