Metadata-Version: 2.4
Name: bdv
Version: 0.2.0
Summary: Blind Data Vault — privilege-separated secret vault for AI agents
Author: Calvin Sienatra
License-Expression: MIT
Project-URL: Homepage, https://github.com/calvinsienatra/bdv
Project-URL: Repository, https://github.com/calvinsienatra/bdv
Keywords: vault,secrets,ai,pii,encryption,age
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0
Provides-Extra: daemon
Requires-Dist: pyyaml>=6.0; extra == "daemon"
Requires-Dist: pdfplumber>=0.10; extra == "daemon"
Requires-Dist: pytesseract>=0.3; extra == "daemon"
Requires-Dist: pdf2image>=1.16; extra == "daemon"
Requires-Dist: Pillow>=10.0; extra == "daemon"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Dynamic: license-file

# BDV (Blind Data Vault)

A privilege-separated, encrypted secret vault that lets AI agents **verify, send, and act on sensitive data without ever seeing it**.

The daemon runs in a Docker container. The AI agent talks to it via Unix socket or TCP. Even if the agent is fully compromised, it **cannot decrypt secrets** as the privilege boundary is at the OS/container level.

```
pip install bdv      # client only
docker compose up -d # full daemon
```

For the paradigm rationale, see [WHITEPAPER.md](WHITEPAPER.md). For design decisions, see [ARCHITECTURE.md](ARCHITECTURE.md).

## Quick Start

### Docker

```bash
docker compose up -d
docker exec bdv bdv list
docker exec bdv bdv write ssn --value "123-45-6789" --type numeric --desc "US SSN"
docker exec bdv bdv check ssn /data/inbox/form.pdf
```

### pip

The client connects to a running BDV daemon. Start the daemon first (via Docker or manual setup), then:

```bash
pip install bdv
bdv list
bdv write ssn --value "123-45-6789" --type numeric --desc "US SSN"
```

### TCP

```bash
# Daemon: BDV_SERVER_TCP_ENABLED=true BDV_SERVER_TCP_TOKEN=mytoken
bdv --tcp localhost:9652 --token mytoken list
```

## What It Does

The agent sends commands. The daemon handles all crypto, file reading, and pattern matching internally. The agent only receives redacted results.

| Command | Parameters | What the agent gets back |
|---------|-----------|-------------------------|
| `list` | | Secret names + descriptions (never values) |
| `check_document` | secret name, file path | MATCH/NO MATCH with redacted context |
| `send` | template, subject | "Sent. 2 secrets injected." (never the content) |
| `write` | name, value, type | "Encrypted and stored." |
| `delete` | name | "Deleted." |
| `describe` | name, description | "Updated." |

### Secret Types

| Type | What it does | Examples |
|------|-------------|----------|
| `numeric` | Pattern-based sweep with anomaly report | SSN, routing numbers, credit cards |
| `exact` | Case-sensitive exact match | Email addresses, API keys |
| `text` | Case-insensitive with component decomposition | Names, addresses |

### Namespaced Secrets

Store multiple values under a single name using colon syntax:

```bash
bdv write email:personal --value "me@gmail.com" --type exact --desc "Personal"
bdv write email:work --value "me@corp.com" --type exact --desc "Corporate"

# Check all values at once
bdv check email /data/inbox/form.pdf
# RESULT: Checked 2 values under 'email'. 1 found, 1 not found.

# Or a specific label
bdv check email:work /data/inbox/form.pdf

# Delete one label or the entire group
bdv delete email:work    # one label
bdv delete email         # all values
```

Flat secrets (no colon) work exactly as before.

### Supported Document Formats

PDF (text + scanned via OCR), images (Tesseract), plain text (txt/md/csv/json/xml).

## Python Client

```python
from client import BDVClient

# Unix socket (default: /run/bdv/bdv.sock)
client = BDVClient()

# Or TCP
client = BDVClient(host="localhost", port=9652, token="...")

client.list()
client.write("ssn", "123-45-6789", description="US SSN", secret_type="numeric")
client.check_document("ssn", "/data/inbox/form.pdf")
client.send(template="Your SSN is {{VAULT:ssn}}.", subject="Details")
client.delete("ssn")
client.describe("ssn", "Updated description")
```

## Configuration

Three layers: **defaults** → **YAML** → **env vars** (env wins).

```bash
cp bdv.example.yml bdv.yml  # see all options
```

| Variable | Default | Description |
|----------|---------|-------------|
| `BDV_CONFIG` | | Path to YAML config file |
| `BDV_SERVER_SOCKET_PATH` | `/run/bdv/bdv.sock` | Unix socket path |
| `BDV_SERVER_TCP_ENABLED` | `false` | Enable TCP listener |
| `BDV_SERVER_TCP_HOST` | `127.0.0.1` | TCP listen address |
| `BDV_SERVER_TCP_PORT` | `9652` | TCP port |
| `BDV_SERVER_TCP_TOKEN` | | Required auth token for TCP |
| `BDV_STORAGE_VAULT_DIR` | `/data/secrets` | Encrypted secret storage |
| `BDV_STORAGE_INBOX_DIR` | `/data/inbox` | Write-only document inbox |
| `BDV_STORAGE_OUTBOX_DIR` | `/data/outbox` | Output directory (file channel) |
| `BDV_OUTPUT_TYPE` | `file` | Output channel: `file`, `smtp`, `webhook` |
| `BDV_PLUGINS_DIR` | `/plugins` | Directory for plugin `.py` files |
| `BDV_SOCKET` | | Client-side override for the socket path (alternative to `--socket`) |
| `BDV_TOKEN` | | Client-side override for the TCP auth token (alternative to `--token`) |

## Extensibility

### Custom commands

New agent-invocable actions can be added as plugins. Drop a `.py` file into the plugins directory and mount it as a volume. No fork or rebuild needed.

```yaml
# docker-compose.yml
volumes:
  - ./plugins:/plugins:ro
```

Each plugin defines a `COMMANDS` dict mapping command names to handlers:

```python
# plugins/fill_form.py
from daemon.crypto import decrypt
from daemon.config import BDVConfig
from daemon.channels.base import OutputChannel
from pathlib import Path

def cmd_fill_form(params, config, channel):
    vault_dir = Path(config.storage.vault_dir)
    key_file = vault_dir / ".vault-key"
    secret = decrypt(params["name"], vault_dir, key_file).strip()
    # ... fill the form, write to outbox ...
    return "Form filled. 3 fields populated. Output written to outbox."

COMMANDS = {"fill_form": cmd_fill_form}
```

The daemon loads plugins on startup. Built-in commands cannot be overridden.

### Custom output channels

New delivery mechanisms for the `send` command work the same way. Subclass `OutputChannel` and export a `CHANNELS` dict:

```python
# plugins/s3_channel.py
from daemon.channels.base import OutputChannel

class S3Channel(OutputChannel):
    def __init__(self, config):
        self.bucket = "my-vault-output"

    def send(self, body, subject, *, original_message_id=""):
        # ... upload to S3 ...
        return f"Uploaded to s3://{self.bucket}/..."

CHANNELS = {"s3": S3Channel}
```

Then set `BDV_OUTPUT_TYPE=s3` to use it. Built-in channels (`file`, `smtp`, `webhook`) cannot be overridden.

### Available imports for plugins

Plugins run inside the daemon process, so all `daemon.*` modules are available. For local development outside the container, `pip install bdv[daemon]` makes these importable.

| Import | What it provides |
|--------|-----------------|
| `daemon.crypto.decrypt(name, vault_dir, key_file)` | Decrypt a secret by name |
| `daemon.crypto.encrypt(name, value, vault_dir, key_file)` | Encrypt and store a secret |
| `daemon.crypto.secret_path(name, vault_dir)` | Resolve a name to its `.age` file path |
| `daemon.document.extract_text(file_path)` | Extract text from PDF, image, or text file |
| `daemon.patterns.derive_patterns(name, secret)` | Auto-derive regex from a secret value |
| `daemon.patterns.build_context_window(...)` | Build a redacted context window around a match |
| `daemon.patterns.normalize_secret(secret)` | Strip delimiters from a secret for comparison |
| `daemon.config.BDVConfig` | Config dataclass (passed as `config` param) |
| `daemon.channels.base.OutputChannel` | Base class for output channels (passed as `channel` param) |

Every handler receives `(params: dict, config: BDVConfig, channel: OutputChannel)` and returns a `str`. The return value is sent back to the agent, so it must never contain plaintext secrets.

## Security Model

| Threat | Mitigation |
|--------|-----------|
| Prompt injection exfiltrates secrets | Agent can't decrypt (wrong container/user) |
| Agent runs `cat` on vault files | Permission denied (OS-level) |
| Malicious regex (ReDoS) | Patterns derived from trusted vault data only |
| Secrets leak into logs/memory/vector DB | Values never enter agent context |
| Adjacent PII in context windows | Aggressive redaction of 3+ digit sequences |

See [WHITEPAPER.md](WHITEPAPER.md) for the paradigm rationale and FAQ.

### Security Disclaimer

BDV protects secrets **from the AI agent**. It is not a general-purpose secrets management solution.

**What it does NOT cover:**
- **Physical server security.** Anyone with root/physical/disk access to the host can extract the private key. If you run on a VPS, your hosting provider has theoretical disk access.
- **Transport security.** The daemon protects secrets from the LLM, not during delivery to the end user. Use TLS, VPN (Tailscale/WireGuard), or PGP encryption for transport.
- **Backup security.** If your backup system captures the vault directory in plaintext, the secrets are exposed. Encrypt backups independently.
- **Memory forensics.** During active decryption (a few milliseconds per operation), the plaintext exists in the daemon's process memory.

**Recommendation:** Run BDV on a dedicated machine (not a shared VPS) behind a VPN mesh like Tailscale, with full-disk encryption (LUKS) enabled.

## Requirements

**Docker (recommended):** All dependencies included in the image.

**Manual:** Python 3.10+, age, pdfplumber, pytesseract + tesseract-ocr, pdf2image + poppler-utils.

**Client only:** Python 3.10+, click (`pip install bdv`).

## License

MIT
