Metadata-Version: 2.4
Name: certisigma-census
Version: 1.14.0
Summary: Cryptographic file inventory and exfiltration detection — powered by CertiSigma
Project-URL: Homepage, https://certisigma.ch
Project-URL: Documentation, https://developers.certisigma.ch/census
Project-URL: Repository, https://github.com/massimocavallin/certisigma-census
Project-URL: Issues, https://github.com/massimocavallin/certisigma-census/issues
Author: Ten Sigma Sagl
License-Expression: MIT
License-File: LICENSE
Keywords: attestation,breach-detection,cryptography,file-integrity,forensics
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security :: Cryptography
Classifier: Topic :: System :: Filesystems
Requires-Python: >=3.10
Requires-Dist: certisigma>=1.9.0
Requires-Dist: click>=8.1
Requires-Dist: tomli>=2.0; python_version < '3.11'
Provides-Extra: dev
Requires-Dist: click-man>=0.5; extra == 'dev'
Requires-Dist: fpdf2>=2.8.0; extra == 'dev'
Requires-Dist: hypothesis>=6.0; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: pyyaml>=6.0; extra == 'dev'
Requires-Dist: watchdog>=4.0.0; extra == 'dev'
Provides-Extra: report
Requires-Dist: fpdf2>=2.8.0; extra == 'report'
Provides-Extra: watch
Requires-Dist: watchdog>=4.0.0; extra == 'watch'
Description-Content-Type: text/markdown

# CertiSigma Census

[![Test](https://github.com/massimocavallin/certisigma-census/actions/workflows/test.yml/badge.svg)](https://github.com/massimocavallin/certisigma-census/actions/workflows/test.yml)
[![PyPI](https://img.shields.io/pypi/v/certisigma-census)](https://pypi.org/project/certisigma-census/)
[![Python](https://img.shields.io/pypi/pyversions/certisigma-census)](https://pypi.org/project/certisigma-census/)
[![Coverage](https://img.shields.io/badge/coverage-83%25-green)](https://github.com/massimocavallin/certisigma-census)

Cryptographic file inventory and exfiltration detection — powered by [CertiSigma](https://certisigma.ch).

Census scans directories, computes SHA-256 hashes, attests them via the CertiSigma API (three-layer cryptographic proof: ECDSA T0, qualified TSA T1, Bitcoin T2), and maintains a local manifest. When suspect files surface, Census compares their hashes against the registry to prove — with cryptographic certainty — whether they match inventoried assets.

## Installation

```bash
pip install certisigma-census

# With watch mode (filesystem monitoring)
pip install certisigma-census[watch]

# With PDF report generation
pip install certisigma-census[report]

# Everything
pip install "certisigma-census[watch,report]"
```

Requires Python 3.10+. TOML config support on Python 3.10 uses `tomli` (auto-installed).

## Quick Start

### 1. Inventory scan

```bash
export CERTISIGMA_API_KEY=cs_...

# Scan a directory and attest all file hashes
census scan /path/to/sensitive-files --source inventory-hr

# Dry run — hash only, no attestation
census scan /path/to/files --dry-run

# Scan only PDFs and Word docs, skip files over 100 MB
census scan /data --include "*.pdf" --include "*.docx" --max-size 100M

# Resume an interrupted scan
census scan /data --source quarterly --manifest inventory.db --resume

# Parallel hashing for large directories (4 CPU cores)
census scan /data --workers 4

# Attest the manifest itself (proves manifest existed at scan time)
census scan /data --attest-manifest
```

This produces a `.census-manifest.db` (SQLite) mapping each hash to its file path, size, and attestation metadata.

### 2. Breach comparison

```bash
# Compare suspect files against the CertiSigma registry
census compare /path/to/suspect-files --manifest /path/to/.census-manifest.db

# Save report as JSON or CSV
census compare /suspect --output report.json
census compare /suspect --output report.csv
```

Exit code: `0` if no matches, `1` if matches found.

### 3. Manifest status and export

```bash
# Show summary
census status /path/to/.census-manifest.db

# Export manifest as CSV for compliance reporting
census export manifest.db --format csv --output inventory.csv

# Export as JSON
census export manifest.db --format json --output inventory.json

# Export as sha256sum (GNU coreutils compatible — works with sha256sum -c)
census export manifest.db --format sha256sum --output checksums.sha256
```

### 4. Evidence verification

```bash
# Verify a hash against the CertiSigma registry
census verify a1b2c3d4e5f67890...

# Verify a file (hash it first, then check)
census verify /path/to/document.pdf --file

# Full-chain manifest verification (all hashes against the registry)
census verify-manifest inventory.db --strict
census verify-manifest inventory.db --detailed --json

# Hash from stdin (for pipes and CI/CD)
echo "data" | census hash --stdin

# Save OpenTimestamps proof
census verify a1b2c3... --save-ots proof.ots
```

No API key required — all verification endpoints are public.

### 5. Integrity check

```bash
# Check files against manifest baseline
census integrity manifest.db

# Strict mode: exit 1 on any discrepancy
census integrity manifest.db --strict

# Differential mode: only report NEW findings since last run
census integrity manifest.db --since auto --write-state auto
```

100% local operation — no API calls, no network needed.

### 5b. Update baseline (AIDE-style)

```bash
# Accept verified changes into manifest (interactive confirmation)
census update manifest.db

# Non-interactive (CI/cron)
census update manifest.db --yes

# Then attest new hashes
census scan /data --resume --manifest manifest.db
```

Completes the FIM workflow: detect → review → accept. New entries are unattested until the next scan.

### 6. Forensic reports

```bash
# HTML report (always available, zero dependencies)
census report manifest.db -o report.html

# PDF report (requires: pip install certisigma-census[report])
census report manifest.db -o report.pdf --evidence --integrity

# Evidence bundle: ZIP with report + OTS proofs + checksums
census report manifest.db -o bundle.zip --bundle --evidence

# Attest the report itself (three-layer cryptographic proof)
census report manifest.db -o report.pdf --attest --api-key cs_...
# → writes report.pdf + report.pdf.attestation.json

# Verify a previously attested report
census verify-report report.pdf
```

### 7. Manifest diff

```bash
# Compare two manifests
census diff baseline.db current.db

# HTML diff report
census diff baseline.db current.db -o diff.html

# Machine-readable (exit codes: 0=none, 1=added, 2=removed, 4=modified)
census diff baseline.db current.db --json
```

### 8. Standalone hashing

```bash
# Hash a file
census hash document.pdf

# Hash a directory
census hash /path/to/files

# Verify against known hash
census hash document.pdf --verify a1b2c3d4e5...
```

### 9. Attestation tracking

```bash
# Check attestation status
census track att_12345

# Wait for Bitcoin anchoring (default)
census track att_12345 --poll --timeout 7200

# Wait for TSA certification only (faster than T2)
census track att_12345 --poll --level T1
```

### 10. Webhooks (T1/T2 lifecycle push notifications)

```bash
# Register a webhook for T1 (TSA) and T2 (Bitcoin) events
census webhook register --url https://hooks.example.com/certisigma \
    --events t1_complete,t2_complete --label prod-monitor \
    --save-secret .census-webhook-secret

# List registered webhooks
census webhook list --json

# View delivery history
census webhook deliveries wh_abc123

# Start a webhook receiver with T1/T2 hooks
census webhook serve --secret-file .census-webhook-secret \
    --on-t1 'echo "T1 certified" | tee -a /var/log/census.log' \
    --on-t2 'curl -X POST https://slack.webhook/...'

# Verify a saved webhook payload (forensic evidence chain)
census webhook verify-payload delivery.json \
    --signature "sha256=abc..." --secret-file .census-webhook-secret

# Delete a webhook
census webhook delete wh_abc123
```

### 11. Watch mode with full T1/T2 lifecycle

```bash
# Watch with T0 + T1/T2 end-to-end hooks
census watch /data \
    --on-change "jq . >> /var/log/census-changes.jsonl" \
    --on-attest "echo 'T0 attested'" \
    --on-t1 "echo 'T1 TSA certified'" \
    --on-t2 "curl -X POST https://slack.webhook/..." \
    --webhook-secret-file .census-webhook-secret \
    --webhook-port 9514
```

### 12. Self-diagnostic

```bash
# Run all health checks
census doctor

# Check including a specific manifest
census doctor --manifest inventory.db

# Machine-readable output for CI
census doctor --json
```

### 11. Manifest merging

```bash
# Merge manifests from different servers
census merge server1.db server2.db -o combined.db

# Merge with glob
census merge scans/*.db -o full-inventory.db --json
```

### 12. Audit log

```bash
# View all operations
census audit-log show

# Verify hash chain integrity
census audit-log verify

# Machine-readable
census audit-log show --last 10 --json
```

### 13. Named snapshots

```bash
# Create a compliance baseline
census snapshot create q1-baseline --manifest inventory.db

# List snapshots
census snapshot list

# Compare two snapshots
census snapshot diff q1-baseline q2-baseline
```

### 14. Forensic annotation

```bash
# Annotate an attestation with case metadata
census annotate att_123 --note "Evidence for case FR-2026-42" --tag "case-2026-001"

# Zero-knowledge mode: encrypt before sending
census annotate att_123 --note "Confidential" --encrypt --encryption-key <key>

# GDPR right-to-erasure
census annotate att_123 --delete
```

### 15. Configuration

```bash
# Create config template
census config init --project

# View effective config
census config show

# Enable shell completions
eval "$(census completion bash)"
```

### 16. Forensic share tokens

```bash
# Create a share token (chain of custody)
census share create <att_id> --expires 24h --recipient "Legal Dept" --max-uses 5

# List / inspect / revoke
census share list --json
census share info <token_id>
census share revoke <token_id>
```

### 17. Structured tagging

```bash
# Tag attestations for classification
census tag set <att_id> -t department=legal -t case=2026-001

# Encrypted tags (zero-knowledge)
census tag set <att_id> -t classification=confidential --encrypt

# Query by tags (AND logic, cursor pagination)
census tag query -f department=legal --limit 50 --json
```

### 18. Key rotation

```bash
# Rotate encryption key (NIST SP 800-57)
census key-rotate <att_id> --old-key <hex64> --new-key <hex64>
```

### 19. Derived lists (third-party breach detection)

```bash
# Create an opaque HMAC-SHA256 derived list from your manifest
census derived-list create --manifest ./manifest.db --label "Q1 2026"

# Third party matches their suspects (server never sees plaintext)
census derived-list match <list_id> --list-key <hex64> --hashes-file suspects.txt

# Audit trail
census derived-list access-log <list_id>
```

### 20. Metadata read

```bash
census metadata get <att_id> --json
census metadata get <att_id> --decrypt --encryption-key <hex64>
```

### 21. Watch mode (continuous monitoring)

```bash
# Watch a directory for changes and attest new/modified files
census watch /path/to/files --source "production"

# Dry run — hash only, no attestation
census watch /data --dry-run

# Network mount — use polling
census watch /mnt/share --polling --poll-interval 10

# Event hooks — run commands on change/attestation (JSON on stdin)
census watch /data --on-change "jq . >> /var/log/census-changes.jsonl" \
                   --on-attest "curl -X POST https://slack.webhook/..."
```

Requires: `pip install certisigma-census[watch]`

Production deployment via systemd: see `contrib/census-watch@.service`.

### 22. Manifest seal (tamper evidence)

```bash
# Create an HMAC-SHA256 seal for a manifest
census seal ./manifest.db --key $(census key-gen)

# Verify the seal before trusting a manifest
census verify-seal ./manifest.db --key <hex64>

# JSON output
census verify-seal ./manifest.db --key <hex64> --json
```

The seal proves the manifest has not been modified since it was sealed. Follows the Tripwire/AIDE signed-database pattern.

### 23. Quiet mode (scripting)

```bash
# Suppress informational output — only errors and exit codes
census -q scan /data --dry-run
census -q compare /suspects

# Quiet + JSON — clean machine-readable output
census -q scan /data --json --attest-manifest
```

### 24. Bulk leak detection

```bash
# Scan a suspect drive against your org inventory (up to 50K hashes/call)
census bulk-scan /mnt/suspect-drive --json

# Cross-reference with a local manifest
census bulk-scan ./data --manifest inventory.db --workers 4

# Dry run — hash and count, no API call (save rate limit)
census bulk-scan /data --dry-run

# Label the scan for incident tracking
census bulk-scan /exports --source incident-2026-003 --json

# Save results to file
census bulk-scan /exports --output results.json

# Report-only mode — always exit 0 (for CI pipelines)
census bulk-scan /data --exit-zero --json > results.json

# Summary mode — counts only, no match details
census bulk-scan /data --summary --exit-zero
```

Exit code: `0` if no matches (or `--exit-zero`), `1` if matches found (potential exfiltration).

### 25. Organization statistics

```bash
# View org-level inventory stats
census stats

# Machine-readable
census stats --json
```

### 26. SARIF output (CI/CD integration)

```bash
# Compare with SARIF output for GitHub Security tab
census compare /suspects --format sarif > results.sarif

# Write SARIF directly to file (recommended for CI/CD)
census compare /suspects --format sarif --output results.sarif

# Report-only mode — always exit 0 (upload SARIF without pipeline failure)
census compare /suspects --format sarif --output results.sarif --exit-zero

# Summary mode — counts only, concise CI logs
census compare /suspects --summary --exit-zero

# SARIF + JSON are also available
census compare /suspects --format json
```

SARIF v2.1.0 output can be uploaded to GitHub Security tab, VS Code SARIF Viewer, and other compatible tools.

### 27. JSONL streaming output

```bash
# Stream results to a log file (one JSON object per line)
census compare /suspects --format jsonl >> /var/log/census/matches.jsonl

# Pipe to jq for real-time filtering
census compare /suspects --format jsonl | jq 'select(.level=="T2")'

# JSONL is available on compare, bulk-scan, integrity, verify-manifest, and diff
census integrity manifest.db --format jsonl
census diff base.db current.db --format jsonl
```

### 28. On-match notification hooks

```bash
# Execute a command when matches are found (JSON on stdin)
census compare /suspects --on-match './scripts/alert.sh'

# POST to a webhook
census compare /suspects --on-match 'curl -s -X POST -d @- https://hooks.slack.com/...'

# Also available on bulk-scan
census bulk-scan /data --on-match 'python3 scripts/notify.py'
```

The `--on-match` command is only executed when matches > 0. Match data (JSON) is piped to stdin.

### 29. GitHub Actions

```yaml
# Breach detection with SARIF upload (3 lines)
- uses: certisigma/census-action@v1
  with:
    command: compare
    target: ./artifacts
    manifest: ./inventory.db
  env:
    CERTISIGMA_API_KEY: ${{ secrets.CERTISIGMA_API_KEY }}

# Integrity check (no API key needed)
- uses: certisigma/census-action@v1
  with:
    command: integrity
    manifest: ./inventory.db

# Inventory scan on release
- uses: certisigma/census-action@v1
  with:
    command: scan
    target: ./src
    source: release-${{ github.ref_name }}
  env:
    CERTISIGMA_API_KEY: ${{ secrets.CERTISIGMA_API_KEY }}
```

Composite action — zero Docker overhead, SARIF auto-upload to GitHub Security tab, step summary, masked secrets. Full docs: [`docs/features/github-action.md`](docs/features/github-action.md)

### 30. Compliance reports

```bash
# NIS2 compliance report (default)
census compliance-report manifest.db -o report.html

# DORA compliance report
census compliance-report manifest.db --template dora -o report.html

# ISO 27001
census compliance-report manifest.db --template iso27001 -o report.html

# With integrity check included
census compliance-report manifest.db --integrity -o report.html

# Machine-readable JSON
census compliance-report manifest.db --json
```

Maps Census data to regulatory requirements (NIS2, DORA, ISO 27001). 100% local — no API calls. Uses manifest data and optional integrity check.

### 31. Forensic archive

```bash
# Create a forensic evidence package from a manifest
census archive manifest.db -o evidence-2026-03-18.zip

# With chain of custody metadata
census archive manifest.db -o case-42.zip \
  --examiner "J. Doe" --case-id CASE-42 --organization "Acme Corp"

# Verify archive integrity
census verify-archive evidence-2026-03-18.zip
```

Creates a self-contained ZIP with: manifest database, full inventory (JSON), system metadata, chain of custody, SHA256SUMS for offline verification. Follows EnCase/FTK conventions for evidence packaging.

### AI Governance

```bash
# Generate a policy template
census ai-policy init

# Edit .census-ai-policy.toml to define allow/exclude rules

# Classify assets (dry run — no API calls)
census ai-policy apply manifest.db --dry-run

# Apply classifications and tag attestations
census ai-policy apply manifest.db --api-key cs_...

# Generate HTML compliance report
census ai-policy report manifest.db -o ai-report.html

# JSON output
census ai-policy report manifest.db --json
```

Classify inventoried assets for ML/AI training compliance using TOML-based policies. Rules match files by glob patterns and size filters. Safety-first: unmatched files default to `exclude`. Supports EU AI Act, ISO/IEC 42001, and C2PA frameworks. Classification is 100% local; only `apply` (without `--dry-run`) makes API calls to tag attestations.

## How It Works

1. **Scan** — Census walks the directory, computes SHA-256 for each file (streamed, constant memory), and builds a local manifest.
2. **Attest** — Hashes are sent in batches (up to 100 per call) to the CertiSigma API. Each hash receives a three-layer cryptographic proof (T0 ECDSA signature, T1 qualified TSA timestamp, T2 Bitcoin anchor).
3. **Compare** — Suspect files are hashed and verified against the registry via `POST /verify/batch`. Matches prove the file was previously inventoried, regardless of filename or directory structure changes.

The original file content **never leaves** the client. Only SHA-256 hashes are transmitted.

## Features

| Feature | Description | Docs |
|---------|-------------|------|
| **File filters** | `--include`, `--exclude` globs; `--min-size`, `--max-size` | [scanning.md](docs/features/scanning.md) |
| **Resume scans** | `--resume` skips unchanged files, preserves attestation state | [scanning.md](docs/features/scanning.md) |
| **CSV/JSON export** | Compare reports and manifest export in both formats | [comparison.md](docs/features/comparison.md) |
| **Retry with backoff** | Automatic retry on 429/5xx with exponential backoff | [retry-and-resilience.md](docs/features/retry-and-resilience.md) |
| **Structured logging** | `--log-format json` for SIEM/ELK integration | [logging.md](docs/features/logging.md) |
| **Progress bars** | Visual feedback for scan, attest, and compare operations | [scanning.md](docs/features/scanning.md) |
| **SQLite manifest** | WAL mode, indexed lookups, auto-migration from JSON | [manifest.md](docs/features/manifest.md) |
| **Watch mode** | Continuous filesystem monitoring with batch attestation | [watching.md](docs/features/watching.md) |
| **Evidence verification** | Full T0/T1/T2 chain, OTS proof export | [evidence.md](docs/features/evidence.md) |
| **Integrity check** | Tamper detection against manifest baseline, differential mode | [integrity.md](docs/features/integrity.md) |
| **Forensic reports** | HTML, PDF, evidence bundles (ZIP) | [reporting.md](docs/features/reporting.md) |
| **Manifest diff** | Compare snapshots, AIDE-style exit codes, HTML reports | [diff.md](docs/features/diff.md) |
| **Standalone hashing** | SHA-256 without manifests or API calls | [hash.md](docs/features/hash.md) |
| **Attestation tracking** | Monitor T0/T1/T2 progression with `--poll` or `--level T1\|T2` | [tracking.md](docs/features/tracking.md) |
| **Webhooks** | Push-based T1/T2 lifecycle notifications with HMAC verification | — |
| **Config files** | TOML config with user/project precedence | [config.md](docs/features/config.md) |
| **Shell completions** | bash, zsh, fish via `census completion` | — |
| **Self-diagnostic** | API health, config, inotify, manifest integrity | [doctor.md](docs/features/doctor.md) |
| **Manifest merging** | Combine manifests from distributed scans | [merge.md](docs/features/merge.md) |
| **JSON output** | `--json` on scan, compare, status, doctor, merge | — |
| **Audit log** | Tamper-evident JSONL with SHA-256 hash chain | [audit-log.md](docs/features/audit-log.md) |
| **Named snapshots** | Compliance baselines with diff comparison | [snapshots.md](docs/features/snapshots.md) |
| **Forensic annotation** | Metadata, tags, case IDs on attestations | [annotate.md](docs/features/annotate.md) |
| **Zero-knowledge encryption** | AES-256-GCM client-side metadata encryption | [annotate.md](docs/features/annotate.md) |
| **Forensic sharing** | Time-limited, use-limited share tokens (chain of custody) | [sharing.md](docs/features/sharing.md) |
| **Structured tagging** | Key-value classification with encrypted tags and query | [tagging.md](docs/features/tagging.md) |
| **Key rotation** | NIST SP 800-57 AES-256 key rotation for metadata + tags | [key-rotation.md](docs/features/key-rotation.md) |
| **Derived lists** | HMAC-SHA256 opaque third-party breach detection | [derived-lists.md](docs/features/derived-lists.md) |
| **Metadata read** | Read attestation metadata with optional decryption | — |
| **Manifest seal** | HMAC-SHA256 tamper-evidence seal (Tripwire/AIDE pattern) | [seal.md](docs/features/seal.md) |
| **Quiet mode** | `--quiet` / `-q` suppresses info output for scripting | — |
| **Manifest self-attestation** | `--attest-manifest` anchors manifest hash at scan time | — |
| **Bulk leak detection** | `bulk-scan` — 50K hashes/call, `--dry-run`, `--source`, `--output` | — |
| **Organization stats** | `stats` — total claims, unique hashes, monthly breakdown | — |
| **SARIF output** | `compare --format sarif` — v2.1.0 with help, tags, invocations, file write | — |
| **Baseline update** | `update` — AIDE-style accept verified changes into manifest (detect → review → accept) | — |
| **JSONL streaming** | `--format jsonl` on compare, bulk-scan, integrity, verify-manifest, diff | — |
| **On-match hooks** | `--on-match CMD` — execute command with results on stdin (compare, bulk-scan) | — |
| **CI/CD integration** | `--exit-zero` (report-only mode), `--summary` (counts only) on compare and bulk-scan | — |
| **`--no-color`** | Disable colored output; also respects `NO_COLOR` env var (no-color.org) | — |
| **Forensic JSON metadata** | `census_version` and `elapsed_seconds` in all JSON output | — |
| **GitHub Action** | `certisigma/census-action@v1` — composite action for CI/CD with SARIF upload | [github-action.md](docs/features/github-action.md) |
| **Compliance reports** | `compliance-report` — NIS2, DORA, ISO 27001 mapping from manifest data (100% local) | — |
| **Developers page** | Standalone HTML documentation at [developers.certisigma.ch/census](https://developers.certisigma.ch/census) | [census.html](docs/census.html) |
| **AI governance** | `ai-policy init/apply/report` — TOML policy engine for ML/AI training asset classification (EU AI Act, ISO 42001) | — |
| **Manifest encryption** | AES-256-GCM encryption at rest for manifest files (`--encryption-key` / `CENSUS_ENCRYPTION_KEY`) | — |
| **Man pages** | Pre-generated man pages for all commands via `click-man` in `docs/man/` | — |
| **PEP 561** | `py.typed` marker for mypy/pyright inline type annotation support | — |
| **File attribution** | Captures file owner, group, POSIX permissions during scan (manifest schema v3) | — |
| **Attested reports** | `report --attest` + `verify-report` — three-layer proof on the report itself | — |
| **Docker image** | `ghcr.io/certisigma/census` for CI/CD scanning | — |

Full documentation: [`docs/features/`](docs/features/)

## CLI Reference

### Global options

| Option | Description |
|--------|-------------|
| `-v` / `--verbose` | Enable debug logging |
| `-q` / `--quiet` | Suppress informational output (errors and `--json` always shown) |
| `--log-format text\|json` | Log output format (default: text). Also: `CENSUS_LOG_FORMAT` env var |
| `--encryption-key HEX` | AES-256 key (64 hex) for manifest encryption at rest. Also: `CENSUS_ENCRYPTION_KEY` env var |
| `--no-color` | Disable colored output (also respects `NO_COLOR` env, see [no-color.org](https://no-color.org)) |
| `--version` | Show version |

### `census scan`

| Option | Description |
|--------|-------------|
| `--source LABEL` | Source label for attestations |
| `--manifest PATH` | Manifest output path (default: `<dir>/.census-manifest.db`) |
| `--api-key KEY` | API key (or set `CERTISIGMA_API_KEY`) |
| `--base-url URL` | Override API base URL |
| `--dry-run` | Hash only, no attestation |
| `--resume` | Resume interrupted scan |
| `--include GLOB` | Include files matching pattern (repeatable) |
| `--exclude GLOB` | Exclude files matching pattern (repeatable) |
| `--min-size SIZE` | Skip files smaller than SIZE (e.g. `1K`, `10M`) |
| `--max-size SIZE` | Skip files larger than SIZE (default: `5G`) |
| `--workers N` | Parallel hashing workers (default: 1, max: 8) |
| `--attest-manifest` | Attest the manifest's own SHA-256 after scan |
| `--json` | Machine-readable JSON summary |

### `census compare`

| Option | Description |
|--------|-------------|
| `--manifest PATH` | Local manifest for cross-referencing |
| `--output PATH` | Save report (`.json` or `.csv` by extension) |
| `--format text\|json\|sarif\|jsonl` | Output format (default: text). `sarif` emits SARIF v2.1.0; `jsonl` streams one JSON object per match |
| `--include/--exclude/--min-size/--max-size` | Same filters as scan |
| `--detailed` | Enriched results: source label, T0/T1/T2 level (requires API key) |
| `--workers N` | Parallel hashing workers (default: 1, max: 8) |
| `--json` | Machine-readable JSON output (equivalent to `--format json`) |
| `--exit-zero` | Always exit 0 (report-only mode for CI pipelines) |
| `--summary` | Show only counts, no match details |
| `--on-match CMD` | Execute CMD with match results as JSON on stdin (only if matches > 0) |

### `census export`

| Option | Description |
|--------|-------------|
| `--format csv\|json\|sha256sum` | Output format (default: csv) |
| `--output PATH` | Output file (default: stdout) |

### `census verify`

| Option | Description |
|--------|-------------|
| `--file` | Treat argument as a file path (hash it first) |
| `--save-ots PATH` | Save OTS proof to this path |
| `--json` | Machine-readable JSON output |
| `--api-key KEY` | API key (optional for verify) |
| `--base-url URL` | Override API base URL |

### `census verify-manifest`

| Option | Description |
|--------|-------------|
| `--detailed` | Fetch enriched data (source, level) per hash |
| `--strict` | Exit with code 1 if any hash is not attested |
| `--json` | Machine-readable JSON output |
| `-o`/`--output PATH` | Save report (`.csv` or `.json`) |
| `--api-key KEY` | API key (optional, needed for `--detailed`) |
| `--base-url URL` | Override API base URL |

### `census integrity`

| Option | Description |
|--------|-------------|
| `--json` | Machine-readable JSON output |
| `--format text\|json\|jsonl` | Output format (default: text) |
| `--output PATH` | Save results (`.csv` or `.json` by extension) |
| `--strict` | Exit with code 1 on any discrepancy |
| `--since PATH` | Differential: load previous state, suppress known findings (`auto` = sidecar) |
| `--write-state PATH` | Save current state for next differential run (`auto` = sidecar) |

### `census update`

| Option | Description |
|--------|-------------|
| `--yes` / `-y` | Skip confirmation prompt (non-interactive) |
| `--json` | Machine-readable JSON output |

Runs integrity check, then applies changes (remove missing, re-hash modified, add new). New entries are `attested=False`.

### `census report`

| Option | Description |
|--------|-------------|
| `-o`/`--output PATH` | Output file (`.html`, `.pdf`, or `.zip`) **required** |
| `--evidence` | Fetch T0/T1/T2 evidence chain for attested files |
| `--integrity` | Run integrity check and include results |
| `--bundle` | Generate evidence bundle (ZIP) |
| `--attest` | Attest the report's own hash via CertiSigma (three-layer proof) |
| `--api-key KEY` | API key (needed with `--evidence` or `--attest`) |

### `census verify-report`

| Option | Description |
|--------|-------------|
| `--sidecar PATH` | Custom sidecar path (default: `<report>.attestation.json`) |
| `--json` | Machine-readable JSON output |

### `census status`

| Option | Description |
|--------|-------------|
| `--json` | Machine-readable JSON output |

### `census doctor`

| Option | Description |
|--------|-------------|
| `--manifest PATH` | Check health of a specific manifest file |
| `--json` | Machine-readable JSON output |
| `--api-key KEY` | API key |
| `--base-url URL` | Override API base URL |

### `census merge`

| Option | Description |
|--------|-------------|
| `-o`/`--output PATH` | Output manifest path **required** |
| `--json` | Machine-readable JSON summary |

### `census diff`

| Option | Description |
|--------|-------------|
| `--json` | Machine-readable JSON output |
| `-o`/`--output PATH` | Save report (`.html`, `.csv`, or `.json` by extension) |
| `--summary` | Show only counts, no individual file details |

Exit codes: 0=none, 1=added, 2=removed, 4=modified (bitmask, OR'd together).

### `census hash`

| Option | Description |
|--------|-------------|
| `--stdin` | Read data from stdin instead of a file |
| `--verify HASH` | Compare computed hash against expected SHA-256 |
| `--json` | Output as JSON array |

### `census track`

| Option | Description |
|--------|-------------|
| `--poll` | Continuously check until target level reached |
| `--level T1\|T2` | Target proof level (default: T2). Use T1 for TSA-only |
| `--poll-interval SECS` | Seconds between checks (default: 60) |
| `--timeout SECS` | Max time to poll (default: 3600) |
| `--json` | Machine-readable JSON output |
| `--api-key KEY` | API key |
| `--base-url URL` | Override API base URL |

### `census webhook`

| Subcommand | Description |
|------------|-------------|
| `register` | Register a webhook for T1/T2 events |
| `list` | List registered webhooks |
| `delete WEBHOOK_ID` | Delete a webhook and its delivery history |
| `deliveries WEBHOOK_ID` | Show delivery history |
| `verify-payload FILE` | Verify HMAC signature of a saved payload |
| `serve` | Start webhook receiver HTTP server |

**`census webhook register` options:**

| Option | Description |
|--------|-------------|
| `--url URL` | HTTPS callback URL (required) |
| `--events LIST` | Comma-separated: `t1_complete,t2_complete` (required) |
| `--label LABEL` | Human-readable label (max 200 chars) |
| `--save-secret FILE` | Save signing secret to file (0o600 permissions) |
| `--json` | Machine-readable JSON output |

**`census webhook serve` options:**

| Option | Description |
|--------|-------------|
| `--secret-file FILE` | Signing secret file (required) |
| `--port PORT` | Listen port (default: 9514) |
| `--bind ADDR` | Bind address (default: 127.0.0.1) |
| `--on-t1 CMD` | Shell command on T1 event (JSON on stdin) |
| `--on-t2 CMD` | Shell command on T2 event (JSON on stdin) |
| `--tls-cert FILE` | PEM certificate for built-in TLS |
| `--tls-key FILE` | PEM private key for built-in TLS |
| `--replay-window SECS` | Anti-replay window (default: 300) |

### `census config`

| Action | Description |
|--------|-------------|
| `show` | Display effective merged config |
| `init` | Create a template config file |
| `paths` | Show config file locations |
| `--project` | Act on project `.census.toml` |

### `census audit-log`

| Action | Description |
|--------|-------------|
| `show` | Display audit log entries |
| `verify` | Check hash chain integrity |
| `clear` | Delete the audit log file |
| `--log-path PATH` | Override audit log file path |
| `--last N` | Show only last N entries (with `show`) |
| `--json` | Machine-readable JSON output |

### `census snapshot`

| Action | Description |
|--------|-------------|
| `create <name>` | Save a named snapshot of a manifest |
| `list` | List all snapshots |
| `diff <name1> <name2>` | Compare two snapshots |
| `delete <name>` | Remove a snapshot |
| `--manifest PATH` | Manifest to snapshot (required for `create`) |
| `--snapshot-dir PATH` | Override snapshot directory |
| `--json` | Machine-readable JSON output |

### `census annotate`

| Option | Description |
|--------|-------------|
| `--note TEXT` | Free-text note |
| `--tag TEXT` | Tag label (e.g. case number) |
| `--case-id TEXT` | Forensic case identifier |
| `--source TEXT` | Update source label |
| `--delete` | Soft-delete metadata (GDPR) |
| `--encrypt` | Encrypt client-side (AES-256-GCM) |
| `--encryption-key HEX` | 64-char hex AES-256 key |
| `--decrypt` | Decrypt and display stored metadata |
| `--json` | Machine-readable JSON output |
| `--api-key KEY` | API key |

### `census share`

| Action / Option | Description |
|--------|-------------|
| `create <att_id>...` | Create share token for attestation(s) |
| `list` | List all share tokens |
| `info <token_id>` | Inspect a specific token |
| `revoke <token_id>` | Revoke a token |
| `--expires DURATION` | Token lifetime: `30m`, `24h`, `7d` (default: `24h`) |
| `--recipient TEXT` | Recipient label |
| `--max-uses N` | Max usage count |
| `--json` | Machine-readable JSON output |

### `census tag`

| Action / Option | Description |
|--------|-------------|
| `set <att_id>` | Set tags (requires `-t key=value`) |
| `get <att_id>` | List tags on an attestation |
| `delete <att_id> <key>` | Delete a specific tag |
| `query` | Query attestations by tag filter |
| `-t`, `--tag key=value` | Tag pair (repeatable) |
| `-f`, `--filter key=value` | Query filter (repeatable, AND logic) |
| `--encrypt` | Encrypt tag values (AES-256-GCM) |
| `--decrypt` | Decrypt on get |
| `--limit N` | Max query results (default: 100) |
| `--cursor TOKEN` | Pagination cursor |
| `--json` | Machine-readable JSON output |

### `census key-rotate`

| Option | Description |
|--------|-------------|
| `<attestation_id>` | Target attestation |
| `--old-key HEX` | Current 64-char hex AES-256 key |
| `--new-key HEX` | New 64-char hex AES-256 key |
| `--json` | Machine-readable JSON output |

### `census derived-list`

| Action / Option | Description |
|--------|-------------|
| `create` | Create HMAC-SHA256 derived list |
| `list` | List all derived lists |
| `info <list_id>` | Get list details |
| `match <list_id>` | Match suspect hashes against list |
| `access-log <list_id>` | View access audit trail |
| `signature <list_id>` | ECDSA signature verification (no auth required) |
| `revoke <list_id>` | Revoke a list |
| `--manifest PATH` | Manifest to read hashes from |
| `--tag-filter JSON` | JSON tag filter for server-side selection |
| `--label TEXT` | Human-readable label |
| `--expires HOURS` | Expiry in hours (max 2160) |
| `--list-key HEX` | HMAC key (64 hex chars) for match |
| `--hashes-file PATH` | File with one hash per line for match |
| `--json` | Machine-readable JSON output |

### `census metadata`

| Action / Option | Description |
|--------|-------------|
| `get <att_id>` | Read attestation metadata |
| `--decrypt` | Decrypt encrypted extra_data |
| `--encryption-key HEX` | 64-char hex AES-256 key |
| `--json` | Machine-readable JSON output |

### `census key-gen`

Generate a random AES-256 encryption key (64 hex characters, 256 bits). The key is shown only once — store it securely.

```bash
census key-gen              # outputs the key to stdout
census key-gen --json       # JSON output: {"key": "...", "algorithm": "AES-256-GCM", "bits": 256}
```

### `census completion`

Takes a shell name: `bash`, `zsh`, or `fish`.

```bash
eval "$(census completion bash)"   # bash
eval "$(census completion zsh)"    # zsh
census completion fish | source    # fish
```

### `census watch`

| Option | Description |
|--------|-------------|
| `--debounce SECS` | Quiet period before processing (default: 2.0s) |
| `--batch-interval SECS` | Max time between attestation batches (default: 30s) |
| `--scan-on-start / --no-scan-on-start` | Baseline scan before watching (default: on) |
| `--on-delete ignore\|mark\|remove` | Action on file deletion (default: ignore) |
| `--polling` | Use PollingObserver for NFS/CIFS mounts |
| `--poll-interval SECS` | Polling interval (default: 5s) |
| `--source/--manifest/--api-key/--dry-run` | Same as `census scan` |
| `--include/--exclude/--min-size/--max-size` | Same filters as scan |
| `--on-change CMD` | Shell command on file change (JSON on stdin) |
| `--on-attest CMD` | Shell command after attestation (JSON on stdin) |
| `--on-t1 CMD` | Shell command on T1 (TSA) webhook event (JSON on stdin) |
| `--on-t2 CMD` | Shell command on T2 (Bitcoin) webhook event (JSON on stdin) |
| `--webhook-secret-file FILE` | Signing secret for webhook receiver |
| `--webhook-port PORT` | Webhook receiver port (default: 9514) |
| `--webhook-bind ADDR` | Webhook receiver bind address (default: 127.0.0.1) |

Requires: `pip install certisigma-census[watch]`

### `census archive`

| Option | Description |
|--------|-------------|
| `MANIFEST` | Path to the manifest database |
| `-o/--output PATH` | Output ZIP path (default: evidence-YYYY-MM-DD.census.zip) |
| `--examiner NAME` | Examiner name (chain of custody) |
| `--case-id ID` | Case identifier (chain of custody) |
| `--notes TEXT` | Free-text notes (chain of custody) |
| `--organization NAME` | Organization name (chain of custody) |
| `--no-compress` | Store files without compression |
| `--no-seal` | Exclude manifest seal even if present |
| `--json` | Machine-readable JSON output with forensic metadata |

### `census verify-archive`

| Option | Description |
|--------|-------------|
| `ARCHIVE_PATH` | Path to the Census evidence archive |
| `--json` | Machine-readable JSON output with forensic metadata |

Verifies SHA256SUMS against actual archive contents. Exit code 0 = valid, 1 = tampered. Archives larger than 500 MB are rejected (decompression bomb guard).

### `census seal`

| Option | Description |
|--------|-------------|
| `MANIFEST_PATH` | Path to the manifest file |
| `--key KEY` | HMAC key (64 hex chars = 256 bits) |
| `--json` | Machine-readable JSON output |

### `census verify-seal`

| Option | Description |
|--------|-------------|
| `MANIFEST_PATH` | Path to the manifest file |
| `--key KEY` | HMAC key used to create the seal |
| `--json` | Machine-readable JSON output |

Exit code 0 = valid, 1 = invalid or error.

### `census bulk-scan`

| Option | Description |
|--------|-------------|
| `SUSPECT_DIR` | Directory to scan |
| `--manifest PATH` | Local manifest for cross-referencing original paths |
| `--include/--exclude/--min-size/--max-size` | Same filters as scan |
| `--workers N` | Parallel hashing workers (default: 1, max: 8) |
| `--source LABEL` | Source label for audit logging (e.g. incident ID) |
| `--dry-run` | Hash only, no API call — preview file/hash/chunk counts |
| `--output PATH` | Save results to JSON file |
| `--json` | Machine-readable JSON output |
| `--exit-zero` | Always exit 0 (report-only mode for CI pipelines) |
| `--summary` | Show only counts, no match details |
| `--api-key KEY` | API key (requires `scan` scope) |
| `--base-url URL` | Override API base URL |

Uses `POST /scan` — up to 50K hashes per call with automatic chunking. Exit code: 0=no matches, 1=matches found (or always 0 with `--exit-zero`).

### `census stats`

| Option | Description |
|--------|-------------|
| `--json` | Machine-readable JSON output |
| `--api-key KEY` | API key (requires `batch` scope) |
| `--base-url URL` | Override API base URL |

## Exit Codes

| Code | Context | Meaning |
|------|---------|---------|
| `0` | All commands | Success (or `--exit-zero` report-only mode) |
| `1` | All commands | General error (API, I/O, config, or matches found) |
| `2` | All commands | Usage error (invalid arguments — Click handles this) |
| `1` | `integrity --strict` | Violations detected (missing, modified, or new files) |
| bitmask | `diff` | `1`=added, `2`=removed, `4`=modified (OR'd together) |
| `0` | `compare --exit-zero` | Always 0, even if matches found (for CI) |

## Manifest Encryption at Rest

Census can encrypt manifests on disk using AES-256-GCM:

```bash
# Generate a key
census key-gen

# Scan with encryption — manifest is saved as .db.enc
census --encryption-key <hex64> scan /data --dry-run

# Load an encrypted manifest
census --encryption-key <hex64> status manifest.db

# Or use the environment variable (recommended for automation)
export CENSUS_ENCRYPTION_KEY=<hex64>
census scan /data --dry-run
census status manifest.db
```

Key resolution precedence: `--encryption-key` > `CENSUS_ENCRYPTION_KEY` env > config file `encryption_key`.

Encrypted manifests are auto-detected by their `CENSUS_ENC\x01` header. `census doctor` reports encryption status.

## Man Pages

Man pages are included in the source repository under `docs/man/` but are **not** installed automatically by `pip`. To use them:

```bash
# Option 1: read directly from the source tree
man docs/man/census.1

# Option 2: install system-wide (requires root)
sudo install -m 644 docs/man/*.1 /usr/local/share/man/man1/

# Regenerate after adding new commands
./scripts/generate-man-pages.sh
```

For quick CLI help without man pages, use `census --help` or `census <command> --help`.

## Dependencies

- [`certisigma`](https://pypi.org/project/certisigma/) — Official CertiSigma Python SDK
- [`click`](https://click.palletsprojects.com/) — CLI framework

Optional:
- [`watchdog`](https://pypi.org/project/watchdog/) — Filesystem monitoring (only for `census watch`)
- [`fpdf2`](https://pypi.org/project/fpdf2/) — PDF report generation (only for `census report` with `.pdf` output)

## Testing

```bash
pip install -e ".[dev]"

# Unit tests (820+ tests, ~18s)
pytest --tb=short -q

# With coverage report
pytest --cov --cov-report=html

# Integration tests (requires API key)
CERTISIGMA_API_KEY=cs_demo_xxx pytest -m integration -v

# Performance benchmarks
python scripts/benchmark.py --files 1000 --output results.json
```

## License

MIT — Ten Sigma Sagl
