Metadata-Version: 2.4
Name: sanicode
Version: 0.3.3
Summary: AI-assisted code sanitization scanner with OWASP ASVS, NIST 800-53, and ASD STIG compliance mapping.
Project-URL: Homepage, https://github.com/rdwj/sanicode
Project-URL: Repository, https://github.com/rdwj/sanicode
Project-URL: Issues, https://github.com/rdwj/sanicode/issues
Author: Sanicode Contributors
License: Apache-2.0
Keywords: compliance,llm,owasp,sast,security,stig
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Requires-Python: >=3.10
Requires-Dist: fastapi>=0.100
Requires-Dist: litellm>=1.0
Requires-Dist: networkx>=3.0
Requires-Dist: prometheus-client>=0.17
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: tomli>=2.0; python_version < '3.11'
Requires-Dist: tomlkit>=0.12
Requires-Dist: tree-sitter-language-pack>=0.7
Requires-Dist: tree-sitter>=0.24
Requires-Dist: typer>=0.9.0
Requires-Dist: uvicorn[standard]>=0.20
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: httpx>=0.24; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Description-Content-Type: text/markdown

# Sanicode

Sanicode scans Python, JavaScript/TypeScript, and PHP codebases for input validation and sanitization gaps using taint analysis and a data flow knowledge graph, then maps every finding to OWASP ASVS 5.0, NIST 800-53, ASD STIG v4r11, and PCI DSS 4.0. Output formats include SARIF (for GitHub Code Scanning), JSON, Markdown, and an HTML dashboard with an interactive knowledge graph.

Unlike pattern-only tools like Bandit or Semgrep, sanicode traces tainted data from source to sink across function boundaries, so findings carry context about *how* untrusted input reaches a dangerous call and *whether* sanitization exists along the path.

## Install

```
pip install sanicode
```

Requires Python 3.10+.

## Quick start

Scan a codebase and generate a Markdown report:

```
sanicode scan .
```

Generate SARIF output for CI integration:

```
sanicode scan . -f sarif
```

Generate an HTML dashboard with an interactive knowledge graph:

```
sanicode scan . -f html
```

Fail the build if high-severity findings exist:

```
sanicode scan . --fail-on high
```

Reports are written to `sanicode-reports/` by default.

## CI/CD integration

### GitHub Action

```yaml
- uses: rdwj/sanicode@v0
  with:
    path: .
    fail-on: high
    format: sarif
```

### Pre-commit hook

```yaml
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/rdwj/sanicode
    rev: v0.3.1
    hooks:
      - id: sanicode
```

See [docs/ci-cd-integration.md](docs/ci-cd-integration.md) for GitLab CI, Jenkins, Azure DevOps, and Tekton/OpenShift Pipelines.

## API server

Start the FastAPI server for remote or hybrid scan mode:

```
sanicode serve
```

This starts on port 8080 with Prometheus metrics at `/metrics`.

### Endpoints

```
POST /api/v1/scan              Submit a scan (async)
GET  /api/v1/scan/{id}         Poll scan status
GET  /api/v1/scan/{id}/findings   Retrieve findings (JSON or ?format=sarif)
GET  /api/v1/scan/{id}/graph      Retrieve knowledge graph
POST /api/v1/analyze           Instant snippet analysis
GET  /api/v1/compliance/map    Compliance framework lookup
GET  /api/v1/health            Liveness check
GET  /metrics                  Prometheus metrics
```

## CLI commands

```
sanicode scan .                              # Scan codebase, generate reports
sanicode scan . -f sarif                     # SARIF output
sanicode scan . -f json -f sarif             # Multiple formats
sanicode scan . -f html                      # HTML dashboard with interactive graph
sanicode scan . --fail-on high               # Exit non-zero on high+ findings
sanicode serve                               # Start API server on :8080
sanicode report scan-result.json             # Re-generate reports from saved results
sanicode report scan-result.json -s high     # Filter by severity
sanicode report scan-result.json --cwe 89    # Filter by CWE
sanicode config setup                        # Interactive provider configuration wizard
sanicode config set llm.fast.model granite-nano  # Script-friendly config
sanicode config test                         # Test configured LLM tiers
sanicode config --show                       # Show resolved configuration
sanicode config --init                       # Create starter sanicode.toml
sanicode graph . --export graph.json         # Export knowledge graph
sanicode graph . --visualize graph.html      # Standalone graph visualization
sanicode rules --list                        # List all detection rules
sanicode rules --validate custom.yaml        # Validate custom rule file
sanicode benchmark                           # Benchmark against Bandit and Semgrep
```

## Detection rules

21 built-in rules across three languages:

**Python** (10 rules, SC001–SC010): path traversal, OS command injection, XSS, SQL injection, code injection, weak cryptography, insecure random, deserialization, hardcoded credentials, SSRF.

**JavaScript/TypeScript** (6 rules, SC200–SC205): path traversal, OS command injection, XSS, weak cryptography, insecure random, hardcoded credentials.

**PHP** (5 rules, SC100–SC104): OS command injection, XSS, SQL injection, deserialization, hardcoded credentials.

Custom YAML rules extend this set. Place rule files in `rules/` in your project root or `~/.config/sanicode/rules/`, and validate with `sanicode rules --validate`.

## Custom rules

```yaml
id: CUSTOM001
cwe_id: 78
severity: high
pattern:
  targets: [python]
  ast_pattern: "call:subprocess.run"
  args:
    shell: "True"
```

Rule files are discovered from `rules/` in the project root and `~/.config/sanicode/rules/`. Run `sanicode rules --validate custom.yaml` to check syntax before deploying.

## Taint analysis

Sanicode performs dataflow-aware taint tracking at two levels:

- **Intra-procedural**: reaching-definitions analysis within each function body.
- **Inter-procedural**: function summaries propagated across the call graph.

Taint paths produce high-confidence edges in the knowledge graph, giving the LLM (and human reviewers) evidence of whether untrusted data actually reaches a sink.

## Compliance frameworks

Findings map to four frameworks, covering 54 CWEs:

- **OWASP ASVS 5.0** — V1: Encoding and Sanitization requirements (L1/L2/L3)
- **NIST 800-53** — SI-10 (Information Input Validation), SI-15 (Information Output Filtering), and related controls
- **ASD STIG v4r11** — APSC-DV-002510 (CAT I), APSC-DV-002520 (CAT II), APSC-DV-002530 (CAT II), and related checks
- **PCI DSS 4.0** — Requirement 6 (Develop and Maintain Secure Systems and Software)

## Configuration

Create a config file:

```
sanicode config --init
```

This writes a `sanicode.toml` in the current directory. Config is loaded from (in order):

1. `--config` flag
2. `sanicode.toml` in the current directory
3. `~/.config/sanicode/config.toml`

Sanicode works fully without any configuration. LLM tiers are optional — without them, the tool runs in degraded mode using AST pattern matching, taint analysis, knowledge graph construction, and compliance lookups. LLM integration adds context-aware reasoning on top of these.

### LLM tiers (optional)

The config supports three tiers for different task complexities. Supported providers include cloud APIs (Anthropic, OpenAI, Google, Azure) and self-hosted inference (vLLM, Ollama, OpenShift AI). Run `sanicode config setup` for an interactive wizard that walks through provider selection and endpoint configuration.

| Tier        | Purpose                                  | Recommended model        |
|-------------|------------------------------------------|--------------------------|
| `fast`      | Classification, severity scoring         | Granite Nano, Mistral 7B |
| `analysis`  | Data flow context, taint reasoning       | Granite Code 8B          |
| `reasoning` | Compliance mapping, graph exploitability | Llama 3.1 70B            |

## Current status

v0.3.0 — Multi-language scanning (Python, JavaScript/TypeScript, PHP), 21 built-in detection rules, intra- and inter-procedural taint analysis, LLM graph reasoning, 54-CWE compliance database with four framework mappings, GitHub Action and pre-commit hook, custom YAML rules, and CI/CD integration guides for six platforms.

## License

Apache-2.0
