Metadata-Version: 2.4
Name: llmdebug
Version: 2.8.0
Summary: Structured debug snapshots for LLM-assisted debugging
Project-URL: Homepage, https://github.com/NicolasSchuler/llmdebug
Project-URL: Repository, https://github.com/NicolasSchuler/llmdebug
Author-email: Nicolas Schuler <schuler.nicolas@proton.me>
License: MIT
License-File: LICENSE
Keywords: crash-reporting,debugging,llm,pytest
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: Pytest
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Debuggers
Requires-Python: >=3.10
Requires-Dist: filelock>=3.0
Provides-Extra: cli
Requires-Dist: click>=8.0; extra == 'cli'
Requires-Dist: rich>=13.0; extra == 'cli'
Provides-Extra: dev
Requires-Dist: bandit>=1.7; extra == 'dev'
Requires-Dist: click>=8.0; extra == 'dev'
Requires-Dist: deptry>=0.20; extra == 'dev'
Requires-Dist: diff-cover>=9.0; extra == 'dev'
Requires-Dist: import-linter>=2.0; extra == 'dev'
Requires-Dist: ipython>=8.0; extra == 'dev'
Requires-Dist: mcp>=1.0; extra == 'dev'
Requires-Dist: mutmut>=3.0; extra == 'dev'
Requires-Dist: numpy>=1.20; extra == 'dev'
Requires-Dist: pyright>=1.1; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest-benchmark>=4.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: python-semantic-release>=9.0; extra == 'dev'
Requires-Dist: radon>=6.0; extra == 'dev'
Requires-Dist: rich>=13.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Requires-Dist: toons>=0.1; extra == 'dev'
Requires-Dist: vulture>=2.11; extra == 'dev'
Requires-Dist: xenon>=0.9; extra == 'dev'
Provides-Extra: evals
Requires-Dist: datasets>=2.0; extra == 'evals'
Provides-Extra: jupyter
Requires-Dist: ipython>=8.0; extra == 'jupyter'
Provides-Extra: mcp
Requires-Dist: mcp>=1.0; extra == 'mcp'
Provides-Extra: toon
Requires-Dist: toons>=0.1; extra == 'toon'
Description-Content-Type: text/markdown

<p align="center">
  <img src="logo/bird.png" alt="llmdebug logo" width="200">
</p>

# llmdebug

Structured debug snapshots for LLM-assisted debugging.

When your code fails, `llmdebug` captures the exception, stack frames, local variables, and environment info in a JSON format optimized for LLM consumption. This enables **evidence-based debugging** instead of the "guess → patch → rerun" loop.

Current feature status is documented in this README (source of truth for shipped capabilities). Research context and forward-looking priorities live in `docs/research-improvement-roadmap.md`.

## Why?

Without observability, LLMs debug by guessing:
```
fail → guess patch → rerun → repeat (LLM roulette)
```

With `llmdebug`, failures produce rich snapshots automatically:
```
fail → read snapshot → ranked hypotheses → minimal patch → verify
```

The key insight: **baseline instrumentation should always be on**, so the first failure already has the evidence needed to diagnose it.

## Installation

```bash
pip install llmdebug          # Core library + pytest plugin
pip install llmdebug[cli]     # CLI for viewing snapshots
pip install llmdebug[mcp]     # MCP server for IDE integration (Claude Code, etc.)
pip install llmdebug[jupyter]  # Jupyter/IPython integration
pip install llmdebug[toon]    # TOON output format for maximum token savings
```

## Quick Start

### Pytest (automatic - recommended)

Just install the package. Test failures automatically generate snapshots.

```bash
pytest  # Failures create .llmdebug/latest.json
```

### Decorator

```python
from llmdebug import debug_snapshot

@debug_snapshot()
def main():
    data = load_data()
    process(data)

if __name__ == "__main__":
    main()
```

### Context Manager

For targeted instrumentation when you need more detail:

```python
from llmdebug import snapshot_section

with snapshot_section("data_processing"):
    result = transform(data)
```

### Jupyter / IPython

Automatic snapshot capture in notebooks with rich HTML display:

```python
# In a notebook cell:
%load_ext llmdebug

# Or programmatically:
import llmdebug
llmdebug.load_jupyter()
```

After any cell error, a compact banner shows the exception, crash location, and hints. Use magic commands for deeper analysis:

```python
%llmdebug              # Show full snapshot with locals and context
%llmdebug hypothesize  # Generate ranked debugging hypotheses
%llmdebug diff         # Compare latest vs previous snapshot
%llmdebug list         # List recent snapshots
%llmdebug config       # Show active configuration
```

Requires the `jupyter` extra: `pip install llmdebug[jupyter]`

### Production Hooks

Capture unhandled exceptions automatically in production applications:

```python
import llmdebug

llmdebug.install_hooks(out_dir=".llmdebug")

# Any unhandled exception, thread crash, or unraisable exception
# will now produce a snapshot automatically.

# Optional: uninstall when done
llmdebug.uninstall_hooks()
```

Hooks install into `sys.excepthook`, `threading.excepthook`, and `sys.unraisablehook`. They include rate limiting (default: 10 captures/min) and automatic PII redaction.

### Web Middleware

Zero-config crash capture for web frameworks:

```python
# Flask
app.wsgi_app = LLMDebugWSGIMiddleware(app.wsgi_app)

# FastAPI
from llmdebug import LLMDebugASGIMiddleware
app.add_middleware(LLMDebugASGIMiddleware)

# Django WSGI
from llmdebug import LLMDebugWSGIMiddleware
application = LLMDebugWSGIMiddleware(application)
```

Middleware captures request context (method, path, query string) alongside the crash snapshot, with automatic PII redaction on query parameters.

## CLI

View and manage snapshots in the terminal with rich formatting:

```bash
llmdebug                          # Show latest snapshot (crash-level detail)
llmdebug show --detail full       # Show all stack frames
llmdebug show --detail context    # Everything including repro, git, env
llmdebug show --json              # Output raw expanded JSON
llmdebug show --raw-session       # Output raw DebugSession envelope JSON
llmdebug list                     # List recent snapshots
llmdebug frames -i 0              # Inspect a specific frame
llmdebug diff                     # Compare latest vs previous snapshot
llmdebug git-context              # On-demand enhanced git metadata
llmdebug git-context --json       # Enhanced git metadata as JSON
llmdebug hypothesize              # Auto-generate debugging hypotheses
llmdebug clean -k 5               # Keep only 5 most recent snapshots
```

All commands accept `--dir <path>` to point at a custom snapshot directory.

Requires the `cli` extra: `pip install llmdebug[cli]`

### Detail Levels

The `show` command defaults to **crash** level for minimal token usage. Use `--detail` to control verbosity:

| Level | Content | Typical Size |
|-------|---------|--------------|
| `crash` (default) | Exception + crash frame only | ~2K tokens |
| `full` | All frames + traceback | ~5K tokens |
| `context` | Everything (repro, git, env, coverage) | ~10K tokens |

### Snapshot Diffing

Compare two snapshots to see what changed between runs:

```bash
llmdebug diff                     # Compare latest vs previous
llmdebug diff old.json new.json   # Compare specific files
llmdebug diff --json              # Output diff as JSON
```

### Enhanced Git Context

Get richer git-aware debugging metadata on demand (without inflating snapshot capture payloads):

```bash
llmdebug git-context              # Latest snapshot, text view
llmdebug git-context --json       # JSON output for tooling
llmdebug git-context '#2'         # Specific snapshot reference
```

Outputs metadata only:
- crash-line blame metadata
- recent commit metadata + shortstats
- crash-file diffstat metadata

### Hypothesis Generation

Auto-generate ranked debugging hypotheses from snapshot patterns:

```bash
llmdebug hypothesize              # Analyze latest snapshot
llmdebug hypothesize --json       # Output as JSON array
```

The hypothesis engine includes 10 pattern detectors that identify common bug patterns (empty arrays, shape mismatches, None values, off-by-one errors, etc.) and provide actionable suggestions.

## MCP Server

`llmdebug` includes an MCP server for direct IDE integration (Claude Code, Cursor, etc.):

```bash
llmdebug-mcp  # Start the MCP server (stdio transport)
```

Install with: `pip install llmdebug[mcp]`

### Available Tools

| Tool | Description |
|------|-------------|
| `llmdebug_diagnose` | Concise crash summary optimized for LLM consumption |
| `llmdebug_show` | Full expanded JSON snapshot with detail level control |
| `llmdebug_list` | List available snapshots with metadata |
| `llmdebug_frame` | Detailed view of a specific stack frame |
| `llmdebug_git_context` | On-demand enhanced git metadata for crash triage |
| `llmdebug_diff` | Compare two snapshots to show what changed |
| `llmdebug_hypothesize` | Generate ranked debugging hypotheses |
| `llmdebug_rca_status` | Show latest RCA state for a session |
| `llmdebug_rca_history` | Show RCA attempt history |
| `llmdebug_rca_advance` | Manually advance RCA state machine |

`llmdebug_diagnose`/`llmdebug_show` support detail controls; RCA-related tools return JSON state payloads.

Notable MCP parameters:
- `llmdebug_show(raw_session=true)` returns the raw DebugSession envelope.
- `llmdebug_show(with_rca=true)` returns `{snapshot, rca}` JSON.
- `gate_mode=off|soft|strict` and `exploratory=true|false` are available on RCA-aware tools.

### RCA Workflow

MCP responses can include an RCA block (state, gate feedback, and next required steps).
Use `llmdebug_rca_status` and `llmdebug_rca_history` to inspect progression, or
`llmdebug_rca_advance` for custom/manual agent workflows.

RCA prompt contract reference: `docs/rca_prompt_contract.md`.

### Claude Code Configuration

Add to your project's `.mcp.json`:

```json
{
  "mcpServers": {
    "llmdebug": {
      "command": "llmdebug-mcp"
    }
  }
}
```

## Output

On failure, `.llmdebug/latest.json` stores a versioned DebugSession envelope:

```json
{
  "schema_version": "2.0",
  "kind": "llmdebug.debug_session",
  "session": {
    "name": "test_training_step",
    "timestamp_utc": "2026-01-27T14:30:52Z",
    "llmdebug_version": "2.3.0"
  },
  "snapshot": {
    "exception": {
      "type": "ValueError",
      "message": "operands could not be broadcast together..."
    },
    "frames": [
      {
        "file": "training.py",
        "line": 42,
        "function": "train_step",
        "code": "output = model(x) + residual",
        "locals": {
          "x": {"__array__": "jax.Array", "shape": [32, 64], "dtype": "float32"},
          "residual": {"__array__": "jax.Array", "shape": [32, 128], "dtype": "float32"}
        }
      }
    ]
  },
  "context": {
    "env": {"python": "3.12.0", "platform": "Darwin-24.0.0-arm64"}
  }
}
```

For compatibility, `get_latest_snapshot()` and loader APIs return a normalized flat view by default:

```python
from llmdebug import get_latest_snapshot

flat = get_latest_snapshot()  # default: normalized flat snapshot

from llmdebug.output import get_latest_snapshot as get_raw_snapshot
raw = get_raw_snapshot(normalize=False)  # raw DebugSession envelope
```

**Key features:**
- Crash frame is at index 0 (most relevant first)
- Arrays summarized with `shape` and `dtype` (not raw data)
- Source snippet around the failing line
- Environment info for reproducibility

### Snapshot Enrichment

Snapshots are automatically enriched with contextual data:

- **Schema metadata**: `schema_version`, `llmdebug_version`, `crash_frame_index`
- **Exception detail**: `qualified_type`, `args`, `notes`, `cause`, `context`, `exceptions` (ExceptionGroup), `error_category` with auto-classification and suggestions
- **Frame metadata**: `module`, `file_rel`, `locals_meta` (type/size hints), truncation markers
- **Git context**: commit hash, branch, dirty status
- **Pytest context**: `longrepr`, `capstdout`, `capstderr`, params, `repro` command
- **Coverage data**: executed/missing lines, branch stats (when pytest-cov is active)
- **Async context**: asyncio task name and state
- **Log records**: recent log entries (opt-in via `capture_logs=True`)
- **Capture config**: frames, locals_mode, truncation limits, redaction patterns

## For Claude Code / LLM Users

Add this to your project's `CLAUDE.md`:

```markdown
## Debug Snapshots (llmdebug)

This project uses `llmdebug` for structured crash diagnostics.

### On any failure:
1. **Read `.llmdebug/latest.json` first** (or run `llmdebug show --json`) - never patch before reading
2. Analyze the snapshot:
   - **Exception type/message** - what went wrong
   - **Crash frame (index 0)** - where it happened
   - **Locals** - variable values at crash time
   - **Array shapes** - look for empty arrays, shape mismatches
3. **Produce 2-4 ranked hypotheses** based on evidence
4. Apply minimal fix for the most likely hypothesis
5. Re-run to verify

### Key signals:
- `shape: [0, ...]` - empty array, upstream data issue
- `None` where object expected - initialization bug
- Shape mismatch in binary op - broadcasting error
- `i=10` with `len(arr)=10` - off-by-one

### When the snapshot isn't enough:
If locals show the symptom but not the cause:
1. Add `with snapshot_section("stage_x")` around suspect code
2. Re-run to get a better snapshot
3. Repeat hypothesis→patch loop

### Don't:
- Guess without reading the snapshot first
- Make multiple speculative changes at once
- Refactor until tests pass
```

## Configuration

```python
@debug_snapshot(
    out_dir=".llmdebug",           # Output directory
    frames=5,                       # Stack frames to capture
    source_context=3,               # Lines of source before/after crash
    source_mode="all",              # "all" | "crash_only" | "none"
    locals_mode="safe",             # "safe" | "meta" | "none"
    max_str=500,                    # Truncate long strings
    max_items=50,                   # Truncate large collections
    redaction_profile="dev",        # Optional: "dev" | "ci" | "prod"
    redact=[r"api_key=.*"],         # Regex patterns to redact
    redact_keys=False,              # Keep dict keys stable by default
    redact_traceback=False,         # Redact traceback text
    redact_exception_strings=False, # Redact exception message/args/notes
    include_env=True,               # Include Python/platform info
    max_snapshots=50,               # Auto-cleanup old snapshots (0 = unlimited)
    output_format="json_compact",   # "json" | "json_compact" | "toon"
    include_git=True,               # Git commit/branch/dirty status
    include_args=True,              # Separate function arguments from locals
    categorize_errors=True,         # Auto-classify errors with suggestions
    include_async_context=True,     # Asyncio task info
    include_array_stats=False,      # Compute min/max/mean/std for arrays
    capture_logs=False,             # Capture recent log records
    log_max_records=20,             # Max log records to capture
    include_coverage=True,          # Coverage data from pytest-cov
    include_modules=None,           # Filter frames by module prefix (None = all)
    max_exception_depth=5,          # Exception chain recursion limit
    lock_timeout=5.0,               # Seconds to wait for file lock
)
```

Redaction defaults to leaf string values only. This avoids accidental key collisions in nested dicts.
Set `redact_keys=True` only if you explicitly need key-name redaction and can accept possible key merging.

`redaction_profile` provides preset behavior:
- `dev`: minimal redaction defaults
- `ci`: stronger string redaction for non-local workflows
- `prod`: strictest defaults (includes traceback/exception-string redaction)

Profiles are additive: explicit `redact`/`redact_*` options always take precedence.

### Environment Variables

All configuration options can also be set via environment variables for pytest:

```bash
LLMDEBUG_OUTPUT_FORMAT=json pytest               # Use pretty JSON
LLMDEBUG_INCLUDE_GIT=false pytest                # Disable git context
LLMDEBUG_CAPTURE_LOGS=true pytest                # Enable log capture
LLMDEBUG_REDACTION_PROFILE=ci pytest             # Use CI redaction profile
LLMDEBUG_REDACT_TRACEBACK=true pytest            # Redact traceback text
LLMDEBUG_REDACT_EXCEPTION_STRINGS=true pytest    # Redact exception strings
```

### Output Formats

llmdebug supports multiple output formats to optimize for different use cases:

| Format | Size | Best For |
|--------|------|----------|
| `json` | baseline | Human readability, external tools |
| `json_compact` (default) | ~40% smaller | LLM context efficiency |
| `toon` | ~50% smaller | Maximum token savings |

**Compact JSON** uses abbreviated keys (e.g., `_exc` instead of `exception`) to reduce token usage. The `get_latest_snapshot()` function auto-expands keys and normalizes DebugSession envelopes by default, so your code works identically regardless of format.

### Pytest Opt-out

Skip snapshot capture for specific tests:

```python
import pytest

@pytest.mark.no_snapshot
def test_expected_failure():
    ...
```

## API

```python
from llmdebug import (
    # Capture
    debug_snapshot,          # Decorator for exception capture
    snapshot_section,        # Context manager for targeted capture
    get_latest_snapshot,     # Read the most recent snapshot (auto-expands keys)
    SnapshotConfig,          # Configuration dataclass
    RedactionProfile,        # Type alias: "dev" | "ci" | "prod"
    resolve_redaction_policy,# Resolve profile + explicit redaction settings

    # Analysis
    generate_hypotheses,     # Auto-generate debugging hypotheses from a snapshot
    Hypothesis,              # Hypothesis dataclass (confidence, pattern, evidence, suggestion)
    filter_snapshot,         # Layered disclosure: filter to crash/full/context detail
    DetailLevel,             # Type alias: "crash" | "full" | "context"

    # Production hooks
    install_hooks,           # Install sys.excepthook + thread + unraisable hooks
    uninstall_hooks,         # Restore original hooks
    PII_PATTERNS,            # Default PII redaction patterns (email, API keys, etc.)

    # Jupyter / IPython
    load_jupyter,            # Install into current IPython/Jupyter session

    # Web middleware
    LLMDebugWSGIMiddleware,  # WSGI middleware (Flask, Django)
    LLMDebugASGIMiddleware,  # ASGI middleware (FastAPI, Starlette)

    # Log capture
    enable_log_capture,      # Install log handler to capture recent records
)

# Read the most recent snapshot programmatically
snapshot = get_latest_snapshot()  # Returns dict or None

# Filter to minimal detail for LLM context
from llmdebug import filter_snapshot
filtered = filter_snapshot(snapshot, "crash")  # Exception + crash frame only

# Generate debugging hypotheses
from llmdebug import generate_hypotheses
hypotheses = generate_hypotheses(snapshot)
for h in hypotheses:
    print(f"[{h.confidence:.0%}] {h.description}")
    print(f"  Suggestion: {h.suggestion}")
```

## Project Docs

- Contributing guide: `CONTRIBUTING.md`
- Security policy: `SECURITY.md`
- Code of conduct: `CODE_OF_CONDUCT.md`

## License

MIT
