Metadata-Version: 2.4
Name: llm-response-validator
Version: 0.1.0
Summary: Validate LLM responses against schemas, types, and constraints. Catch bad JSON, missing fields, and hallucinated formats before they crash your app.
Author-email: Zach <zacharie@astera.org>
License: MIT
Project-URL: Homepage, https://github.com/zachbg/llm-response-validator
Keywords: llm,validation,json,schema,openai,ai,structured-output
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# llm-response-validator

**Validate LLM responses against schemas, types, and constraints.** Catch invalid JSON, missing fields, wrong types, and hallucinated formats before they crash your pipeline.

## The Pain

You ask GPT-4 for JSON and it wraps it in markdown code blocks. Or returns 4 fields instead of 5. Or puts a string where you need an int. Your downstream code crashes at 3 AM.

## Install

```bash
pip install llm-response-validator
```

## Quick Start

```python
from llm_response_validator import validate, extract_json, ensure

# Extract JSON from LLM output (handles markdown blocks, extra text, etc.)
raw = '''Here's the data:
```json
{"name": "Alice", "age": 30, "scores": [95, 87]}
```
'''
data = extract_json(raw)  # {"name": "Alice", "age": 30, "scores": [95, 87]}

# Validate against a schema
schema = {
    "name": {"type": "string", "required": True},
    "age": {"type": "int", "min": 0, "max": 150},
    "email": {"type": "string", "required": True},
    "scores": {"type": "list", "min_length": 1},
}
result = validate(data, schema)
print(result.valid)      # False
print(result.errors)     # ["Missing required field: email"]

# ensure() - validate or raise
data = ensure(raw, schema)  # Extracts JSON + validates, raises on failure
```

## Schema Definition

```python
schema = {
    "field_name": {
        "type": "string",       # string, int, float, bool, list, dict, any
        "required": True,       # Field must be present (default: False)
        "min": 0,               # Min value for numbers
        "max": 100,             # Max value for numbers
        "min_length": 1,        # Min length for strings/lists
        "max_length": 500,      # Max length for strings/lists
        "pattern": r"^\w+$",    # Regex pattern for strings
        "enum": ["a", "b"],     # Allowed values
        "default": "unknown",   # Default if missing (makes it not required)
        "items": {              # Schema for list items
            "type": "string"
        },
        "properties": {         # Schema for nested dict
            "sub_field": {"type": "int"}
        }
    }
}
```

## API

```python
from llm_response_validator import (
    validate,       # Validate dict against schema
    extract_json,   # Extract JSON from messy LLM output
    ensure,         # Extract + validate + raise on error
    repair_json,    # Attempt to fix common JSON errors
    ValidationResult,
    ValidationError,
)

# Extract JSON (handles code blocks, extra text, multiple objects)
data = extract_json(llm_output)              # Returns dict/list or None
data = extract_json(llm_output, default={})  # With default

# Repair common JSON issues
fixed = repair_json('{"name": "test",}')     # Removes trailing comma
fixed = repair_json("{'name': 'test'}")      # Fixes single quotes

# Validate
result = validate(data, schema)
result.valid          # bool
result.errors         # list of error strings
result.warnings       # list of warning strings
result.cleaned_data   # data with defaults applied and types coerced

# Ensure (extract + validate + raise)
data = ensure(llm_output, schema)  # Returns cleaned data or raises
```

## Features

- **JSON extraction** — pulls JSON from code blocks, mixed text, multiple formats
- **JSON repair** — fixes trailing commas, single quotes, unquoted keys, missing brackets
- **Type validation** — string, int, float, bool, list, dict with coercion
- **Nested schemas** — validate deeply nested structures
- **Range/length checks** — min, max, min_length, max_length
- **Pattern matching** — regex validation for string fields
- **Enum validation** — restrict to allowed values
- **Defaults** — fill missing fields with defaults
- **Zero dependencies** — pure Python, stdlib only

## License

MIT
