Metadata-Version: 2.4
Name: proofpudding
Version: 0.1.5
Summary: Python SDK for the ProofPudding API - Document processing and Q&A
Project-URL: Homepage, https://proofpudding.ai
Project-URL: Documentation, https://proofpudding.ai/docs
Author-email: Proofpudding Team <support@proofpudding.ai>
Keywords: api,document-processing,docx,pdf,proofpudding,question-answering,sdk
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: httpx>=0.25.0
Requires-Dist: pydantic>=2.0
Provides-Extra: dev
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: respx>=0.20.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# Proofpudding SDK

A Python SDK for the Proofpudding API - Document processing and question answering.

[![PyPI version](https://badge.fury.io/py/proofpudding.svg)](https://badge.fury.io/py/proofpudding)
[![Python Versions](https://img.shields.io/pypi/pyversions/proofpudding.svg)](https://pypi.org/project/proofpudding/)

## Installation

```bash
pip install proofpudding
```

## Requirements

- Python 3.10+
- httpx >= 0.25.0
- pydantic >= 2.0

## Quick Start

```python
from pudding import PuddingClient
from pudding.exceptions import NotFoundError, ValidationError

# Initialize client
client = PuddingClient(access_token="pk_your_api_key")

# Check API health
health = client.health.check()
print(f"API Status: {health.status}")

# Upload a document (.pdf or .docx, max 100MB)
doc = client.documents.upload(file_path="./contract.pdf")
print(f"Uploaded: {doc.id}")

# Ask a question about the document
job = client.jobs.create(
    document_id=doc.id,
    question="What is the effective date of this contract?"
)

if job.success:
    print(f"Answer: {job.result.answer}")
    print(f"Confidence: {job.result.confidence}")
    for citation in job.result.citations:
        print(f"  - Page {citation.page}: {citation.quote}")
else:
    print(f"Failed: {job.error}")

# List all documents with pagination
docs = client.documents.list(skip=0, limit=10)
print(f"Total documents: {docs.total}")
for doc in docs.items:
    print(f"  - {doc.filename} ({doc.size_bytes} bytes)")

# List jobs for a specific document
jobs = client.jobs.list(document_id=doc.id)

# Delete a document (also deletes associated jobs)
try:
    deleted = client.documents.delete(document_id=doc.id)
    print(f"Deleted: {deleted.filename}")
except NotFoundError:
    print("Document not found")
```

## Async Support

The SDK provides both synchronous and asynchronous clients:

```python
from pudding import PuddingClient, AsyncPuddingClient

# Synchronous client
client = PuddingClient(access_token="pk_your_api_key")
docs = client.documents.list()

# Asynchronous client
async_client = AsyncPuddingClient(access_token="pk_your_api_key")
docs = await async_client.documents.list()
```

## Context Manager Support

Both clients support context managers for proper resource cleanup:

```python
# Synchronous
with PuddingClient(access_token="pk_your_api_key") as client:
    docs = client.documents.list()

# Asynchronous
async with AsyncPuddingClient(access_token="pk_your_api_key") as client:
    docs = await client.documents.list()
```

## Configuration

### Initialization Options

```python
client = PuddingClient(
    access_token="pk_your_api_key",  # Required: Your API key
    timeout=1800.0,                   # Optional: Request timeout in seconds (default: 1800)
    max_retries=3,                    # Optional: Max retries for transient errors (default: 3)
)
```

### Custom Timeout for Jobs

Job creation is a blocking operation that waits for document processing. The default timeout is 1800 seconds (30 minutes):

```python
# Use a custom timeout for the entire client
client = PuddingClient(access_token="...", timeout=3600.0)  # 1 hour

# Or override per-call
job = client.jobs.create(
    document_id="uuid-string",
    question="...",
    timeout=900.0,  # 15 minutes for this call only
)
```

## API Reference

### Health Checks

```python
# Check API health
health = client.health.check()
print(health.status)      # "healthy"
print(health.version)     # API version
print(health.environment) # Environment name

# Check readiness (includes database status)
ready = client.health.ready()
print(ready.status)    # "ready" or "not_ready"
print(ready.database)  # "connected" or error message
```

### Documents

Supported file types: `.pdf`, `.docx` (max 100MB).

```python
# Upload a document from file path
doc = client.documents.upload(file_path="/path/to/document.pdf")

# Upload a .docx file
doc = client.documents.upload(file_path="/path/to/report.docx")

# Upload a document from bytes
with open("document.pdf", "rb") as f:
    doc = client.documents.upload(file=f.read(), filename="document.pdf")

# List documents with pagination
doc_list = client.documents.list(skip=0, limit=20)
print(f"Total: {doc_list.total}")
for doc in doc_list.items:
    print(f"{doc.id}: {doc.filename}")

# Delete a document
deleted_doc = client.documents.delete(document_id="uuid-string")
```

### Jobs

```python
# Create a job (process document with a question)
job = client.jobs.create(
    document_id="uuid-string",
    question="What is the total revenue mentioned in this document?"
)

# Access job results
if job.success:
    print(job.result.answer)
    print(job.result.confidence)  # "high", "medium", "low", or "not_found"
    print(job.processing_time_ms)  # Processing duration in ms
    for citation in job.result.citations:
        print(f"Page {citation.page}: {citation.quote}")

    # Cost information (if available)
    if job.usage:
        print(f"Cost: {job.usage.total_cost_cents} cents")

# List all jobs
job_list = client.jobs.list(skip=0, limit=20)

# List jobs for a specific document
job_list = client.jobs.list(document_id="uuid-string")
```

### Structured Output

Use `JobConfig` with `OutputSchemaConfig` to get structured JSON output:

```python
from pudding.models import JobConfig, OutputSchemaConfig

config = JobConfig(
    output_schema=OutputSchemaConfig(
        schema={
            "type": "object",
            "properties": {
                "effective_date": {"type": "string"},
                "parties": {"type": "array", "items": {"type": "string"}},
            },
        },
        strict=True,              # Fail if schema can't be satisfied (default: True)
        include_citations=True,   # Include citations in response (default: True)
        include_raw_answer=False, # Include raw text answer (default: False)
    ),
    verify_citations=True,        # Verify citations against source (default: True)
    reasoning_effort="auto",      # "auto", "low", or "high" (default: "auto")
)

job = client.jobs.create(
    document_id="uuid-string",
    question="Extract the effective date and parties",
    config=config,
)

if job.success and job.result.structured_output:
    print(job.result.structured_output)
    # {"effective_date": "2025-01-01", "parties": ["Acme Corp", "Widget Inc"]}
```

### Streaming

Use `create_stream` for real-time progress updates via Server-Sent Events:

```python
from pudding.models import (
    DownloadingEvent,
    ProcessingEvent,
    ThinkingEvent,
    StepEvent,
    VerifyingEvent,
    CompleteEvent,
    ErrorEvent,
)

# Async streaming
async for event in client.jobs.create_stream(
    document_id="uuid-string",
    question="What is the effective date?"
):
    if isinstance(event, DownloadingEvent):
        print(f"Downloading: {event.progress}%")
    elif isinstance(event, ProcessingEvent):
        print(f"Processing: {event.pages_done}/{event.total_pages}")
    elif isinstance(event, ThinkingEvent):
        print(f"Thinking (iteration {event.iteration})...")
    elif isinstance(event, StepEvent):
        print(f"Step: {event.message}")
    elif isinstance(event, VerifyingEvent):
        print(f"Verifying citations...")
    elif isinstance(event, CompleteEvent):
        print(f"Done: {event.result}")
    elif isinstance(event, ErrorEvent):
        print(f"Error [{event.error_code}]: {event.message}")
```

## Exception Handling

The SDK provides a comprehensive exception hierarchy:

```python
from pudding.exceptions import (
    PuddingError,        # Base exception for all SDK errors
    AuthenticationError, # 401: Invalid or missing API key
    NotFoundError,       # 404: Resource not found
    ValidationError,     # 400: Invalid request
    RateLimitError,      # 429: Too many requests
    ServerError,         # 500: Internal server error
    GatewayError,        # 502: Upstream service error
    TimeoutError,        # 504: Request timed out
)

try:
    doc = client.documents.upload(file_path="./document.txt")
except ValidationError as e:
    print(f"Validation failed: {e.message}")  # "Only .pdf, .docx files are allowed"
except AuthenticationError as e:
    print(f"Auth failed: {e.message}")
except PuddingError as e:
    print(f"API error ({e.status_code}): {e.message}")
```

## Logging

The SDK uses Python's standard logging module. Configure logging level as needed:

```python
import logging

# Enable debug logging for the SDK
logging.getLogger("pudding").setLevel(logging.DEBUG)
```

## Development

### Setup

```bash
# Clone the repository
git clone https://github.com/pudding-ai/pudding-sdk.git
cd pudding-sdk

# Install with development dependencies
pip install -e ".[dev]"
```

### Running Tests

```bash
pytest
```

### Running Tests with Coverage

```bash
pytest --cov=pudding --cov-report=html
```

### Type Checking

```bash
mypy src/pudding
```

### Linting

```bash
ruff check src/pudding tests
```
