Metadata-Version: 2.4
Name: mineru-python-client
Version: 0.1.0
Summary: Practical Python client for MinerU Precision and Agent parsing APIs
Author: JimEverest
License: MIT
Project-URL: Homepage, https://github.com/JimEverest/mineru-python-client
Project-URL: Repository, https://github.com/JimEverest/mineru-python-client
Project-URL: Issues, https://github.com/JimEverest/mineru-python-client/issues
Keywords: mineru,pdf,html,ocr,parser,api
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31.0
Dynamic: license-file

# MinerU Python client

A practical wrapper around MinerU's asynchronous APIs, upgraded for production-style usage.

What it handles for you:
- local file upload via MinerU signed URLs
- remote URL submission
- async polling until completion
- Agent Lightweight API and Precision API
- HTML routing to `MinerU-HTML` for precision mode
- optional markdown download for Agent tasks
- precision result zip download + auto-unzip
- easy access to `full.md`, `full.html`, `layout.json`, content/model JSON paths
- callback checksum generation and verification helpers

## Installation

```bash
pip install mineru-python-client
```

Or from source:

```bash
git clone https://github.com/JimEverest/mineru-python-client.git
cd mineru-python-client
pip install -e .
```

## Files

- `mineru_client.py` — main client implementation
- `run_mineru_demo.py` — CLI-style example runner
- `tests/test_mineru_client.py` — unit tests using a fake HTTP session

## Quick start

```python
from mineru_client import MinerUClient

client = MinerUClient(token='YOUR_TOKEN', poll_interval=5, timeout=600, request_timeout=60)
result = client.precision_parse_local_files(
    ['/path/to/document.pdf'],
    extra_formats=['html'],
)
print(result[0].full_zip_url)
```

## Production bundle example

This is the easiest production-style path for local files because it:
- uploads
- waits for completion
- downloads the zip
- extracts it
- gives you direct file paths

```python
from mineru_client import MinerUClient

client = MinerUClient(token='YOUR_TOKEN', poll_interval=5, timeout=600)
bundle = client.precision_parse_local_bundle(
    '/path/to/document.pdf',
    output_dir='./mineru_output',
    extra_formats=['html'],
)

print(bundle.zip_path)
print(bundle.extract_dir)
print(bundle.markdown_path)
print(bundle.html_path)
print(bundle.layout_path)
```

## Callback signature verification

```python
from mineru_client import build_callback_checksum, verify_callback_signature

checksum = build_callback_checksum(uid, seed, content)
assert verify_callback_signature(uid, seed, content, checksum)
```

## CLI examples

Precision local file:

```bash
MINERU_TOKEN=*** python3 run_mineru_demo.py \
  --mode precision-local \
  --input '/path/to/document.pdf' \
  --poll-interval 5 \
  --timeout 600 \
  --request-timeout 60 \
  --extra-format html
```

Precision local bundle download + unzip:

```bash
MINERU_TOKEN=*** python3 run_mineru_demo.py \
  --mode precision-local-bundle \
  --input '/path/to/document.pdf' \
  --bundle-output-dir './mineru_output' \
  --poll-interval 5 \
  --timeout 600 \
  --extra-format html
```

Agent local file:

```bash
python3 run_mineru_demo.py \
  --mode agent-local \
  --input '/path/to/small.pdf' \
  --download-markdown
```

## API notes

- Precision API requires a token.
- Agent API does not require a token, but is limited to small single files and does not support HTML.
- MinerU parsing is asynchronous; this wrapper uploads/submits first, then polls until `done` or `failed`.
- Precision local uploads use `/api/v4/file-urls/batch` even for a single file.
- Agent local uploads use `/api/v1/agent/parse/file` and then PUT the file to the returned signed URL.
- The wrapper validates local files before creating remote tasks.
- The wrapper requires HTTPS for signed upload URLs and result URLs.
- Duplicate local basenames are automatically assigned unique `data_id` values.
