Metadata-Version: 2.4
Name: xparse-client
Version: 0.3.0b29
Summary: 面向 Agent 和 RAG 的文档处理 SDK
Author-email: INTSIG-TEXTIN <support@textin.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/textin/xparse-client
Project-URL: Repository, https://github.com/textin/xparse-client
Project-URL: Documentation, https://github.com/textin/xparse-client#readme
Keywords: xparse,pipeline,rag,document,parsing,textin
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: General
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx<1.0.0,>=0.24.0
Requires-Dist: pydantic<3.0.0,>=2.0.0
Requires-Dist: eval-type-backport>=0.2.0; python_version < "3.10"
Provides-Extra: s3
Requires-Dist: boto3>=1.26.0; extra == "s3"
Provides-Extra: milvus
Requires-Dist: pymilvus>=2.3.0; extra == "milvus"
Requires-Dist: milvus-lite; sys_platform != "win32" and extra == "milvus"
Provides-Extra: qdrant
Requires-Dist: qdrant-client>=1.7.0; extra == "qdrant"
Provides-Extra: smb
Requires-Dist: pysmb>=1.2.0; extra == "smb"
Provides-Extra: dotenv
Requires-Dist: python-dotenv>=0.21.0; extra == "dotenv"
Provides-Extra: all
Requires-Dist: boto3>=1.26.0; extra == "all"
Requires-Dist: pymilvus>=2.3.0; extra == "all"
Requires-Dist: milvus-lite; sys_platform != "win32" and extra == "all"
Requires-Dist: qdrant-client>=1.7.0; extra == "all"
Requires-Dist: pysmb>=1.2.0; extra == "all"
Requires-Dist: python-dotenv>=0.21.0; extra == "all"
Dynamic: license-file

# xParse Client

<h3 align="center">
  面向 Agent 和 RAG 的文档解析 Python SDK
</h3>

<div align="center">

[![PyPI version](https://badge.fury.io/py/xparse-client.svg)](https://badge.fury.io/py/xparse-client)
[![Python](https://img.shields.io/pypi/pyversions/xparse-client.svg)](https://pypi.org/project/xparse-client/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

</div>

---

## 目录

- [SDK 安装](#sdk-安装)
- [快速开始](#快速开始)
- [配置说明](#配置说明)
- [错误处理](#错误处理)
- [调试与日志](#调试与日志)
- [本地开发](#本地开发)
- [相关资源](#相关资源)

---

## SDK 安装

> [!NOTE]
> 本 SDK 支持 Python 3.9 及以上版本。

### uv（推荐）

```bash
uv add xparse-client
```

### pip

```bash
pip install xparse-client
```

---

## 快速开始

### API 概览

| API | 用途 | 返回值 |
|-----|------|--------|
| `client.parse.run()` | 同步解析文档 | `ParseResponse` |
| `client.parse.create_job()` | 创建异步解析任务 | `AsyncJobResponse` |
| `client.parse.get_job()` | 查询异步任务状态 | `JobStatusResponse` |
| `client.parse.wait_job()` | 轮询等待异步任务终态 | `JobStatusResponse` |

### 1. 环境配置

```bash
export TEXTIN_APP_ID="your-app-id"
export TEXTIN_SECRET_CODE="your-secret-code"
```

可以在 [TextIn 开发者控制台](https://www.textin.com/console/dashboard/setting) 获取认证凭证。

### 2. 同步解析

```python
from xparse_client import XParseClient, ParseConfig, Capabilities, Scope

client = XParseClient()

with open("document.pdf", "rb") as f:
    result = client.parse.run(
        file=f,
        filename="document.pdf",
        config=ParseConfig(
            capabilities=Capabilities(
                include_table_structure=True,
                title_tree=True,
            ),
            scope=Scope(page_range="1-10"),
        ),
    )

print(f"解析出 {len(result.elements)} 个元素")

# 访问 markdown
if result.markdown:
    print(result.markdown)

# 遍历元素
for el in result.elements:
    print(f"[{el.type}] {el.text[:80]}")
```

### 3. 异步任务

处理大文件时使用服务端异步任务：

```python
with open("large_document.pdf", "rb") as f:
    job = client.parse.create_job(
        file=f,
        filename="large_document.pdf",
        webhook="https://example.com/callback",  # 可选
    )

print(f"任务已创建: {job.job_id}")

result = client.parse.wait_job(job_id=job.job_id, timeout=300.0, poll_interval=5.0)

if result.is_completed:
    # 异步任务返回 result_url，需要另外下载获取解析结果
    import httpx
    resp = httpx.get(result.result_url)
    print(resp.json())
```

---

## 配置说明

### 认证配置

SDK 按以下优先级自动解析凭证：**构造参数 > 环境变量 > .env 文件**

```python
# 方式 1：环境变量 + 无参构造（推荐）
client = XParseClient()

# 方式 2：直接传参
client = XParseClient(
    app_id="your-app-id",
    secret_code="your-secret-code",
)

# 方式 3：.env 文件（需安装 pip install xparse-client[dotenv]）
client = XParseClient()
```

### 超时和重试

```python
client = XParseClient(
    timeout=120.0,      # 请求超时时间（秒），默认 630
    max_retries=3,      # 最大重试次数，默认 3
)
```

### 自定义 API 地址

```python
client = XParseClient(
    server_url="https://custom-api.example.com"
)
```

### 自定义 HTTP 客户端

可以传入 `httpx.Client` 来自定义代理、SSL 证书等底层网络配置，SDK 会自动处理认证、重试和错误映射：

```python
import httpx

http_client = httpx.Client(
    proxy="http://proxy.example.com:8080",
    verify="/path/to/custom-ca.pem",
)

client = XParseClient(
    app_id="your-app-id",
    secret_code="your-secret-code",
    http_client=http_client,
)
```

### 资源管理

```python
with XParseClient() as client:
    result = client.parse.run(...)
    # 退出时自动关闭连接
```

---

## 错误处理

### 错误类层次

**HTTP 层错误：**

| 错误类 | 说明 |
|--------|------|
| `XParseClientError` | 基础错误类，捕获所有 SDK 错误 |
| `ValidationError` | 客户端参数验证失败 |
| `ServerError` | 服务器错误 (HTTP 5xx) |
| `APIError` | API 请求错误（基类） |

**业务层错误（HTTP 200 + 业务 code）：**

| 错误类 | 业务码 | 说明 |
|--------|--------|------|
| `AuthenticationError` | 40101/40102 | 认证失败 |
| `PermissionDeniedError` | 40103 | IP 不在白名单 |
| `InsufficientBalanceError` | 40003 | 余额不足 |
| `InvalidParameterError` | 40004 | 参数错误 |
| `UnsupportedFileTypeError` | 40301 | 文件类型不支持 |
| `FileSizeError` | 40302 | 文件过大（限制 500MB） |
| `CorruptedFileError` | 40422 | 文件损坏 |
| `PasswordProtectedError` | 40423 | PDF 需要密码 |
| `ServiceUnavailableError` | 30203 | 服务暂时不可用 |

### 错误处理示例

```python
from xparse_client.exceptions import (
    XParseClientError, BusinessError, AuthenticationError, APIError
)

try:
    with open("document.pdf", "rb") as f:
        result = client.parse.run(file=f, filename="document.pdf")
except AuthenticationError as e:
    print(f"认证失败: {e.message}")
except BusinessError as e:
    print(f"业务错误 [{e.business_code}]: {e.message}, x_request_id={e.x_request_id}")
except APIError as e:
    print(f"API 错误 [HTTP {e.status_code}]: {e.message}, x_request_id={e.x_request_id}")
except XParseClientError as e:
    print(f"SDK 错误: {e.message}")
```

### 获取请求 ID

每个 API 请求都会返回 `x_request_id`，联系技术支持时提供此 ID 可加快问题定位：

```python
result = client.parse.run(file=f, filename="document.pdf")
print(f"x_request_id={result.x_request_id}")
```

---

## 调试与日志

```python
import logging
logging.getLogger("xparse_client").setLevel(logging.DEBUG)
```

---

## 本地开发

```bash
git clone https://github.com/intsig-textin/xparse-python-client.git
cd xparse-python-client

uv sync --dev
make test
make format
```

### 常用命令

```bash
make test          # 运行所有测试
make test-unit     # 运行单元测试
make test-cov      # 代码覆盖率
make format        # 代码格式化
make lint          # 代码检查
```

---

## 相关资源

- [完整文档](https://docs.textin.com/pipeline/overview) | [GitHub](https://github.com/intsig-textin/xparse-python-client) | [PyPI](https://pypi.org/project/xparse-client/)
- [TextIn 开发者控制台](https://www.textin.com/console/dashboard/setting) | [问题反馈](https://github.com/intsig-textin/xparse-python-client/issues)

### 故障排查

| 问题 | 解决方案 |
|------|----------|
| `AuthenticationError` | 检查 `TEXTIN_APP_ID` 和 `TEXTIN_SECRET_CODE` |
| `FileSizeError` | 文件限制 500MB |
| `TimeoutException` | 增加超时：`XParseClient(timeout=300.0)` |

---

## 许可证

[MIT License](LICENSE)
