Metadata-Version: 2.4
Name: thinkpdf
Version: 1.0.4
Summary: PDF to Markdown engine for LLMs. Smart table extraction, OCR, MCP server.
Author-email: Augusto Cesar Perin <augustocesarperin@abstratuslabs.com>
License: AGPL-3.0
Project-URL: Homepage, https://github.com/augustocesarperin/thinkpdf
Project-URL: Repository, https://github.com/augustocesarperin/thinkpdf
Keywords: pdf,markdown,converter,ocr,tables,math,latex,llm,ai,mcp,cursor,docling
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Text Processing :: Markup :: Markdown
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pymupdf>=1.23.0
Provides-Extra: docling
Requires-Dist: docling>=2.0.0; extra == "docling"
Provides-Extra: gui
Requires-Dist: customtkinter>=5.2.0; extra == "gui"
Requires-Dist: Pillow>=10.0.0; extra == "gui"
Provides-Extra: ocr
Requires-Dist: pytesseract>=0.3.10; extra == "ocr"
Provides-Extra: cli
Requires-Dist: rich>=13.0.0; extra == "cli"
Provides-Extra: full
Requires-Dist: docling>=2.0.0; extra == "full"
Requires-Dist: customtkinter>=5.2.0; extra == "full"
Requires-Dist: Pillow>=10.0.0; extra == "full"
Requires-Dist: pytesseract>=0.3.10; extra == "full"
Requires-Dist: rich>=13.0.0; extra == "full"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Dynamic: license-file

# thinkpdf

Extract text, tables, and structure from PDFs. Built for RAG pipelines, AI training, and LLM context.

Read directly into memory or save as Markdown. Supports OCR.

## Install

```bash
pip install thinkpdf
```

For better table extraction (but sloooower):
```bash
pip install thinkpdf[docling]
```

## Quick Start

```bash
thinkpdf document.pdf                # outputs document.md
thinkpdf document.pdf -o output.md   # custom output
thinkpdf folder/ --batch             # convert all PDFs
```

```python
from thinkpdf import convert
convert("document.pdf")  # returns markdown
```

## GUI

```bash
pip install thinkpdf[gui]
thinkpdf-gui
```

## MCP Server

Add to your MCP config:

```json
{
  "mcpServers": {
    "thinkpdf": {
      "command": "python",
      "args": ["-m", "thinkpdf.mcp_server"]
    }
  }
}
```

| Tool | Description |
|------|-------------|
| `read_pdf` | Read PDF content into context |
| `convert_pdf` | Convert and save to file |
| `get_document_info` | Get PDF metadata |

## License

AGPL-3.0
