Metadata-Version: 2.4
Name: diffpdf
Version: 0.3.0
Summary: A tool for comparing PDF files
Project-URL: Homepage, https://github.com/JustusRijke/DiffPDF
Project-URL: Issues, https://github.com/JustusRijke/DiffPDF/issues
Author-email: Justus Rijke <justusrijke@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Requires-Dist: click
Requires-Dist: colorlog
Requires-Dist: pillow>=10.0.0
Requires-Dist: pixelmatch-fast>=1.1.0
Requires-Dist: pymupdf>=1.23.0
Provides-Extra: dev
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

# DiffPDF

[![Build](https://github.com/JustusRijke/DiffPDF/actions/workflows/build.yml/badge.svg)](https://github.com/JustusRijke/DiffPDF/actions/workflows/build.yml)
[![codecov](https://codecov.io/gh/JustusRijke/DiffPDF/graph/badge.svg?token=O3ZJFG6X7A)](https://codecov.io/gh/JustusRijke/DiffPDF)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

CLI tool for detecting structural, textual, and visual differences between PDF files, for use in automatic regression tests.

## How It Works

DiffPDF uses a fail-fast sequential pipeline to compare PDFs:

1. **Hash Check** - SHA-256 comparison. If identical, exit immediately with pass.
2. **Page Count** - Verify both PDFs have the same number of pages.
3. **Text Content** - Extract and compare text from all pages (ignoring whitespace).
4. **Visual Check** - Render pages to images and compare using [pixelmatch-fast](https://pypi.org/project/pixelmatch-fast/).

Each stage only runs if all previous stages pass.

## Installation

```bash
pip install diffpdf
```

## CLI Usage
```
Usage: diffpdf [OPTIONS] REFERENCE ACTUAL

  Compare two PDF files for structural, textual, and visual differences.

Options:
  --threshold FLOAT       Pixelmatch threshold (0.0-1.0)
  --dpi INTEGER           Render resolution
  --output-dir DIRECTORY  Diff image output directory (optional, if not specified no diff images are saved)
  -v, --verbose           Increase verbosity (-v for INFO, -vv for DEBUG)
  --save-log              Write log output to log.txt
  --version               Show the version and exit.
  --help                  Show this message and exit.
```

**Exit Codes**

- `0` — Pass (PDFs are equivalent)
- `1` — Fail (differences detected)
- `2` — Error (invalid input or processing error)

## Library Usage

```python
from diffpdf import diffpdf

# Basic usage (no diff images saved)
diffpdf("reference.pdf", "actual.pdf")

# With options (save diff images to ./output directory)
diffpdf("reference.pdf", "actual.pdf", output_dir="./output", threshold=0.2, dpi=150, verbosity=2)
```

## Development

```bash
pip install -e .[dev]
pytest tests/ -v
ruff check .
```

## Acknowledgements

Built with [PyMuPDF](https://pymupdf.readthedocs.io/) for PDF parsing and [pixelmatch-fast](https://pypi.org/project/pixelmatch-fast/) (Python port of [pixelmatch](https://github.com/mapbox/pixelmatch)) for visual comparison.
