Metadata-Version: 2.4
Name: deepdoctection
Version: 1.0.6
Summary: Repository for Document AI - server/inference core package
Author: Dr. Janis Meyer
License: Apache License 2.0
Project-URL: Homepage, https://github.com/deepdoctection/deepdoctection
Project-URL: Documentation, https://deepdoctection.readthedocs.io
Project-URL: Repository, https://github.com/deepdoctection/deepdoctection
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: dd-core[full]>=1.0.1
Requires-Dist: huggingface_hub>=0.26.0
Provides-Extra: full
Requires-Dist: dd-datasets[full]>=1.0.1; extra == "full"
Requires-Dist: boto3==1.34.102; extra == "full"
Requires-Dist: pdfplumber>=0.11.0; extra == "full"
Requires-Dist: jdeskew>=0.2.2; extra == "full"
Requires-Dist: networkx>=2.7.1; extra == "full"
Requires-Dist: apted==1.0.3; extra == "full"
Requires-Dist: distance==0.1.3; extra == "full"
Requires-Dist: lxml>=4.9.1; extra == "full"
Requires-Dist: pycocotools>=2.0.2; extra == "full"
Requires-Dist: timm>=0.9.16; extra == "full"
Requires-Dist: transformers<5.0.0,>=4.48.0; extra == "full"
Requires-Dist: accelerate>=0.29.1; extra == "full"
Requires-Dist: python-doctr>=1.0.0; extra == "full"
Provides-Extra: types
Requires-Dist: dd_core[types]; extra == "types"
Requires-Dist: lxml-stubs>=0.5.1; extra == "types"
Provides-Extra: dev
Requires-Dist: black==25.11.0; extra == "dev"
Requires-Dist: isort==7.0.0; extra == "dev"
Requires-Dist: pylint==4.0.2; extra == "dev"
Requires-Dist: mypy==1.4.1; extra == "dev"
Requires-Dist: types-PyYAML>=6.0.12.12; extra == "dev"
Requires-Dist: types-termcolor>=1.1.3; extra == "dev"
Requires-Dist: types-tabulate>=0.9.0.3; extra == "dev"
Requires-Dist: types-tqdm>=4.66.0.5; extra == "dev"
Requires-Dist: types-Pillow>=10.2.0.20240406; extra == "dev"
Requires-Dist: types-urllib3>=1.26.25.14; extra == "dev"
Requires-Dist: lxml-stubs>=0.5.1; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest==9.0.1; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: docs
Requires-Dist: mkdocs-material==9.7.0; extra == "docs"
Requires-Dist: mkdocstrings-python==1.19.0; extra == "docs"
Requires-Dist: griffe==1.13; extra == "docs"

<p align="center">
  <img src="https://github.com/deepdoctection/deepdoctection/raw/master/docs/_imgs/dd_logo.png" alt="Deep Doctection Logo" width="60%">
</p>


# deepdoctection

**deepdoctection** is the main package for running and training models. It provides the
pipeline framework, model wrappers, built-in pipelines, training scripts and evaluation methods.

The base package only installs the necessary dependencies for running inference with some selected models. 
For training, evaluating as well as running all available models, the full package needs to be installed. 

## Overview

- **analyzer**: Configuration and factory functions for creating document analysis pipelines and the built-in analyzer.
- **configs**: YAML configuration for pipelines and model profiles for the model catalogue.
- **extern**: External model wrappers (Detectron2, DocTr, HuggingFace Transformers, Tesseract, PdfPlumber, etc.)
- **pipe**: Pipeline components and services.
- **eval**: Evaluation metrics and Evaluator.
- **train**: Training utilities and training scripts for Detectron2 and selected Transformer models.


## Installation

### Basic Installation

For inference use cases, install the base package:

```bash
uv pip install deepdoctection
```

**Important**: Various dependencies must be installed separately:

- **PyTorch**: Follow instructions at https://pytorch.org/get-started/locally/ according to your os and hardware.
- **Transformers**: `pip install transformers>=4.48.0` (if using HF models)
- **Timm**: `pip install timm>=0.9.16` (necessary for if using some dedicated HF models)
- **DocTr**: `pip install python-doctr>=1.0.0` (if using DocTr models)
- **Detectron2**: Follow instructions at https://detectron2.readthedocs.io/en/latest/tutorials/install.html
- **PDFPlumber**: `pip install pdfplumber>=0.11.0`
- **JDeskew**: `pip install jdeskew>=0.2.2`
- **Boto3**: `pip install boto3==1.34.102`

For running evaluation with various metrics you can also install in then use:

- **APTED**: `pip install apted==1.0.3`
- **Distance**: `pip install distance==0.1.3`
- **Pycocotools**: `pip install pycocotools>=2.0.2`

Image processing is supported by PIL or OpenCV. PIL is used by default and will always be installed. If 
you prefer to use OpenCV, you can install it:

- **OpenCV**: `pip install opencv-python==4.8.0.76`


### Full Installation (Training & Evaluation)

For a one large install with all dependencies (except PyTorch), run:

```bash
uv pip install deepdoctection[full]
```

### Development Installation

For development purpose use clone the repository and install in editable mode.

## License

Apache License 2.0

## Author

Dr. Janis Meyer
