Metadata-Version: 2.4
Name: dd-datasets
Version: 1.2.3
Summary: Dataset building and processing tools for deepdoctection
Author: Dr. Janis Meyer
License: Apache License 2.0
Project-URL: Homepage, https://github.com/deepdoctection/deepdoctection
Project-URL: Documentation, https://deepdoctection.readthedocs.io
Project-URL: Repository, https://github.com/deepdoctection/deepdoctection
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: dd_core>=1.0
Requires-Dist: jsonlines==3.1.0
Provides-Extra: full
Requires-Dist: lxml>=4.9.1; extra == "full"
Provides-Extra: types
Requires-Dist: dd_core[types]; extra == "types"
Requires-Dist: lxml-stubs>=0.5.1; extra == "types"
Provides-Extra: dev
Requires-Dist: black==25.11.0; extra == "dev"
Requires-Dist: isort==7.0.0; extra == "dev"
Requires-Dist: pylint==4.0.2; extra == "dev"
Provides-Extra: test
Requires-Dist: hypothesis; extra == "test"
Requires-Dist: pytest==9.0.1; extra == "test"
Requires-Dist: pytest-cov==7.0.0; extra == "test"

<p align="center">
  <img src="https://github.com/deepdoctection/deepdoctection/raw/master/docs/_imgs/dd_logo.png" alt="Deep Doctection Logo" width="60%">
</p>

# deepdoctection-datasets

Categories and Datasets as well as some dataset instances for training models supported by deepdoctection.

## Overview

`dd-datasets` is a package that provides comprehensive dataset management capabilities for Document AI tasks. 

It includes:

- **datasets**: Built-in dataset definitions and dataflow builders for popular document understanding datasets. 
- **instances**: Pre-defined dataset instances for common document understanding tasks such as object detection, text 
                 classifications and named entity recognition.

## Installation

```bash
uv pip install dd-datasets
```

For using all datasets including those that require the xml-parsing tool lxml:

```bash
uv pip install dd-datasets[full]
```

## License

Apache License 2.0

## Author

Dr. Janis Meyer

