Metadata-Version: 2.1
Name: docowling
Version: 1.0.0
Summary: SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.
Home-page: https://github.com/mouraworks/docowling
License: MIT
Keywords: docowling,convert,document,pdf,docx,html,markdown,layout model,segmentation,table structure,table former
Author: Christoph Auer
Author-email: cau@zurich.ibm.com
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Provides-Extra: ocrmac
Provides-Extra: rapidocr
Provides-Extra: tesserocr
Requires-Dist: beautifulsoup4 (>=4.12.3,<5.0.0)
Requires-Dist: certifi (>=2024.7.4)
Requires-Dist: deepsearch-glm (>=1.0.0,<2.0.0)
Requires-Dist: docling-core[chunking] (>=2.12.1,<3.0.0)
Requires-Dist: docling-ibm-models (>=3.1.0,<4.0.0)
Requires-Dist: docling-parse (>=3.0.0,<4.0.0)
Requires-Dist: easyocr (>=1.7,<2.0)
Requires-Dist: filetype (>=1.2.0,<2.0.0)
Requires-Dist: huggingface_hub (>=0.23,<1)
Requires-Dist: lxml (>=4.0.0,<6.0.0)
Requires-Dist: marko (>=2.1.2,<3.0.0)
Requires-Dist: ocrmac (>=1.0.0,<2.0.0) ; (sys_platform == "darwin") and (extra == "ocrmac")
Requires-Dist: onnxruntime (>=1.7.0,<1.20.0) ; (python_version < "3.10") and (extra == "rapidocr")
Requires-Dist: onnxruntime (>=1.7.0,<2.0.0) ; (python_version >= "3.10") and (extra == "rapidocr")
Requires-Dist: openpyxl (>=3.1.5,<4.0.0)
Requires-Dist: pandas (>=2.1.4,<3.0.0)
Requires-Dist: pydantic (>=2.0.0,<3.0.0)
Requires-Dist: pydantic-settings (>=2.3.0,<3.0.0)
Requires-Dist: pypdfium2 (>=4.30.0,<5.0.0)
Requires-Dist: python-docx (>=1.1.2,<2.0.0)
Requires-Dist: python-pptx (>=1.0.2,<2.0.0)
Requires-Dist: rapidocr-onnxruntime (>=1.4.0,<2.0.0) ; (python_version < "3.13") and (extra == "rapidocr")
Requires-Dist: requests (>=2.32.3,<3.0.0)
Requires-Dist: rtree (>=1.3.0,<2.0.0)
Requires-Dist: scipy (>=1.6.0,<2.0.0)
Requires-Dist: tesserocr (>=2.7.1,<3.0.0) ; extra == "tesserocr"
Requires-Dist: typer (>=0.12.5,<0.13.0)
Project-URL: Repository, https://github.com/mouraworks/docowling
Description-Content-Type: text/markdown

<p align="center">
  <a href="https://github.com/mouraworks/docowling">
    <img loading="lazy" alt="Docling" src="https://github.com/mouraworks/docowling/blob/main/docs/assets/docowling.png" width="80%"/>
  </a>
</p>

# Docowling

[![Docs](https://img.shields.io/badge/docs-live-brightgreen)](https://github.com/mouraworks/docowling/)
[![PyPI version](https://img.shields.io/pypi/v/docling)](https://pypi.org/project/docowling/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/docling)](https://pypi.org/project/docling/)
[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![License MIT](https://img.shields.io/github/license/mouraworks/docowling/)](https://opensource.org/licenses/MIT)

**Docowling**  is a fork of the [Docling](https://github.com/DS4SD/docling), an IBM project, developed to enhance functionalities and add new document processing capabilities.

## License

The Docowling codebase is under MIT license.
For individual model usage, please refer to the model licenses found in the original packages.

