Metadata-Version: 2.2
Name: polytext
Version: 0.1.2
Summary: Python utilities to simplify document files management
Home-page: https://github.com/docsity/polytext
Author: Matteo Senardi
Author-email: matteo.s@docsity.com
License: MIT
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: ~=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pypdf==5.3.0
Requires-Dist: PyMuPDF>=1.25.3
Requires-Dist: pycryptodome==3.21.0
Requires-Dist: weasyprint==64.1
Requires-Dist: markdown==3.7
Requires-Dist: python-docx==1.1.2
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# polytext

# Doc Utils

A Python package for document conversion and text extraction.

## Features

- Convert various document formats (DOCX, ODT, PPT, etc.) to PDF
- Extract text from PDF documents
- Support for both local files and S3 storage
- Multiple PDF parsing backends (PyPDF, PyMuPDF)

## Installation

```bash
# Basic installation
pip install plytext
```

## Requirements

- Python 3.6 or higher
- LibreOffice (for PDF conversion)

## Usage

Converting Documents to PDF

```python
from polytext import convert_to_pdf, ConversionError

try:
    # Convert a document to PDF
    pdf_path = convert_to_pdf('input.docx', 'output.pdf')
    print(f"PDF saved to: {pdf_path}")
except ConversionError as e:
    print(f"Conversion failed: {e}")
```

Text Extraction

```python
from polytext import extract_text_from_file

# Extract text from any supported file
text = extract_text_from_file('document.docx')
print(f"Extracted text: {text}")
```

## License

MIT Licence
