Metadata-Version: 2.4
Name: tfq0tool
Version: 2.0.0
Summary: A powerful text extraction utility for multiple file formats, including PDFs, Word documents, spreadsheets, and code files.
Home-page: https://github.com/tfq0/tfq0tool
Author: Talal
Project-URL: Bug Reports, https://github.com/tfq0/TFQ0tool/issues
Project-URL: Source, https://github.com/tfq0/TFQ0tool
Keywords: text extraction pdf docx xlsx ocr
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: End Users/Desktop
Classifier: Topic :: Text Processing :: General
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyPDF2>=3.0.0
Requires-Dist: python-docx>=0.8.11
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pdfminer.six>=20221105
Requires-Dist: pytesseract>=0.3.10
Requires-Dist: Pillow>=9.5.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: chardet>=5.1.0
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# TFQ0tool

**A powerful command-line utility for extracting text from various file formats, including PDFs, Word documents, spreadsheets, and code files.**

[![Python Version](https://img.shields.io/badge/Python-3.8%2B-blue)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PyPI Version](https://img.shields.io/pypi/v/tfq0tool)](https://pypi.org/project/tfq0tool/)

## Features ✨

- 📂 **Multi-format Support**
  - PDF files (including scanned PDFs with OCR)
  - Word documents (DOCX)
  - Excel spreadsheets (XLSX)
  - Text and code files
  - Support for password-protected PDFs

- 🚀 **Advanced Processing**
  - Multi-threaded parallel processing
  - Automatic encoding detection
  - Memory-efficient large file handling
  - Text preprocessing options
  - OCR support for scanned documents

- 📊 **Progress Tracking**
  - Real-time progress bars
  - Detailed success/failure reporting
  - Comprehensive logging system

- 🛡️ **Robust Error Handling**
  - Graceful handling of corrupted files
  - Clear error messages
  - Detailed debug logging

## Installation 💻

### From PyPI (Recommended)
```bash
pip install tfq0tool
```

### From Source
```bash
git clone https://github.com/tfq0/TFQ0tool.git
cd TFQ0tool
pip install -e .
```

## Usage 🛠️

### Basic Usage
```bash
# Process a single file
tfq0tool document.pdf

# Process multiple files
tfq0tool *.pdf *.docx

# Specify output directory
tfq0tool document.pdf --output ./extracted/

# Enable parallel processing
tfq0tool *.pdf --threads 4
```

### Advanced Options
```bash
# Password-protected PDF
tfq0tool secure.pdf --password mypass

# Text preprocessing
tfq0tool input.docx --preprocess lowercase,strip_whitespace

# Verbose output with progress
tfq0tool *.pdf --verbose

# Force overwrite existing files
tfq0tool data.xlsx --force
```

## Command-Line Options ⚙️

| Option | Description |
|--------|-------------|
| `-o, --output` | Output directory for extracted text |
| `-t, --threads` | Number of threads (default: 1) |
| `-v, --verbose` | Enable detailed output |
| `-f, --force` | Overwrite without confirmation |
| `-p, --password` | PDF password |
| `--preprocess` | Text preprocessing options |

## Text Preprocessing Options 🔧

- `lowercase`: Convert text to lowercase
- `strip_whitespace`: Remove excessive whitespace

## Requirements 📋

- Python 3.8 or higher




