Metadata-Version: 2.4
Name: docsloader
Version: 0.0.7
Summary: This is a documents loader.
Author-email: axiner <atpuxiner@163.com>
Project-URL: Homepage, https://github.com/atpuxiner/docsloader
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: toollib==1.7.7
Requires-Dist: pydantic==2.11.7
Requires-Dist: aiofiles==24.1.0
Requires-Dist: aiohttp==3.12.15
Requires-Dist: lxml==6.0.1
Requires-Dist: openpyxl==3.1.5
Requires-Dist: xlrd==2.0.2
Requires-Dist: python-pptx==1.0.2
Requires-Dist: python-docx==1.2.0
Requires-Dist: pymupdf==1.26.4
Requires-Dist: pywin32==311; platform_system == "Windows"
Requires-Dist: rapidocr-onnxruntime==1.4.4
Dynamic: license-file

# docsloader

## What is this?

- by: axiner
- docsloader
- This is a documents loader.

## Installation

This package can be installed using pip (Python>=3.11):
> pip install docsloader

## Usage

The `docsloader` package provides asynchronous document loaders for various file suffixes. It includes dedicated loaders
for specific file types and an `AutoLoader` that automatically selects the appropriate loader based on file suffix.

### Supported File Suffixes

The package supports loading documents from the following file suffixes:

- **Text Files**: `.txt`
- **CSV Files**: `.csv`
- **Markdown Files**: `.md`
- **HTML Files**: `.html`, `.htm`
- **Excel Files**: `.xlsx`, `.xls`
- **PowerPoint Files**: `.pptx`, `.ppt`
- **Word Files**: `.docx`, `.doc`
- **PDF Files**: `.pdf`
- **Image Files**: `.jpg`, `.jpeg`, `.png`

### Available Loaders

The package provides the following loader classes:

- `TxtLoader`: For Text files
- `CsvLoader`: For CSV files
- `MdLoader`: For Markdown files
- `HtmlLoader`: For HTML files
- `XlsxLoader`: For Excel files
- `PptxLoader`: For PowerPoint files
- `DocxLoader`: For Word files
- `PdfLoader`: For PDF files
- `ImgLoader`: For image files
- `AutoLoader`: Automatically selects the appropriate loader based on file suffix

All loader classes implement asynchronous `load` methods for efficient document processing.

### Example

```python
import asyncio

from docsloader import AutoLoader
from toollib.log import init_logger

logger = init_logger(__name__)


async def main(path_or_url: str):
    loader = AutoLoader(
        path_or_url=path_or_url,
        rm_tmpfile=False,
    )
    async for doc in loader.load():
        logger.info(doc)


if __name__ == "__main__":
    asyncio.run(main(path_or_url=r"E:/NewFolder/测试.docx"))
```

## License

This project is released under the MIT License (MIT). See [LICENSE](LICENSE)
