Metadata-Version: 2.4
Name: litparser
Version: 0.8.0
Summary: Lightweight Document Parser - 순수 Python으로 PDF, DOCX, PPTX, HWPX 파싱
Home-page: https://github.com/ironwung/litparser
Author: ironwung
Author-email: ironwung <ironwung@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/ironwung/litparser
Project-URL: Documentation, https://github.com/ironwung/litparser#readme
Project-URL: Repository, https://github.com/ironwung/litparser
Keywords: pdf,parser,docx,pptx,xlsx,hwpx,document,text-extraction,lightweight
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Text Processing
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# LitParser

**Lit**eweight Document **Parser** - 순수 Python 문서 파서

**외부 라이브러리 없이** 다양한 문서 포맷 파싱

## 설치

```bash
pip install litparser
```

## 사용법

```python
from litparser import parse, to_markdown, to_json

# 자동 포맷 감지
result = parse('document.pdf')
result = parse('report.docx')
result = parse('data.xlsx')
result = parse('문서.hwp')

# 결과 접근
print(result.text)
print(result.tables)

# 변환
md = to_markdown(result)
json_str = to_json(result)
```

## CLI

```bash
litparser document.pdf
litparser document.pdf --markdown
litparser document.pdf --json
litparser 문서.hwp --info
```

## 지원 포맷

| 포맷 | Modern | Legacy |
|------|--------|--------|
| Word | .docx ✅ | .doc ✅ |
| PowerPoint | .pptx ✅ | .ppt ✅ |
| Excel | .xlsx ✅ | .xls ✅ |
| 한글 | .hwpx ✅ | .hwp ✅ |
| PDF | .pdf ✅ | - |
| 텍스트 | .txt, .md ✅ | - |

## 라이선스

MIT License
