Metadata-Version: 2.4
Name: extractflow_parser
Version: 0.1.0
Summary: Parse PDF documents into markdown formatted content using Vision LLMs
Project-URL: Homepage, https://github.com/simplifyX-ai/extractflow-parser
Project-URL: Repository, https://github.com/simplifyX-ai/extractflow-parser.git
Author-email: Chi Tran <chitran.whitecat@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.9
Requires-Dist: instructor>=1.7.2
Requires-Dist: jinja2>=3.0.0
Requires-Dist: ollama>=0.4.4
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pymupdf>=1.22.0
Requires-Dist: tenacity>=9.0.0
Requires-Dist: tqdm>=4.65.0
Provides-Extra: all
Requires-Dist: google-generativeai==0.8.3; extra == 'all'
Requires-Dist: openai==1.58.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: black>=24.4.1; extra == 'dev'
Requires-Dist: pytest>=8.3.4; extra == 'dev'
Requires-Dist: ruff>=0.8.3; extra == 'dev'
Provides-Extra: gemini
Requires-Dist: google-generativeai==0.8.3; extra == 'gemini'
Provides-Extra: openai
Requires-Dist: openai==1.58.0; extra == 'openai'
Description-Content-Type: text/markdown

# ExtractFlow Parse

[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)


## 🚀 Getting Started

### Prerequisites

- 🐍 Python >= 3.9
- 🖥️ Ollama (if you want to use local models)
- 🤖 API Key for OpenAI (OpenAI compatible models) or Google Gemini (if you want to use OpenAI or Google Gemini)

## Models Support

ExtractFlow supports various vision language models from different providers:

### OpenAI Compatible Models
- You can use any model that is compatible with OpenAI API.
- By passing `openai_compatible=True`, and `base_url` to the parser, the parser will use the OpenAI API to process the image.

### Ollama Models
- `llama3.2-vision:11b`
- `llama3.2-vision:70b`
- `llava:13b`
- `llava:34b`

### OpenAI Models
- `gpt-4o`
- `gpt-4o-mini`

### Gemini Models
- `gemini-1.5-flash`
- `gemini-2.0-flash-exp`
- `gemini-1.5-pro`


### Installation

Install the package using pip (Recommended):

```bash
pip install extractflow_parser
```

Install the optional dependencies for OpenAI or Gemini:
```bash
pip install 'extractflow_parser[openai]'
```

```bash
pip install 'extractflow_parser[gemini]'
```

### Setting up Ollama (Optional)
See [examples/ollama_setup.md](examples/ollama_setup.md) on how to setup Ollama locally.

## ⌛️ Usage

### Basic Example Usage

```python
from extractflow_parser import ExtractFlowParser

# Initialize parser
parser = ExtractFlowParser(
    openai_compatible=True,
    base_url="https://api.fireworks.ai/inference/v1",
    api_key="your_api_key",
    model_name="accounts/fireworks/models/qwen2-vl-72b-instruct",
    temperature=0.2,
    top_p=0.3,
    extraction_complexity=True # Set to True for more detailed extraction
)

# Convert PDF to markdown
pdf_path = "path/to/your/document.pdf"
markdown_pages = parser.convert_pdf(pdf_path)

# Process results
for i, page_content in enumerate(markdown_pages):
    print(f"\n--- Page {i+1} ---\n{page_content}")
```

### PDF Page Configuration

```python
from extractflow_parser import ExtractFlowParser, PDFPageConfig

# Configure PDF processing settings
page_config = PDFPageConfig(
    dpi=400,
    color_space="RGB",
    include_annotations=True,
    preserve_transparency=False
)

# Initialize parser with custom page config
parser = ExtractFlowParser(
    model_name="llama3.2-vision:11b",
    temperature=0.7,
    top_p=0.4,
    extraction_complexity=False,
    page_config=page_config
)

# Convert PDF to markdown
pdf_path = "path/to/your/document.pdf"
markdown_pages = parser.convert_pdf(pdf_path)
```

### OpenAI or Gemini Model Usage

```python
from extractflow_parser import ExtractFlowParser

# Initialize parser with OpenAI model
parser = ExtractFlowParser(
    base_url="https://api.openai.com/v1",  # Optional: Custom OpenAI API endpoint
    model_name="gpt-4o",
    api_key="your-openai-api-key", # Get the OpenAI API key from https://platform.openai.com/api-keys
    temperature=0.7,
    top_p=0.4,
    extraction_complexity=True # Set to True for more detailed extraction
)

# Initialize parser with Google Gemini model
parser = ExtractFlowParser(
    model_name="gemini-1.5-flash",
    api_key="your-gemini-api-key", # Get the Gemini API key from https://aistudio.google.com/app/apikey
    temperature=0.7,
    top_p=0.4,
    extraction_complexity=True # Set to True for more detailed extraction
)
```

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
