Metadata-Version: 2.4
Name: gi-text-summarizer
Version: 1.0.0
Summary: Lightweight text summarizer with token budgets and smart chunking
License: MIT
Project-URL: Homepage, https://github.com/your-org/gi-text-summarizer
Project-URL: Repository, https://github.com/your-org/gi-text-summarizer
Project-URL: Issues, https://github.com/your-org/gi-text-summarizer/issues
Keywords: summarizer,llm,openai,nlp,text,chunking
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openai>=1.0.0
Dynamic: license-file

# gi_text_summarizer

A **lightweight, production-ready** Python library for summarizing text with any OpenAI-compatible LLM.

## Features

✅ Automatic text chunking for long documents
✅ Token budget guardrails (summaries always shorter than input)
✅ Support for multiple summary lengths, tones, output formats
✅ Works with OpenAI, Azure OpenAI, or custom LLM endpoints
✅ Built-in token estimation (no external dependencies)
✅ Production-grade error handling and logging

## Installation

```bash
pip install gi-text-summarizer
```

## Quick Start

```python
from gi_text_summarizer import TextSummarizer

summarizer = TextSummarizer(api_key="sk-...")

result = summarizer.summarize(
    text="Your long document here...",
    summary_type="medium",    # "short" | "medium" | "detailed"
    tone="neutral",           # "neutral" | "formal" | "casual"
    focus_area="general",
    output_format="text",     # "text" | "bullets" | "json"
)

print(result.summary)
print(result.compression)  # e.g., "68% shorter"
print(result)              # Pretty-printed with all details
```

## API Reference

### TextSummarizer

```python
TextSummarizer(
    api_key=None,              # OpenAI/Azure API key (or env vars)
    azure_endpoint=None,       # Azure endpoint
    deployment_name=None,      # Azure deployment
    model="gpt-4o-mini",       # Model name
    provider="auto",           # "openai" | "azure" | "custom"
)
```

### summarize()

```python
result = summarizer.summarize(
    text: str,
    summary_type="medium",     # "short" (20%) | "medium" (40%) | "detailed" (60%)
    tone="neutral",            # "neutral" | "formal" | "casual"
    focus_area="general",      # Any string
    output_format="text",      # "text" | "bullets" | "json"
    chunk_strategy="character",# "character" | "sentence"
    chunk_size=3000,           # Characters per chunk
) -> SummaryResult
```

### SummaryResult

```python
result.summary          # str: Generated summary
result.input_tokens     # int: Tokens in original
result.output_tokens    # int: Tokens in summary
result.compression      # str: "X% shorter"
result.num_chunks       # int: Number of chunks
```

## Token Budget Guardrails

Automatically calculates token caps to ensure summaries are shorter:

| Type | Ratio | Use Case |
|------|-------|----------|
| `short` | 20% | Core idea (3-4 sentences) |
| `medium` | 40% | Executive summary |
| `detailed` | 60% | Comprehensive summary |

## Chunking & Multi-Document Summarization

For longer texts:
1. Text is split into chunks
2. Each chunk is summarized
3. Summaries are combined and re-summarized

```python
result = summarizer.summarize(
    text=long_document,
    chunk_strategy="sentence",  # Better quality
    chunk_size=3000,
)
```

## Environment Variables

```bash
# OpenAI
export OPENAI_API_KEY="sk-..."

# Azure OpenAI
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o-mini"
```

## Standalone Token Counter

```python
from gi_text_summarizer import count_tokens

tokens = count_tokens("Your text here...")
```

## Examples

**Executive Summary (Formal)**
```python
result = summarizer.summarize(
    text=financial_report,
    summary_type="medium",
    tone="formal",
    focus_area="financial",
    output_format="bullets",
)
```

**Technical Highlight (JSON)**
```python
result = summarizer.summarize(
    text=documentation,
    summary_type="detailed",
    tone="neutral",
    focus_area="technical",
    output_format="json",
)
```

**Ultra-Concise (20% of input)**
```python
result = summarizer.summarize(
    text=research_paper,
    summary_type="short",
)
```

## Publishing to PyPI

```bash
pip install build twine
python -m build
twine upload dist/*
```

## License

MIT License.
