Metadata-Version: 2.4
Name: QuerySUTRA
Version: 0.3.2
Summary: SUTRA: Structured-Unstructured-Text-Retrieval-Architecture - AI-powered data analysis with custom visualizations, fuzzy matching, and smart caching
Home-page: https://github.com/yourusername/querysutra
Author: Aditya Batta
Author-email: 
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Database
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: openai>=1.0.0
Requires-Dist: plotly>=5.0.0
Requires-Dist: matplotlib>=3.3.0
Requires-Dist: PyPDF2>=3.0.0
Requires-Dist: python-docx>=0.8.11
Requires-Dist: openpyxl>=3.0.0
Provides-Extra: mysql
Requires-Dist: sqlalchemy>=1.4.0; extra == "mysql"
Requires-Dist: mysql-connector-python>=8.0.0; extra == "mysql"
Provides-Extra: postgres
Requires-Dist: sqlalchemy>=1.4.0; extra == "postgres"
Requires-Dist: psycopg2-binary>=2.9.0; extra == "postgres"
Provides-Extra: embeddings
Requires-Dist: sentence-transformers>=2.0.0; extra == "embeddings"
Provides-Extra: all
Requires-Dist: sqlalchemy>=1.4.0; extra == "all"
Requires-Dist: mysql-connector-python>=8.0.0; extra == "all"
Requires-Dist: psycopg2-binary>=2.9.0; extra == "all"
Requires-Dist: sentence-transformers>=2.0.0; extra == "all"
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# QuerySUTRA

**SUTRA: Structured-Unstructured-Text-Retrieval-Architecture**

Professional Python library for AI-powered data analysis with automatic entity extraction, natural language querying, and intelligent caching.

## Installation

```bash
pip install QuerySUTRA

# Optional features
pip install QuerySUTRA[embeddings]  # Smart caching
pip install QuerySUTRA[mysql]       # MySQL support
pip install QuerySUTRA[postgres]    # PostgreSQL support
pip install QuerySUTRA[all]         # All features
```

## Key Features

### 1. Automatic Multi-Table Creation
Upload PDFs, Word documents, or text files and automatically extract structured entities.

```python
from sutra import SUTRA

sutra = SUTRA(api_key="your-openai-key")
sutra.upload("employee_data.pdf")

# Automatically creates:
# - employee_data_people (20 rows, 6 columns)
# - employee_data_contacts (20 rows, 4 columns)
# - employee_data_events (15 rows, 4 columns)
```

### 2. Natural Language Querying

```python
result = sutra.ask("Show me all people from New York")
print(result.data)

# With visualization
result = sutra.ask("Show sales by region", viz="pie")
```

### 3. Load Existing Databases

```python
# Load SQLite database
sutra = SUTRA.load_from_db("sutra.db", api_key="your-key")

# Connect to MySQL
sutra = SUTRA.connect_mysql("localhost", "root", "password", "database")

# Connect to PostgreSQL
sutra = SUTRA.connect_postgres("localhost", "postgres", "password", "database")
```

### 4. Custom Visualizations

```python
result = sutra.ask("Sales by region", viz="pie")       # Pie chart
result = sutra.ask("Trends", viz="line")               # Line chart
result = sutra.ask("Compare", viz="bar")               # Bar chart
result = sutra.ask("Correlation", viz="scatter")       # Scatter plot
result = sutra.ask("Data", viz="table")                # Table view
result = sutra.ask("Analysis", viz="heatmap")          # Heatmap
result = sutra.ask("Auto", viz=True)                   # Auto-detect
```

### 5. Smart Fuzzy Matching

```python
sutra = SUTRA(api_key="your-key", fuzzy_match=True)

# "New York City" matches "New York" automatically
result = sutra.ask("Who are from New York City?")
```

### 6. Intelligent Caching with Embeddings

```python
sutra = SUTRA(api_key="your-key", use_embeddings=True)

result = sutra.ask("Show sales")           # Calls API
result = sutra.ask("Display sales data")   # Uses cache (no API call)
```

### 7. Irrelevant Query Detection

```python
sutra = SUTRA(api_key="your-key", check_relevance=True)

result = sutra.ask("What is the weather?")
# Warns: "This question seems irrelevant to your database"
```

### 8. Direct SQL Access (Free)

```python
result = sutra.sql("SELECT * FROM people WHERE city='New York'")
print(result.data)
```

## Complete Configuration

```python
sutra = SUTRA(
    api_key="your-openai-key",
    db="database.db",              # SQLite path
    use_embeddings=True,           # Smart caching (saves API calls)
    check_relevance=True,          # Detect irrelevant queries
    fuzzy_match=True,              # Better NLP
    cache_queries=True             # Simple caching
)
```

## Supported Formats

CSV, Excel, JSON, SQL, PDF, Word, Text, Pandas DataFrame

## Usage Examples

### Basic Workflow

```python
sutra = SUTRA(api_key="your-key")
sutra.upload("data.pdf")
sutra.tables()                    # View tables
sutra.schema()                    # View schema
sutra.peek("table_name", n=10)    # Preview data
result = sutra.ask("Your question?")
```

### Database Export

```python
sutra.export_db("backup.db", format="sqlite")
sutra.export_db("schema.sql", format="sql")
sutra.save_to_mysql("localhost", "root", "pass", "db")
sutra.save_to_postgres("localhost", "postgres", "pass", "db")
sutra.backup("./backups")
```

## How It Works

### Entity Extraction Example

**Input PDF:**
```
John Doe lives at 123 Main St, Dallas. Email: john@company.com.
Sarah Smith lives at 456 Oak Ave, Boston. Email: sarah@company.com.
```

**Output Tables:**

**people**
| id | name | address | city | email |
|----|------|---------|------|-------|
| 1 | John Doe | 123 Main St | Dallas | john@company.com |
| 2 | Sarah Smith | 456 Oak Ave | Boston | sarah@company.com |

### Embeddings for Smart Caching

Uses `all-MiniLM-L6-v2` model (80MB, runs locally):
- Query 1: "Show sales" → API call
- Query 2: "Display sales" → 92% similar → Cached (no API call)

### Fuzzy Matching

- Query: "New York City"
- Database: ["New York", "Dallas", "Boston"]
- Match: "New York City" → "New York" (85% similar)

## API Reference

### Class Methods

`SUTRA.load_from_db(db_path, api_key, **kwargs)` - Load existing SQLite database

`SUTRA.connect_mysql(host, user, password, database, ...)` - Connect to MySQL

`SUTRA.connect_postgres(host, user, password, database, ...)` - Connect to PostgreSQL

### Instance Methods

`upload(data, name=None)` - Upload data

`ask(question, viz=False, table=None)` - Natural language query

`sql(query, viz=False)` - Raw SQL query

`tables()` - List all tables

`schema(table=None)` - Show schema

`peek(table=None, n=5)` - Preview data

`export_db(path, format)` - Export database

`save_to_mysql(...)` - Export to MySQL

`save_to_postgres(...)` - Export to PostgreSQL

`backup(path=None)` - Create backup

`close()` - Close connection

## Performance Tips

1. Use `load_from_db()` to avoid re-uploading
2. Use `sql()` for complex queries (no API cost)
3. Enable `use_embeddings=True` for caching
4. Enable `cache_queries=True` for exact matches

## Troubleshooting

**No API key error:** `sutra = SUTRA(api_key="sk-...")`

**PDF fails:** `pip install PyPDF2`

**MySQL error:** `pip install QuerySUTRA[mysql]`

**Embeddings error:** `pip install QuerySUTRA[embeddings]`

## Requirements

- Python 3.8+
- OpenAI API key
- 100MB disk space (if using embeddings)

## License

MIT License

## Changelog

### v0.3.1
- Semantic embeddings for smart caching
- Fuzzy matching for better NLP
- Irrelevant query detection
- Load existing databases
- MySQL/PostgreSQL connectivity
- Custom visualizations
- All features optional

---

**Made by Aditya Batta**
