Metadata-Version: 2.4
Name: text2sql-agent
Version: 0.1.0
Summary: A flexible, LLM-agnostic text-to-SQL agent that works with any database
Author-email: Jasmin Patel <jasmin.patel@sourcefuse.com>
Maintainer-email: Jasmin Patel <jasmin.patel@sourcefuse.com>
License: MIT
License-File: LICENSE
Keywords: agent,ai,database,langchain,llm,natural-language,sql,text-to-sql
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Requires-Dist: langchain-community>=0.0.20
Requires-Dist: langchain-core>=0.1.0
Requires-Dist: langchain>=0.1.0
Requires-Dist: sqlalchemy>=2.0.0
Provides-Extra: all
Requires-Dist: anthropic>=0.18.0; extra == 'all'
Requires-Dist: boto3>=1.28.0; extra == 'all'
Requires-Dist: langchain-anthropic>=0.1.0; extra == 'all'
Requires-Dist: langchain-aws>=0.1.0; extra == 'all'
Requires-Dist: langchain-groq>=0.0.1; extra == 'all'
Requires-Dist: langchain-openai>=0.0.5; extra == 'all'
Requires-Dist: openai>=1.0.0; extra == 'all'
Requires-Dist: psycopg2-binary>=2.9.0; extra == 'all'
Requires-Dist: pymysql>=1.1.0; extra == 'all'
Requires-Dist: pyodbc>=4.0.0; extra == 'all'
Provides-Extra: all-dbs
Requires-Dist: psycopg2-binary>=2.9.0; extra == 'all-dbs'
Requires-Dist: pymysql>=1.1.0; extra == 'all-dbs'
Requires-Dist: pyodbc>=4.0.0; extra == 'all-dbs'
Provides-Extra: all-llms
Requires-Dist: anthropic>=0.18.0; extra == 'all-llms'
Requires-Dist: boto3>=1.28.0; extra == 'all-llms'
Requires-Dist: langchain-anthropic>=0.1.0; extra == 'all-llms'
Requires-Dist: langchain-aws>=0.1.0; extra == 'all-llms'
Requires-Dist: langchain-groq>=0.0.1; extra == 'all-llms'
Requires-Dist: langchain-openai>=0.0.5; extra == 'all-llms'
Requires-Dist: openai>=1.0.0; extra == 'all-llms'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.18.0; extra == 'anthropic'
Requires-Dist: langchain-anthropic>=0.1.0; extra == 'anthropic'
Provides-Extra: bedrock
Requires-Dist: boto3>=1.28.0; extra == 'bedrock'
Requires-Dist: langchain-aws>=0.1.0; extra == 'bedrock'
Provides-Extra: bigquery
Requires-Dist: sqlalchemy-bigquery>=1.6.0; extra == 'bigquery'
Provides-Extra: dev
Requires-Dist: black>=22.0.0; extra == 'dev'
Requires-Dist: build>=0.10.0; extra == 'dev'
Requires-Dist: flake8>=5.0.0; extra == 'dev'
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-html>=4.1.1; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: twine>=4.0.0; extra == 'dev'
Provides-Extra: google
Requires-Dist: langchain-google-genai>=0.0.5; extra == 'google'
Provides-Extra: groq
Requires-Dist: langchain-groq>=0.0.1; extra == 'groq'
Provides-Extra: mssql
Requires-Dist: pyodbc>=4.0.0; extra == 'mssql'
Provides-Extra: mysql
Requires-Dist: pymysql>=1.1.0; extra == 'mysql'
Provides-Extra: ollama
Requires-Dist: langchain-ollama>=0.0.1; extra == 'ollama'
Provides-Extra: openai
Requires-Dist: langchain-openai>=0.0.5; extra == 'openai'
Requires-Dist: openai>=1.0.0; extra == 'openai'
Provides-Extra: postgresql
Requires-Dist: psycopg2-binary>=2.9.0; extra == 'postgresql'
Description-Content-Type: text/markdown

# Text-to-SQL Agent

### Key Advantages

**1. Performance Optimization**
- Schema pre-loading eliminates redundant lookups
- Intelligent result truncation reduces LLM token usage
- Cached table information speeds up complex queries
  
**2. Enhanced User Experience**
- Timestamp context: "Show me orders from last week" just works
- Conversation metadata tracking for better follow-ups
- Domain context for specialized behavior

**3. Agnostic Design**
- Works with **any** LangChain BaseChatModel
- Works with **any** SQL database
- Optional domain context (generic by default)
- No provider or database lock-in

---

## Quick Start

### Installation

```bash
# Basic installation
pip install text2sql-agent

# With specific providers
pip install text2sql-agent[bedrock]    # AWS Bedrock
pip install text2sql-agent[openai]     # OpenAI
pip install text2sql-agent[anthropic]  # Anthropic Claude
pip install text2sql-agent[ollama]     # Ollama (local)

# With databases
pip install text2sql-agent[postgresql]
pip install text2sql-agent[mysql]
```

### Example

```python
from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_aws import ChatBedrock

# Setup (any LLM, any database)
llm = ChatBedrock(model_id="anthropic.claude-3-sonnet-20240229-v1:0")
db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")

# Create agent
agent = SQLAgent(llm=llm, db=db)

# Ask questions
result = agent.query("What tables are in the database?")
print(result["answer"])
print(result["sql_query"])
print(result["results"])  # Proper JSON
```

---

## LLM Provider Examples

All examples use the same `SQLAgent` interface - just swap the LLM.

### AWS Bedrock (Claude, Titan, etc.)

```python
from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_aws import ChatBedrock

llm = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    region_name="us-east-1"
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)
```

**Setup:**
```bash
pip install langchain-aws boto3
export AWS_DEFAULT_REGION=us-east-1
# Use AWS credentials or IAM role
```

### OpenAI (GPT-4, GPT-3.5)

```python
from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_openai import ChatOpenAI
import os

llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0.1,
    api_key=os.getenv("OPENAI_API_KEY")
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)
```

**Setup:**
```bash
pip install langchain-openai
export OPENAI_API_KEY=sk-your-key-here
```

### Anthropic API (Claude direct)

```python
from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_anthropic import ChatAnthropic
import os

llm = ChatAnthropic(
    model="claude-3-5-sonnet-20241022",
    temperature=0.1,
    api_key=os.getenv("ANTHROPIC_API_KEY")
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)
```

**Setup:**
```bash
pip install langchain-anthropic
export ANTHROPIC_API_KEY=your-key-here
```

### Ollama (Local)

```python
from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="mistral:7b",
    temperature=0.1
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)
```

**Setup:**
```bash
pip install langchain-ollama
# Install Ollama: https://ollama.ai
ollama pull mistral:7b
```

### Google Gemini

```python
from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_google_genai import ChatGoogleGenerativeAI
import os

llm = ChatGoogleGenerativeAI(
    model="models/gemini-2.0-flash-exp",
    temperature=0.1,
    google_api_key=os.getenv("GOOGLE_API_KEY")
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)
```

**Setup:**
```bash
pip install langchain-google-genai
export GOOGLE_API_KEY=your-key-here
```

---

## Database Examples

All examples use the same `SQLAgent` interface - just swap the database.

### PostgreSQL

```python
from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
import os

# Connection string format
db = JSONSerializableSQLDatabase.from_uri(
    "postgresql://username:password@localhost:5432/database_name"
)

# With environment variable (recommended)
db = JSONSerializableSQLDatabase.from_uri(os.getenv("DATABASE_URL"))

# With SSL mode
db = JSONSerializableSQLDatabase.from_uri(
    "postgresql://user:pass@host:5432/db?sslmode=require"
)

agent = SQLAgent(llm=llm, db=db)
```

**Setup:**
```bash
pip install psycopg2-binary
# or
pip install text2sql-agent[postgresql]
```

**Connection String Formats:**
```
postgresql://user:password@host:port/database
postgresql://user:password@host:port/database?sslmode=require
```

### MySQL

```python
from text_to_sql import SQLAgent, JSONSerializableSQLDatabase

# MySQL connection
db = JSONSerializableSQLDatabase.from_uri(
    "mysql+pymysql://username:password@localhost:3306/database_name"
)

# With charset
db = JSONSerializableSQLDatabase.from_uri(
    "mysql+pymysql://user:pass@host:3306/db?charset=utf8mb4"
)

agent = SQLAgent(llm=llm, db=db)
```

**Setup:**
```bash
pip install pymysql
# or
pip install text2sql-agent[mysql]
```

**Connection String Formats:**
```
mysql+pymysql://user:password@host:port/database
mysql+pymysql://user:password@host:port/database?charset=utf8mb4
```

### SQLite (Local Files)

```python
from text_to_sql import SQLAgent, JSONSerializableSQLDatabase

# Relative path
db = JSONSerializableSQLDatabase.from_uri("sqlite:///database.db")

# Absolute path
db = JSONSerializableSQLDatabase.from_uri("sqlite:////absolute/path/to/database.db")

# In-memory (testing)
db = JSONSerializableSQLDatabase.from_uri("sqlite:///:memory:")

agent = SQLAgent(llm=llm, db=db)
```

**Setup:**
```bash
# No additional dependencies - SQLite is built-in to Python
```

---

## Advanced Features

### 1. Schema Pre-loading (70-80% Performance Boost)

Pre-load important table schemas to eliminate redundant lookups:

```python
# Explicit table list (recommended)
agent = SQLAgent(
    llm=llm,
    db=db,
    important_tables=["users", "orders", "products"]
)

# Auto-discovery based on domain
agent = SQLAgent(
    llm=llm,
    db=db,
    domain_context="e-commerce",
    important_tables="auto"  # Finds relevant tables
)

# Cache all tables (small schemas only)
agent = SQLAgent(
    llm=llm,
    db=db,
    important_tables="all"  # Max 10 tables recommended
)
```

### 2. Async Support

Non-blocking query execution:

```python
import asyncio

async def main():
    result = await agent.aquery("How many users registered today?")
    print(result["answer"])

# Concurrent queries
async def concurrent_queries():
    results = await asyncio.gather(
        agent.aquery("How many users?"),
        agent.aquery("How many orders?"),
        agent.aquery("What's the total revenue?")
    )
    return results

# Run async code
asyncio.run(main())
```

### 3. Enhanced Logging & Metrics

Track everything:

```python
import logging
from text_to_sql import SQLAgent

# Configure logging
SQLAgent.configure_logging(
    level=logging.INFO,
    log_file="sql_queries.log"
)

agent = SQLAgent(llm=llm, db=db)
result = agent.query("Show me today's orders")

# Access metadata
print(result["metadata"])
# {
#   "sql_query": "SELECT * FROM orders WHERE ...",
#   "result_count": 42,
#   "tables_accessed": ["orders", "customers"],
#   "execution_time": 1.23,
#   "timestamp": "2025-12-30T..."
# }
```

**Log Output:**
```
QUERY_ID: 20251230_143022_12345
USER_QUERY: Show me today's orders
TIMESTAMP: 2025-12-30T14:30:22
DATABASE: postgresql
DOMAIN: e-commerce
GENERATED_SQL:
  SELECT o.*, c.name
  FROM orders o
  JOIN customers c ON o.customer_id = c.id
  WHERE o.created_at >= CURRENT_DATE
EXECUTION_TIME: 1.23s
RESULT_COUNT: 42 rows
SQL_EXECUTION_STATUS: SUCCESS
```

### 4. Timestamp Context

Automatic temporal query support:

```python
agent = SQLAgent(llm=llm, db=db, include_timestamp=True)

# These "just work" - no manual date handling needed
agent.query("Show me orders from last week")
agent.query("How many users registered yesterday?")
agent.query("What's revenue for the past 30 days?")
agent.query("Show me appointments scheduled for tomorrow")
```

The agent automatically injects current date context:
```
Current date: 2025-12-30
Today's date: 2025-12-30
Yesterday: 2025-12-29
Last week: 2025-12-23 to 2025-12-29
```

### 5. Domain Context

Specialized behavior for different domains:

```python
# Medical
agent = SQLAgent(
    llm=llm,
    db=db,
    domain_context="medical patient records, diagnoses, medications"
)
result = agent.query("How many patients have diabetes?")

# E-commerce
agent = SQLAgent(
    llm=llm,
    db=db,
    domain_context="e-commerce products, orders, customers, transactions"
)
result = agent.query("What were the top 5 selling products this week?")

# Financial
agent = SQLAgent(
    llm=llm,
    db=db,
    domain_context="financial transactions, accounts, balances"
)
result = agent.query("Show me suspicious transactions over $10,000")
```

### 6. Conversation History

Better follow-up questions:

```python
# First query
result1 = agent.query("Who are the employees in Engineering?")

# Follow-up with context
conversation_history = [
    {"role": "user", "content": "Who are the employees in Engineering?"},
    {"role": "assistant", "content": result1["answer"]}
]

result2 = agent.query(
    "What is their average salary?",  # "their" refers to Engineering employees
    conversation_history=conversation_history
)
```

**Enhanced context tracking:**
```python
# Include metadata for better context
conversation_history = [
    {
        "role": "user",
        "content": query1,
        "metadata": result1.get("metadata", {})  # SQL query, tables accessed
    },
    {
        "role": "assistant",
        "content": result1["answer"]
    }
]
```

### 7. Error Recovery (8 Error Patterns)

User-friendly error messages with actionable suggestions:

```python
result = agent.query("Show me data from nonexistent_table")

if "error" in result:
    print(result["error_type"])        # "table_not_found"
    print(result["answer"])             # User-friendly message
    print(result["error_suggestion"])   # Actionable steps
```

**Handled Error Patterns:**
1. **Iteration Limit** - Query too complex
2. **Timeout** - Database performance issues
3. **Table Not Found** - Shows available tables
4. **Column Not Found** - Suggests schema inspection
5. **Syntax Error** - Simplification suggestions
6. **Permission Denied** - Access issue guidance
7. **Connection Error** - Database connectivity tips
8. **Type Mismatch** - Data type guidance

### 8. Result Truncation

Automatically handle large result sets:

```python
agent = SQLAgent(
    llm=llm,
    db=db,
    max_rows_for_llm=20,           # Send up to 20 rows to LLM
    large_result_threshold=100     # 100+ rows = "large"
)

# Query returns 1000 rows
result = agent.query("SELECT * FROM large_table")

# LLM only sees first 20 rows (faster, cheaper)
print(result["answer"])  # Summary based on 20 rows

# Get full results if needed
full_data = agent.get_full_results()  # All 1000 rows
```

### 9. Singleton Pattern

Resource management for production:

```python
# Create singleton instance
agent1 = SQLAgent(llm=llm, db=db, use_singleton=True)
agent2 = SQLAgent(llm=llm, db=db, use_singleton=True)

# agent1 and agent2 are THE SAME instance
assert agent1 is agent2

# Prevents:
# - Multiple database connections
# - Redundant schema caching
# - Memory leaks
```

---

## Complete Configuration

```python
agent = SQLAgent(
    llm=llm,                           # Required: Any BaseChatModel
    db=db,                             # Required: JSONSerializableSQLDatabase
    domain_context="medical records",  # Optional: Domain specialization
    important_tables=["Patient", ...], # Optional: Schema pre-loading
    enable_schema_caching=True,        # Default: True
    max_rows_for_llm=10,               # Default: 10
    large_result_threshold=50,         # Default: 50
    verbose=True,                      # Default: False
    max_iterations=10,                 # Default: 10
    use_singleton=False,               # Default: False
    include_timestamp=True,            # Default: True
)
```

---

## API Reference

### SQLAgent Class

#### Constructor Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `llm` | `BaseChatModel` | Required | Any LangChain chat model |
| `db` | `JSONSerializableSQLDatabase` | Required | Database instance |
| `domain_context` | `str` | `None` | Optional domain description |
| `important_tables` | `List[str]` or `str` | `None` | Tables to pre-load ("auto", "all", or list) |
| `enable_schema_caching` | `bool` | `True` | Enable schema pre-loading |
| `max_rows_for_llm` | `int` | `10` | Max rows to send to LLM |
| `large_result_threshold` | `int` | `50` | Threshold for "large" results |
| `verbose` | `bool` | `False` | Show agent thinking |
| `max_iterations` | `int` | `10` | Max agent iterations |
| `use_singleton` | `bool` | `False` | Singleton pattern |
| `include_timestamp` | `bool` | `True` | Timestamp context |

#### Methods

##### `query(question, conversation_history=None)`

Execute a natural language query (synchronous).

**Parameters:**
- `question` (str): Natural language question
- `conversation_history` (List[Dict], optional): Previous conversation

**Returns:**
```python
{
    "answer": "Natural language answer",
    "sql_query": "SELECT * FROM ...",
    "results": '[{"id": 1, ...}]',  # JSON string
    "intermediate_steps": [...],
    "metadata": {
        "sql_query": "...",
        "result_count": 42,
        "tables_accessed": ["orders"],
        "execution_time": 1.23,
        "timestamp": "2025-12-30T..."
    }
}
```

##### `aquery(question, conversation_history=None)` [async]

Execute a natural language query (asynchronous).

**Parameters:** Same as `query()`

**Returns:** Same as `query()`

**Usage:**
```python
result = await agent.aquery("How many users?")
```

##### `get_full_results()`

Get complete untruncated results from last query.

**Returns:** JSON string with all rows

##### `get_schema_info(table_names=None)`

Get database schema information.

**Parameters:**
- `table_names` (List[str], optional): Specific tables

**Returns:** Schema information string

##### `get_table_names()`

Get list of all table names.

**Returns:** List[str]

##### `get_dialect()`

Get SQL dialect of connected database.

**Returns:** str ("postgresql", "mysql", "sqlite", etc.)

##### `configure_logging(level, log_file)` [class method]

Configure logging for SQL queries.

**Parameters:**
- `level` (int): logging.INFO, logging.DEBUG, etc.
- `log_file` (str): Path to log file

---

## Supported Providers & Databases

### LLM Providers

- AWS Bedrock (Claude, Titan, Llama, etc.)
- OpenAI (GPT-4, GPT-3.5)
- Anthropic (Claude via API)
- Ollama (Local: Mistral, Llama, Qwen, etc.)
- Google (Gemini, PaLM)
- Any custom LangChain BaseChatModel

### Databases

- PostgreSQL
- MySQL
- SQLite
- Any SQLAlchemy-compatible database

---

## How It Works

1. **Schema First:** Agent checks table schemas before writing queries
2. **ReAct Pattern:** Uses Thought-Action-Observation loop
3. **Standard SQL:** Generates database-agnostic SQL
4. **Smart Truncation:** Large results summarized for LLM
5. **Error Recovery:** Auto-retries with corrections
6. **Caching:** Pre-loaded schemas skip redundant lookups
7. **Metadata Tracking:** Full observability of query execution

---

## Limitations

- **Read-only:** SELECT queries only (no INSERT, UPDATE, DELETE, DROP)
- **LangChain Required:** Must use LangChain-compatible LLM
- **SQLAlchemy Required:** Database must work with SQLAlchemy
- **JSON Serialization:** Complex custom types may need manual handling

---

## License

MIT License - see LICENSE file for details

---

## Acknowledgments

Built on:
- [LangChain](https://github.com/langchain-ai/langchain)
- [SQLAlchemy](https://www.sqlalchemy.org/)
