Metadata-Version: 2.4
Name: db-query-agent
Version: 0.1.1
Summary: AI-powered natural language database query system using OpenAI Agents SDK
Author-email: Paul Awolesi <awolesiboluwatife@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Ocolus1/db-query-agent
Project-URL: Documentation, https://github.com/Ocolus1/db-query-agent#readme
Project-URL: Repository, https://github.com/Ocolus1/db-query-agent
Project-URL: Issues, https://github.com/Ocolus1/db-query-agent/issues
Keywords: ai,database,natural-language,sql,openai,agents,text-to-sql
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openai-agents>=0.4.0
Requires-Dist: sqlalchemy>=2.0.0
Requires-Dist: sqlparse>=0.5.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: aiosqlite>=0.19.0
Provides-Extra: postgres
Requires-Dist: psycopg2-binary>=2.9.0; extra == "postgres"
Requires-Dist: asyncpg>=0.29.0; extra == "postgres"
Provides-Extra: mysql
Requires-Dist: pymysql>=1.1.0; extra == "mysql"
Requires-Dist: aiomysql>=0.2.0; extra == "mysql"
Provides-Extra: mssql
Requires-Dist: pyodbc>=5.0.0; extra == "mssql"
Provides-Extra: redis
Requires-Dist: redis>=5.0.0; extra == "redis"
Requires-Dist: hiredis>=2.0.0; extra == "redis"
Provides-Extra: all
Requires-Dist: psycopg2-binary>=2.9.0; extra == "all"
Requires-Dist: asyncpg>=0.29.0; extra == "all"
Requires-Dist: pymysql>=1.1.0; extra == "all"
Requires-Dist: aiomysql>=0.2.0; extra == "all"
Requires-Dist: pyodbc>=5.0.0; extra == "all"
Requires-Dist: redis>=5.0.0; extra == "all"
Requires-Dist: hiredis>=2.0.0; extra == "all"
Provides-Extra: dev
Requires-Dist: streamlit>=1.30.0; extra == "dev"
Requires-Dist: pytest>=8.4.2; extra == "dev"
Requires-Dist: pytest-asyncio>=1.2.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: build>=1.3.0; extra == "dev"
Requires-Dist: twine>=6.2.0; extra == "dev"
Dynamic: license-file
Dynamic: requires-python

# DB Query Agent 🤖💬

> **AI-powered natural language database query system using OpenAI Agents SDK**

[![PyPI version](https://badge.fury.io/py/db-query-agent.svg)](https://badge.fury.io/py/db-query-agent)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://pepy.tech/badge/db-query-agent)](https://pepy.tech/project/db-query-agent)

A powerful, production-ready Python package that lets you query databases using natural language. Built with OpenAI Agents SDK, featuring intelligent safety guardrails, streaming responses, and optimized for speed.

## ✨ Features

- 🗣️ **Natural Language Queries** - Ask questions in plain English, get SQL and results
- ⚡ **Blazing Fast** - Streaming responses, adaptive model selection, multi-layer caching
- 🔒 **Production-Ready Safety** - Read-only mode, SQL injection prevention, query validation
- 💬 **Session Management** - Maintains conversation context across multiple queries
- 🎯 **Smart Schema Loading** - Only loads relevant tables for faster responses
- 🔌 **Universal Database Support** - PostgreSQL, MySQL, SQLite, SQL Server
- 📦 **Easy Integration** - Works with Django, Flask, FastAPI, or any Python app

## 🚀 Quick Start

### Installation

```bash
pip install db-query-agent

# With database-specific drivers
pip install db-query-agent[postgres]  # PostgreSQL
pip install db-query-agent[mysql]     # MySQL
pip install db-query-agent[all]       # All drivers
```

### Basic Usage

**Option 1: Load from .env (Recommended)**

```bash
# Create .env file
DATABASE_URL=postgresql://user:pass@localhost/mydb
OPENAI_API_KEY=sk-...
FAST_MODEL=gpt-4o-mini
READ_ONLY=true
```

```python
from db_query_agent import DatabaseQueryAgent

# Load everything from .env
agent = DatabaseQueryAgent.from_env()

# Or override specific values
agent = DatabaseQueryAgent.from_env(
    fast_model="gpt-4.1",
    enable_statistics=True
)
```

**Option 2: Direct Configuration**

```python
from db_query_agent import DatabaseQueryAgent

# Pass all parameters directly
agent = DatabaseQueryAgent(
    database_url="postgresql://user:pass@localhost/mydb",
    openai_api_key="sk-...",
    fast_model="gpt-4o-mini",
    read_only=True,
    enable_cache=True
)
```

### Query the Database

```python
# Query in natural language (async)
result = await agent.query("How many users signed up last month?")

print(result["natural_response"])
# Output: "245 users signed up last month"

print(result["sql"])
# Output: "SELECT COUNT(*) FROM users WHERE created_at >= '2025-09-01'"
```

### With Streaming (Recommended)

```python
# Stream responses for better UX
async for chunk in agent.query_stream("Show me top 10 customers by revenue"):
    print(chunk, end="", flush=True)
```

### Session-based Chat

```python
# Create a session for multi-turn conversations
session = agent.create_session(session_id="user_123")

# First query
response1 = await session.ask("Show me all products")

# Follow-up query (maintains context)
response2 = await session.ask("Filter those by category=electronics")

# Another follow-up
response3 = await session.ask("Sort by price descending")
```

## 🔧 Utility Methods

### Session Management

```python
# List all active sessions
sessions = agent.list_sessions()

# Get conversation history
history = agent.get_session_history("user_123")

# Clear session history
agent.clear_session("user_123")

# Delete session
agent.delete_session("user_123")
```

### Schema Exploration

```python
# Get basic schema
schema = agent.get_schema()

# Get detailed schema with relationships
schema_info = agent.get_schema_info(include_foreign_keys=True)
print(f"Total tables: {schema_info['total_tables']}")
print(f"Relationships: {len(schema_info['relationships'])}")
```

### Statistics and Monitoring

```python
# Get comprehensive statistics
stats = agent.get_stats()

print(f"Total queries: {stats['total_queries']}")
print(f"Cache hit rate: {stats['cache_hits'] / stats['total_queries'] * 100:.1f}%")
print(f"Active connections: {stats['pool']['checked_out']}")
print(f"Total sessions: {stats['sessions']['total_sessions']}")
```

## 🎯 Framework Integration

### Django

```python
# views.py
from django.conf import settings
from db_query_agent import DatabaseQueryAgent

agent = DatabaseQueryAgent(
    database_url=settings.DATABASES['default']['URL'],
    openai_api_key=settings.OPENAI_API_KEY
)

def query_database(request):
    question = request.POST.get('question')
    result = agent.query(question)
    return JsonResponse(result)
```

### FastAPI

```python
# main.py
from fastapi import FastAPI
from db_query_agent import DatabaseQueryAgent

app = FastAPI()
agent = DatabaseQueryAgent(database_url=os.getenv("DATABASE_URL"))

@app.post("/query")
async def query_db(question: str):
    return agent.query(question)
```

### Flask

```python
# app.py
from flask import Flask, request
from db_query_agent import DatabaseQueryAgent

app = Flask(__name__)
agent = DatabaseQueryAgent(database_url=os.getenv("DATABASE_URL"))

@app.route('/query', methods=['POST'])
def query():
    return agent.query(request.json['question'])
```

## ⚙️ Configuration

### Environment Variables (Recommended)

Create a `.env` file with all configuration:

```bash
# Required
OPENAI_API_KEY=sk-your-api-key
DATABASE_URL=postgresql://user:pass@localhost/db

# Model Configuration
MODEL_STRATEGY=adaptive
FAST_MODEL=gpt-4o-mini
BALANCED_MODEL=gpt-4.1-mini
COMPLEX_MODEL=gpt-4.1

# Cache Configuration
CACHE_ENABLED=true
CACHE_BACKEND=memory
CACHE_SCHEMA_TTL=3600
CACHE_QUERY_TTL=300
CACHE_LLM_TTL=3600

# Safety Configuration
READ_ONLY=true
QUERY_TIMEOUT=30
MAX_RESULT_ROWS=10000

# Connection Configuration
DB_POOL_SIZE=10
DB_MAX_OVERFLOW=20

# Performance Configuration
LAZY_SCHEMA_LOADING=true
ENABLE_STREAMING=true
WARMUP_ON_INIT=false
```

Then load with a single line:
```python
agent = DatabaseQueryAgent.from_env()
```

### Direct Configuration

Pass parameters directly (overrides .env):

```python
from db_query_agent import DatabaseQueryAgent

agent = DatabaseQueryAgent(
    database_url="postgresql://...",
    openai_api_key="sk-...",
    
    # Model configuration
    model_strategy="adaptive",  # Use fast model for simple queries
    fast_model="gpt-4o-mini",   # 2s generation time
    balanced_model="gpt-4.1-mini",  # 3s generation time
    complex_model="gpt-4.1",     # 5s generation time
    
    # Cache configuration
    enable_cache=True,
    cache_backend="redis",
    schema_cache_ttl=3600,  # 1 hour
    query_cache_ttl=300,    # 5 minutes
    llm_cache_ttl=3600,     # 1 hour
    
    # Safety configuration
    read_only=True,
    allowed_tables=["users", "orders", "products"],
    blocked_tables=["sensitive_data"],
    max_query_timeout=30,
    max_result_rows=10000,
    
    # Connection configuration
    pool_size=10,
    max_overflow=20,
    
    # Performance configuration
    lazy_schema_loading=True,
    max_tables_in_context=5,
    enable_streaming=True,
    warmup_on_init=False,
    
    # Statistics configuration
    enable_statistics=True,  # Track queries, cache hits, etc.
    
    # Session configuration
    session_backend="sqlite",
    session_db_path="./sessions.db"
)
```

### Mixed Configuration

Load from `.env` and override specific values:

```python
# Load most settings from .env, override specific ones
agent = DatabaseQueryAgent.from_env(
    fast_model="gpt-4.1",  # Override model
    read_only=False,       # Override safety
    enable_statistics=True  # Add statistics
)
```

## 📊 Performance

With all optimizations enabled:

| Scenario | Response Time | Cache Hit |
|----------|---------------|-----------|
| Simple query (cached) | **0.5s** | ✅ |
| Simple query (uncached) | **1.5s** | ❌ |
| Complex query (cached) | **2s** | ✅ |
| Complex query (uncached) | **5s** | ❌ |

- **90% of queries** complete in < 3 seconds
- **First token** appears in < 500ms with streaming
- **Cache hit rate** typically > 60% in production

## 🔒 Security Features

- ✅ **Read-only mode** by default (only SELECT queries)
- ✅ **SQL injection prevention** with query parsing and validation
- ✅ **Table access control** with allowlist/blocklist
- ✅ **Query timeout** enforcement
- ✅ **Dangerous keyword detection** (DROP, DELETE, etc.)
- ✅ **Input/output guardrails** with OpenAI Agents SDK

## 📚 Documentation

### Core Documentation
- **[API Reference](./docs/API_REFERENCE.md)** - Complete API documentation for all classes and methods
- **[Integration Guides](./docs/INTEGRATION_GUIDES.md)** - Django, Flask, FastAPI, Streamlit, Jupyter integrations
- **[Troubleshooting Guide](./docs/TROUBLESHOOTING.md)** - Common issues and solutions
- **[Architecture](./docs/ARCHITECTURE.md)** - System design and architecture decisions

### Examples
- **[Basic Examples](./examples/basic_usage.py)** - 7 basic usage examples
- **[Advanced Examples](./examples/advanced_usage.py)** - 8 advanced patterns and optimizations
- **[Examples Guide](./examples/README.md)** - Overview of all examples

### Additional Resources
- [Technical Plan](./TECHNICAL_PLAN.md) - Project roadmap and implementation details
- [Speed Optimization Guide](./SPEED_OPTIMIZATION_GUIDE.md) - Performance tuning strategies
- [Changelog](./CHANGELOG.md) - Version history and updates
- [Packaging Guide](./PACKAGING.md) - How to build and release the package
- [Release Checklist](./RELEASE_CHECKLIST.md) - Pre-release verification checklist

## 🧪 Development

### Setup

```bash
# Clone repository
git clone https://github.com/yourusername/db-query-agent
cd db-query-agent

# Install dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run demo UI
streamlit run demo/streamlit_app.py
```

### Learn More

- **New to the package?** Start with [Basic Examples](./examples/basic_usage.py)
- **Integrating with a framework?** Check [Integration Guides](./docs/INTEGRATION_GUIDES.md)
- **Need help?** See [Troubleshooting Guide](./docs/TROUBLESHOOTING.md)
- **Want to understand the internals?** Read [Architecture](./docs/ARCHITECTURE.md)
```

### Project Structure

```
db-query-agent/
├── src/db_query_agent/
│   ├── __init__.py
│   ├── agent.py                    # Main agent class
│   ├── simple_multi_agent_system.py # Multi-agent orchestration
│   ├── schema_extractor.py         # Schema introspection
│   ├── cache_manager.py            # Multi-layer caching
│   ├── connection_manager.py       # DB connection pooling
│   ├── query_validator.py          # SQL validation
│   ├── session_manager.py          # Session management
│   ├── config.py                   # Configuration classes
│   └── exceptions.py               # Custom exceptions
├── docs/
│   ├── API_REFERENCE.md            # Complete API documentation
│   ├── INTEGRATION_GUIDES.md       # Framework integration guides
│   ├── TROUBLESHOOTING.md          # Common issues & solutions
│   └── ARCHITECTURE.md             # System architecture
├── examples/
│   ├── basic_usage.py              # 7 basic examples
│   ├── advanced_usage.py           # 8 advanced examples
│   └── README.md                   # Examples documentation
├── demo/
│   ├── streamlit_app.py            # Interactive demo UI
│   └── create_demo_db.py           # Demo database setup
├── tests/                          # 133 tests (all passing)
│   ├── test_dynamic_configuration.py
│   ├── test_streaming.py
│   ├── test_utility_methods.py
│   └── test_phase4_integration.py
└── pyproject.toml                  # Package configuration
```

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- Built with [OpenAI Agents SDK](https://github.com/openai/openai-agents-python)
- Database abstraction by [SQLAlchemy](https://www.sqlalchemy.org/)
- SQL parsing by [sqlparse](https://github.com/andialbrecht/sqlparse)

## 📧 Support

- 📖 [Documentation](https://github.com/yourusername/db-query-agent#readme)
- 🐛 [Issue Tracker](https://github.com/yourusername/db-query-agent/issues)
- 💬 [Discussions](https://github.com/yourusername/db-query-agent/discussions)

---

**Made with ❤️ for developers who want to query databases with natural language**
