Metadata-Version: 2.4
Name: rag007
Version: 0.2.2
Summary: rag007 — multi-backend retrieval-augmented generation with LangGraph
Project-URL: Homepage, https://github.com/bmsuisse/rag007
Project-URL: Repository, https://github.com/bmsuisse/rag007
Author: Dominik Peter
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: langchain-core>=0.3.0
Requires-Dist: langchain-openai>=0.3.0
Requires-Dist: langchain>=0.3.0
Requires-Dist: langgraph>=0.4.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: requests>=2.31.0
Requires-Dist: stop-words>=2024.1.1
Requires-Dist: tenacity>=8.2.0
Provides-Extra: all
Requires-Dist: azure-identity>=1.19.0; extra == 'all'
Requires-Dist: azure-search-documents>=11.6.0; extra == 'all'
Requires-Dist: chromadb>=1.0.0; extra == 'all'
Requires-Dist: cohere>=5.21.1; extra == 'all'
Requires-Dist: duckdb>=1.2.0; extra == 'all'
Requires-Dist: faiss-cpu>=1.9.0; extra == 'all'
Requires-Dist: httpx>=0.27.0; extra == 'all'
Requires-Dist: lancedb>=0.20.0; extra == 'all'
Requires-Dist: meilisearch>=0.40.0; extra == 'all'
Requires-Dist: pgvector>=0.4.0; extra == 'all'
Requires-Dist: psycopg[binary]>=3.2.0; extra == 'all'
Requires-Dist: python-dotenv>=1.2.2; extra == 'all'
Requires-Dist: qdrant-client>=1.12.0; extra == 'all'
Requires-Dist: rerankers>=0.6.0; extra == 'all'
Requires-Dist: rich>=13.0.0; extra == 'all'
Requires-Dist: sentence-transformers>=3.0.0; extra == 'all'
Provides-Extra: azure
Requires-Dist: azure-identity>=1.19.0; extra == 'azure'
Requires-Dist: azure-search-documents>=11.6.0; extra == 'azure'
Provides-Extra: casino-royale
Requires-Dist: chromadb>=1.0.0; extra == 'casino-royale'
Requires-Dist: httpx>=0.27.0; extra == 'casino-royale'
Requires-Dist: python-dotenv>=1.2.2; extra == 'casino-royale'
Requires-Dist: rich>=13.0.0; extra == 'casino-royale'
Provides-Extra: chromadb
Requires-Dist: chromadb>=1.0.0; extra == 'chromadb'
Provides-Extra: cli
Requires-Dist: python-dotenv>=1.2.2; extra == 'cli'
Requires-Dist: rich>=13.0.0; extra == 'cli'
Provides-Extra: cohere
Requires-Dist: cohere>=5.21.1; extra == 'cohere'
Provides-Extra: duckdb
Requires-Dist: duckdb>=1.2.0; extra == 'duckdb'
Provides-Extra: eval
Requires-Dist: bm25s>=0.3.3; extra == 'eval'
Requires-Dist: datasets>=4.8.4; extra == 'eval'
Requires-Dist: mteb>=2.12.11; extra == 'eval'
Requires-Dist: pandas>=3.0.2; extra == 'eval'
Requires-Dist: pyarrow>=23.0.1; extra == 'eval'
Requires-Dist: python-dotenv>=1.2.2; extra == 'eval'
Requires-Dist: rich>=13.0.0; extra == 'eval'
Provides-Extra: faiss
Requires-Dist: faiss-cpu>=1.9.0; extra == 'faiss'
Provides-Extra: goldeneye
Requires-Dist: cohere>=5.21.1; extra == 'goldeneye'
Requires-Dist: meilisearch>=0.40.0; extra == 'goldeneye'
Requires-Dist: python-dotenv>=1.2.2; extra == 'goldeneye'
Requires-Dist: rich>=13.0.0; extra == 'goldeneye'
Provides-Extra: goldfinger
Requires-Dist: azure-identity>=1.19.0; extra == 'goldfinger'
Requires-Dist: azure-search-documents>=11.6.0; extra == 'goldfinger'
Requires-Dist: cohere>=5.21.1; extra == 'goldfinger'
Requires-Dist: python-dotenv>=1.2.2; extra == 'goldfinger'
Requires-Dist: rich>=13.0.0; extra == 'goldfinger'
Provides-Extra: huggingface
Requires-Dist: sentence-transformers>=3.0.0; extra == 'huggingface'
Provides-Extra: jina
Requires-Dist: httpx>=0.27.0; extra == 'jina'
Provides-Extra: lancedb
Requires-Dist: lancedb>=0.20.0; extra == 'lancedb'
Requires-Dist: pyarrow>=23.0.1; extra == 'lancedb'
Provides-Extra: meilisearch
Requires-Dist: meilisearch>=0.40.0; extra == 'meilisearch'
Provides-Extra: moonraker
Requires-Dist: chromadb>=1.0.0; extra == 'moonraker'
Requires-Dist: python-dotenv>=1.2.2; extra == 'moonraker'
Requires-Dist: rich>=13.0.0; extra == 'moonraker'
Requires-Dist: sentence-transformers>=3.0.0; extra == 'moonraker'
Provides-Extra: pgvector
Requires-Dist: pgvector>=0.4.0; extra == 'pgvector'
Requires-Dist: psycopg[binary]>=3.2.0; extra == 'pgvector'
Provides-Extra: qdrant
Requires-Dist: qdrant-client>=1.12.0; extra == 'qdrant'
Provides-Extra: recommended
Requires-Dist: cohere>=5.21.1; extra == 'recommended'
Requires-Dist: meilisearch>=0.40.0; extra == 'recommended'
Requires-Dist: python-dotenv>=1.2.2; extra == 'recommended'
Requires-Dist: rich>=13.0.0; extra == 'recommended'
Provides-Extra: rerankers
Requires-Dist: rerankers>=0.6.0; extra == 'rerankers'
Provides-Extra: skyfall
Requires-Dist: azure-identity>=1.19.0; extra == 'skyfall'
Requires-Dist: azure-search-documents>=11.6.0; extra == 'skyfall'
Requires-Dist: chromadb>=1.0.0; extra == 'skyfall'
Requires-Dist: cohere>=5.21.1; extra == 'skyfall'
Requires-Dist: duckdb>=1.2.0; extra == 'skyfall'
Requires-Dist: faiss-cpu>=1.9.0; extra == 'skyfall'
Requires-Dist: httpx>=0.27.0; extra == 'skyfall'
Requires-Dist: lancedb>=0.20.0; extra == 'skyfall'
Requires-Dist: meilisearch>=0.40.0; extra == 'skyfall'
Requires-Dist: pgvector>=0.4.0; extra == 'skyfall'
Requires-Dist: psycopg[binary]>=3.2.0; extra == 'skyfall'
Requires-Dist: python-dotenv>=1.2.2; extra == 'skyfall'
Requires-Dist: qdrant-client>=1.12.0; extra == 'skyfall'
Requires-Dist: rerankers>=0.6.0; extra == 'skyfall'
Requires-Dist: rich>=13.0.0; extra == 'skyfall'
Requires-Dist: sentence-transformers>=3.0.0; extra == 'skyfall'
Provides-Extra: spectre
Requires-Dist: pgvector>=0.4.0; extra == 'spectre'
Requires-Dist: psycopg[binary]>=3.2.0; extra == 'spectre'
Requires-Dist: python-dotenv>=1.2.2; extra == 'spectre'
Requires-Dist: rich>=13.0.0; extra == 'spectre'
Requires-Dist: sentence-transformers>=3.0.0; extra == 'spectre'
Provides-Extra: thunderball
Requires-Dist: cohere>=5.21.1; extra == 'thunderball'
Requires-Dist: python-dotenv>=1.2.2; extra == 'thunderball'
Requires-Dist: qdrant-client>=1.12.0; extra == 'thunderball'
Requires-Dist: rich>=13.0.0; extra == 'thunderball'
Description-Content-Type: text/markdown

# rag007 🕵️🍸🚗🎯 — Licensed to Retrieve

<div align="center">

**Not just hybrid search. A true autonomous retrieval agent.**  
Shaken, not stirred — plug in any vector store, any LLM, any reranker.  
The mission: find the right documents, neutralise irrelevant noise, and deliver the answer. Every time.

[![PyPI](https://img.shields.io/pypi/v/rag007)](https://pypi.org/project/rag007/)
[![Python](https://img.shields.io/pypi/pyversions/rag007)](https://pypi.org/project/rag007/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![CI](https://github.com/bmsuisse/rag007/actions/workflows/ci.yml/badge.svg)](https://github.com/bmsuisse/rag007/actions/workflows/ci.yml)

</div>

---

```python
from rag007 import init_agent

rag = init_agent("documents", model="openai:gpt-5.4", backend="qdrant")
state = rag.chat("What is the status of operation overlord?")
# Your answer. Shaken, not stirred.
```

---

## 🕵️ The Agent

*"We have a problem. Millions of documents. One question. And the clock is ticking."*

Most retrieval systems send a junior analyst — one query, one pass, done. Fast, cheap, and dangerously incomplete.

**So they sent rag007.** Licensed to retrieve. Never satisfied with *good enough*.

Before every mission, rag007 visits Q's lab: **8 backends** to operate from, **any LLM** as the intelligence source, **precision rerankers** to separate signal from noise, and a **tool-calling agent** that inspects schemas, builds filters on the fly, and adapts to whatever the index throws at it.

In the field, it **plans**, **infiltrates**, and **interrogates** — running parallel searches across BM25 and vector space, fusing the evidence, and cross-examining every result through an LLM quality gate. When the trail goes cold, it rewrites the query and tries again. It doesn't stop until the mission is complete.

Only once the evidence is airtight does it surface the answer. **Cited. Grounded. Delivered.**

> 🍸 *"Shaken, not stirred — and always on target."* 🎯

Not in the name of any crown or government. In the name of **whoever is seeking the truth in their data**.

---

## 🕵️ How It Works

Most RAG libraries are **pipelines** — query in, documents out, done. rag007 is an **agent**.

Like a field operative, it doesn't execute a single search and report back. It thinks, adapts, and keeps going until the mission is complete:

1. 🧠 **Understands the intent** — rewrites your query into precise search keywords, detects whether it's a keyword lookup or semantic question, and adjusts the hybrid search ratio accordingly
2. 🔍 **Searches intelligently** — runs multiple query variants simultaneously across BM25 and vector search, fuses the results, and re-ranks with a dedicated reranker
3. 🧐 **Judges the results** — an LLM quality gate evaluates whether the retrieved documents actually answer the question
4. 🔄 **Adapts autonomously** — if results are off-target, rewrites the query and tries again; if a single approach fails, fans out into a swarm of parallel search strategies
5. ✍️ **Delivers the answer** — only once it's confident the evidence is solid does it generate a cited, grounded response

This is the difference between a search box and a field agent.

---

## ✨ Features

- 🚗 **Fast as an Aston Martin** — fully async pipeline, parallel HyDE + preprocessing, zero blocking calls
- 🎯 **On target, every time** — LLM quality gate rejects weak results and rewrites the query until the evidence is airtight
- 🔬 **Deep research, not shallow search** — multi-query swarm fans out across BM25 and vector space simultaneously, fusing intelligence from every angle
- 🃏 **Always has an ace up its sleeve** — when one approach fails, swarm retrieval deploys parallel strategies as backup
- 🕵️ **True agentic loop** — retrieve → judge → rewrite → retry, fully autonomous, up to `max_iter` rounds
- 🔍 **Hybrid search** — BM25 + vector, fused with RRF or DBSF
- 🧠 **HyDE** — hypothetical document embeddings for better recall on vague queries
- 🛠️ **Tool-calling agent** — `get_index_settings`, `get_filter_values`, `search_hybrid`, `search_bm25`, `rerank_results` — LLM picks tools dynamically
- 🏆 **Multi-reranker** — Cohere, HuggingFace, Jina, ColBERT, RankGPT, or custom
- 🗄️ **8 backends** — Meilisearch, Azure AI Search, ChromaDB, LanceDB, Qdrant, pgvector, DuckDB, InMemory
- 🤖 **Any LLM** — OpenAI, Azure, Anthropic, Ollama, Vertex AI, or any LangChain model
- ⚡ **One-line init** — `init_agent("docs", model="openai:gpt-5.4", backend="qdrant")` — no imports needed
- 💬 **Multi-turn chat** — conversation history with citation-aware answers
- 🎯 **Auto-strategy** — LLM samples your collection and tunes itself automatically
- 🔄 **Async-native** — every operation has a sync and async variant

---

## 📦 Install

```bash
# Recommended — Meilisearch + Cohere reranker + interactive CLI
pip install rag007[recommended]

# Base only — in-memory backend, BM25 keyword search
pip install rag007
```

| Extra | What you get | Command |
|-------|-------------|---------|
| **`recommended`** | Meilisearch + Cohere reranker + Rich CLI | `pip install rag007[recommended]` |
| `cli` | Interactive CLI with guided setup wizard | `pip install rag007[cli]` |
| `all` | Every backend + reranker + CLI | `pip install rag007[all]` |

<details>
<summary>🍸 Bond Edition extras — because every mission needs a code name</summary>

| Extra | Code name | Stack |
|-------|-----------|-------|
| `goldeneye` | GoldenEye | Meilisearch + Cohere + CLI — the classic recommended loadout |
| `skyfall` | Skyfall | Everything. All backends, all rerankers, all CLI — nothing left behind |
| `thunderball` | Thunderball | Qdrant + Cohere + CLI — vector power meets precision reranking |
| `moonraker` | Moonraker | ChromaDB + HuggingFace — fully local, no API keys, off the grid |
| `goldfinger` | Goldfinger | Azure AI Search + Azure OpenAI + Cohere — all gold, all cloud |
| `spectre` | Spectre | pgvector + HuggingFace — open-source shadow ops, no paid APIs |
| `casino-royale` | Casino Royale | ChromaDB + Jina — lightweight first mission |

```bash
pip install rag007[goldeneye]      # 🍸 The classic
pip install rag007[skyfall]        # 💥 Everything falls into place
pip install rag007[thunderball]    # ⚡ Vector power + precision
pip install rag007[moonraker]     # 🌙 Fully local, no API keys
pip install rag007[goldfinger]     # ☁️  All Azure, all gold
pip install rag007[spectre]        # 👻 Open-source, no paid APIs
pip install rag007[casino-royale]  # 🎰 Lightweight first mission
```

</details>

<details>
<summary>Individual backends &amp; rerankers</summary>

```bash
pip install rag007[meilisearch]     # 🔎 Meilisearch
pip install rag007[azure]           # ☁️  Azure AI Search
pip install rag007[chromadb]        # 🟣 ChromaDB
pip install rag007[lancedb]         # 🏹 LanceDB
pip install rag007[pgvector]        # 🐘 PostgreSQL + pgvector
pip install rag007[qdrant]          # 🟡 Qdrant
pip install rag007[duckdb]          # 🦆 DuckDB
pip install rag007[cohere]          # 🏅 Cohere reranker
pip install rag007[huggingface]     # 🤗 HuggingFace cross-encoder (local)
pip install rag007[jina]            # 🌊 Jina reranker
pip install rag007[rerankers]       # 🎯 rerankers (ColBERT, Flashrank, RankGPT, …)
```

Mix and match: `pip install rag007[qdrant,cohere,cli]`

</details>

---

## 🚀 Quick Start

### One-liner with `init_agent`

The fastest way to get started — no provider imports, string aliases for everything:

```python
from rag007 import init_agent

# Minimal — in-memory backend, LLM from env vars
rag = init_agent("docs")

# OpenAI + Qdrant + Cohere reranker
rag = init_agent(
    "my-collection",
    model="openai:gpt-5.4",
    backend="qdrant",
    backend_url="http://localhost:6333",
    reranker="cohere",
)

# Anthropic + Azure AI Search (native vectorisation, no client-side embeddings)
rag = init_agent(
    "my-index",
    model="anthropic:claude-sonnet-4-6",
    gen_model="anthropic:claude-opus-4-6",
    backend="azure",
    backend_url="https://my-search.search.windows.net",
    reranker="huggingface",
    auto_strategy=True,
)

# Fully local — Ollama + ChromaDB + HuggingFace cross-encoder
rag = init_agent(
    "docs",
    model="ollama:llama3",
    backend="chroma",
    reranker="huggingface",
    reranker_model="cross-encoder/ms-marco-MiniLM-L-6-v2",
)
```

**Backend aliases**

| Alias | Class | Extra |
|-------|-------|-------|
| `"memory"` / `"in_memory"` | `InMemoryBackend` | _(none)_ |
| `"chroma"` / `"chromadb"` | `ChromaDBBackend` | `rag007[chromadb]` |
| `"qdrant"` | `QdrantBackend` | `rag007[qdrant]` |
| `"lancedb"` / `"lance"` | `LanceDBBackend` | `rag007[lancedb]` |
| `"duckdb"` | `DuckDBBackend` | `rag007[duckdb]` |
| `"pgvector"` / `"pg"` | `PgvectorBackend` | `rag007[pgvector]` |
| `"meilisearch"` | `MeilisearchBackend` | `rag007[meilisearch]` |
| `"azure"` | `AzureAISearchBackend` | `rag007[azure]` |

**Reranker aliases**

| Alias | Class | `reranker_model` | Extra |
|-------|-------|-----------------|-------|
| `"cohere"` | `CohereReranker` | Cohere model name (default: `rerank-v3.5`) | `rag007[cohere]` |
| `"huggingface"` / `"hf"` | `HuggingFaceReranker` | HF model name (default: `cross-encoder/ms-marco-MiniLM-L-6-v2`) | `rag007[huggingface]` |
| `"jina"` | `JinaReranker` | Jina model name (default: `jina-reranker-v2-base-multilingual`) | `rag007[jina]` |
| `"llm"` | `LLMReranker` | _(uses the agent's LLM)_ | _(none)_ |
| `"rerankers"` | `RerankersReranker` | Any model from the `rerankers` library | `rag007[rerankers]` |

```python
# Cohere (default model)
rag = init_agent("docs", model="openai:gpt-5.4", reranker="cohere")

# HuggingFace — multilingual model
rag = init_agent("docs", model="openai:gpt-5.4", reranker="huggingface",
                 reranker_model="cross-encoder/mmarco-mMiniLMv2-L12-H384-v1")

# Jina
rag = init_agent("docs", model="openai:gpt-5.4", reranker="jina")  # uses JINA_API_KEY

# ColBERT via rerankers library
rag = init_agent("docs", model="openai:gpt-5.4", reranker="rerankers",
                 reranker_model="colbert-ir/colbertv2.0",
                 reranker_kwargs={"model_type": "colbert"})

# Pass a pre-built reranker instance directly
from rag007 import CohereReranker
rag = init_agent("docs", reranker=CohereReranker(model="rerank-v3.5", api_key="..."))
```

**Model strings:** any `"provider:model-name"` from LangChain's `init_chat_model` — `openai`, `anthropic`, `azure_openai`, `google_vertexai`, `ollama`, `groq`, `mistralai`, and more

### Manual setup

```python
from rag007 import Agent, InMemoryBackend

backend = InMemoryBackend(embed_fn=my_embed_fn)
backend.add_documents([
    {"content": "RAG combines retrieval with generation", "source": "wiki"},
    {"content": "Vector search finds similar embeddings", "source": "docs"},
])

rag = Agent(index="demo", backend=backend)

# Single query → full answer
state = rag.invoke("What is retrieval-augmented generation?")
print(state.answer)

# Retrieve only — documents without LLM answer
query, docs = rag.retrieve_documents("What is retrieval-augmented generation?")
for doc in docs:
    print(doc.page_content)

# Override top-K at call time
query, docs = rag.retrieve_documents("hybrid search", top_k=3)
```

### `Agent.from_model` — model string with explicit backend

```python
from rag007 import Agent, QdrantBackend

rag = Agent.from_model(
    "openai:gpt-5.4-mini",          # fast model for routing & rewriting
    index="docs",
    gen_model="openai:gpt-5.4",     # powerful model for the final answer
    backend=QdrantBackend("docs", url="http://localhost:6333"),
)
```

---

## 💬 Multi-turn Chat

```python
from rag007 import Agent, ConversationTurn

rag = Agent(index="articles")
history: list[ConversationTurn] = []

state = rag.chat("What is hybrid search?", history)
history.append(ConversationTurn(question="What is hybrid search?", answer=state.answer))

state = rag.chat("How does it compare to pure vector search?", history)
print(state.answer)
print(f"Sources: {len(state.documents)}")
```

Async variant:

```python
state = await rag.achat("What is hybrid search?", history)
```

---

## 🏗️ Architecture

rag007 has two operating modes — both fully autonomous:

### Graph mode (`rag.chat` / `rag.invoke`)

The default. A LangGraph state machine that runs the full agentic pipeline:

```
Query
  │
  ├─[HyDE]──────────────────────────────────────────┐
  │  Hypothetical document embedding (parallel)      │
  │                                                  ▼
  ▼                                         [Embed HyDE text]
[Preprocess]                                         │
  Extract keywords + variants                        │
  Detect semantic_ratio + fusion strategy            │
  │                                                  │
  └──────────────────────────────────────────────────┘
                        │
                        ▼
              [Hybrid Search × N queries]
               BM25 + Vector, multi-arm
                        │
                        ▼
               [RRF / DBSF Fusion]
                        │
                        ▼
                    [Rerank]
               Cohere / HF / Jina / LLM
                        │
                        ▼
               [Quality Gate]
               LLM judges relevance
                   │         │
                (good)     (bad)
                   │         │
                   ▼         ▼
              [Generate]  [Rewrite] ──► loop (max_iter)
                   │
                   ▼
        Answer + [n] inline citations
```

### Tool-calling agent mode (`rag.invoke_agent`)

The agent receives a set of tools and reasons step-by-step, calling them in whatever order makes sense for the question. No fixed pipeline — pure field improvisation:

```
Query
  │
  ▼
[LLM Agent]  ◄──────────────────────────────────────┐
  Thinks: "What do I need to answer this?"           │
  │                                                  │
  ├── get_index_settings()                           │
  │   Discover filterable / sortable / boost fields  │
  │                                                  │
  ├── get_filter_values(field)                       │
  │   Sample real stored values for a field          │
  │   → build precise filter expressions             │
  │                                                  │
  ├── search_hybrid(query, filter, sort_fields)      │
  │   BM25 + vector, optional filter + sort boost    │
  │                                                  │
  ├── search_bm25(query, filter)                     │
  │   Fallback pure keyword search                   │
  │                                                  │
  ├── rerank_results(query, hits)                    │
  │   Re-rank with configured reranker               │
  │                                                  │
  └── [needs more info?] ─────────────────────────► │

  [done]
  │
  ▼
Answer  (tool calls explained inline)
```

Use `invoke_agent` when questions involve **dynamic filtering** — the agent inspects the index schema, samples real field values, builds filters on the fly, and decides whether to sort by business signals like popularity or recency.

---

## 🗄️ Backends

### ☁️ Azure AI Search

Native hybrid search — no client-side embeddings needed when the index has an integrated vectorizer:

```python
from rag007 import Agent, AzureAISearchBackend

# Native vectorization — service embeds the query server-side
rag = Agent(
    index="my-index",
    backend=AzureAISearchBackend(
        "my-index",
        endpoint="https://my-search.search.windows.net",
        api_key="...",
    ),
)

# Client-side vectorization
rag = Agent(
    index="my-index",
    backend=AzureAISearchBackend(
        "my-index",
        endpoint="https://my-search.search.windows.net",
        api_key="...",
        embed_fn=my_embed_fn,
    ),
)

# With Azure semantic reranking
rag = Agent(
    index="my-index",
    backend=AzureAISearchBackend(
        "my-index",
        endpoint="https://my-search.search.windows.net",
        api_key="...",
        semantic_config="my-semantic-config",
    ),
)
```

### 🟡 Qdrant

```python
from rag007 import Agent, QdrantBackend

rag = Agent(
    index="my_collection",
    backend=QdrantBackend("my_collection", url="http://localhost:6333", embed_fn=my_embed_fn),
)
```

### 🟣 ChromaDB

```python
from rag007 import Agent, ChromaDBBackend

rag = Agent(
    index="my_collection",
    backend=ChromaDBBackend("my_collection", path="./chroma_db", embed_fn=my_embed_fn),
)
```

### 🏹 LanceDB

```python
from rag007 import Agent, LanceDBBackend

rag = Agent(
    index="docs",
    backend=LanceDBBackend("docs", db_uri="./lancedb", embed_fn=my_embed_fn),
)
```

### 🐘 PostgreSQL + pgvector

```python
from rag007 import Agent, PgvectorBackend

rag = Agent(
    index="documents",
    backend=PgvectorBackend(
        "documents",
        dsn="postgresql://user:pass@localhost:5432/mydb",
        embed_fn=my_embed_fn,
    ),
)
```

### 🦆 DuckDB

```python
from rag007 import Agent, DuckDBBackend

rag = Agent(
    index="vectors",
    backend=DuckDBBackend("vectors", db_path="./my.duckdb", embed_fn=my_embed_fn),
)
```

### 🔎 Meilisearch

```python
from rag007 import Agent, MeilisearchBackend

rag = Agent(
    index="articles",
    backend=MeilisearchBackend("articles", url="http://localhost:7700", api_key="masterKey"),
)
```

### 📦 InMemory (default, zero dependencies)

```python
from rag007 import Agent, InMemoryBackend

backend = InMemoryBackend(embed_fn=my_embed_fn)
backend.add_documents([
    {"content": "RAG combines retrieval with generation", "source": "wiki"},
    {"content": "Vector search finds similar embeddings", "source": "docs"},
])

rag = Agent(index="demo", backend=backend)
```

---

## 🤖 LLM Configuration

Pass a pre-built LangChain model or use `init_agent` / `Agent.from_model` for string-based init.  
When using `Agent` directly, configure via env vars or pass an explicit model instance.

### OpenAI

```python
from langchain_openai import ChatOpenAI
from rag007 import Agent

rag = Agent(
    index="articles",
    llm=ChatOpenAI(model="gpt-5.4", api_key="sk-..."),
    gen_llm=ChatOpenAI(model="gpt-5.4", api_key="sk-..."),
)
```

### Azure OpenAI (explicit keys)

```python
from langchain_openai import AzureChatOpenAI
from rag007 import Agent

llm = AzureChatOpenAI(
    azure_endpoint="https://my-resource.openai.azure.com",
    azure_deployment="gpt-5.4",
    api_key="...",
    api_version="2024-12-01-preview",
)
rag = Agent(index="articles", llm=llm, gen_llm=llm)
```

### Azure OpenAI (env vars)

```python
# Set: AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT
from rag007 import Agent

rag = Agent(index="articles")  # auto-detected
```

### Azure OpenAI with Managed Identity (no API key)

```python
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from langchain_openai import AzureChatOpenAI
from rag007 import Agent

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)
llm = AzureChatOpenAI(
    azure_endpoint="https://my-resource.openai.azure.com",
    azure_deployment="gpt-5.4",
    azure_ad_token_provider=token_provider,
    api_version="2024-12-01-preview",
)
rag = Agent(index="articles", llm=llm, gen_llm=llm)
```

### Anthropic Claude

```bash
pip install langchain-anthropic
```

```python
from langchain_anthropic import ChatAnthropic
from rag007 import Agent

llm = ChatAnthropic(model="claude-sonnet-4-6", api_key="sk-ant-...")
rag = Agent(index="articles", llm=llm, gen_llm=llm)
```

### Ollama (local, no API key)

```bash
pip install langchain-ollama
```

```python
from langchain_ollama import ChatOllama
from rag007 import Agent

rag = Agent(
    index="articles",
    llm=ChatOllama(model="llama3.2", base_url="http://localhost:11434"),
    gen_llm=ChatOllama(model="llama3.2", base_url="http://localhost:11434"),
)
```

### Google Vertex AI

```bash
pip install langchain-google-vertexai
```

```python
from langchain_google_vertexai import ChatVertexAI
from rag007 import Agent

llm = ChatVertexAI(model="gemini-2.0-flash", project="my-gcp-project", location="us-central1")
rag = Agent(index="articles", llm=llm, gen_llm=llm)
```

### Separate fast and generation models

Use a cheap/fast model for query rewriting and routing, a powerful model for the final answer:

```python
from langchain_openai import AzureChatOpenAI
from rag007 import Agent

fast_llm = AzureChatOpenAI(azure_deployment="gpt-5.4-mini", api_key="...", api_version="2024-12-01-preview")
gen_llm  = AzureChatOpenAI(azure_deployment="gpt-5.4",      api_key="...", api_version="2024-12-01-preview")

rag = Agent(index="articles", llm=fast_llm, gen_llm=gen_llm)
```

---

## 🏆 Rerankers

### 🏅 Cohere

```python
from rag007 import Agent, CohereReranker

rag = Agent(index="articles", reranker=CohereReranker(model="rerank-v3.5", api_key="..."))
```

### 🤗 HuggingFace cross-encoder (local, no API key)

```bash
pip install rag007[huggingface]
```

```python
from rag007 import Agent, HuggingFaceReranker

rag = Agent(index="articles", reranker=HuggingFaceReranker())

# Multilingual
rag = Agent(index="articles", reranker=HuggingFaceReranker(model="cross-encoder/mmarco-mMiniLMv2-L12-H384-v1"))
```

### 🌊 Jina (multilingual API)

```bash
pip install rag007[jina]
```

```python
from rag007 import Agent, JinaReranker

rag = Agent(index="articles", reranker=JinaReranker(api_key="..."))  # or JINA_API_KEY env var
```

### 🎯 rerankers — ColBERT / Flashrank / RankGPT / any cross-encoder

Unified bridge to the [`rerankers`](https://github.com/AnswerDotAI/rerankers) library by answer.ai:

```bash
pip install rag007[rerankers]
```

```python
from rag007 import Agent, RerankersReranker

rag = Agent(index="articles", reranker=RerankersReranker("cross-encoder/ms-marco-MiniLM-L-6-v2", model_type="cross-encoder"))
rag = Agent(index="articles", reranker=RerankersReranker("colbert-ir/colbertv2.0", model_type="colbert"))
rag = Agent(index="articles", reranker=RerankersReranker("flashrank", model_type="flashrank"))
rag = Agent(index="articles", reranker=RerankersReranker("gpt-5.4-mini", model_type="rankgpt", api_key="..."))
```

### 🔧 Custom reranker

```python
from rag007 import Agent, RerankResult

class MyReranker:
    def rerank(self, query: str, documents: list[str], top_n: int) -> list[RerankResult]:
        return [RerankResult(index=i, relevance_score=1.0 / (i + 1)) for i in range(top_n)]

rag = Agent(index="articles", reranker=MyReranker())
```

---

## 🛠️ Tools

When using `invoke_agent`, the LLM has access to a set of tools it can call in any order. No fixed pipeline — the agent decides what it needs.

| Tool | Description |
|------|-------------|
| `get_index_settings()` | Discover filterable, searchable, sortable, and boost fields from the index schema |
| `get_filter_values(field)` | Sample real stored values for a field — used to build precise filter expressions |
| `search_hybrid(query, filter_expr, semantic_ratio, sort_fields)` | BM25 + vector hybrid search with optional filter and sort boost |
| `search_bm25(query, filter_expr)` | Pure keyword search — fallback when hybrid returns poor results |
| `rerank_results(query, hits)` | Re-rank a list of hits with the configured reranker |

The agent follows this reasoning pattern:

1. Call `get_index_settings()` to learn the schema
2. If the question names a specific entity, call `get_filter_values(field)` to find the exact stored value
3. Call `search_hybrid()` with a filter and/or sort if relevant, otherwise broad hybrid search
4. Fall back to `search_bm25()` if results are thin
5. Call `rerank_results()` to surface the most relevant hits
6. Summarise — explaining which filters and signals influenced the answer

```python
from rag007 import Agent

rag = Agent(index="products")

# Agent inspects schema, detects brand field, samples values,
# builds filter, sorts by popularity signal — all autonomously
result = rag.invoke_agent("Show me the most popular Bosch power tools")
print(result)
```

---

## ⚙️ Constructor Reference

```python
Agent(
    index="my_index",           # collection / index name
    backend=...,                # SearchBackend (default: InMemoryBackend)
    llm=...,                    # fast LLM — routing, rewrite, filter
    gen_llm=...,                # generation LLM — final answer
    reranker=...,               # Cohere / HuggingFace / Jina / custom
    top_k=10,                   # final result count            [RAG_TOP_K]
    rerank_top_n=5,             # reranker top-n                [RAG_RERANK_TOP_N]
    retrieval_factor=4,         # over-retrieval multiplier     [RAG_RETRIEVAL_FACTOR]
    max_iter=20,                # max retrieve-rewrite cycles   [RAG_MAX_ITER]
    semantic_ratio=0.5,         # hybrid semantic weight        [RAG_SEMANTIC_RATIO]
    fusion="rrf",               # "rrf" or "dbsf"               [RAG_FUSION]
    instructions="",            # extra system prompt for generation
    embed_fn=None,              # (str) -> list[float]
    boost_fn=None,              # (doc_dict) -> float score boost
    base_filter=None,           # always-on filter expression
    hyde_min_words=8,           # min words to trigger HyDE     [RAG_HYDE_MIN_WORDS]
    hyde_style_hint="",         # style hint for HyDE prompt
    auto_strategy=False,        # auto-tune from index samples
)
```

---

## 📡 API Reference

| Method | Returns | Description |
|--------|---------|-------------|
| `rag.invoke(query)` | `RAGState` | Full RAG pipeline (sync) |
| `rag.ainvoke(query)` | `RAGState` | Full RAG pipeline (async) |
| `rag.chat(query, history)` | `RAGState` | Multi-turn chat (sync) |
| `rag.achat(query, history)` | `RAGState` | Multi-turn chat (async) |
| `rag.retrieve_documents(query, top_k)` | `(str, list[Document])` | Retrieve only, no answer |
| `rag.query(query)` | `str` | Answer string directly |
| `rag.invoke_agent(query)` | `str` | Tool-calling agent mode (sync) |
| `rag.ainvoke_agent(query)` | `str` | Tool-calling agent mode (async) |

`RAGState` fields: `answer` · `documents` · `query` · `question` · `history` · `iterations`

---

## 🌍 Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint | — |
| `AZURE_OPENAI_API_KEY` | Azure OpenAI API key | — |
| `AZURE_OPENAI_DEPLOYMENT` | Default deployment | — |
| `AZURE_OPENAI_FAST_DEPLOYMENT` | Fast model deployment | → `DEPLOYMENT` |
| `AZURE_OPENAI_GENERATION_DEPLOYMENT` | Generation deployment | → `DEPLOYMENT` |
| `AZURE_OPENAI_API_VERSION` | API version | `2024-12-01-preview` |
| `OPENAI_API_KEY` | OpenAI API key (fallback) | — |
| `OPENAI_MODEL` | OpenAI model name | `gpt-5.4` |
| `AZURE_COHERE_ENDPOINT` | Azure Cohere endpoint | — |
| `AZURE_COHERE_API_KEY` | Azure Cohere API key | — |
| `COHERE_API_KEY` | Cohere API key (fallback) | — |
| `JINA_API_KEY` | Jina reranker API key | — |
| `MEILI_URL` | Meilisearch URL | `http://localhost:7700` |
| `MEILI_KEY` | Meilisearch API key | `masterKey` |
| `RAG_TOP_K` | Final result count | `10` |
| `RAG_RERANK_TOP_N` | Reranker top-n | `5` |
| `RAG_RETRIEVAL_FACTOR` | Over-retrieval multiplier | `4` |
| `RAG_SEMANTIC_RATIO` | Hybrid semantic weight | `0.5` |
| `RAG_FUSION` | Fusion strategy | `rrf` |
| `RAG_HYDE_MIN_WORDS` | Min words to trigger HyDE | `8` |

---

## 🖥️ CLI

*"The gadgets are ready."*

```bash
pip install rag007[recommended]

# 🧙 Guided setup wizard — choose LLM, embedder, backend, reranker
rag007

# 💬 Chat mode — full agentic pipeline
rag007 --chat -c my_index

# 🔍 Retriever mode — documents only, no LLM
rag007 --retriever -c my_index

# ⚡ Skip wizard, use env vars
rag007 --skip-wizard -c my_index
```

The wizard guides you through:
1. **LLM provider** — OpenAI, Anthropic, Ollama, or env default
2. **Embedding model** — OpenAI, Azure OpenAI, Ollama, or none (BM25 only)
3. **Vector store** — InMemory, Meilisearch, ChromaDB, Qdrant, pgvector, DuckDB, LanceDB, Azure AI Search
4. **Reranker** — Cohere, Jina, HuggingFace, LLM-based, or none
5. **Mode** — Chat (with answers) or Retriever (documents only)

---

## 📄 License

MIT — *Licence to code.*
