Metadata-Version: 2.4
Name: gvdb
Version: 0.22.0
Summary: Python client for GVDB distributed vector database
Project-URL: Homepage, https://github.com/JonathanBerhe/gvdb
Project-URL: Repository, https://github.com/JonathanBerhe/gvdb
License-Expression: Apache-2.0
Requires-Python: >=3.9
Requires-Dist: grpcio>=1.60.0
Requires-Dist: protobuf>=4.25.0
Provides-Extra: h5ad
Requires-Dist: anndata>=0.10; extra == 'h5ad'
Requires-Dist: numpy>=1.24; extra == 'h5ad'
Provides-Extra: import
Requires-Dist: numpy>=1.24; extra == 'import'
Requires-Dist: pandas>=2.0; extra == 'import'
Requires-Dist: pyarrow>=14.0; extra == 'import'
Requires-Dist: tqdm>=4.60; extra == 'import'
Provides-Extra: import-all
Requires-Dist: anndata>=0.10; extra == 'import-all'
Requires-Dist: numpy>=1.24; extra == 'import-all'
Requires-Dist: pandas>=2.0; extra == 'import-all'
Requires-Dist: polars>=0.20; extra == 'import-all'
Requires-Dist: pyarrow>=14.0; extra == 'import-all'
Requires-Dist: tqdm>=4.60; extra == 'import-all'
Provides-Extra: numpy
Requires-Dist: numpy>=1.24; extra == 'numpy'
Provides-Extra: pandas
Requires-Dist: pandas>=2.0; extra == 'pandas'
Requires-Dist: pyarrow>=14.0; extra == 'pandas'
Provides-Extra: parquet
Requires-Dist: pyarrow>=14.0; extra == 'parquet'
Provides-Extra: progress
Requires-Dist: tqdm>=4.60; extra == 'progress'
Description-Content-Type: text/markdown

# gvdb

Python client for [GVDB](https://github.com/JonathanBerhe/gvdb) distributed vector database.

## Install

```bash
pip install gvdb

# With bulk import extras (Parquet, NumPy, Pandas, progress bar)
pip install gvdb[import]

# All optional dependencies
pip install gvdb[import-all]
```

## Quick Start

```python
from gvdb import GVDBClient

client = GVDBClient("localhost:50051", api_key="your-key")  # api_key is optional

# Create a collection
client.create_collection("my_vectors", dimension=768)

# Insert vectors
vectors = [[0.1, 0.2, ...], [0.3, 0.4, ...]]  # list of float lists
ids = [1, 2]
client.insert("my_vectors", ids, vectors)

# Search
results = client.search("my_vectors", query_vector=[0.1, 0.2, ...], top_k=10)
for r in results:
    print(f"ID: {r.id}, distance: {r.distance}")

# Hybrid search (BM25 + vector)
results = client.hybrid_search(
    "my_vectors",
    query_vector=[0.1, 0.2, ...],
    text_query="running shoes",
    top_k=10,
    text_field="description",   # metadata field to search
    return_metadata=True,
)

# Clean up
client.drop_collection("my_vectors")
client.close()
```

## Bulk Import

Import vectors from common ML formats. Auto-creates collections, supports resume via upsert idempotency, and shows progress bars (with `tqdm`).

```python
import numpy as np

# From NumPy array
vectors = np.random.rand(100_000, 768).astype(np.float32)
result = client.import_numpy(vectors, "embeddings")
print(result)  # ImportResult(total=100000, batches=10, elapsed=12.3s, ...)

# From Parquet (GVDB schema: id + vector + metadata columns)
result = client.import_parquet("vectors.parquet", "embeddings")

# From Pandas DataFrame
result = client.import_dataframe(df, "embeddings", vector_column="embedding")

# From CSV (JSON-encoded or dimension-prefixed vector columns)
result = client.import_csv("data.csv", "embeddings")

# From AnnData h5ad (scRNA-seq embeddings)
result = client.import_h5ad("adata.h5ad", "cells", embedding_key="X_pca")
```

All importers accept `mode="upsert"` (default, idempotent) or `mode="stream_insert"` (faster, no resume). See `ImportResult` for batch counts, timing, and failure tracking.

### Optional dependency extras

| Extra | Dependencies | For |
|-------|-------------|-----|
| `gvdb[parquet]` | pyarrow | `import_parquet` |
| `gvdb[numpy]` | numpy | `import_numpy` |
| `gvdb[pandas]` | pandas, pyarrow | `import_dataframe`, `import_csv` |
| `gvdb[h5ad]` | anndata, numpy | `import_h5ad` |
| `gvdb[progress]` | tqdm | Progress bars |
| `gvdb[import]` | All above except anndata | Common ML workflows |
| `gvdb[import-all]` | Everything + polars | All formats |
