Metadata-Version: 2.4
Name: vector-cache-memory
Version: 0.1.9
Summary: Python client for VectorCache Go DB server
Author-email: Abhishek Maurya <abhishekmaurya.official@gmail.com>, Kinjal Raykarmakar <kinjalrk2k@gmail.com>
Project-URL: Homepage, https://github.com/vector-cache/vector-cache
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.30.0
Requires-Dist: setuptools_scm>=7.1
Requires-Dist: grpcio==1.76.0
Requires-Dist: grpcio-tools==1.76.0
Requires-Dist: mypy-protobuf==3.6.0
Requires-Dist: protobuf==6.33.0
Requires-Dist: black==25.9.0
Requires-Dist: pinecone==7.3.0
Requires-Dist: python-dotenv==1.2.1

# VectorCache

![](./res/banner.png)

**VectorCache** is a high-performance, in-memory vector database with gRPC and HTTP support. It allows storing, retrieving, and searching high-dimensional vector embeddings efficiently. The project provides a Go server and Python clients with type-safe APIs.

## Features

- ✅ Vector storage and retrieval with user-defined IDs and metadata
- ✅ gRPC and HTTP clients for Python
- ✅ Multiple index types: Flat L2, Flat Inner Product (cosine similarity)
- ✅ Thread-safe in-memory database
- ✅ Graceful shutdown and logging
- ✅ Automatic server binary management for Python clients

## Usage

### Go Server

> [!WARNING]  
> The Go server is not intended to be used for standalone purpose

```sh
go run cmd/main.go --port 8000 --indexType flatL2 --dim 3 --protocol grpc
```

| Flag          | Description                             | Default |
| ------------- | --------------------------------------- | ------- |
| `--port`      | Port to listen on                       | 8000    |
| `--indexType` | Type of index (`flatL2`, `flatIP`)      | flatL2  |
| `--dim`       | Vector dimension                        | 3       |
| `--protocol`  | Communication protocol (`grpc`, `http`) | grpc    |

## Installation

### Via PyPI (Recommended)

```bash
pip install vector-cache-memory
```

---

## Quick Start

### 1. Standalone In-Memory Cache

Store and search vectors with minimal setup:

```python
import asyncio
from vector_cache import VectorCache

async def main():
    # Initialize cache (server starts automatically)
    cache = VectorCache(dim=768)  # 768-dimensional vectors
    
    # Store embeddings with metadata
    response = cache.set(
        emb=[0.1, 0.2, 0.3, ...],  # Your embedding
        data={"title": "Hello World", "url": "https://example.com"}
    )
    print(f"Stored vector: {response.uid}")
    
    # Search for similar vectors
    results = await cache.search([0.1, 0.2, 0.3, ...], top_k=5)
    
    for record in results:
        print(f"Match: {record.data['title']} (score: {record.score:.3f})")

asyncio.run(main())
```

### 2. With Pinecone Integration

Use VectorCache as a high-speed cache layer in front of Pinecone:

```python
import asyncio
from vector_cache import VectorCache, IndexType

async def main():
    cache = VectorCache(
        dim=1536,  # OpenAI embedding dimension
        indexType=IndexType.FLAT_IP,  # Cosine-like similarity
        cacheCapacity=10000,  # Keep 10k vectors in RAM
        cache_thresold=0.85,  # Use cache if score >= 0.85
        
        # Pinecone configuration
        primarydb="pinecone",
        api_key="your-pinecone-api-key",
        index_name="your-index"
    )
    
    # Search: checks cache first, falls back to Pinecone
    results = await cache.search([0.1, ...], top_k=5)
    print(results)

asyncio.run(main())
```

### 3. Batch Operations (High Throughput)

```python
import asyncio
from vector_cache import VectorCache

async def main():
    cache = VectorCache(
        protocol="grpc",  # Faster than HTTP
        cacheCapacity=50000,
    )
    
    # Batch search multiple queries in parallel
    queries = [[0.1, 0.2, ...], [0.3, 0.4, ...], [0.5, 0.6, ...]]
    tasks = [cache.search(q, top_k=5) for q in queries]
    all_results = await asyncio.gather(*tasks)
    
    for results in all_results:
        print(f"Got {len(results)} matches")

asyncio.run(main())
```

---

## Core Concepts

### VectorCache Class

The main entry point. It manages:
- The local vector database server
- Client-side search/store operations
- Automatic lifecycle management (start on init, stop on exit)

```python
from vector_cache import VectorCache, Protocol, IndexType, EvictionPolicy

cache = VectorCache(
    port=6379,                          # Server port
    dim=512,                            # Vector dimensionality
    indexType=IndexType.FLAT_L2,        # L2 distance or FLAT_IP for cosine
    protocol=Protocol.GRPC,             # gRPC (faster) or HTTP
    eviction_policy=EvictionPolicy.FIFO,  # Cache eviction strategy
    cacheCapacity=1000,                 # Max vectors in memory
    cache_thresold=0.8,                 # Use cache if similarity > 0.8
    log="stdout",                       # Log output
)
```

### RecordData

Returned by `search()`. Contains the matched vector and its metadata:

```python
from vector_cache import RecordData

# structure:
# RecordData(
#   uid: str,              # Unique ID for this vector
#   emb: List[float],      # The stored embedding
#   data: Dict[str, Any],  # Your metadata
#   score: float           # Similarity score
# )

results = await cache.search(query, top_k=3)
for r in results:
    print(f"ID: {r.uid}, Score: {r.score:.3f}, Data: {r.data}")
```

### SetResponseData

Returned by `set()`. Indicates insertion status:

```python
from vector_cache import SetResponseData

response = cache.set(emb, data)
# response.uid: the assigned ID (save for later reference)
# response.status: "success" or error message

if response and response.status == "success":
    print(f"Stored with ID: {response.uid}")
```

---

## Configuration Guide

### Vector Dimensionality

Set `dim` to match your embeddings:
- **768**: BERT, sentence-transformers
- **1536**: OpenAI text-embedding-ada-002
- **512**: Custom embeddings (default)
- **1024**: Cohere, Jina embeddings

### Index Type

Choose based on your embedding type:

| IndexType | Use Case | Similarity Metric |
|-----------|----------|-------------------|
| `FLAT_L2` | Raw embeddings, image features | Euclidean distance |
| `FLAT_IP` | Normalized embeddings, cosine-like | Inner product |

```python
# For normalized text embeddings (cosine similarity)
cache = VectorCache(indexType=IndexType.FLAT_IP)

# For raw feature vectors (L2 distance)
cache = VectorCache(indexType=IndexType.FLAT_L2)
```

### Cache Capacity & Eviction

```python
cache = VectorCache(
    cacheCapacity=10000,        # Keep up to 10k vectors
    # Memory ≈ vectors × dim × 4 bytes
    # 10k vectors × 768 dim × 4 = ~30 MB
    eviction_policy=EvictionPolicy.FIFO,  # Remove oldest when full
)
```

### Protocol Selection

```python
# gRPC (default, recommended for production)
cache = VectorCache(protocol=Protocol.GRPC)  # Lower latency, binary

# HTTP (good for debugging, firewall-friendly)
cache = VectorCache(protocol=Protocol.HTTP)  # Easy to inspect with curl
```

---

## Integrations

### Pinecone

Combine VectorCache's low-latency caching with Pinecone's durability:

```python
cache = VectorCache(
    dim=1536,
    primarydb="pinecone",
    api_key="pcn_...",
    index_name="my-index",
    cache_thresold=0.85,  # Only use cache if score > 0.85
    cacheCapacity=5000,   # Keep recent/hot results cached
)

# Search workflow:
# 1. Query VectorCache in-memory cache
# 2. If cache miss or low score, query Pinecone
# 3. Asynchronously populate cache with Pinecone results
results = await cache.search(query, top_k=10)
```

**Best Practices:**
- Keep `cache_thresold` high (0.8+) to minimize stale data
- Store lightweight metadata in cache; keep large blobs in external storage
- Monitor cache hit rate and adjust capacity based on workload

---

## API Reference

### VectorCache Methods

#### `set(emb: List[float], data: Dict[str, Any]) -> SetResponseData`

Store a vector in the cache.

```python
response = cache.set(
    emb=[0.1, 0.2, 0.3],
    data={"id": "doc1", "title": "My Document"}
)
print(response.uid)  # Unique ID for the vector
```

**Parameters:**
- `emb` (List[float]): The embedding vector. Must match the configured dimension.
- `data` (Dict[str, Any]): Arbitrary metadata (optional, defaults to {}).

**Returns:** `SetResponseData` with `uid` (assigned ID) and `status` ("success" or error).

---

#### `search(emb: List[float], top_k: int = 5, namespace: str = "", filter: Optional[Dict] = None) -> List[RecordData]`

Search for similar vectors.

```python
results = await cache.search(
    emb=[0.1, 0.2, 0.3],
    top_k=10,
    namespace="documents",  # For Pinecone filtering
    filter={"category": "news"}  # Metadata filter (Pinecone)
)

for record in results:
    print(f"Score: {record.score:.3f}, Data: {record.data}")
```

**Parameters:**
- `emb` (List[float]): Query vector.
- `top_k` (int, optional): Number of results. Default: 5.
- `namespace` (str, optional): Pinecone namespace filter. Default: "".
- `filter` (Optional[Dict], optional): Pinecone metadata filter. Default: None.

**Returns:** List of `RecordData`, sorted by similarity score (best first).

---

## Performance Tips

### Memory Optimization

```python
cache = VectorCache(
    cacheCapacity=5000,   # Smaller cache = lower memory
    dim=384,              # Smaller dimension = lower memory
    # Typical memory: 5000 × 384 × 4 bytes ≈ 7 MB
)
```

### Latency Optimization

```python
cache = VectorCache(
    protocol=Protocol.GRPC,  # Binary protocol is faster
    cacheCapacity=50000,     # Larger cache = fewer fallbacks to primary DB
    cache_thresold=0.9,      # More likely to use cached results
)
```

### Batch Processing

```python
# Parallel searches for high throughput
queries = [embedding1, embedding2, embedding3, ...]
results = await asyncio.gather(
    *[cache.search(q, top_k=10) for q in queries]
)
```

---

## Examples

### Real-World: Document Search with Metadata

```python
import asyncio
from vector_cache import VectorCache

async def main():
    cache = VectorCache(
        dim=768,
        cacheCapacity=100000,
    )
    
    # Index documents
    documents = [
        {"id": "doc1", "title": "Python 101", "content": "..."},
        {"id": "doc2", "title": "Go Guide", "content": "..."},
        {"id": "doc3", "title": "Rust Book", "content": "..."},
    ]
    
    # Simulate embeddings (in reality, use a model)
    embeddings = [
        [0.1, 0.2, 0.3, ...],  # Python doc
        [0.4, 0.5, 0.6, ...],  # Go doc
        [0.7, 0.8, 0.9, ...],  # Rust doc
    ]
    
    for doc, emb in zip(documents, embeddings):
        cache.set(emb, {
            "id": doc["id"],
            "title": doc["title"],
            "preview": doc["content"][:100]
        })
    
    # Search
    query_embedding = [0.15, 0.25, 0.35, ...]  # Query for Python docs
    results = await cache.search(query_embedding, top_k=2)
    
    for r in results:
        print(f"Found: {r.data['title']} (similarity: {r.score:.2f})")

asyncio.run(main())
```

### Advanced: Cache with Pinecone Backup

```python
import asyncio
from vector_cache import VectorCache, IndexType

async def hybrid_search():
    # Setup cache + Pinecone
    cache = VectorCache(
        dim=1536,
        indexType=IndexType.FLAT_IP,
        cacheCapacity=20000,
        cache_thresold=0.8,
        primarydb="pinecone",
        api_key="your-api-key",
        index_name="prod-index",
    )
    
    # Add to cache
    response = cache.set(
        [0.1, ...],
        {"doc_id": "123", "updated": "2024-01-01"}
    )
    
    # Search: uses cache first, falls back to Pinecone
    results = await cache.search([0.1, ...], top_k=20)
    
    # Analyze results
    cache_hits = sum(1 for r in results if r.score > 0.8)
    print(f"Cache hits: {cache_hits}, Pinecone results: {len(results) - cache_hits}")

asyncio.run(hybrid_search())
```

---

## Troubleshooting

### Server Won't Start

**Problem:** "Failed to start VectorCache server"

**Solution:** 
- Check if port is already in use: `lsof -i :6379`
- Try a different port: `VectorCache(port=6380)`
- Verify binary is accessible: `echo $PATH`

### High Memory Usage

**Problem:** Cache is using too much RAM

**Solution:**
- Reduce `cacheCapacity`
- Use smaller embedding dimension if possible
- Enable log rotation for server logs

### Slow Search

**Problem:** Searches are taking >100ms

**Solution:**
- Use `protocol=Protocol.GRPC` (faster than HTTP)
- Increase `cacheCapacity` to improve hit rate
- Profile query embedding dimensions

---


**Made with ❤️ by the [Abhishek Maurya](https://github.com/abhimaurya-dev) & [Kinjal Raykarmakar](https://github.com/Kinjalrk2k)**

