Metadata-Version: 2.4
Name: weaviate-orm
Version: 0.1.30
Summary: A lightweight ORM-style layer for Weaviate
Author-email: Tobias Heimig-Elschner <Tobias.Heimig-Elschner@bbr.bund.de>
License-Expression: GPL-3.0-or-later
Project-URL: Homepage, https://gitlab.opencode.de/bbsr_ida_public/weaviate_orm
Project-URL: Issues, https://gitlab.opencode.de/bbsr_ida_public/weaviate_orm/-/issues
Project-URL: Citation, https://doi.org/10.58007/x1wa-rt92
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: weaviate-client>=4.16.0
Dynamic: license-file

# Weaviate ORM
### A Pythonic ORM-style layer for Weaviate

`weaviate-orm` provides a structured, object-oriented interface to the Weaviate vector database. Inspired by traditional ORM patterns, it allows you to define collections as Python classes, automatically generate Weaviate schemas, and perform CRUD and similarity-based queries without leaving the object-oriented paradigm.

## Requirements

- Python 3.11+
- weaviate-client >= 4.16.0
- Weaviate server >= 1.27.0
- For examples using Ollama: a reachable Ollama instance with the "snowflake-arctic-embed2" embedding model available

### Server Requirements

**Weaviate 1.27.0 or higher** is required. The `weaviate-client >= 4.16.0` enforces this constraint. During engine initialization, the ORM performs a runtime check:

```python
engine = Weaviate_Engine()
engine.create_all_schemas()  # Raises RuntimeError if server version < 1.27.0
```

Why 1.27.0?
- Unified vector configuration API (`Configure.Vectors.*` instead of deprecated `vectorizer_config`)
- Robust named vector support
- Improved schema handling and stability

If your server doesn't meet this requirement, the engine will raise a clear error:

```
RuntimeError: Weaviate server version 1.25.4 is not supported. 
Please use Weaviate >= 1.27.0 to match the client requirements.
```

## Features

- **Declarative Schema Definition**  
  Define collections using Python classes and descriptors for properties and cross-references.

- **Dynamic Schema Generation via Metaclass**  
  A metaclass extracts property and reference definitions to auto-generate Weaviate-compatible schema structures.

- **Polymorphic References** ✨  
  Define references to base classes to accept objects from multiple subclass collections. Includes automatic type tracking and runtime type resolution.

- **Object-Oriented CRUD Operations**  
  Seamlessly create, retrieve, update, and delete data using instance methods — no raw queries needed.

- **Recursive Save & Update**  
  Handles deeply nested references and automatically saves or updates related objects when desired.

- **Flexible Engine Binding**  
  The engine takes any connection method from the `weaviate` library, allowing support for local, remote, or custom configurations.

- **UUID Management**  
  Built-in support for both `uuid4` and `uuid5` strategies — including UUID validation and immutability enforcement.

- **Optional Auto-loading of References**  
  Cross-references can be automatically resolved into full model instances when accessed.

- **Support for Near-Vector and Near-Text Queries**  
  Leverage Weaviate’s native similarity search APIs directly from your model classes.

- **Strict Typing and Validation**  
  Type enforcement and optional validation logic on properties and references using Python descriptors.

- **Fully Tested**  
  Includes both unit and integration tests with Pytest and Docker-based Weaviate setups.

---

## Design Patterns

- **Descriptor Pattern**  
  Custom `Property` and `Reference` descriptors manage validation, casting, and schema representation of scalar and relational fields.

- **Metaclass-Based Schema Introspection**  
  A metaclass (`Weaviate_Meta`) dynamically collects all descriptors from a class and builds the full Weaviate schema. It also generates a dynamic `__init__` constructor based on the field signatures.

- **Engine Abstraction Layer**  
  The `Weaviate_Engine` encapsulates connection logic and schema management. It supports any connection method compatible with the `weaviate` Python client (e.g., local, remote, or cloud).

- **Instance-Oriented CRUD**  
  CRUD operations (including recursive reference handling and UUID safety checks) are exposed as instance methods on models that inherit from `Base_Model`.

- **Reference Resolution Strategy**  
  `Reference` descriptors can be configured to auto-load full objects (not just UUIDs), and support both one-way and two-way references, as well as single or list cardinalities.

---

## Installation

You can install `weaviate_orm` either from PyPI (once published) or directly from the source repository.

### 📦 From PyPI (recommended)
```bash
pip install weaviate-orm
```

### 🛠 From Git (development and testing)
```bash
git clone https://gitlab.opencode.de/bbsr_ida_public/weaviate_orm.git
cd weaviate_orm
pip install -e .
```
Note: Make sure your Python environment includes a compatible Weaviate client (>= 4.16.0):

```bash
pip install "weaviate-client>=4.16.0"
```

## Quick Start
In this example, we will use a local Weaviate instance using text2vec_ollama as the default vectorizer with a local ollama instance running and providing "snowflake-arctic-embed2" as embedding model.

### Create a Model (vector_config)
Create two related models `Paper` and `Author` and specify configuration for the weaviate schema. 

```python
from __future__ import annotations

import os
from uuid import UUID, uuid5
import datetime

from weaviate_orm.weaviate_base import Base_Model
from weaviate_orm.weaviate_property import Property
from weaviate_orm.weavitae_reference import Reference, Reference_Type

from weaviate.classes.config import Configure, DataType

# Get the host and port from environment variables
llm_host = os.getenv("LLM_HOST", "llm")
llm_port = int(os.getenv("LLM_PORT", 11434))

class Paper(Base_Model):
  # Use _vector_config (new API) – replaces deprecated vectorizer_config
  # Protected with underscore prefix to prevent accidental overrides
  _vector_config = Configure.Vectors.text2vec_ollama(
            api_endpoint = f"http://{llm_host}:{llm_port}", #api-endpoint for the local ollama model
            model = "snowflake-arctic-embed2", #Embedding model to use
            vectorize_collection_name = False
        )
    
    title = Property(cast_type=str, description="The title of the paper", required=True, weaviate_type=DataType.TEXT, vectorize_property_name=True)
    abstract = Property(cast_type=str, description="The abstract of the paper", required=True, weaviate_type=DataType.TEXT, vectorize_property_name=True)
    pub_date = Property(cast_type=datetime.date, description="The publication date of the paper", required=True, weaviate_type=DataType.DATE, skip_vectorization=True)
    doi = Property(cast_type=str, description="The doi of the paper", required=True, weaviate_type=DataType.TEXT, skip_vectorization=True)
    author = Reference(target_collection_name="Author", auto_loading=True, description="The author of the paper", reference_type=Reference_Type.SINGLE, way_type=Reference_Type.TWOWAY, required=False, skip_validation=True)
    co_authors = Reference(target_collection_name="Author", auto_loading=False, description="The co-authors of the paper", reference_type=Reference_Type.LIST, way_type=Reference_Type.ONEWAY, required=False, skip_validation=True)

    _namespace = UUID("eb8bc242-5f59-4a47-8230-0cea6fcc1028")

    def _get_uuid_name_string(self):
        return self.doi
        
class Author(Base_Model):
    first_name = Property(cast_type=str, description="The first name of the author", required=True, weaviate_type=DataType.TEXT, vectorize_property_name=True)
    last_name = Property(cast_type=str, description="The last name of the author", required=True, weaviate_type=DataType.TEXT, vectorize_property_name=True)
    orc_id = Property(cast_type=str, description="The orc_id of the author", required=False, weaviate_type=DataType.TEXT, skip_vectorization=True)
    papers = Reference(target_collection_name="Paper", auto_loading=False, description="The papers of the author", reference_type=Reference_Type.LIST, way_type=Reference_Type.TWOWAY, required=False, skip_validation=True)

    _namespace = UUID("eb8bc242-5f59-4a47-8230-0cea6fcc1028")

    def _get_uuid_name_string(self) -> str:
        return f"{self.first_name} {self.last_name}"

```

### Create a Weaviate Engine, Register Models, and Create Schema

```python
from weaviate_orm.weaviate_engine import Weaviate_Engine
from weaviate import connect_to_local, connect_to_weaviate_cloud

# Get host and ports from environment variables
host = os.getenv("WEAVIATE_HOST", "vdatabase")
port = int(os.getenv("WEAVIATE_PORT", 8080))
grpc_port = int(os.getenv("WEAVIATE_GRPC_PORT", 50051))

# Initialize the Weaviate engine using the connect_to_local method and its parameters
engine = Weaviate_Engine(connect_to_local, host=host, port=port, grpc_port=grpc_port)

# Register the models with the engine
engine.register_all_models(Paper, Author)

# Create the schema in Weaviate
engine.create_all_schemas()
```

### Create and Save Instances

```python
# Example data
paper_data = {
    "title": "A Study on Weaviate ORM",
    "abstract": "This paper discusses the Weaviate ORM and its features.",
    "pub_date": datetime.datetime.now(datetime.timezone.utc),
    "doi": "10.1234/weaviate-orm",
}

author_data = {
    "first_name": "Allen",
    "last_name": "Turing",
    "orc_id": "0000-0002-1234-5678",
}

# Create an insance of author and paper

author = Author(first_name=author_data["first_name"],
                last_name=author_data["last_name"],
                orc_id=author_data["orc_id"])

paper = Paper(title=paper_data["title"],
                abstract=paper_data["abstract"],
                pub_date=paper_data["pub_date"],
                doi=paper_data["doi"],
                author=author)

# Save the author and paper to Weaviate
paper.save(include_references=True, recursive=True)
```

### Read an Instance
```python 
# Retrieve the paper by its UUID
paper_uuid = paper.get_uuid()

paper_instance = Paper.get(paper_uuid, include_references=True)
print(f"Paper Title: {paper_instance.title}")
print(f"Author: {paper_instance.author.first_name} {paper_instance.author.last_name}")
```

### Update an Instance
```python
# Update the paper's title
paper_instance.title = "An Updated Study on Weaviate ORM"
paper_instance.update()

#Check updated instance
paper_instance = Paper.get(paper_uuid, include_references=True)
print(f"Paper Title: {paper_instance.title}")
```

### Delete an Instance
```python
# Delete the paper instance
paper_instance.delete()
```

## Schema Configuration Access

The ORM uses protected class-level attributes for schema configuration to prevent accidental overrides. All schema configs are prefixed with an underscore and accessed via read-only class properties.

### Configuring Collections

Define schema configurations using the underscored attributes:

```python
from weaviate.classes.config import Configure

class Article(Base_Model):
    # Class-level schema configuration (protected with underscore)
    _vector_config = Configure.Vectors.text2vec_ollama(
        api_endpoint="http://llm:11434",
        model="snowflake-arctic-embed2",
        vectorize_collection_name=False
    )
    
    _description = "A collection of articles with vector embeddings"
    
    _inverted_index_config = None  # Optional index tuning
    _generative_config = None      # Optional generative configuration
```

### Accessing Configurations

You can read schema configurations at both class and instance levels:

```python
# Class-level access (read-only)
print(Article.vector_config)  # Returns the configured vector
print(Article.description)     # Returns "A collection of articles..."

# Instance-level access
article = Article(...)
print(article.vector_config)   # Proxies to class-level config
```

### Deprecation Path

Older code using non-underscored names will still work but emit a deprecation warning:

```python
class LegacyModel(Base_Model):
    vector_config = Configure.Vectors.text2vec_ollama(...)  # DeprecationWarning
    description = "Legacy"  # DeprecationWarning
```

Migrate to the underscored versions to avoid future incompatibility:

```python
class ModernModel(Base_Model):
    _vector_config = Configure.Vectors.text2vec_ollama(...)
    _description = "Modern"
```

## Project Structure
```
weaviate_orm/
│
├── __init__.py              # Public interface and versioning
├── weaviate_base.py         # Base_Model with full CRUD logic and query support
├── weaviate_engine.py       # Manages Weaviate client and schema creation
├── weaviate_meta.py         # Metaclass for schema extraction and dynamic __init__
├── weaviate_property.py     # Descriptor for scalar fields
├── weavitae_reference.py    # Descriptor for references (one-way, two-way, single, list)
├── weaviate_decorators.py   # Client injection and async-to-sync conversion
└──  weaviate_utility.py      # Helpers for validation and reference comparison
```

## License
This project is licensed under the GNU General Public License v3.0 (GPLv3).

You are free to use, modify, and distribute this software under the terms of the GPLv3 license. Any derivative work must also be distributed under the same license.

For full details, see the [LICENSE](https://choosealicense.com/licenses/gpl-3.0/).

## Contributing & Credits
This project is created and maintained by the BBSR - IDA (Tobias Heimig-Elschner).
Feel free to open issues or submit pull requests if you encounter bugs, have ideas, or want to improve the package.

### 📚 Citation
The project is published on Zenodo and can be cited as:
Heimig, T. (2025). Weaviate ORM - (0.1.0). Bundesinstitut für Bau-, Stadt- und Raumforschung (BBSR). https://doi.org/10.58007/x1wa-rt92

## Roadmap & Open Development Topics
- **Batch Operations**<br>Add support for batched inserts and updates for high-throughput use cases.

- **Generalized Query Interface**<br>Unify near-vector, near-text, and filter queries into a common, fluent interface.

- **Nested Reference Updates**<br>Extend update logic to fully support reference deletions and new nested reference creation during .update() calls.

---

## Testing

### Unit tests

Run the unit test suite:

```bash
pytest tests/unit -q
```

### Integration tests

Integration tests require a running Weaviate (>= 1.27.0) and, for Ollama-based examples, a reachable LLM service. Using the included docker-compose setup:

```bash
# From repository root
docker-compose build vdatabase llm
docker-compose up -d vdatabase llm

# Run integration tests once services are healthy
pytest tests/integration -q -m integration
```

If you maintain your own Weaviate instance, set these environment variables so the engine can connect:

```bash
export WEAVIATE_HOST=vdatabase
export WEAVIATE_PORT=8080
export WEAVIATE_GRPC_PORT=50051
```

### Migration note

The Weaviate Python client has deprecated `vectorizer_config` in favor of `vector_config`. This project now uses `vector_config` everywhere, including named vectors (e.g., `Configure.Vectors.text2vec_ollama(...)`). Ensure your server and client meet the versions above to avoid startup or schema warnings.
