Metadata-Version: 2.4
Name: iris-vector-graph
Version: 1.5.4
Summary: Transactional Graph + Vector retrieval system for InterSystems IRIS with hybrid search, openCypher, and GraphQL APIs
Project-URL: Homepage, https://github.com/isc-tdyar/iris-vector-graph
Project-URL: Documentation, https://github.com/isc-tdyar/iris-vector-graph/tree/main/docs
Project-URL: Repository, https://github.com/isc-tdyar/iris-vector-graph
Project-URL: Issues, https://github.com/isc-tdyar/iris-vector-graph/issues
Author-email: InterSystems Community Team <grants@intersystems.com>
License-Expression: MIT
License-File: LICENSE
Keywords: bioinformatics,biomedical,graph,iris,knowledge-graph,protein-interactions,vector-search
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Requires-Dist: fastapi>=0.118.0
Requires-Dist: httpx>=0.28.1
Requires-Dist: intersystems-irispython>=3.2.0
Requires-Dist: networkx>=3.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: py2neo>=2021.2.4
Requires-Dist: pydantic>=2.11.9
Requires-Dist: pytest-asyncio>=1.2.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: requests>=2.28.0
Requires-Dist: strawberry-graphql[fastapi]>=0.280.0
Requires-Dist: uvicorn>=0.37.0
Provides-Extra: biodata
Requires-Dist: biopython>=1.81; extra == 'biodata'
Requires-Dist: bioservices>=1.11.0; extra == 'biodata'
Requires-Dist: mygene>=3.2.0; extra == 'biodata'
Provides-Extra: demo
Requires-Dist: python-fasthtml>=0.12.0; extra == 'demo'
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: flake8>=6.0.0; extra == 'dev'
Requires-Dist: iris-devtester>=1.8.1; extra == 'dev'
Requires-Dist: isort>=5.12.0; extra == 'dev'
Requires-Dist: mypy>=1.5.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-playwright>=0.7.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Provides-Extra: ml
Requires-Dist: scikit-learn>=1.3.0; extra == 'ml'
Requires-Dist: scipy>=1.11.0; extra == 'ml'
Requires-Dist: torch>=2.0.0; extra == 'ml'
Provides-Extra: performance
Requires-Dist: memory-profiler>=0.61.0; extra == 'performance'
Requires-Dist: psutil>=5.9.0; extra == 'performance'
Provides-Extra: visualization
Requires-Dist: graphviz>=0.20.0; extra == 'visualization'
Requires-Dist: matplotlib>=3.7.0; extra == 'visualization'
Requires-Dist: plotly>=5.15.0; extra == 'visualization'
Description-Content-Type: text/markdown

# IRIS Vector Graph

**The ultimate Graph + Vector + Text Retrieval Engine for InterSystems IRIS.**

[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![InterSystems IRIS](https://img.shields.io/badge/IRIS-2025.1+-purple.svg)](https://www.intersystems.com/products/intersystems-iris/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/intersystems-community/iris-vector-graph/blob/main/LICENSE)

IRIS Vector Graph is a general-purpose graph utility built on InterSystems IRIS that supports and demonstrates knowledge graph construction and query techniques. It combines **graph traversal**, **HNSW vector similarity**, and **lexical search** in a single, unified database.

---

## Why IRIS Vector Graph?

- **Multi-Query Power**: Query your graph via **SQL**, **openCypher (v1.3 with DML)**, or **GraphQL** — all on the same data.
- **Transactional Engine**: Beyond retrieval — support for `CREATE`, `DELETE`, and `MERGE` operations.
- **Blazing Fast Vectors**: Native HNSW indexing delivering **~1.7ms** search latency (vs 5.8s standard).
- **Zero-Dependency Integration**: Built with IRIS Embedded Python — no external vector DBs or graph engines required.
- **Production-Ready**: The engine behind [iris-vector-rag](https://github.com/intersystems-community/iris-vector-rag) for advanced RAG pipelines.

---

## Installation

```bash
pip install iris-vector-graph
```

Note: Requires **InterSystems IRIS 2025.1+** with the `irispython` runtime enabled.

## Quick Start

```bash
# 1. Clone & Sync
git clone https://github.com/intersystems-community/iris-vector-graph.git && cd iris-vector-graph
uv sync

# 2. Spin up IRIS
docker-compose up -d

# 3. Start API
uvicorn api.main:app --reload
```

Visit:
- **GraphQL Playground**: [http://localhost:8000/graphql](http://localhost:8000/graphql)
- **API Docs**: [http://localhost:8000/docs](http://localhost:8000/docs)

---

## Unified Query Engines

### openCypher (Advanced RD Parser)
IRIS Vector Graph features a custom recursive-descent Cypher parser supporting multi-stage queries and transactional updates:

```cypher
// Complex fraud analysis with WITH and Aggregations
MATCH (a:Account)-[r]->(t:Transaction)
WITH a, count(t) AS txn_count
WHERE txn_count > 5
MATCH (a)-[:OWNED_BY]->(p:Person)
RETURN p.name, txn_count
```

**Supported Clauses:** `MATCH`, `OPTIONAL MATCH`, `WITH`, `WHERE`, `RETURN`, `UNWIND`, `CREATE`, `DELETE`, `DETACH DELETE`, `MERGE`, `SET`, `REMOVE`.

### GraphQL
```graphql
query {
  protein(id: "PROTEIN:TP53") {
    name
    interactsWith(first: 5) { id name }
    similar(limit: 3) { protein { name } similarity }
  }
}
```

### SQL (Hybrid Search)
```sql
SELECT TOP 10 id, 
       kg_RRF_FUSE(id, vector, 'cancer suppressor') as score
FROM nodes
ORDER BY score DESC
```

---

## Scaling & Performance

The integration of a native **HNSW (Hierarchical Navigable Small World)** functional index directly into InterSystems IRIS provides massive scaling benefits for hybrid graph-vector workloads. 

By keeping the vector index in-process with the graph data, we achieve **subsecond multi-modal queries** that would otherwise require complex application-side joins across multiple databases.

### Performance Benchmarks (2026 Refactor)
- **High-Speed Traversal**: **~1.84M TEPS** (Traversed Edges Per Second).
- **Sub-millisecond Latency**: 2-hop BFS on 10k nodes in **<40ms**.
- **RDF 1.2 Support**: Native support for **Quoted Triples** (Metadata on edges) via subject-referenced properties.
- **Query Signatures**: O(1) hop-rejection using ASQ-inspired Master Label Sets.

### Why fast vector search matters for graphs
Consider a "Find-and-Follow" query common in fraud detection:
1.  **Find** the top 10 accounts most semantically similar to a known fraudulent pattern (Vector Search).
2.  **Follow** all outbound transactions from those 10 accounts to identify the next layer of the money laundering ring (Graph Hop).

In a standard database without HNSW, the first step (vector search) can take several seconds as the dataset grows, blocking the subsequent graph traversals. With `iris-vector-graph`, the vector lookup is reduced to **~1.7ms**, enabling the entire hybrid traversal to complete in a fraction of a second.

---

## Interactive Demos

Experience the power of IRIS Vector Graph through our interactive demo applications.

### Biomedical Research Demo
Explore protein-protein interaction networks with vector similarity and D3.js visualization.

### Fraud Detection Demo
Real-time fraud scoring with transaction networks, Cypher-based pattern matching, and bitemporal audit trails.

To run the CLI demos:
```bash
export PYTHONPATH=$PYTHONPATH:.
# Cypher-powered fraud detection
python3 examples/demo_fraud_detection.py

# SQL-powered "drop down" example
python3 examples/demo_fraud_detection_sql.py
```

To run the Web Visualization demos:
```bash
# Start the demo server
uv run uvicorn src.iris_demo_server.app:app --port 8200 --host 0.0.0.0
```
Visit [http://localhost:8200](http://localhost:8200) to begin.

---

## iris-vector-rag Integration

IRIS Vector Graph is the core engine powering [iris-vector-rag](https://github.com/intersystems-community/iris-vector-rag). You can use it in your RAG pipelines like this:

```python
from iris_vector_rag import create_pipeline

# Create a GraphRAG pipeline powered by this engine
pipeline = create_pipeline('graphrag')

# Combined vector + text + graph retrieval
result = pipeline.query(
    "What are the latest cancer treatment approaches?",
    top_k=5
)
```

---

## Documentation

- [Detailed Architecture](https://github.com/intersystems-community/iris-vector-graph/blob/main/docs/architecture/ARCHITECTURE.md)
- [Biomedical Domain Examples](https://github.com/intersystems-community/iris-vector-graph/tree/main/examples/domains/biomedical/)
- [Full Test Suite](https://github.com/intersystems-community/iris-vector-graph/tree/main/tests/)
- [iris-vector-rag Integration](https://github.com/intersystems-community/iris-vector-rag)
- [Verbose README](https://github.com/intersystems-community/iris-vector-graph/blob/main/docs/README_VERBOSE.md) (Legacy)

---

## Changelog

### v1.5.4 (2025-01-31)
- **Schema Cleanup**: Removed invalid `VECTOR_DIMENSION` call from schema utilities
- **Refinement**: Engine now relies solely on inference and explicit config for dimensions

### v1.5.3 (2025-01-31)
- **Robust Embeddings**: Fixed embedding dimension detection for IRIS Community 2025.1
- **API Improvements**: Added `embedding_dimension` param to `IRISGraphEngine` for manual override
- **Auto-Inference**: Automatically infers dimension from input if detection fails
- **Code Quality**: Major cleanup of `engine.py` to remove legacy duplicates

### v1.5.2 (2025-01-31)
- **Engine Acceleration**: Ported high-performance SQL paths for `get_node()` and `count_nodes()`
- **Bulk Loading**: New `bulk_create_nodes()` and `bulk_create_edges()` methods with `%NOINDEX` support
- **Performance**: Verified 80x speedup for single-node reads and 450x for counts vs standard Cypher

### v1.5.1 (2025-01-31)
- **Extreme Performance**: Verified 38ms latency for 5,000-node property queries (at 10k entity scale)
- **Subquery Stability**: Optimized `REPLACE` string aggregation to avoid IRIS `%QPAR` optimizer bugs
- **Scale Verified**: Robust E2E stress tests confirm industrial-grade performance for 10,000+ nodes

### v1.4.9 (2025-01-31)
- **Exact Collation**: Added `%EXACT` to VARCHAR columns for case-sensitive matching
- **Performance**: Prevents default `UPPER` collation behavior in IRIS 2024.2+
- **Case Sensitivity**: Ensures node IDs, labels, and property keys are case-sensitive

### v1.4.8 (2025-01-31)
- **Fix SUBSCRIPT error**: Removed `idx_props_key_val` which caused errors with large values
- **Improved Performance**: Maintained composite indexes that don't include large VARCHAR columns

### v1.4.7 (2025-01-31)
- **Revert to VARCHAR(64000)**: LONGVARCHAR broke REPLACE; VARCHAR(64000) keeps compatibility
- **Large Values**: 64KB property values, REPLACE works, no CAST needed

### ~~v1.4.5/1.4.6~~ (deprecated - use 1.4.7)
- v1.4.5 used LONGVARCHAR which broke REPLACE function
- v1.4.6 used CAST which broke on old schemas

### v1.4.4 (2025-01-31)
- **Bulk Loading Support**: `%NOINDEX` INSERTs, `disable_indexes()`, `rebuild_indexes()`
- **Fast Ingest**: Skip index maintenance during bulk loads, rebuild after

### v1.4.3 (2025-01-31)
- **Composite Indexes**: Added (s,key), (s,p), (p,o_id), (s,label) based on TrustGraph patterns
- **12 indexes total**: Optimized for label filtering, property lookups, edge traversal

### v1.4.2 (2025-01-31)
- **Performance Indexes**: Added indexes on rdf_labels, rdf_props, rdf_edges for fast graph traversal
- **ensure_indexes()**: New method to add indexes to existing databases
- **Composite Index**: Added (key, val) index on rdf_props for property value lookups

### v1.4.1 (2025-01-31)
- **Embedding API**: Added `get_embedding()`, `get_embeddings()`, `delete_embedding()` methods
- **Schema Prefix in Engine**: All engine SQL now uses configurable schema prefix

### v1.4.0 (2025-01-31)
- **Schema Prefix Support**: `set_schema_prefix('Graph_KG')` for qualified table names
- **Pattern Operators Fixed**: `CONTAINS`, `STARTS WITH`, `ENDS WITH` now work correctly
- **IRIS Compatibility**: Removed recursive CTEs and `NULLS LAST` (unsupported by IRIS)
- **ORDER BY Fix**: Properties in ORDER BY now properly join rdf_props table
- **type(r) Verified**: Relationship type function works in RETURN/WHERE clauses

---

**Author: Thomas Dyar** (thomas.dyar@intersystems.com)
