Metadata-Version: 2.2
Name: uhsr
Version: 0.1.1
Summary: Unified Hyperbolic Spectral Retrieval (UHSR) - a novel text retrieval algorithm combining lexical and semantic search.
Home-page: https://github.com/vedaant2000/uhsr-retrieval
Author: Vedaant Singh
Keywords: uhsr,text retrieval,BM25,FAISS,semantic search,lexical search,spectral re-ranking,machine learning,NLP
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: sentence-transformers
Provides-Extra: gpu
Requires-Dist: faiss-gpu; extra == "gpu"
Provides-Extra: cpu
Requires-Dist: faiss-cpu; extra == "cpu"
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

<div align="center">
  <img src="logo.png" alt="UHSR Logo" width="300">
  <hr>
  <br/>
</div>

# Unified Hyperbolic Spectral Retrieval (UHSR)

Unified Hyperbolic Spectral Retrieval (UHSR) is a novel text retrieval algorithm that fuses lexical search (using BM25) with semantic search (using dense embeddings) into a unified, robust, and scalable system. It leverages advanced techniques such as logistic normalization, harmonic fusion, and spectral re-ranking based on graph Laplacian analysis to produce interpretable relevance scores within the [0,1] range.

## Key Features

- **Hybrid Retrieval:** Combines BM25 for lexical scoring and dense vector semantic similarity for contextual understanding.
- **Advanced Fusion:** Uses logistic normalization and harmonic fusion to integrate multiple scoring signals.
- **Spectral Re-Ranking:** Employs spectral analysis (using the graph Laplacian and Fiedler vector) to boost central, highly relevant candidates.
- **Metric Flexibility:** Supports multiple semantic similarity metrics (cosine, euclidean, Mahalanobis) to suit various datasets.
- **Interpretable Scores:** Final relevance scores are normalized to the [0,1] range.
- **Scalable:** Designed to work with both small and large datasets using FAISS for fast approximate nearest neighbor search.

## Overview

UHSR provides an end-to-end text retrieval pipeline that starts with raw documents and ends with a ranked list of documents. It first applies BM25 to perform fast lexical filtering, then computes semantic similarity using dense embeddings. The two scoring components are fused via a harmonic mean after logistic normalization, ensuring that both lexical and semantic aspects contribute effectively. Finally, a spectral re-ranking step based on graph Laplacian analysis refines the ranking by boosting documents that are centrally located among the top candidates.

## Intended Use

UHSR is intended for research and educational purposes and can serve as a strong foundation for further development in text retrieval and natural language processing applications.

For more details, visit the [GitHub repository](https://github.com/vedaant00/uhsr-retrieval).
