Metadata-Version: 2.1
Name: easySemanticSearch
Version: 1.3.2
Summary: An easy way to use advanced semantic search.
Author: Abhishek Venkatachalam
Author-email: abhishek.venkatachalam06@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: sentence-transformers
Requires-Dist: numpy
Requires-Dist: pandas

For more information about the author, visit [LinkedIn](https://www.linkedin.com/in/abhishek-venkatachalam-62121049/).

# Semantic Search Python Package

## Overview

This Python package provides utilities for quick, simple and efficient semantic search.
This package leverages the SBERT capabilities of the SentenceTransformer. It allows users to perform semantic search on CSV files and pandas DataFrames.

## Installation

You can install the package using pip:

```bash
pip install easySemanticSearch
```

## Methods

### csv_SimpleSemanticSearch

Performs semantic search on a CSV file and returns a list of results.

```python
from easySemanticSearch import csv_SimpleSemanticSearch

results = csv_SimpleSemanticSearch(csv_filepath_name, input_query="Your query")
```

#### Parameters:

- **csv_filepath_name** (str): Path to the CSV file.
- **input_query** (str, default="Some text"): Query to search for.
- **max_results** (int, default=5): Maximum number of results to return.
- **model_name** (str, default="all-MiniLM-L6-v2"): Name of the SentenceTransformer model to use.
- **embeddings_Filename** (str, default="embeddings_SemanticSearch.pkl"): Filename to save/load embeddings.
- **cache_folder** (str, default="default_folder"): Folder path to cache the model.

### dF_SimpleSemanticSearch

Performs semantic search on a pandas DataFrame and returns a list of results.

```python
from easySemanticSearch import dF_SimpleSemanticSearch
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'column1': ['text1', 'text2'],
    'column2': ['text3', 'text4']
})

results = dF_SimpleSemanticSearch(user_dataframe=df, input_query="The 1st text", max_results=5, model_name="all-MiniLM-L6-v2", embeddings_Filename="embeddings_SemanticSearch.pkl", cache_folder="C:\anonymous\BestSearcher\easySemanticSearch")
```

#### Parameters:

- **user_dataframe** (pd.DataFrame): Input pandas DataFrame.
- **input_query** (str, default="Some text"): Query to search for.
- **max_results** (int, default=5): Maximum number of results to return.
- **model_name** (str, default="all-MiniLM-L6-v2"): Name of the SentenceTransformer model to use.
- **embeddings_Filename** (str, default="embeddings_SemanticSearch.pkl"): Filename to save/load embeddings.
- **cache_folder** (str, default="default_folder"): Folder path to cache the model.

## Example Usage

1. Below is an example of how to Semantically search csv files using the `csv_SimpleSemanticSearch` method:

```python
#Import the libraries.
from easySemanticSearch import csv_SimpleSemanticSearch
import pandas as pd

# Read dataset from CSV file
csv_filepath_name = "CustomerService_logs.csv"

# Set input query
input_query = "I've experienced some crashes during busy times. Is there a plan to handle increased traffic or peak usage periods?"
print("Query:\n" + input_query + "\n\n")

# Get top 3 similar descriptions
max_results = 3    # The maximum number of search results to be retrieved.
top_SearchResults = csv_SimpleSemanticSearch(csv_filepath_name, input_query, max_results=max_results)

print("Knowledge Base:\n")
knowledgeBase = ""
for description, score in top_SearchResults:
    knowledgeBase = knowledgeBase + "\n" + description
    print(f"Description: {description}")
    print("-" * 50)
```


2. Below is an example of how to Semantically Search dataframes using the `dF_SimpleSemanticSearch` method:

```python
#Import the libraries.
from easySemanticSearch import dF_SimpleSemanticSearch
import pandas as pd


# Read dataset from CSV file
csv_filepath_name = "CustomerService_logs.csv"
sample_dataset = pd.DataFrame()
sample_dataset = pd.read_csv(csv_filepath_name)

# Set input query
input_query = "I've experienced some crashes during busy times. Is there a plan to handle increased traffic or peak usage periods?"
print("Query:\n" + input_query + "\n\n")

# Get top 3 similar descriptions
max_results = 3    # The maximum number of search results to be retrieved.
top_SearchResults = dF_SimpleSemanticSearch(sample_dataset, input_query, max_results=max_results)

print("Knowledge Base:\n")
knowledgeBase = ""
for description, score in top_SearchResults:
    knowledgeBase = knowledgeBase + "\n" + description
    print(f"Description: {description}")
    print("-" * 50)
```

## Note

The first time this code is run on a dataset, the encoding is time-consuming. Performance improves dramatically after the first initialization.
By default, the SentenceTransformer model used is "all-MiniLM-L6-v2", which can be changed based on user preference.
