Metadata-Version: 2.1
Name: vectorflow_client
Version: 0.0.9
Summary: Use this Python client to embed documents with VectorFlow, an open source, high throughput, production ready vector embedding pipeline.
Project-URL: Client Homepage, https://github.com/dgarnitz/vectorflow/tree/main/client
Project-URL: VectorFlow Homepage, https://github.com/dgarnitz/vectorflow
Project-URL: Issues & Bugs, https://github.com/dgarnitz/vectorflow/issues
Author-email: David Garnitz <david@getvectorflow.com>
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.9
Requires-Dist: openai~=1.3.6
Requires-Dist: posthog~=3.0.2
Requires-Dist: requests~=2.31.0
Requires-Dist: tiktoken~=0.5.1
Description-Content-Type: text/markdown

# VectorFlow Python Client
Use this Python client to embed documents and upload them to a vector database with VectorFlow. You can also check on the status of those embedding and upload jobs. 

### How to Use
The client has 2 methods for uploading documents to embed and 2 for checking statuses, listed below. All four methods return a python `response` object from the python `requests` library. You must parse the response using the `.json()` method. 

#### Initialize
```
from vectorflow_client.vectorflow import Vectorflow

vectorflow = Vectorflow()
vectorflow.embedding_api_key = "YOUR_OPEN_AI_KEY"
```

#### Embed a Single File
```
filepath = './src/api/tests/fixtures/test_medium_text.txt'
response = vectorflow.embed(filepath)
```
This points at your local VectorFlow instance by default. You can also point at our free hosted version of VectorFlow, which is more performant. Just alter the `base_url` parameter of the embed method and set the `internal api key` from the [managed service.](https://app.getvectorflow.com/home)
```
vectorflow.internal_api_key = 'SWITCHINGKEYS1234567890'
response = vectorflow.embed(filepath, base_url = "https://vectorflowembeddings.online")
```

#### Embed Multiple Files
```
paths = ['./src/api/tests/fixtures/test_pdf.pdf', './src/api/tests/fixtures/test_medium_text.txt']
response = vectorflow.upload(paths)
```

#### Get Statuses for Multiple Jobs
```
response = vectorflow.get_job_statuses(jobs_ids)
```

#### Get Status for Single Job
```
response = vectorflow.get_job_status(job_id)
```

### Notes on Default Setup
By default, this will set up vectorflow to embed files locally and upload them to a local instance of qdrant. It assumes you follow the default configuration in the VectorFlow repository's `setup.sh` which runs a collection of docker images locally using docker compose that will embed the documents with Open AI's ADA model and upload it to a local qdrant instance. 

For more granular control over the chunking, embedding and vector DB configurations, override default values on the `Vectorflow` class or on its `embeddings` and `vector_db` fields. For example:

```
from vectorflow_client.vectorflow import Vectorflow
from vectorflow_client.embeddings_type import EmbeddingsType
from vectorflow_client.vector_db_type import VectorDBType

vectorflow = Vectorflow()

# use open source sentence transformer model
vectorflow.embeddings.hugging_face_model_name = "thenlper/gte-base"
vectorflow.embeddings.embeddings_type = EmbeddingsTypeClient.HUGGING_FACE

# use Pinecone
vectorflow.vector_db.vector_db_type = VectorDBType.PINECONE
vectorflow.vector_db.environment = "us-east-1-aws"
vectorflow.vector_db.index_name = "test"
```

## Chunk Enhancer
The VectorFlow Client also features a RAG chunk enhancer. It works by passing it a list of chunks, the original source document and a use case describing the kind of searches you will run. It then adds extra relevant contextual information to the end of each chunk based on the use case to help facilitate better similarity searches. 

### Usage

```
from vectorflow_client.chunk_enhancer import ChunkEnhancer
import fitz

usecase = """
I am reviewing academic papers about search and evaluation techniques for large language models to try to utilize them more effectively.
I want to supplement my existing knowledge with state of the art techniques and see if I can apply them to my own work.
I will want to compare and contrast different techniques.
I will also want to learn about the detail technical workings of these, including at a mathematical level.
The purpose of this sytem is to help me while conducting both research and building real world applications using large language models.
"""

enhancer = ChunkEnhancer(usecase=usecase, openai_api_key="your-key")

doc = fitz.open("paper.pdf")
pdf_text = ""
for page in doc:
  pdf_text += page.get_text()

chunk1 = pdf_text[:2048]
chunks = [chunk1]
enhanced_chunks = enhancer.enhance_chunks(chunks, pdf_text)
```