Metadata-Version: 2.1
Name: infinity-llm
Version: 0.1.0
Summary: use any llm api in a plug-and-play fashion
License: MIT
Author: marcasty
Author-email: markacastellano2@gmail.com
Requires-Python: >=3.9,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: anthropic (>=0.34.1,<0.35.0)
Requires-Dist: cohere (>=5.8.1,<6.0.0)
Requires-Dist: google-generativeai (>=0.7.2,<0.8.0)
Requires-Dist: groq (>=0.9.0,<0.10.0)
Requires-Dist: instructor (>=1.4.0,<2.0.0)
Requires-Dist: mistral-common (>=1.3.4,<2.0.0)
Requires-Dist: mistralai (>=1.0.2,<2.0.0)
Requires-Dist: openai (>=1.42.0,<2.0.0)
Requires-Dist: pydantic (==2.8.2)
Requires-Dist: tiktoken (>=0.7.0,<0.8.0)
Requires-Dist: transformers (>=4.44.2,<5.0.0)
Requires-Dist: voyageai (>=0.2.3,<0.3.0)
Description-Content-Type: text/markdown

# infinity-llm: Use any LLM API

infinity-llm is a collection of Python tools to make LLM APIs plug-and-play for fast, easy experimentation.
I use this in my own projects today, but it's still very much a work in progress. I'm trying to strike 
the balance between simplicity and modularity, so if you have ideas feel free to message me or submit a PR!

[![Twitter Follow](https://img.shields.io/twitter/follow/markycasty?style=social)](https://twitter.com/markycasty)

## Key Features

- **Chat Completion**: Mostly a wrapper around [jxnl/instructor](https://github.com/jxnl/instructor), supports a/sync chat completion and streaming for structured and unstructured chat completions.
- **Embeddings/Rerankers**: Easily use a slew of embedding and reranking models
- **Asynchronous Workloads**: Run all chat completion and embedding workloads in massively parallel fashion without worrying about rate limits. Nice for ETL pipelines.
- **OpenAI Batch Jobs**: Run large scale batch jobs with OpenAI's batch API.

## Chat Completion

All types of chat completions are made easy!

1. Make a client
```python
from infinity_llm import from_any, Provider

client = from_any(
    provider=Provider.OPENAI, 
    model_name="gpt-4o", 
    async_client=False
    )
```

2. Choose between a/sync un/structured response with/without streaming

```python
# synchronous, unstructured response without streaming 
# (aka a standard chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a friend."},
        {"role": "user", "content": "Tell me about your day."}
    ],
    response_model=None # No response model means unstructured response
)
```

## Embeddings/Rerankers
```python
from infinity_llm import Provider, embed_from_any

# Create a Cohere embedding client
client = embed_from_any(Provider.COHERE)

# Example text to embed
text = "This is an example sentence to embed using Cohere."

# Get the embedding
embeddings, total_tokens = client.create(input=text, model="embed-english-v3.0", input_type="clustering")


print(f"Number of embeddings: {len(embeddings)}")
# > Number of embeddings: 1
print(f"Embedding dimension: {len(embeddings[0])}")
# > Embedding dimension: 1536
print(f"Usage: {total_tokens}")
# > Usage: 13
```

