Metadata-Version: 2.1
Name: explore-cli
Version: 0.3.4
Summary: Interactively explore a codebase with an LLM
Author: Jeremy Dormitzer
Author-email: jeremy.dormitzer@gmail.com
Requires-Python: >=3.12,<3.13
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: azure-identity (>=1.19.0,<2.0.0)
Requires-Dist: chromadb (==0.5.3)
Requires-Dist: gnureadline (>=8.2.10,<9.0.0)
Requires-Dist: keybert (>=0.8.5,<0.9.0)
Requires-Dist: langchain (>=0.3.5,<0.4.0)
Requires-Dist: langchain-chroma (>=0.1.3,<0.2.0)
Requires-Dist: langchain-community (>=0.3.3,<0.4.0)
Requires-Dist: langchain-core (>=0.3.13,<0.4.0)
Requires-Dist: langchain-huggingface (>=0.1.1,<0.2.0)
Requires-Dist: langchain-ollama (>=0.2.0,<0.3.0)
Requires-Dist: langchain-openai (>=0.2.5,<0.3.0)
Requires-Dist: langchain-text-splitters (>=0.3.1,<0.4.0)
Requires-Dist: llama-index (>=0.11.8,<0.12.0)
Requires-Dist: llama-index-embeddings-huggingface (>=0.3.1,<0.4.0)
Requires-Dist: llama-index-llms-ollama (>=0.3.1,<0.4.0)
Requires-Dist: llama-index-vector-stores-chroma (>=0.2.0,<0.3.0)
Requires-Dist: openai (>=1.40.6,<2.0.0)
Requires-Dist: pathspec (>=0.12.1,<0.13.0)
Requires-Dist: rich (>=13.9.3,<14.0.0)
Requires-Dist: sentence-transformers (>=3.0.1,<4.0.0)
Requires-Dist: toml (>=0.10.2,<0.11.0)
Requires-Dist: tqdm (>=4.66.5,<5.0.0)
Requires-Dist: unstructured (>=0.15.10,<0.16.0)
Description-Content-Type: text/markdown

# explore
> Interactively explore a codebase with an LLM

[![PyPI - Version](https://img.shields.io/pypi/v/explore-cli?pypiBaseUrl=https%3A%2F%2Fpypi.org)](https://pypi.org/project/explore-cli/)

`explore` is a script to interactively explore a codebase by chatting with an LLM. It uses [retrieval-augmented generation](https://research.ibm.com/blog/retrieval-augmented-generation-RAG) via [`chromadb`](https://docs.trychroma.com/) to provide the LLM with relevant source code from the codebase.

`explore` uses OpenAI models by default, so you'll need an [OpenAI API key](https://openai.com/index/openai-api/).

## Installation
`explore` is available [on PyPI](https://pypi.org/project/explore-cli/). I recommend installing it with [`pipx`](https://github.com/pypa/pipx):

```sh
pipx install explore-cli
export OPENAI_API_KEY=<your OpenAI API key>
explore <directory>
```

Alternatively, you can clone this repository and run the script with [`poetry`](https://python-poetry.org/):

```sh
poetry install
poetry build
export OPENAI_API_KEY=<your OpenAI API key>
poetry run explore <directory>
```

## Usage

```sh
usage: explore [-h] [-l LLM] [-m MODEL] directory

Interactively explore a codebase with an LLM.

positional arguments:
  directory             The directory to index and explore.

options:
  -h, --help            show this help message and exit
  -l LLM, --llm LLM     The LLM backend, one of openai, ollama, or azure. Default: openai. If using Azure, make sure to
                        set the AZURE_OPENAI_ENDPOINT and OPENAI_API_VERSION environment variables.
  -m MODEL, --model MODEL
                        The LLM model to use. Default: gpt-4o-mini for openai, mistral-nemo:latest for ollama, or
                        gpt-4o for azure.
```

## How it works
1. The codebase is indexed into a local Chroma store. Each file is split into chunks using language-specific separators.
2. Documents relevant to the query are collected using multiple retrieval strategies:
   - Primary retrieval is done through vector similarity using the indexed embeddings.
   - A multi-query retriever issues multiple variations of the query to increase the diversity and relevance of retrieved documents.
   - Additionally, a history-aware retriever reformulates the user query, considering the conversation history to better capture context.
3. Retrieved documents are deduplicated, concatenated, and added as context to the LLM, which generates an answer to the user's question. Answers include specific references to the files and code pertinent to the query.

## Using Azure OpenAI
`explore` can connect to an Azure OpenAI instance using Azure Active Directory authentication. First, set the relevant environment variables (you can find the values to use for these in the Azure portal):

``` sh
export AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com # <- endpoint for the Azure OpenAI instance
export OPENAI_API_VERSION=2024-10-01-preview # <- API version for the deployment you want to use
```

Make sure you are authenticated via the [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/):

``` sh
az login
```

When you invoke `explore`, pass the Azure OpenAI deployment name as the `--model` argument and specify `azure` as the `--llm`:

``` sh
explore --llm azure --model gpt-4o /some/directory
```

