Metadata-Version: 2.4
Name: llama-text2sql-eval
Version: 0.0.3
Summary: A Quick Llama Text2SQL Evaluation Library
Author: Jeff Tang
License: MIT
Project-URL: Homepage, https://github.com/meta-llama/llama-cookbook/tree/text2sql/end-to-end-use-cases/coding/text2sql/eval
Keywords: llama,text2sql,eval
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: datasets==3.6.0
Requires-Dist: llama_api_client==0.1.2
Requires-Dist: func_timeout==4.3.5

# A Quick Library for Llama Text2SQL Accuracy Evaluation

This library provides a simple interface for evaluating the accuracy of Llama models on the Text2SQL task. It uses the BIRD DEV dataset and provides a simple API for running the evaluation pipeline using the Llama API.

## Quick Start

1. Run `pip install llama-text2sql-eval` to install the library.

2. Download the [BIRD](https://bird-bench.github.io/) DEV dataset by running the following commands:

```bash

mkdir -p llama-text2sql-eval/data
cd llama-text2sql-eval/data
wget https://bird-bench.oss-cn-beijing.aliyuncs.com/dev.zip
unzip dev.zip
rm dev.zip
rm -rf __MACOSX
cd dev_20240627
unzip dev_databases.zip
rm dev_databases.zip
rm -rf __MACOSX
cd ../..
```

3. Get your Llama API key [here](https://llama.developer.meta.com/) and set up an environment variable:

```bash
export LLAMA_API_KEY="your_key_here"
```

4. Create a Python script and run it:

```python
import os
from llama_text2sql_eval import LlamaText2SQLEval

evaluator = LlamaText2SQLEval()

results = evaluator.run(
    model="Llama-3.3-70B-Instruct", # or any other Llama models supported by the Llama API
    api_key=os.getenv("LLAMA_API_KEY")
)

if results:
    print(f"Overall Accuracy: {results['overall_accuracy']:.2f}%")
    print(f"Simple: {results['simple_accuracy']:.2f}%")
    print(f"Moderate: {results['moderate_accuracy']:.2f}%")
    print(f"Challenging: {results['challenging_accuracy']:.2f}%")
```

This will take about 40 minutes to run. After it completes, you should see something like:

```
Overall Accuracy: 57.95%
Simple: 65.30%
Moderate: 47.63%
Challenging: 44.14%
```
