Metadata-Version: 2.4
Name: llm-batch-annotate
Version: 0.1.2
Summary: Batch-oriented LLM annotation workflows for tabular datasets with OpenAI Batch support.
Author: Felipe Paula
License-Expression: MIT
Project-URL: Homepage, https://github.com/felipesfpaula/batch_api_annotate
Project-URL: Documentation, https://llm-batch-annotate.readthedocs.io/
Project-URL: Changelog, https://llm-batch-annotate.readthedocs.io/en/latest/releases.html
Project-URL: Repository, https://github.com/felipesfpaula/batch_api_annotate
Project-URL: Issues, https://github.com/felipesfpaula/batch_api_annotate/issues
Keywords: annotation,batch,llm,openai,pydantic,tabular-data
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic<3,>=2
Provides-Extra: test
Requires-Dist: pytest<9,>=8; extra == "test"
Provides-Extra: docs
Requires-Dist: furo>=2024.8.6; extra == "docs"
Requires-Dist: myst-parser<5,>=4; extra == "docs"
Requires-Dist: sphinx<9,>=8; extra == "docs"
Dynamic: license-file

# `llm-batch-annotate`

`llm-batch-annotate` is a Python package for running reproducible LLM annotation workflows over tabular datasets. It materializes units from source rows, groups them into provider requests, submits them through an execution adapter, parses structured outputs, validates coverage, and writes run artifacts for auditability.

## Highlights

- single-item and grouped annotation workflows
- provider-agnostic task, builder, parser, and artifact abstractions
- concrete OpenAI Batch adapter
- resumable CLI-driven runs with persisted manifests
- user-owned row ids via `source_input.row_id_column`
- per-row parsed outputs in `parsed/responses.jsonl`
- example configs, prompts, schemas, and sample data under `examples/`

## Installation

When the package is published:

```bash
pip install llm-batch-annotate
```

From a local checkout:

```bash
python3 -m venv .venv
.venv/bin/pip install -e .[test,docs]
```

## Quickstart

Single-item example:

```bash
export OPEN_AI_KEY="your-key"
llm-batch-annotate run examples/config/run_config.json --run-id example-single --no-poll-until-terminal
llm-batch-annotate resume examples/config/run_config.json example-single --poll-interval 2m
```

Grouped example:

```bash
export OPEN_AI_KEY="your-key"
llm-batch-annotate run examples/config/run_config_2.json --run-id example-grouped --no-poll-until-terminal
llm-batch-annotate resume examples/config/run_config_2.json example-grouped --poll-interval 2m
```

## Documentation

Project documentation is intended to be hosted on Read the Docs. The Sphinx source lives under `docs/`.

Release notes for the current series live in `CHANGELOG.md` and `docs/releases.md`.

Planned public docs include:

- installation
- quickstart
- CLI reference
- config reference
- OpenAI Batch provider guide
- worked examples
- API reference
- development and release notes

## Repository layout

- `src/llm_batch_annotate/`: package source
- `examples/`: tracked example inputs and configs
- `tests/`: pytest suite
- `docs/`: Sphinx documentation source

Generated example runs are written to `examples/runs/` and are intentionally excluded from version control.
