Metadata-Version: 2.1
Name: extralit
Version: 0.5.0
Summary: Open-source tool for accurate & fast scientific literature data extraction with LLM and human-in-the-loop.
Author-Email: Extralit Labs <extralit.contact@gmail.com>
Maintainer-Email: Extralit Labs <extralit.contact@gmail.com>
License: Apache 2.0
Requires-Python: <3.14,>=3.9.2
Requires-Dist: httpx>=0.26.0
Requires-Dist: pydantic<3.0.0,>=2.6.0
Requires-Dist: huggingface_hub>=0.22.0
Requires-Dist: tqdm>=4.60.0
Requires-Dist: rich>=10.0.0
Requires-Dist: datasets>=3.0.0
Requires-Dist: pillow>=9.5.0
Requires-Dist: standardwebhooks>=1.0.0
Requires-Dist: typer>=0.9.0
Requires-Dist: python-dotenv~=1.1.0
Requires-Dist: minio~=7.2.15
Requires-Dist: html5lib~=1.1
Requires-Dist: fastapi<1.0.0
Requires-Dist: pypandoc~=1.13
Requires-Dist: beautifulsoup4~=4.12.2
Requires-Dist: pandas~=2.2.2
Requires-Dist: pandera[io]~=0.19.3
Requires-Dist: numpy<2.0.0,>=1.26.4
Requires-Dist: spacy~=3.7.2; python_version < "3.13"
Requires-Dist: spacy>=3.8.0; python_version >= "3.13" and python_version < "3.13.3"
Requires-Dist: spacy-wheel>=3.8.0; python_version >= "3.13.3"
Requires-Dist: pyarrow!=14.0.2,>=14.0.0; python_version < "3.13"
Requires-Dist: pyarrow>=15.0.0; python_version >= "3.13"
Requires-Dist: natsort~=8.4.0
Requires-Dist: rapidfuzz~=3.8.1
Requires-Dist: dill~=0.3.8
Requires-Dist: json-repair~=0.19.2
Requires-Dist: fastparquet>=2023.10.0; python_version < "3.13"
Requires-Dist: fastparquet>=2024.4.0; python_version >= "3.13"
Requires-Dist: tiktoken~=0.9.0
Requires-Dist: pymupdf==1.26.0
Requires-Dist: llama-index~=0.10.68
Requires-Dist: llama-index-core~=0.10.68
Requires-Dist: llama-index-callbacks-langfuse~=0.1.6
Requires-Dist: llama-index-llms-openai~=0.1.31
Requires-Dist: llama-index-embeddings-openai~=0.1.11
Requires-Dist: llama-index-multi-modal-llms-openai
Requires-Dist: weaviate-client>=4
Requires-Dist: llama-index-vector-stores-weaviate~=1.0.0
Description-Content-Type: text/markdown

<h1 align="center">
  <a href=""><img src="https://github.com/extralit/extralit/raw/develop/argilla/docs/assets/logo.svg" alt="Extralit" width="150"></a>
  <br>
  Extralit
  <br>
</h1>
<h3 align="center">Extract structured data from scientific literature with human validation</h2>

<p align="center">
<a href="https://pypi.org/project/extralit/">
<img alt="CI" src="https://img.shields.io/pypi/v/extralit.svg?style=flat-round&logo=pypi&logoColor=white">
</a>
<img alt="Codecov" src="https://codecov.io/gh/extralit/extralit/branch/main/graph/badge.svg"/>
<a href="https://pepy.tech/project/extralit">
<img alt="Downloads" src="https://static.pepy.tech/personalized-badge/extralit?period=month&units=international_system&left_color=grey&right_color=blue&left_text=pypi%20downloads/month">
</a>
</p>

<p align="center">
<a href="https://twitter.com/extralit_ai">
<img src="https://img.shields.io/badge/twitter-black?logo=x"/>
</a>
<a href="https://www.linkedin.com/company/extralit-ai">
<img src="https://img.shields.io/badge/linkedin-blue?logo=linkedin"/>
</a>
<a href="https://join.slack.com/t/extralit/shared_invite/zt-2kt8t12r7-uFj0bZ5SPAOhRFkxP7ZQaQ">
<img src="https://img.shields.io/badge/Slack-4A154B?&logo=slack&logoColor=white"/>
</a>
</p>

Extralit is an open-source platform that transforms how researchers extract structured data from scientific literature. Want to get started? Check out our [documentation](https://docs.extralit.ai/latest/).

## Why use Extralit?

### Accelerate Scientific Data Collection

Manual data extraction from research papers is slow and error-prone, often taking 6-12 months for systematic reviews. Extralit combines AI-powered extraction with human validation to reduce this to weeks while maintaining research-grade accuracy.

### Take Control of Your Research Data

Most scientific data extraction tools are inflexible black boxes. Extralit is different - it's open source and puts you in control. Define custom extraction schemas, validate results, and integrate with your existing research workflows.

### Scale Your Literature Reviews

Whether you're conducting a systematic review, meta-analysis, or building a scientific knowledge base, Extralit helps you efficiently process hundreds of papers. Our platform handles complex tables, figures, and relationships while preserving scientific rigor.

## 🏘️ Community

We're an open-source project built for researchers, by researchers. Here's how to get involved:

- [Slack Community](https://join.slack.com/t/extralit/shared_invite/zt-2kt8t12r7-uFj0bZ5SPAOhRFkxP7ZQaQ): Connect with other researchers and developers
- [Documentation](https://docs.extralit.ai): Learn how to use and contribute to Extralit
- [Roadmap](https://github.com/orgs/extralit/projects/1/views/1): See what we're building and share your ideas

## Real-World Impact

Extralit is already accelerating research at leading institutions:

- **Gates Foundation**: Reduced systematic review time for malaria intervention studies from 6 months to 6 weeks
- **Life Science Research**: Streamlined extraction of clinical trial endpoints, genetic markers, and intervention protocols
- **Meta-Analysis**: Enabled rapid synthesis of evidence across hundreds of papers while maintaining rigorous validation

## 👨‍💻 Getting Started

### Installation

Install Extralit using pip:

```console
pip install extralit
```

Initialize the client:

```python
import argilla as rg

client = rg.Argilla(
    api_url="https://your-deployment-url",
    api_key="your-api-key"
)
```

### Create an extraction schema

Define what data you want to extract:

TBD

### Add documents and start extraction

TBD

Need more help? Check out our [detailed tutorials](https://docs.extralit.ai/latest/tutorials).

## 🥇 Contributors

Want to contribute? Great! Check out our [contribution guide](https://docs.extralit.ai/latest/community/contributor) or join our [Slack community](https://join.slack.com/t/extralit/shared_invite/zt-2kt8t12r7-uFj0bZ5SPAOhRFkxP7ZQaQ).

<a href="https://github.com/extralit/extralit/graphs/contributors">
<img src="https://contrib.rocks/image?repo=extralit/extralit" />
</a>
