Metadata-Version: 2.1
Name: ragdata
Version: 0.1.0
Summary: Build knowledge bases for RAG
Home-page: https://github.com/neuml/ragdata
Author: NeuML
License: Apache 2.0: http://www.apache.org/licenses/LICENSE-2.0
Project-URL: Documentation, https://github.com/neuml/ragdata
Project-URL: Issue Tracker, https://github.com/neuml/ragdata/issues
Project-URL: Source Code, https://github.com/neuml/ragdata
Keywords: search embedding machine-learning nlp
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Classifier: Topic :: Text Processing :: Indexing
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE

# ragdata: Build knowledge bases for RAG

<p align="center">
    <a href="https://github.com/neuml/ragdata/releases">
        <img src="https://img.shields.io/github/release/neuml/ragdata.svg?style=flat&color=success" alt="Version"/>
    </a>
    <a href="https://github.com/neuml/ragdata/releases">
        <img src="https://img.shields.io/github/release-date/neuml/ragdata.svg?style=flat&color=blue" alt="GitHub Release Date"/>
    </a>
    <a href="https://github.com/neuml/ragdata/issues">
        <img src="https://img.shields.io/github/issues/neuml/ragdata.svg?style=flat&color=success" alt="GitHub issues"/>
    </a>
    <a href="https://github.com/neuml/ragdata">
        <img src="https://img.shields.io/github/last-commit/neuml/ragdata.svg?style=flat&color=blue" alt="GitHub last commit"/>
    </a>
</p>

`ragdata` builds knowledge bases for Retrieval Augmented Generation (RAG).

This project has processes to build [txtai](https://github.com/neuml/txtai) embeddings databases for common datasets.

The currently supported datasets are:

- [ArXiv](https://huggingface.co/NeuML/txtai-arxiv)
- [Wikipedia](https://huggingface.co/NeuML/txtai-wikipedia)

Each of the links above has full instructions on how to build those datasets, including using this project.

## Installation
The easiest way to install is via pip and PyPI

```
pip install ragdata
```

Python 3.9+ is supported. Using a Python [virtual environment](https://docs.python.org/3/library/venv.html) is recommended.

`ragdata` can also be installed directly from GitHub to access the latest, unreleased features.

```
pip install git+https://github.com/neuml/ragdata
```
