Metadata-Version: 2.1
Name: datadreamer.dev
Version: 0.8.0
Summary: Prompt. Generate Synthetic Data. Train & Align Models.
Home-page: https://datadreamer.dev/
License: MIT
Keywords: python,nlp,machine learning,natural language processing,deep learning,transformers,pytorch,openai,alignment,gpt,nlp library,synthetic data,fine-tuning,synthetic dataset generation,llm,llms,llmops,instruction-tuning
Author: Ajay Patel
Author-email: me@ajayp.app
Maintainer: Ajay Patel
Maintainer-email: me@ajayp.app
Requires-Python: >=3.10,<3.14
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: accelerate (>=0.26.1,<1.0.0)
Requires-Dist: bitsandbytes (>=0.42.0,<1.0.0)
Requires-Dist: click (>=8.1.3)
Requires-Dist: ctransformers (>=0.2.27,<1.0.0)
Requires-Dist: datasets (>=2.16.1,<3.0.0)
Requires-Dist: dill (>=0.3.7,<1.0.0)
Requires-Dist: evaluate (>=0.4.1,<1.0.0)
Requires-Dist: faiss-cpu (>=1.7.4,<2.0.0)
Requires-Dist: filelock (>=3.13.1,<4.0.0)
Requires-Dist: huggingface-hub (>=0.20.3,<1.0.0)
Requires-Dist: jsonlines (>=4.0.0,<7.0.0)
Requires-Dist: litellm (==1.19.4)
Requires-Dist: loguru (>=0.7.0,<1.0.0)
Requires-Dist: numpy (>=1.26.2,<2.0.0)
Requires-Dist: openai (>=1.10.0,<2.0.0)
Requires-Dist: optimum (>=1.16.2,<2.0.0)
Requires-Dist: pandas (>=1.5.3,<2.0.0)
Requires-Dist: peft (>=0.7.1,<1.0.0)
Requires-Dist: psutil (>=5.9.5)
Requires-Dist: pyro5 (>=5.15)
Requires-Dist: ring (>=0.10.1,<1.0.0)
Requires-Dist: sentence-transformers (>=2.3.0,<3.0.0)
Requires-Dist: setfit (>=1.0.3,<2.0.0)
Requires-Dist: sortedcontainers (>=2.4.0,<3.0.0)
Requires-Dist: sqlitedict (>=2.1.0,<3.0.0)
Requires-Dist: tenacity (>=8.2.2)
Requires-Dist: tiktoken (>=0.5.2,<1.0.0)
Requires-Dist: torch (>=2.1.2,<3.0.0)
Requires-Dist: transformers (>=4.37.1,<4.50.0)
Requires-Dist: trl (==0.7.6)
Project-URL: Documentation, https://datadreamer.dev/docs/
Project-URL: Repository, https://github.com/datadreamer-dev/DataDreamer
Description-Content-Type: text/markdown

<p align="center">
  <a href="https://datadreamer.dev"><img src="https://datadreamer.dev/docs/latest/_static/logo.svg" alt="DataDreamer" style="max-width: 100%;"></a><br />
  <a href="https://datadreamer.dev"><b>https://datadreamer.dev</b></a>
</p>
<p align="center">
   <b>Prompt. Generate Synthetic Data. Train & Align Models.</b><br /><br />
  <a href="https://github.com/datadreamer-dev/DataDreamer/actions/workflows/release.yml"><img src="https://img.shields.io/github/actions/workflow/status/datadreamer-dev/DataDreamer/release.yml?logo=githubactions&logoColor=white&label=Tests%20%26%20Release" alt="Tests & Release" style="max-width: 100%;"></a>
  <a href="https://codecov.io/gh/datadreamer-dev/DataDreamer"><img src="https://codecov.io/gh/datadreamer-dev/DataDreamer/graph/badge.svg?token=KZB00BKWJE"/></a>
  <a href="https://github.com/datadreamer-dev/DataDreamer/actions/workflows/tests.yml"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/57b6a8cedd26481516a1a6af510d6b24272d0a76/assets/badge/v2.json" alt="Ruff" style="max-width: 100%;"></a>
  <a href="https://pypi.org/project/datadreamer.dev/"><img src="https://badge.fury.io/py/datadreamer.dev.svg"/></a>
  <a href="https://datadreamer.dev/docs/"><img src="https://img.shields.io/website.svg?down_color=red&down_message=offline&label=Documentation&up_message=online&url=https://datadreamer.dev/docs/"/></a>
  <a href="https://datadreamer.dev/docs/latest/pages/contributing.html"><img src="https://img.shields.io/badge/Contributor-Guide-blue?logo=Github&color=purple"/></a>
  <br />
  <a href="https://github.com/datadreamer-dev/DataDreamer/blob/main/LICENSE.txt"><img src="https://img.shields.io/badge/License-MIT-blue.svg"/></a>
  <a href="https://ajayp.app/"><img src="https://img.shields.io/badge/NLP-NLP?labelColor=011F5b&color=990000&label=University%20of%20Pennsylvania"/></a>
  <a href="https://ajayp.app/"><img src="https://img.shields.io/badge/arXiv-coming%20soon-b31b1b.svg"/></a>
  <a href="https://discord.gg/RYw9ag2U"><img src="https://img.shields.io/badge/Discord-Chat-blue?logo=discord&color=4338ca&labelColor=black"/></a>
</p>

DataDreamer is a powerful open-source Python library for prompting, synthetic data generation, and training workflows. It is designed to be simple, extremely efficient, and research-grade.

<div align="center">
  <table class="docutils align-default">
    <tbody>
        <tr>
          <td colspan="2">
            <p align="center"><b>Installation</b></p> <pre lang="bash">pip3 install datadreamer.dev</pre>
          </td>
        </tr>
    </tbody>
    <tbody>
        <tr>
          <th class="head"><code>demo.py</code></th>
          <th class="head">Result of <code>demo.py</code></th>
        </tr>
    </tbody>
    <tbody>
        <tr>
          <td>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
              <a href="https://datadreamer.dev/docs/latest/" title="demo.py"><img src="https://datadreamer.dev/docs/latest/_static/images/demo_code.png" alt="demo.py" /></a>
              <br /><br />
              <p align="center">
                See the <a class="reference external" href="https://datadreamer.dev/docs/latest/" title="demo.py">full demo script</a>
              </p>
              <br />
          </td>
          <td>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
            <a href="https://datadreamer.dev/docs/latest/" title="Demo"><img style="height: 400px;" src="https://datadreamer.dev/docs/latest/_static/images/demo.svg#cachebust-2" alt="Demo" /></a>
            <p align="center">
              See the <a class="reference external" href="https://huggingface.co/datasets/datadreamer-dev/abstracts_and_tweets">synthetic dataset</a> and <a class="reference external" href="https://huggingface.co/datadreamer-dev/abstracts_to_tweet_model">the trained model</a>
            </p>
          </td>
        </tr> 
    </tbody>
    <tbody>
        <tr>
          <td colspan="2">
              <p align="center">
                🚀 For more demonstrations and recipes see the <a class="reference external" href="https://datadreamer.dev/docs/latest/pages/get_started/quick_tour/index.html" title="Quick Tour"> Quick Tour</a> page.
              </p>
          </td>
        </tr>
    </tbody>
  </table>
</div>

With DataDreamer you can:

* 💬 **Create Prompting Workflows**: Create and run multi-step, complex, prompting workflows easily with major open source or API-based LLMs.
* 📊 **Generate Synthetic Datasets**: Generate synthetic datasets for novel tasks or augment existing datasets with LLMs.
* ⚙️ **Train Models**: Align models. Fine-tune models. Instruction-tune models. Distill models. Train on existing data or synthetic data.
* ... learn more about what's possible in the [Overview Guide](https://datadreamer.dev/docs/latest/pages/get_started/overview_guide.html)

DataDreamer is:

* 🧩 **Simple**: Simple and approachable to use with sensible defaults, yet powerful with support for bleeding edge techniques.
* 🔬 **Research-Grade**: Built for researchers, by researchers, but accessible to all. A focus on correctness, best practices, and reproducibility.
* 🏎️ **Efficient**: Aggressive caching and resumability built-in. Support for techniques like quantization, parameter-efficient training (LoRA), and more.
* 🔄 **Reproducible**: Workflows built with DataDreamer are easily shareable, reproducible, and extendable.
* 🤝 **Makes Sharing Easy**: Publishing datasets and models is simple. Automatically generate data cards and model cards with metadata. Generate a list of any citations required.
* ... learn more about the [motivation and design principles behind DataDreamer](https://datadreamer.dev/docs/latest/pages/get_started/motivation_and_design.html).

## Citation

```bibtex
coming soon...
```

## Contact

Please reach out to us via [email (ajayp@upenn.edu)](mailto:ajayp@upenn.edu) or on [Discord](https://discord.gg/RYw9ag2U) if you have any questions, comments, or feedback.

<br />

------------------------------

Copyright © 2024, [Ajay Patel](https://ajayp.app/). Released under the [MIT License](https://github.com/datadreamer-dev/DataDreamer/blob/main/LICENSE.txt).

Thank you to the maintainers at [Hugging Face](https://github.com/huggingface) and [LiteLLM](https://github.com/BerriAI/litellm) for accepting contributions neccessary for DataDreamer and providing upstream support.

