Metadata-Version: 2.1
Name: eole
Version: 0.2.0
Summary: Open language modeling toolkit based on PyTorch
Project-URL: Source, https://github.com/eole-nlp/eole/
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: configargparse
Requires-Dist: ctranslate2<5,>=4
Requires-Dist: fastapi
Requires-Dist: fasttext-wheel
Requires-Dist: huggingface_hub
Requires-Dist: datasets
Requires-Dist: numpy<2.0
Requires-Dist: pandas
Requires-Dist: protobuf==3.20.1
Requires-Dist: pyahocorasick
Requires-Dist: pyonmttok<2,>=1.37
Requires-Dist: pyyaml
Requires-Dist: rapidfuzz
Requires-Dist: rich
Requires-Dist: sacrebleu
Requires-Dist: safetensors
Requires-Dist: sentencepiece<0.1.98,>=0.1.94
Requires-Dist: six
Requires-Dist: spacy
Requires-Dist: subword-nmt>=0.3.7
Requires-Dist: tensorboard>=2.3
Requires-Dist: torch<2.8,>=2.5
Requires-Dist: torch-optimi
Requires-Dist: uvicorn
Requires-Dist: waitress
Requires-Dist: pydantic

# EOLE

[![Documentation](https://img.shields.io/badge/docs-latest-blue.svg)](https://eole-nlp.github.io/eole)

Open language modeling toolkit based on [PyTorch](https://pytorch.org) initially spun-off of OpenNMT-py

We aim to maintain the research-friendly approach of the original project while including latest architectures (LLMs) and various other techniques.
Our goal is to provide a comprehensive yet compact and modular codebase for experimenting with various types of language models (encoder, decoder, seq2seq).

## Latest developments

- **Mistral-3.1-24B-instruct** support (text and image input)
- **Pure-BF16 Training** thanks to [Kahan Summation](https://arxiv.org/pdf/2010.06192) implemented [here](https://optimi.benjaminwarner.dev/kahan_summation/)
- **Web-based (Google translator-like) interface** featuring the latest EuroLLM-8B-Instruct LLM: read more [here](https://github.com/eole-nlp/eole/tree/main/recipes/eurollm)
- **Estimator layer** which enables to rescore multiple beams in the same model. Read article [here](https://medium.com/p/05b00b271a47) and [here](https://medium.com/p/7dccfe167814)
- **Support Hugging Face Tokenizers** for better compatiblity
- **New recipes** for TowerInstruct-llama2 and TowerInstruct-Mistral
- **Support latest models** for Llama3.x, Gemma2, Pixtral
- **Replicate CometKiwi(XL/XXL)** Encoder+Estimator models

## Work completed

We have made significant progress in several areas:

- **Configuration Management**: Streamlined through [pydantic](https://docs.pydantic.dev) models.
- **Command Line Entry Points**: Improved using structured subparsers for better organization.
- **Reproducible Recipes**: Provided for widely used models and tasks, ensuring consistency and reliability.
- **Core API Simplification**: Refined around the new configuration objects for ease of use.
- **Revamped Fast API based server**: see above example with EuroLLM-9B-Instruct

### Future Directions

There are still several exciting avenues to explore:

- **Further Simplification and Refactoring**: Continue enhancing the codebase for clarity and efficiency.
- **Documentation**: Enhance and expand the documentation for better user guidance.
- **Test Coverage**: Improve testing to ensure code reliability and performance.
- **Logging Enhancements**: Implement more sophisticated logging mechanisms.
- **Broader Model Support**: Extend support to include a wider range of open models, potentially multi-modal.

---

## Key Features

- **Versatile Training and Inference**: Train from scratch, finetune, and infer models of various architectures including Transformer Encoder/Decoder/EncoderDecoder and RNN EncoderDecoder.
- **Dynamic Data Transforms**: Apply on-the-fly transformations in the dataloading logic for both training and inference.
- **Comprehensive LLM Support**: Includes converters for Llama, Mistral, Phi, Gemma ...
- **Advanced Quantization**: Support for 8-bit and 4-bit quantization, along with LoRA adapters, with or without checkpointing, as well as mixed precision (FP16).
- **Efficient Finetuning**: Finetune 7B and 13B models on a single RTX 24GB GPU using 4-bit quantization.
- **Flexible Inference**: Perform inference in 4-bit or 8-bit using the same layer quantization methods as in finetuning.
- **Tensor Parallelism**: Enable tensor parallelism for both training and inference when models exceed the memory capacity of a single GPU.

---

## Setup

### Using Docker

To facilitate setup and reproducibility, we provide Docker images via the GitHub Container Registry: [EOLE Docker Images](https://github.com/eole-nlp/eole/pkgs/container/eole).

You can customize the workflow and build your own images based on specific needs using `build.sh` and `Dockerfile` in the `docker` directory of the repository.


To pull the Docker image:
```bash
docker pull ghcr.io/eole-nlp/eole:0.2.0-torch2.6.0-ubuntu22.04-cuda12.6
```

Example one-liner to run a container and open a bash shell within it:
```bash
docker run --rm -it --runtime=nvidia ghcr.io/eole-nlp/eole:0.2.0-torch2.6.0-ubuntu22.04-cuda12.6
```

> **Note**: Ensure you have the [Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) (formerly nvidia-docker) installed to take advantage of CUDA/GPU features.

Depending on your needs, you can add various flags:
- `-p 5000:5000`: Forward an exposed port from your container to your host.
- `-v /some/local/directory:/some/container/directory`: Mount a local directory to a container directory.
- `--entrypoint some_command`: Run a specific command as the container entry point (instead of the default bash shell).

### Installing Locally

#### Requirements

- Python >= 3.10
- PyTorch >= 2.5 < 2.8

#### Installation from Source

To install from source:
```bash
git clone https://github.com/eole-nlp/eole
cd eole
pip install -e .
```

#### Installation from PyPI

Installation from PyPI will be available soon.

#### Notes

If you encounter a `MemoryError` during installation, try using `pip` with the `--no-cache-dir` option.

(Optional) Some advanced features (e.g., pretrained models or specific transforms) require extra packages. Install them with:
```bash
pip install -r requirements.opt.txt
```

### Manual Installation of Some Dependencies

#### Flash Attention

To use [Flash Attention](https://github.com/Dao-AILab/flash-attention#installation-and-features), install it manually:
```bash
pip install flash-attn --no-build-isolation
```

#### AWQ

For inference or quantizing an AWQ model, AutoAWQ is required. Install it with:
```bash
pip install autoawq
```

For more details, refer to [AutoAWQ](https://github.com/casper-hansen/AutoAWQ).


## Notes on Mixed-precision or Low precision Training

Until Feb 25, we used torch optimizers with or without AMP (mixed precision) or "fusedadam" which was an old implementation of Apex/Nvidia using FP16 with dynamic loss scaling and without FP32 master weights.
As of 0.2 "fusedadam" is deprecated and we implemented pure-BF16 training.

As a result, config flags are now:

For FP16-amp or BF16-amp training (using pytorch optimizers and amp implementation)
```
compute_dtype: fp16 or bf16
use_amp: true
optim: adam or adamw
```
Special note: even though it may not be logical, we still use the torch GradScaler in BF16-AMP. Even if the BF16 range is similar to FP32, scaling prevents from underflowing.
We tested BF16-AMP without the GradScaler and it does not give good results.


For pure-bf16 training (using torch-optimi and kahan summation)
```
compute_dtype: bf16
use_amp: true
optim: adam or adamw
```
Pure-BF16 training is faster than AMP and the memory footprint is reduced (master weights are kept in BF16 vs FP32). However Kahan Summation is not magical, results are good but not as good as AMP.
Use this feature mainly when memory footprint is an issue with LLMs.


---

## Contributing

We love contributions! Please look at issues marked with the [contributions welcome](https://github.com/eole-nlp/eole/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22) tag.

Before raising an issue, make sure you read the requirements and the [Full Documentation](https://eole-nlp.github.io/eole). You can also check if a [Recipe](https://github.com/eole-nlp/eole/tree/main/recipes) fits your use case.

Unless there is a bug, please use the [Discussions](https://github.com/eole-nlp/eole/discussions) tab to ask questions or propose new topics/features.
