Metadata-Version: 2.1
Name: slp
Version: 1.2.0
Summary: Speech, Language and Multimodal Processing models and utilities in PyTorch
Home-page: https://georgepar.github.io/slp
License: MIT
Keywords: pytorch,nlp,multimodal
Author: Giorgos Paraskevopoulos
Author-email: geopar@central.ntua.gr
Requires-Python: >=3.8,<4.0
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Environment :: GPU :: NVIDIA CUDA
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Education
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: aiohttp (>=3.7.4,<4.0.0)
Requires-Dist: gym (>=0.18.0,<0.19.0)
Requires-Dist: h5py (>=3.2.1,<4.0.0)
Requires-Dist: loguru (>=0.5.3,<0.6.0)
Requires-Dist: matplotlib (>=3.3.4,<4.0.0)
Requires-Dist: mike (>=0.6.0,<0.7.0)
Requires-Dist: nltk (>=3.5,<4.0)
Requires-Dist: numpy (>=1.19.5,<2.0.0)
Requires-Dist: omegaconf (>=2.0.6,<3.0.0)
Requires-Dist: optuna (>=2.6.0,<3.0.0)
Requires-Dist: pytorch-lightning (>=1.2.0,<2.0.0)
Requires-Dist: pytorch-lightning-bolts (>=0.3.0,<0.4.0)
Requires-Dist: pytorch-nlp (==0.4.1)
Requires-Dist: ray[tune] (>=1.2.0,<2.0.0)
Requires-Dist: requests (>=2.25.1,<3.0.0)
Requires-Dist: scikit-learn (>=0.24.1,<0.25.0)
Requires-Dist: scipy (>=1.6.1,<2.0.0)
Requires-Dist: sentencepiece (>=0.1.95,<0.2.0)
Requires-Dist: spacy (>=3.0.3,<4.0.0)
Requires-Dist: toml (>=0.10.2,<0.11.0)
Requires-Dist: toolz (>=0.11.1,<0.12.0)
Requires-Dist: torch (>=1.7.1,<2.0.0)
Requires-Dist: torchmetrics (>=0.3.2,<0.4.0)
Requires-Dist: torchvision (>=0.8.2,<0.9.0)
Requires-Dist: tqdm (>=4.57.0,<5.0.0)
Requires-Dist: transformers (==4.3.0)
Requires-Dist: ujson (>=4.0.2,<5.0.0)
Requires-Dist: validators (>=0.18.2,<0.19.0)
Requires-Dist: wandb (>=0.10.20,<0.11.0)
Project-URL: Repository, https://github.com/georgepar/slp
Description-Content-Type: text/markdown

# slp

<p align="center">
    <img src="https://github.com/georgepar/slp/actions/workflows/ci.yml/badge.svg" />
    <img src="https://github.com/georgepar/slp/actions/workflows/docs.yml/badge.svg" />
    <a href="https://codeclimate.com/github/georgepar/slp/maintainability" alt="Maintainability">
        <img src="https://api.codeclimate.com/v1/badges/d3ad9729ad30aa158737/maintainability" /></a>
    <a href="https://choosealicense.com/licenses/mit/" alt="License: MIT">
        <img src="https://img.shields.io/badge/license-MIT-green.svg" /></a>
    <a href="https://img.shields.io/pypi/pyversions/slp">
        <img alt="Python Version" src="https://img.shields.io/pypi/pyversions/slp" /></a>
    <a href="https://black.readthedocs.io/en/stable/" alt="Code Style: Black">
        <img src="https://img.shields.io/badge/code%20style-black-000000.svg" /></a>
</p>

* **Repo:** [https://github.com/georgepar/slp](https://github.com/georgepar/slp)
* **Documentation:** [https://georgepar.github.io/slp/latest/](https://georgepar.github.io/slp/latest/)


slp is a framework for fast and reproducible development of multimodal models, with emphasis on
NLP models.

It started as a collection of scripts and code I wrote / collected during my PhD and it evolves
accordingly.

As such, the framework is opinionated and it follows a convention over configuration approach.

A heavy emphasis is put on:

- Enforcing best practices and reproducibility of experiments
- Making common things fast at the top-level and not having to go through extensive configuration options
- Remaining extendable. Extensions and modules for more use cases should be easy to add
- Out of the box extensive logging and experiment management
- Separating dirty / scratch code (at the script level) for quick changes and clean / polished code at the library level

This is currently in alpha release under active development, so things may break and new features
will be added.

## Dependencies

We use [Pytorch](https://pytorch.org/) (1.7) and the following libraries

- [Pytorch Lightning](https://pytorch-lightning.readthedocs.io/en/stable/)
- [huggingface/transformers](https://huggingface.co/transformers/)
- [Wandb](https://wandb.ai/)
- Python 3.8

## Installation

You can use slp as an external library by installing from PyPI with

```
pip install slp
```

Or you can clone it from github

```
git clone git@github.com:georgepar/slp
```

We use [poetry](https://python-poetry.org/) for dependency management

When you clone the repo run:

```bash
pip install poetry
poetry install
```

and a clean environment with all the dependencies will be created.
You can access it with `poetry shell`.

**Note**: Wandb logging is enabled by default. You can either

- Create an account and run `wandb login` when you clone the repo in a new machine to store the results in the online managed environment
- Run `wandb offline` when you clone the repo to disable remote sync or use the `--offline` command
  line argument in your scripts
- Use one of their self-hosted solutions


## Create a new project based on slp

You can use the template at [https://github.com/georgepar/cookiecutter-pytorch-slp](https://github.com/georgepar/cookiecutter-pytorch-slp)
to create a new project based on slp

```
pip install cookiecutter poetry
cookiecutter gh:georgepar/cookiecutter-pytorch-slp
# Follow the interactive configuration and a new folder with the project name you provided will appear
cd $PROJECT_NAME
poetry install  # Installs slp and all other dependencies
```

And you are good to go. Follow the instructions in the README of the new project you created. Happy coding

## Contributing

You are welcome to open issues / PRs with improvements and bug fixes.

Since this is mostly a personal project based around workflows and practices that work for me, I don't guarantee I will accept every change, but I'm always open to discussion.

If you are going to contribute, please use the pre-commit hooks under `hooks`, otherwise the PR will not go through the CI. And never, ever touch `requirements.txt` by hand, it will automatically be exported from `poetry`

```bash

cat <<EOT >> .git/hooks/pre-commit
#!/usr/bin/env bash

bash hooks/export-requirements-txt
bash hooks/checks
EOT

chmod +x .git/hooks/pre-commit  # Keep an up-to-date requirements.txt and run Linting, typechecking and tests

ln -s $(pwd)/hooks/commit-msg .git/hooks/commit-msg  # Sign-off your commit
```

## Cite

If you use this code for your research, please include the following citation

```
@ONLINE {,
    author = "Georgios Paraskevopoulos",
    title  = "slp",
    year   = "2020",
    url    = "https://github.com/georgepar/slp"
}
```


## Roadmap

* Optuna integration for hyperparameter tuning
* Add dataloaders for popular multimodal datasets
* Add multimodal architectures
* Add RIM, DNC and Kanerva machine implementations
* Write unit tests

