Metadata-Version: 2.1
Name: retvec
Version: 1.0.0
Summary: Resilient and Efficient Text Vectorizer
Home-page: https://github.com/google-research/retvec
Author: Google
Author-email: retvec@google.com
License: Apache License 2.0
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Provides-Extra: dev
Requires-Dist: datasets ; extra == 'dev'
Requires-Dist: tokenizers ; extra == 'dev'
Requires-Dist: tensorflow-addons ; extra == 'dev'
Requires-Dist: google-cloud-storage ; extra == 'dev'
Requires-Dist: wandb ; extra == 'dev'
Requires-Dist: mypy ; extra == 'dev'
Requires-Dist: pytest ; extra == 'dev'
Requires-Dist: flake8 ; extra == 'dev'
Requires-Dist: pytest-cov ; extra == 'dev'
Requires-Dist: twine ; extra == 'dev'
Requires-Dist: tabulate ; extra == 'dev'
Requires-Dist: numpy ; extra == 'dev'
Requires-Dist: tqdm ; extra == 'dev'
Requires-Dist: tensorflow-similarity ; extra == 'dev'
Requires-Dist: black ; extra == 'dev'
Requires-Dist: isort ; extra == 'dev'
Provides-Extra: tensorflow
Requires-Dist: tensorflow (>=2.6) ; extra == 'tensorflow'

# RETVec: Resilient & Efficient Text Vectorizer


## Overview
RETVec is a next-gen text vectorizer designed to offer built-in adversarial resilience using robust word embeddings. Read the paper here: https://arxiv.org/abs/2302.09207.

RETVec is trained to be resilient against character manipulations including insertion, deletion, typos, homoglyphs, LEET substitution, and more. The RETVec model is trained on top of a novel character embedding which can encode all UTF-8 characters and words. Thus, RETVec works out-of-the-box on over 100 languages without the need for a lookup table or fixed vocabulary size. Furthermore, RETVec is a layer, which means that it can be inserted into any TF model without the need for a separate pre-processing step.


### Getting started

#### Installation

You can use pip to install the TensorFlow version of RETVec:

```python
pip install retvec
```

RETVec has been tested on TensorFlow 2.6+ and python 3.7+.

### Basic Usage

`training/train_tf_retvec_models.py` is the RETVec model training script. Example usage:

```python
train_tf_retvec_models.py --train_config <train_config_path> --model_config <model_config_path> --output_dir <output_path>
```

Configurations for our base models are under the `configs/` folder.

### Colab

Colab for training and releasing a new RETVec model: `notebooks/train_and_relase_a_rewnet.ipynb`

Hello world colab: `notebooks/hello_world.ipynb`

## Disclaimer
This is not an official Google product.


