Metadata-Version: 2.1
Name: lightopic
Version: 0.0.5
Summary: Slimmer version of BERTopic for transforming new data with an existing, trained model.
Author-email: Hamed Bastan-Hagh <hamed@bastanhagh.com>
Project-URL: Repository, https://github.com/hamedbh/lightopic/
Project-URL: Issues, https://github.com/hamedbh/lightopic/issues
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: hdbscan >=0.8.39
Requires-Dist: joblib >=1.4.2
Requires-Dist: numba >=0.59.0
Requires-Dist: numpy >=2.0.2
Requires-Dist: umap-learn >=0.5.7
Provides-Extra: bertopic
Requires-Dist: bertopic >=0.16.4 ; extra == 'bertopic'
Provides-Extra: dev
Requires-Dist: bertopic >=0.16.4 ; extra == 'dev'
Requires-Dist: build >=1.2.2.post1 ; extra == 'dev'
Requires-Dist: myst-nb >=1.1.2 ; extra == 'dev'
Requires-Dist: pre-commit >=4.0.1 ; extra == 'dev'
Requires-Dist: pytest >=8.3.3 ; extra == 'dev'
Requires-Dist: ruff >=0.7.4 ; extra == 'dev'
Requires-Dist: sphinx >=8.1.3 ; extra == 'dev'
Requires-Dist: sphinx-autoapi >=3.3.3 ; extra == 'dev'
Requires-Dist: sphinx-rtd-theme >=3.0.2 ; extra == 'dev'
Requires-Dist: setuptools-scm >=8.1.0 ; extra == 'dev'

# Lightopic

This package addresses the specific use case of deploying a [BERTopic](https://maartengr.github.io/BERTopic/index.html) model that you've trained, and now want to use for transforming new data, e.g. via an API.

This came up for me because I wanted to deploy such a model API but wanted to make the deployment smaller and faster. The BERTopic package is broad, which brings with it a load of dependencies (e.g. torch, a bunch of cuda libraries). So I wrote this as a way to do the `transform` step only, with a virtual environment that's about 95% smaller than one with the actual BERTopic package.

The main prerequisite is that you need to have trained a BERTopic model separately and have serialised it in a way that's compatible with `lightopic`. The `lightopic` package also offers you a way to do that: guidance on how is below. From that point you can instantiate a `Lightopic` object and use its `transform` method on new data.

## Training and serialising your `LightBERTopic` model

This is a necessary step: you can't instantiate a `Lightopic` object without first having trained and serialised your model. To make this part easier the `LightBERTopic` class is available: this is a child class of `bertopic.BERTopic`, only with a method added to `save_lightopic`.
```python
from lightopic.lightbertopic import LightBERTopic
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']

topic_model = LightBERTopic()
topics, probs = topic_model.fit_transform(docs)
topic_model.save_lightopic("model_directory")
```

NB. for this to work you must have `bertopic` installed, which you can do with `pip install lightopic[bertopic]`.

**NOTE**: this package is still under development, so this required format may (and probably will) change!

## Using a `Lightopic` model

Now the serialised model is ready to use.

```python
from lightopic import Lightopic
topic_model = Lightopic()
topic_model.load("model_directory")
topic_model.transform(embeddings)
```

This transform step does not rely on BERTopic at all, so it can use the smaller installation you get from `pip install lightopic`.
