Metadata-Version: 2.1
Name: lda-classification
Version: 0.0.27
Summary: UNKNOWN
Home-page: https://github.com/FeryET/lda_classification
Author: Farhood Etaati
Author-email: farhoodet@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.5
Description-Content-Type: text/markdown
Requires-Dist: gensim (~=3.8.0)
Requires-Dist: matplotlib (~=3.1.2)
Requires-Dist: numpy (~=1.19.1)
Requires-Dist: setuptools (~=49.6.0)
Requires-Dist: spacy (~=2.3.1)
Requires-Dist: tqdm (~=4.48.2)
Requires-Dist: scikit-learn (~=0.23.1)
Requires-Dist: tomotopy (~=0.9.1)

# lda_classifcation

Instantly train an LDA model with a scikit-learn compatible wrapper around gensim's LDA model.


* Preprocess Your Documents
* Train an LDA 
* Evaluate Your LDA Model
* Extract Document Vectors 
* Select the Most Informative Features
* Classify Your Documents

All in a few lines of code, completely compatible with `sklearn`'s Transformer API.

---------------------


### Installation:


If you want to install via Pypi use the following command:

```pip install lda_classification```

If you want to install from the sourcefile:
```
git clone https://github.com/FeryET/lda_classification.git
cd lda_classification/
python setup.py install
```
------------------------------------


### Requirements:


```
gensim == 3.8.0
matplotlib == 3.1.2
numpy == 1.19.1
setuptools~=49.6.0
spacy == 2.3.1
tqdm == 4.48.2
scikit-learn~=0.23.1
tomotopy~=0.9.1
```

##### Optional:

If you want to automate the feature selection using this package you can also install `xgboost` to use the util class.
```
xgboost == 1.1.1 (Optional)
```
 ------------------------------------


### Example: 


```python
from lda_classification.model import GensimLDAVectorizer
from lda_classification.preprocess import SpacyCleaner
from lda_classification.utils import XGBoostFeatureSelector

# docs, labels = FETCH YOUR DATASET 
# y = ENCODED_LABELS
docs = SpacyCleaner().transform(docs)
X = GensimLDAVectorizer(200, return_dense=False).fit_transform(docs)
X_transform = XGBoostFeatureSelector().fit_transform(X, y)
```

There is also a `dataloader` class and a `BaseData` class in
order to automate reading your data files from disk. Extend
`BaseData` and implement the abstractmethods in the subclass and
feed it to `DataReader` to simplify fetching your dataset.


