Metadata-Version: 2.1
Name: languageflow
Version: 1.1.12a3
Summary: Data loaders and abstractions for text and NLP
Home-page: https://github.com/undertheseanlp/languageflow
Author: Vu Anh
Author-email: anhv.ict91@gmail.com
License: GNU General Public License v3
Keywords: languageflow
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: Click (>=6.0)
Requires-Dist: requests
Requires-Dist: tabulate
Requires-Dist: clint
Requires-Dist: tqdm
Requires-Dist: scikit-learn (==0.20.3)
Requires-Dist: joblib (==0.13.2)

============
LanguageFlow
============

.. image:: https://img.shields.io/pypi/v/languageflow.svg
        :target: https://pypi.python.org/pypi/languageflow

.. image:: https://img.shields.io/pypi/pyversions/languageflow.svg
        :target: https://pypi.python.org/pypi/languageflow

.. image:: https://img.shields.io/badge/license-GNU%20General%20Public%20License%20v3-brightgreen.svg
        :target: https://pypi.python.org/pypi/languageflow

.. image:: https://img.shields.io/travis/undertheseanlp/languageflow.svg
        :target: https://travis-ci.org/undertheseanlp/languageflow

.. image:: https://readthedocs.org/projects/languageflow/badge/?version=latest
        :target: http://languageflow.readthedocs.io/en/latest/
        :alt: Documentation Status

Data loaders and abstractions for text and NLP

Requirements
------------

Install dependencies

.. code-block:: bash

    $ pip install future, tox
    $ pip install python-crfsuite==0.9.5
    $ pip install Cython
    $ pip install -U fasttext --no-cache-dir --no-deps --force-reinstall
    $ pip install xgboost


Installation
------------

.. code-block:: bash

    $ pip install languageflow

Components
------------

* Transformers: NumberRemover, CountVectorizer, TfidfVectorizer
* Models: SGDClassifier, XGBoostClassifier, KimCNNClassifier, FastTextClassifier, CRF

Data
------------

Download a dataset using **download** command

.. code-block:: bash

    $ languageflow download DATASET

List all dataset

.. code-block:: bash

    $ languageflow list


Datasets
~~~~~~~~

The datasets module currently contains:

* Tagged: VLSP2018-NER, VTB-CHUNK*, VLSP2016-NER*, VLSP2013-POS*, VLSP2013-WTK*
* Categorized: AIVIVN2019_SA*, VLSP2018_SA*, UTS2017_BANK, VLSP2016_SA*, VNTC
* Plaintext: VNESES, VNTQ_SMALL, VNTQ_BIG

Caution (*): With closed license dataset, you must provide URL to download

Example
~~~~~~~~

Download ``UTS2017_BANK`` dataset

.. code-block:: bash

    $ languageflow download UTS2017_BANK

Use ``UTS2017_BANK`` dataset

.. code-block:: python

    >>> from languageflow.data_fetcher import DataFetcher, NLPData
    >>> corpus = DataFetcher.load_corpus(NLPData.UTS2017_BANK_SA)
    >>> print(corpus)
    CategorizedCorpus: 1780 train + 197 dev + 494 test sentences


=======
History
=======

1.1.7 (2018-04-12)
------------------------

* Automatic deploy with travis and pypi
* Fix dependencies hell


1.1.6 (2017-12-26)
------------------

* Add data module to handle data downloading and data preprocessing
* Add many new models: SGDClassifier, XGBoostClassier, FastTextClassifier, CRF
* Add new feature: LanguageBoard
* Automatic continuous integration with travis-ci
* Build docs with readthedocs.org


1.1.5 (2017-12-11)
------------------

* Refactor project to integrate with underthesea experiment

0.1.0 (2017-09-18)
------------------

* First release on PyPI.



