Metadata-Version: 2.1
Name: ktrain
Version: 0.7.0b6
Summary: ktrain is a lightweight wrapper for Keras to help train neural networks
Home-page: https://github.com/amaiya/ktrain
Author: Arun S. Maiya
Author-email: arun@maiya.net
License: MIT
Description: 
        
        
        ### News and Announcements
        - **2019-12-10:**  
          - ***ktrain*** **v0.7.x is released and now uses TensorFlow Keras** (i.e., `tf.keras`) instead of stand-alone Keras.  If you're using custom Keras models with *ktrain*, you must change all `keras` references to `tensorflow.keras`. That is, don't import Keras like this: `from keras.layers import Dense`. Do this instead:  `from tensorflow.keras.layers import Dense`.  If you mix calls to tf.keras with Keras, you will experience problems.  Supported versions of TensorFlow include 1.14 and 2.0.
        - **2019-11-12:**  
          - *ktrain* v0.6.x is released and includes pre-canned support for [learning from unlabeled or partially labeled text data](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorial-05-learning_from_unlabeled_text_data.ipynb).
        - **Coming Soon**:
          - better support for custom data formats and models
          - ability to train [HuggingFace Transformer models](https://github.com/huggingface/transformers) within *ktrain*
          
        
        
        ----
        
        
        # ktrain
        *ktrain* is a lightweight wrapper for the deep learning library [Keras](https://keras.io/) (and other libraries) to help build, train, and deploy neural networks.  With only a few lines of code, ktrain allows you to easily and quickly:
        
        - estimate an optimal learning rate for your model given your data using a Learning Rate Finder
        - utilize learning rate schedules such as the [triangular policy](https://arxiv.org/abs/1506.01186), the [1cycle policy](https://arxiv.org/abs/1803.09820), and [SGDR](https://arxiv.org/abs/1608.03983) to effectively minimize loss and improve generalization
        - employ fast and easy-to-use pre-canned models for  `text`, `vision`, and `graph` data:
          - `text` data:
             - **Text Classification**: [BERT](https://arxiv.org/abs/1810.04805), [NBSVM](https://www.aclweb.org/anthology/P12-2018), [fastText](https://arxiv.org/abs/1607.01759), GRUs with [pretrained word vectors](https://fasttext.cc/docs/en/english-vectors.html), and other models <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/IMDb-BERT.ipynb)]</sup></sub>
             - **Sequence Labeling**:  [Bidirectional LSTM-CRF](https://arxiv.org/abs/1603.01360) with optional pretrained word embeddings <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorial-06-sequence-tagging.ipynb)]</sup></sub>
             - **Unsupervised Topic Modeling** with [LDA](http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf)  <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/20newsgroups-topic_modeling.ipynb)]</sup></sub>
             - **Document Similarity with One-Class Learning**:  given some documents of interest, find and score new documents that are semantically similar to them using [One-Class Text Classification](https://en.wikipedia.org/wiki/One-class_classification) <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/20newsgroups-document_similarity_scorer.ipynb)]</sup></sub>
             - **Document Recommendation Engine**:  given text from a sample document, recommend documents that are semantically similar to it from a larger corpus  <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/20newsgroups-recommendation_engine.ipynb)]</sup></sub>
          - `vision` data:
            - **image classification** (e.g., [ResNet](https://arxiv.org/abs/1512.03385), [Wide ResNet](https://arxiv.org/abs/1605.07146), [Inception](https://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf)) <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/vision/dogs_vs_cats-ResNet50.ipynb)]</sup></sub>
          - `graph` data:
            - **graph node classification** (e.g., [GraphSAGE](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf)) <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/graphs/pubmed-GraphSAGE.ipynb)]</sup></sub>
        - perform multilingual text classification (e.g., [Chinese Sentiment Analysis with BERT](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/ChineseHotelReviews-BERT.ipynb), [Arabic Sentiment Analysis with NBSVM](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/ArabicHotelReviews-nbsvm.ipynb))
        - load and preprocess text and image data from a variety of formats 
        - inspect data points that were misclassified and [provide explanations](https://eli5.readthedocs.io/en/latest/) to help improve your model
        - leverage a simple prediction API for saving and deploying both models and data-preprocessing steps to make predictions on new raw data
        
        
        ### Tutorials
        Please see the following tutorial notebooks for a guide on how to use *ktrain* on your projects:
        * Tutorial 1:  [Introduction](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorial-01-introduction.ipynb)
        * Tutorial 2:  [Tuning Learning Rates](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorial-02-tuning-learning-rates.ipynb)
        * Tutorial 3: [Image Classification](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorial-03-image-classification.ipynb)
        * Tutorial 4: [Text Classification](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorial-04-text-classification.ipynb)
        * Tutorial 5: [Learning from Unlabeled Text Data](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorial-05-learning_from_unlabeled_text_data.ipynb)
        * Tutorial 6: [Text Sequence Tagging](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorial-06-sequence-tagging.ipynb) for Named Entity Recognition
        * Tutorial 7: [Graph Node Classification](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorial-07-graph-node_classification.ipynb) with Graph Neural Networks
        * Tutorial A1: [Additional tricks](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorial-A1-additional-tricks.ipynb), which covers topics such as previewing data augmentation schemes, inspecting intermediate output of Keras models for debugging, setting global weight decay, and use of built-in and custom callbacks.
        * Tutorial A2: [Explaining Predictions and Misclassifications](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorial-A2-explaining-predictions.ipynb)
        
        
        Some blog tutorials about *ktrain* are shown below:
        
        > [**ktrain: A Lightweight Wrapper for Keras to Help Train Neural Networks**](https://towardsdatascience.com/ktrain-a-lightweight-wrapper-for-keras-to-help-train-neural-networks-82851ba889c) 
        
        
        > [**BERT Text Classification in 3 Lines of Code**](https://towardsdatascience.com/bert-text-classification-in-3-lines-of-code-using-keras-264db7e7a358)  
        
        > [**Explainable AI in Practice**](https://medium.com/@asmaiya/explainable-ai-in-practice-2e5ae2d16dc7) 
        
        
        Using *ktrain* on **Google Colab**?  See [this simple demo of Multiclass Text Classification with BERT](https://colab.research.google.com/drive/1AH3fkKiEqBpVpO5ua00scp7zcHs5IDLK).
        
        
        
        Tasks such as text classification and image classification can be accomplished easily with 
        only a few lines of code.
        
        #### Example: Text Classification of [IMDb Movie Reviews](https://ai.stanford.edu/~amaas/data/sentiment/) Using [BERT](https://arxiv.org/pdf/1810.04805.pdf)
        ```
        import ktrain
        from ktrain import text as txt
        
        # load data
        (x_train, y_train), (x_test, y_test), preproc = txt.texts_from_folder('data/aclImdb', maxlen=500, 
                                                                             preprocess_mode='bert',
                                                                             train_test_names=['train', 'test'],
                                                                             classes=['pos', 'neg'])
        
        # load model
        model = txt.text_classifier('bert', (x_train, y_train), preproc=preproc)
        
        # wrap model and data in ktrain.Learner object
        learner = ktrain.get_learner(model, 
                                     train_data=(x_train, y_train), 
                                     val_data=(x_test, y_test), 
                                     batch_size=6)
        
        # find good learning rate
        learner.lr_find()             # briefly simulate training to find good learning rate
        learner.lr_plot()             # visually identify best learning rate
        
        # train using 1cycle learning rate schedule for 3 epochs
        learner.fit_onecycle(2e-5, 3) 
        ```
        
        
        #### Example: Classifying Images of [Dogs and Cats](https://www.kaggle.com/c/dogs-vs-cats) Using a Pretrained [ResNet50](https://arxiv.org/abs/1512.03385) model
        ```
        import ktrain
        from ktrain import vision as vis
        
        # load data
        (train_data, val_data, preproc) = vis.images_from_folder(
                                                      datadir='data/dogscats',
                                                      data_aug = vis.get_data_aug(horizontal_flip=True),
                                                      train_test_names=['train', 'valid'], 
                                                      target_size=(224,224), color_mode='rgb')
        
        # load model
        model = vis.image_classifier('pretrained_resnet50', train_data, val_data, freeze_layers=80)
        
        # wrap model and data in ktrain.Learner object
        learner = ktrain.get_learner(model=model, train_data=train_data, val_data=val_data, 
                                     workers=8, use_multiprocessing=False, batch_size=64)
        
        # find good learning rate
        learner.lr_find()             # briefly simulate training to find good learning rate
        learner.lr_plot()             # visually identify best learning rate
        
        # train using triangular policy with ModelCheckpoint and implicit ReduceLROnPlateau and EarlyStopping
        learner.autofit(1e-4, checkpoint_folder='/tmp/saved_weights') 
        ```
        
        #### Example: Sequence Labeling for [Named Entity Recognition](https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus/version/2) using a randomly initialized [Bidirectional LSTM CRF](https://arxiv.org/abs/1603.01360) model
        ```
        import ktrain
        from ktrain import text as txt
        
        # load data
        (trn, val, preproc) = txt.entities_from_txt('data/ner_dataset.csv',
                                                    sentence_column='Sentence #',
                                                    word_column='Word',
                                                    tag_column='Tag', 
                                                    data_format='gmb')
        
        # load model
        model = txt.sequence_tagger('bilstm-crf', preproc)
        
        # wrap model and data in ktrain.Learner object
        learner = ktrain.get_learner(model, train_data=trn, val_data=val)
        
        
        # conventional training for 1 epoch using a learning rate of 0.001 (Keras default for Adam optmizer)
        learner.fit(1e-3, 1) 
        ```
        
        
        #### Example: Node Classification on [Cora Citation Graph](https://linqs-data.soe.ucsc.edu/public/lbc/cora.tgz) using a [GraphSAGE](https://arxiv.org/abs/1706.02216) model
        ```
        import ktrain
        from ktrain import graph as gr
        
        # load data with supervision ratio of 10%
        (trn, val, preproc)  = gr.graph_nodes_from_csv(
                                                       'cora.content', # node attributes/labels
                                                       'cora.cites',   # edge list
                                                       sample_size=20, 
                                                       holdout_pct=None, 
                                                       holdout_for_inductive=False,
                                                      train_pct=0.1, sep='\t')
        
        # load model
        model=gr.graph_node_classifier('graphsage', trn)
        
        # wrap model and data in ktrain.Learner object
        learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=64)
        
        
        # find good learning rate
        learner.lr_find(max_epochs=100) # briefly simulate training to find good learning rate
        learner.lr_plot()               # visually identify best learning rate
        
        # train using triangular policy with ModelCheckpoint and implicit ReduceLROnPlateau and EarlyStopping
        learner.autofit(0.01, checkpoint_folder='/tmp/saved_weights')
        ```
        
        
        Additional examples can be found [here](https://github.com/amaiya/ktrain/tree/master/examples).
        
        
        
        ### Installation
        
        Make sure pip is up-to-date with: `pip3 install -U pip`.
        
        1. Ensure Tensorflow 1.14 or TensorFlow 2 [is installed](https://www.tensorflow.org/install/pip?lang=python3) if it is not already
        
        > For GPU: `pip3 install "tensorflow_gpu>=1.14,<=2"`
        
        > For CPU: `pip3 install "tensorflow>=1.14,<=2"`
        
        
        2. Install *ktrain*: `pip3 install ktrain`
        
        The *ktrain* package can be used with TensorFlow versions 1.14 and 2.0.  If using TensorFlow 2.0, *ktrain*
        presently runs in 1.x mode using [tf.compat.v1.disable_v2_behavior](https://www.tensorflow.org/api_docs/python/tf/compat/v1/disable_v2_behavior).  In the future, this will be removed and **only** TensorFlow 2 will be supported.
        
        
        <!--
        ### Requirements
        
        The following software/libraries should be installed:
        
        - [Python 3.6+](https://www.python.org/) (tested on 3.6.7)
        - [Keras](https://keras.io/)  (tested on 2.2.4)
        - [TensorFlow](https://www.tensorflow.org/)  (tested on 1.10.1)
        - [scikit-learn](https://scikit-learn.org/stable/) (tested on 0.20.0)
        - [matplotlib](https://matplotlib.org/) (tested on 3.0.0)
        - [pandas](https://pandas.pydata.org/) (tested on 0.24.2)
        - [keras_bert](https://github.com/CyberZHG/keras-bert/tree/master/keras_bert) 
        - [fastprogress](https://github.com/fastai/fastprogress) 
        -->
        
        
        This code was tested on Ubuntu 18.04 LTS using TensorFlow 1.14 and TensorFlow 2 (Keras version 2.2.4-tf).
        
        ----
        **Creator:  [Arun S. Maiya](http://arun.maiya.net)**
        
        **Email:** arun [at] maiya [dot] net
        
Keywords: keras,deep learning,machine learning
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Description-Content-Type: text/markdown
