Metadata-Version: 2.1
Name: lineflow
Version: 0.1.5
Summary: Framework-Agnostic NLP Data Pipeline in Python
Home-page: https://github.com/yasufumy/lineflow
Author: Yasufumi Taniguchi
Author-email: yasufumi.taniguchi@gmail.com
License: MIT
Description: # lineflow: Framework-Agnostic NLP Data Pipeline in Python
        [![Build Status](https://travis-ci.org/yasufumy/lineflow.svg?branch=master)](https://travis-ci.org/yasufumy/lineflow)
        
        ## Installation
        
        To install lineflow, simply:
        
        ```sh
        $ pip install lineflow
        ```
        
        ## Usage
        
        Load a text dataset and peek items:
        
        ```py
        import lineflow as lf
        
        
        ds = lf.TextDataset('/path/to/dataset')
        
        print(ds.first())  # peek a first item
        print(ds.take(5))  # peek a first 5 items
        print(ds[100])  # random access
        
        ds.map(tokenize)  # apply your own processing line by line (lazy evaluation)
        ```
        
        Use lineflow with [PyTorch](https://pytorch.org/):
        
        ```py
        import lineflow as lf
        from pytorch.utils.data import DataLoader
        
        
        ds = lf.TextDataset('/path/to/dataset').map(tokenize)
        
        loader = DataLoader(ds, batch_size=3, shuffle=True, num_workers=4)
        it = iter(loader)
        print(next(it))
        del it
        ```
        
        Use lineflow with [Keras](https://keras.io/):
        
        ```py
        import math
        
        import lineflow as lf
        from keras.utils import OrderedEnqueuer, Sequence
        
        
        class TextSequence(Sequence):
            def __init__(self, dataset, batch_size):
                self._dataset = dataset
                self._batch_size = batch_size
        
            def __len__(self):
                return int(math.ceil(len(self._dataset)) / float(self._batch_size))
        
            def __getitem__(self, index):
                return self._dataset[index * self._batch_size:
                                     (index + 1) * self._batch_size]
        
        
        ds = lf.TextDataset('/path/to/dataset').map(tokenize)
        sequence = TextSequence(ds, batch_size=3)
        enqueuer = OrderedEnqueuer(sequence, shuffle=True, use_multiprocessing=True)
        enqueuer.start()
        it = enqueuer.get()
        print(next(it))
        enqueuer.stop()
        ```
        
        Use lineflow with [Chainer](https://chainer.org/):
        
        ```py
        import lineflow as lf
        from chainer.iterators import MultiprocessIterator
        
        
        ds = lf.TextSequence('/path/to/dataset').map(tokenize)
        it = MultiprocessIterator(ds, batch_size=3, shuffle=True, n_processes=4)
        print(next(it))
        it.finalize()
        ```
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Description-Content-Type: text/markdown
