Metadata-Version: 2.1
Name: parpar
Version: 0.0.1
Summary: Parallel parser for large files
Home-page: https://pypi.org/project/sil/
Author: Sumner Magruder
Author-email: sumner.magruder@zmnh.uni-hamburg.de
License: UNKNOWN
Description: # About
        ParPar (Parallel Parser) is a light tool which makes it easy to distribute a function across a large file.
        
        ParPar is meant to work on serialized data where some values are highly repeated across records for a given field. e.g.
        
        ```
        a 1 1
        a 1 2
        a 2 3
        a 2 4
        a 3 5
        a 3 6
        b 1 7
        b 1 8
        b 2 9
        b 2 10
        b 3 11
        b 3 12
        ```
        
        although we have 12 records, for column 1 there are only 2 unique values. Likewise, for column 2 there are only 3 unique values. We could break this file up into smaller files under a directory:
        
        ```
        <out-dir>/<col-1-value>/<col-2-value>
        ```
        or vis versa.
        
        # How to use.
        
        1. Start by importing the ParPar class:
        
        ```
        from parpar import ParPar
        ```
        
        2. Initialize an instance:
        
        ```
        ppf = ParPar()
        ```
        
        3. Shard a large file into sub files*:
        
        ```
        ppf.shard(
          <input-file>, <output-directory>,
          <columns>, <delim>, <newline>
        )
        ```
        
        4. Check to make sure the number of records are the same:
        
        ```
        files = ppf.shard_files(<output-directory>)
        records = ppf.sharded_records(files)
        
        from parpar.utils import filelines
        
        print(records == filelines(<input-file>))
        ```
        
        
        5. Map an arbitrary function across all shared files:
        
        ```
        def foo(line, *args, **kwargs):
            pass
        
        args = [1, 2, 3]
        kwargs = {'a': 'x', 'b': 'y'}
        
        ppf.shard_apply(<output-directory>,
          foo, args, kwargs,
          processes=<number-of-processes>
        )
        ```
        
        
        # How to install with conda
        
        ```
        pip install parpar
        
        # remove old copies
        rm -rf /anaconda3/lib/python3.6/site-packages/parpar*
        
        conda skeleton pypi parpar
        conda build parpar
        conda install --use-local parpar
        ```
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.6
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
