Metadata-Version: 1.1
Name: bigcode-embeddings
Version: 0.1.2
Summary: Tool generate and visualize embeddings from bigcode
Home-page: https://github.com/tuvistavie/bigcode-tools/tree/master/bigcode-embeddings
Author: Daniel Perez
Author-email: tuvistavie@gmail.com
License: UNKNOWN
Download-URL: https://github.com/tuvistavie/bigcode-tools/archive/master.zip
Description: # bigcode-embeddings
        
        NOTE: data must be generated with [`bigcode-ast-tools`][2] before being able to use
        this tool
        
        `bigcode-embeddings` allows to generate and visualize embeddings for
        AST nodes.
        
        ## Install
        
        This project should be used with Python 3.
        
        To install the package either run
        
        ```
        pip install bigcode-embeddings
        ```
        
        or clone the repository and run
        
        ```
        cd bigcode-embeddings
        pip install -r requirements.txt
        python setup.py install
        ```
        
        NOTE: tensorflow needs to be installed separately.
        
        ## Usage
        
        ### Training embeddings
        
        Training data can be generated using [`bigcode-ast-tools`][2]
        
        Given a `data.txt.gz` generated from a vocabulary of size 30000,
        100D embeddings can be trained using
        
        ```
        ./bin/bigcode-embeddings train -o embeddings/ --vocab-size 30000 --emb-size 100 --l2-value 0.05 --learning-rate 0.01 data.txt.gz
        ```
        
        [Tensorboard][2] can be used to visualize the progress
        
        ```
        tensorboard --logdir embeddings/
        ```
        
        After the first epoch, embeddings visualization becomes available from
        Tensorboard. The vocabulary TSV file generated by `bigcode-ast-tools` can
        be loaded to have labels on the embeddings.
        
        ### Visualizing the embeddings
        
        Trained embeddings can be visualized using the `visualize` subcommand
        If the generated vocabulary file is `vocab.tsv`, the above embeddings
        can be visualized with the following command
        
        ```
        ./bin/data-explorer visualize clusters -m embeddings/embeddings.bin-STEP -l vocab.tsv
        ```
        
        where `STEP` should be the largest value found in the `embeddings/` directory.
        
        The `-i` flag can be passed to generate an interactive plot.
        
        [1]: ../bigcode-ast-tools/README.md
        [2]: https://github.com/tensorflow/tensorboard
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
