Metadata-Version: 2.1
Name: ml-logger
Version: 0.1.18
Summary: A print and debugging utility that makes your error printouts look nice
Home-page: https://github.com/episodeyang/ml_logger
Author: Ge Yang
Author-email: yangge1987@gmail.com
License: UNKNOWN
Keywords: ml_logger,vis_serverlogging,debug,debugging
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Requires-Dist: typing
Requires-Dist: numpy
Requires-Dist: termcolor
Requires-Dist: params-proto
Requires-Dist: cloudpickle
Requires-Dist: japronto
Requires-Dist: uvloop (==0.8.1)
Requires-Dist: requests
Requires-Dist: requests-futures
Requires-Dist: hachiko
Requires-Dist: sanic
Requires-Dist: sanic-cors
Requires-Dist: dill

ML-Logger, A Beautiful Remote Logging Utility for Any Python ML Project
=======================================================================

A common pain that comes after getting to launch ML training jobs on AWS
is a lack of a good way to manage and visualize your data. So far, a
common practice is to upload your experiment data to aws s3 or google
cloud buckets. Then one quickly realizes that downloading data from s3
can be slow. s3 does not offer diffsync like gcloud-cli's ``g rsync``.
This makes it hard to sync a large collection of data that is constantly
appended to.

Visualization Dashboard (Preview) :boom:
----------------------------------------

Incoming: A real-time visualization dashboard (and sever!) |ml
visualization dashboard|

An Example Log from ML-Logger
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

So far the best way we have found for organizing experimental data is to
have a centralized instrumentation server. Compared with managing your
data on S3, a centralized instrumentation server makes it much easier to
move experiments around, run analysis that is co-located with your data,
and hosting visualization dashboards on the same machine. To download
data locally, you can use ``sshfs``, ``smba``, ``rsync`` or a variety of
remote disks. All faster than s3.

ML-Logger is the logging utility that allows you to do this. To make
ML\_logger easy to use, we made it so that you can use ml-logger with
zero configuration, logging to your local hard-drive by default. When
the logging directory field
``logger.configure(log_directory= <your directory>)`` is an http end
point, the logger will instantiate a fast, future based logging client
that launches http requests in a separate thread. We optimized the
client so that it won't slow down your training code-block.

API wise, ML-logger makes it easy for you to log textual printouts,
simple scalars, numpy tensors, image tensors, and ``pyplot`` figures.
Because you might also want to read data from the instrumentation
server, we also made it possible to load numpy, pickle, text and binary
files remotely.

In the future, we will start building an integrated dashboard with fast
search, live figure update and markdown-based reporting/dashboarding to
go with ml-logger.

Now give this a try, and profit!

Usage
-----

To **install** ``ml_logger``, do:

.. code-block:: bash

    pip install ml-logger

To kickstart a logging server, run

.. code-block:: bash

    python -m ml_logger.server

In your project files, do:

.. code-block:: python

    from params_proto import cli_parse
    from ml_logger import logger


    @cli_parse
    class Args:
        seed = 1
        D_lr = 5e-4
        G_lr = 1e-4
        Q_lr = 1e-4
        T_lr = 1e-4
        plot_interval = 10
        log_dir = "http://54.71.92.65:8081"
        log_prefix = "https://github.com/episodeyang/ml_logger/blob/master/runs"

    logger.configure(log_directory="http://some.ip.address.com:2000", prefix="your-experiment-prefix!")
    logger.log_params(Args=vars(Args))
    logger.log_file(__file__)


    for epoch in range(10):
        logger.log(step=epoch, D_loss=0.2, G_loss=0.1, mutual_information=0.01)
        logger.log_keyvalue(epoch, 'some string key', 0.0012)
        # when the step index updates, logger flushes all of the key-value pairs to file system/logging server

    logger.flush()

    # Images
    face = scipy.misc.face()
    face_bw = scipy.misc.face(gray=True)
    logger.log_image(index=4, color_image=face, black_white=face_bw)
    image_bw = np.zeros((64, 64, 1))
    image_bw_2 = scipy.misc.face(gray=True)[::4, ::4]

    logger.log_image(i, animation=[face] * 5)

This version of logger also prints out a tabular printout of the data
you are logging to your ``stdout``. - can silence ``stdout`` per key
(per ``logger.log`` call) - can print with color:
``logger.log(timestep, some_key=green(some_data))`` - can print with
custom formatting:
``logger.log(timestep, some_key=green(some_data, percent))`` where
``percent`` - uses the correct ``unix`` table characters (please stop
using ``|`` and ``+``. **Use ``│``, ``┼`` instead**)

A typical print out of this logger look like the following:

.. code-block:: python

    from ml_logger import ML_Logger

    logger = ML_Logger(log_directory=f"/mnt/bucket/deep_Q_learning/{datetime.now(%Y%m%d-%H%M%S.%f):}")

    logger.log_params(G=vars(G), RUN=vars(RUN), Reporting=vars(Reporting))

outputs the following

.. code-block:: text

    ═════════════════════════════════════════════════════
                  G               
    ───────────────────────────────┬─────────────────────
               env_name            │ MountainCar-v0      
                 seed              │ None                
          stochastic_action        │ True                
             conv_params           │ None                
             value_params          │ (64,)               
            use_layer_norm         │ True                
             buffer_size           │ 50000               
          replay_batch_size        │ 32                  
          prioritized_replay       │ True                
                alpha              │ 0.6                 
              beta_start           │ 0.4                 
               beta_end            │ 1.0                 
        prioritized_replay_eps     │ 1e-06               
          grad_norm_clipping       │ 10                  
               double_q            │ True                
             use_dueling           │ False               
         exploration_fraction      │ 0.1                 
              final_eps            │ 0.1                 
             n_timesteps           │ 100000              
            learning_rate          │ 0.001               
                gamma              │ 1.0                 
            learning_start         │ 1000                
            learn_interval         │ 1                   
    target_network_update_interval │ 500                 
    ═══════════════════════════════╧═════════════════════
                 RUN              
    ───────────────────────────────┬─────────────────────
            log_directory          │ /mnt/slab/krypton/machine_learning/ge_dqn/2017-11-20/162048.353909-MountainCar-v0-prioritized_replay(True)
              checkpoint           │ checkpoint.cp       
               log_file            │ output.log          
    ═══════════════════════════════╧═════════════════════
              Reporting           
    ───────────────────────────────┬─────────────────────
         checkpoint_interval       │ 10000               
            reward_average         │ 100                 
            print_interval         │ 10                  
    ═══════════════════════════════╧═════════════════════
    ╒════════════════════╤════════════════════╕
    │      timestep      │        1999        │
    ├────────────────────┼────────────────────┤
    │      episode       │         10         │
    ├────────────────────┼────────────────────┤
    │    total reward    │       -200.0       │
    ├────────────────────┼────────────────────┤
    │ total reward/mean  │       -200.0       │
    ├────────────────────┼────────────────────┤
    │  total reward/max  │       -200.0       │
    ├────────────────────┼────────────────────┤
    │time spent exploring│       82.0%        │
    ├────────────────────┼────────────────────┤
    │    replay beta     │        0.41        │
    ╘════════════════════╧════════════════════╛

.. code-block:: python

    from ml_logger import ML_Logger

    logger = ML_Logger('/mnt/slab/krypton/unitest')
    logger.log(0, some=Color(0.1, 'yellow'))
    logger.log(1, some=Color(0.28571, 'yellow', lambda v: f"{v * 100:.5f}%"))
    logger.log(2, some=Color(0.85, 'yellow', percent))
    logger.log(3, {"some_var/smooth": 10}, some=Color(0.85, 'yellow', percent))
    logger.log(4, some=Color(10, 'yellow'))
    logger.log_histogram(4, td_error_weights=[0, 1, 2, 3, 4, 2, 3, 4, 5])

colored output: (where the values are yellow)

.. code-block:: text

    ╒════════════════════╤════════════════════╕
    │        some        │        0.1         │
    ╘════════════════════╧════════════════════╛
    ╒════════════════════╤════════════════════╕
    │        some        │     28.57100%      │
    ╘════════════════════╧════════════════════╛
    ╒════════════════════╤════════════════════╕
    │        some        │       85.0%        │
    ╘════════════════════╧════════════════════╛
    ╒════════════════════╤════════════════════╕
    │  some var/smooth   │         10         │
    ├────────────────────┼────────────────────┤
    │        some        │       85.0%        │
    ╘════════════════════╧════════════════════╛

TODO:
-----

-  [ ] Integrate with visdom, directly plot locally.

   -  (better to keep it separate, because visdom is shitty.)
   -  ml\_logger does NOT know the full data set. Therefore we should
      not expect it to do the data processing such as taking mean,
      reservoir sampling etc. **Where should this happen though?**
   -  just log to visdom for now. Use the primitive ``plot.ly`` plotting
      inteface.
   -  data: keys/values

.. |ml visualization dashboard| image:: https://github.com/episodeyang/ml_logger/blob/master/figures/ml_visualization_dashboard_preview.png?raw=true


