Metadata-Version: 2.1
Name: Chrones
Version: 1.0.2
Summary: Software development tool to visualize runtime statistics about your program and correlate them with its phases
Home-page: https://github.com/jacquev6/Chrones
Author: Vincent Jacques
Author-email: vincent@vincent-jacques.net
License: MIT
Description-Content-Type: text/markdown
License-File: LICENSE

<!--
Copyright 2020-2022 Laurent Cabaret
Copyright 2020-2022 Vincent Jacques
-->

*Chrones* is a software development tool to visualize runtime statistics (CPU percentage, GPU percentage, memory usage, *etc.*) about your program and correlate them with the phases of your program.

It aims at being very simple to use and provide useful information out of the box<!-- @todo(later) *and* at being customizable to your specific use cases -->.

Here is an example of graph produced by *Chrones* about a shell script launching a few executables (see exactly how this image is generated [at the end of this Readme](#code-of-the-example-image)):

![Example](https://github.com/jacquev6/Chrones/raw/v1.0.2/integration-tests/readme-example/report.png)

*Chrones* was sponsored by [Laurent Cabaret](https://cabaretl.pages.centralesupelec.fr/en/publications/) from the [MICS](http://www.mics.centralesupelec.fr/) and written by [Vincent Jacques](https://vincent-jacques.net).

It's licensed under the [MIT license](http://choosealicense.com/licenses/mit/).
Its [documentation and source code](https://github.com/jacquev6/Chrones) are on GitHub.

Questions? Remarks? Bugs? Want to contribute? Open [an issue](https://github.com/jacquev6/Chrones/issues) or [a discussion](https://github.com/jacquev6/Chrones/discussions)!

<!-- @todo(later) Insert paragraph about Chrones' clients? -->

# Conceptual overview

*Chrones* consist of three parts: instrumentation (optional), monitoring and reporting.

The instrumentation part of *Chrones* runs inside your program after you've modified it.
It's used as a library for your programming language.
To use it, you add one-liners to the functions you want to know about.
After that, your program logs insider timing information about these functions.

The monitoring part is a wrapper around your program.
It runs your program as you instruct it to, preserving its access to the standard input and outputs, the environment, and its command-line.
While doing so, it monitors your program's whole process tree and logs resource usage metrics.

The reporting part reads the logs produced by the instrumentation and monitoring, and produces human-readable reports including graphs.

The instrumentation part is completely optional.
You can use the monitoring part on non-instrumented programs,
or even on partially instrumented programs like a shell script calling an instrumented executable and a non-instrumented executable.
The graphs produced by *Chrones*' reporting will just miss information about your program's phases.

We've chosen the command-line as the main user interface for *Chrones*' to allow easy integration into your automated workflows.
<!-- @todo(later) It can also be used as a Python library for advanced use-cases. -->

Please note that *Chrones* currently only works on Linux.
Furthermore, the C++ instrumentation requires g++.
We would gladly accept contributions that extend *Chrones*' usability.

*Chrones*' instrumentation libraries are available for <!-- @todo(later) Python,--> C++ and the shell language.

# Expected performance

The instrumentation part of *Chrones* accurately measures and reports durations down to the millisecond.
Its monitoring part takes samples a few times per second.
No nanoseconds in this project; *Chrones* is well suited for programs that run at least a few seconds.

Overhead introduced by *Chrones* in C++ programs is less than a second per million instrumented blocks.
Don't use it for functions called billions of times.

# Get started

## Install *Chrones*

The monitoring and reporting parts of *Chrones* are distributed as a [Python package on PyPI](https://pypi.org/project/Chrones/).
Install them with `pip install Chrones`.

<details>
<summary>And at the moment that's all you need. <small>(Click the arrow for more information)</small></summary>

The instrumentation parts are distributed in language-specific ways.

The Python version comes with the `Chrones` Python packages you've just installed.

The C++ and shell languages don't really have package managers, so the C++ and shell versions happen to also be distributed within the Python package.

Versions for other languages will be distributed using the appropriate packages managers.
</details>

## (Optional) Instrument your code

### Concepts

The instrumentation libraries are based on the following concepts:

#### Coordinator

The *coordinator* is a single object that centralizes measurements and writes them into a log file.

It also takes care of enabling or disabling instrumentation: the log will be created if and only if it detects it's being run inside *Chrones*' monitoring.
This lets you run your programm outside *Chrones*' monitoring as if it was not instrumented.

#### Chrone

A *chrone* is the main instrumentation tool.
You can think of it as a stopwatch that logs an event when it's started and another event when it's stoped.

Multiple chrones can be nested.
This makes them particularly suitable to instrument [structured code](https://en.wikipedia.org/wiki/Structured_programming) with blocks and functions (*i.e.* the vast majority of modern programs).
From the log of the nested chrones, *Chrones*' reporting is able to reconstruct the evolution of the call stack(s) of the program.

Chrones have three identifying attributes: a *name*, an optional *label* and an optional *index*.
The three of them are used in reports to distinguish between chrones.
Here is their meaning:

- In languages that support it, the name is set automatically from the name of the enclosing function.
In languages that don't, we strongly recommend that you use the same convention: a chrone's name comes from the closest named piece of code.
- It sometimes makes sense to instrument a block inside a function.
The label is here to identify those blocks.
- Finaly, when these blocks are iterations of a loop, you can use the index to distinguish them.

See `simple.cpp` at the end of this Readme for a complete example.

<!-- @todo(later) Later because they don't appear on repport.png, only in summaries. #### Mini-chrone -->

### Language-specific instructions

The *Chrones* instrumentation library is currently available for the following languages:

#### Shell

First, import *Chrones* and initialize the coordinator with:

    source <(chrones instrument shell enable program-name)

where `program-name` is... the name of your program.

You can then use the two functions `chrones_start` and `chrones_stop` to instrument your shell functions:

    function foo {
        chrones_start foo

        # Do something

        chrones_stop
    }

`chrones_start` accepts one mandatory argument: the `name`, and two optional ones: the `label` and `index`.
See their description in the [Concepts](#concepts) section above.

#### C++

First, `#include <chrones.hpp>`.
The header is distributed within *Chrones*' Python package.
You can get is location with `chrones instrument c++ header-location`, that you can pass to the `-I` option of you compiler.
For example, ``g++ -I`chrones instrument c++ header-location` foo.cpp -o foo``.

`chrones.hpp` uses variadic macros with `__VA_OPT__`, so if you need to set your `-std` option, you can use either `gnu++11` or `c++20` or later.

Create the coordinator at global scope, before your `main` function:

    CHRONABLE("program-name")

where `program-name` is... the name of your program.

You can then instrument functions and blocks using the `CHRONE` macro:

    int main() {
        CHRONE();

        {
            CHRONE("loop");
            for (int i = 0; i != 100; ++i) {
                CHRONE("iteration", i);
                // Do something
            }
        }
    }

Then `CHRONE` macro accepts zero to two arguments: the optional label and index. See their description in the [Concepts](#concepts) section above.
In the example above, all three chrones will have the same name, `"int main()"`.
`"loop"` and `"iteration"` will be the respective labels of the last two chrones, and the last chrone will also have an index.

*Chrones*' instrumentation can be statically disabled by passing `-DCHRONES_DISABLED` to the compiler.
In that case, all macros provided by the header will be empty and your code will compile exactly as if it was not using *Chrones*.

Troubleshooting tip: if you get an `undefined reference to chrones::global_coordinator` error, double-check you're linking with the translation unit that calls `CHRONABLE`.

Known limitations:

- `CHRONE` must not be used outside `main`, *e.g.* in constructors and destructors of static variables

<!-- @todo(later) #### Python

First, import *Chrones*' decorator: `from chrones.instumentation import chrone`.

Then, decorate your functions:

    @chrone
    def foo():
        # Do something

You can also instrument blocks that are not functions:

    with chrone("bar"):
        # Do something

@todo(later) Name, label, and index -->

## Run using `chrones run`

Compile your executable(s) if required.
Then launch them using `chrones run -- your_program --with --its --options`,
or `chrones run --monitor-gpu -- your_program` if your code uses an NVidia GPU.

Everything before the `--` is interpreted as options for `chrones run`.
Everything after is passed as-is to your program.
The standard input and output are passed unchanged to your program.
The exit code of `chrones run` is the exit code of `your_program`.

Have a look at `chrones run --help` for its detailed usage.

## Generate report

Run `chrones report` to generate a report in the current directory.

Have a look at `chrones report --help` for its detailed usage.

<!-- @todo(later) ## Use *Chrones* as a library

Out of the box, *Chrones* produces generic reports and graphs, but you can customize them by using *Chrones* as a Python library. -->

# Code of the example image

As a complete example, here is the shell script that the image at the top of this Readme is about (named `example.sh`):

<!-- START example.sh --><!--
    #!/bin/bash

    set -o errexit
    trap 'echo "Error on ${BASH_SOURCE[0]}:$LINENO"' ERR
--><!-- STOP -->
<!-- EXTEND example.sh -->
    source <(chrones instrument shell enable example)


    function waste_time {
      chrones_start waste_time
      sleep 0.5
      chrones_stop
    }

    waste_time

    dd status=none if=/dev/random of=in.dat bs=16M count=1

    chrones_start run-cpu
    ./cpu
    chrones_stop

    waste_time

    chrones_start run-gpu
    ./gpu
    chrones_stop

    waste_time
<!-- STOP -->
<!-- CHMOD+X example.sh -->

And the two executables called by the script:

- `cpu.cpp`:

<!-- START cpu.cpp -->
    #include <time.h>

    #include <chrones.hpp>

    CHRONABLE("cpu");

    void waste_time() {
      CHRONE();

      usleep(500'000);
    }

    void input_and_output() {
      CHRONE();

      char data[4 * 1024 * 1024];

      std::ifstream in("in.dat");

      for (int i = 0; i != 2; ++i) {
        in.read(data, sizeof(data));
        waste_time();
        std::ofstream out("out.dat");
        out.write(data, sizeof(data));
        waste_time();
      }
    }

    void use_cpu(const int repetitions) {
      CHRONE();

      for (int i = 0; i < repetitions; ++i) {
        volatile double x = 3.14;
        for (int j = 0; j != 1'000'000; ++j) {
          x = x * j;
        }
      }
    }

    void use_several_cores() {
      CHRONE();

      #pragma omp parallel for
      for (int i = 0; i != 8; ++i) {
        use_cpu(256 + i * 32);
      }
    }

    int main() {
      CHRONE();

      waste_time();

      input_and_output();

      {
        CHRONE("loop");
        for (int i = 0; i != 2; ++i) {
          CHRONE("iteration", i);

          waste_time();
          use_cpu(256);
        }
      }

      waste_time();

      use_several_cores();
    }
<!-- STOP -->

- `gpu.cu`:

<!-- START gpu.cu -->
    #include <cassert>

    #include <chrones.hpp>

    const int block_size = 1024;
    const int blocks_count = 128;
    const int data_size = blocks_count * block_size;

    CHRONABLE("gpu");

    void waste_time() {
      CHRONE();

      usleep(500'000);
    }

    void transfer_to_device(double* h, double* d) {
      CHRONE();

      for (int i = 0; i != 8'000'000; ++i) {
        cudaMemcpy(h, d, data_size * sizeof(double), cudaMemcpyHostToDevice);
      }
      cudaDeviceSynchronize();
    }

    __global__ void use_gpu_(double* data) {
      const int i = blockIdx.x * block_size + threadIdx.x;
      assert(i < data_size);

      volatile double x = 3.14;
      for (int j = 0; j != 700'000; ++j) {
        x = x * j;
      }
      data[i] *= x;
    }

    void use_gpu(double* data) {
      CHRONE();

      use_gpu_<<<blocks_count, block_size>>>(data);
      cudaDeviceSynchronize();
    }

    void transfer_to_host(double* d, double* h) {
      CHRONE();

      for (int i = 0; i != 8'000'000; ++i) {
        cudaMemcpy(d, h, data_size * sizeof(double), cudaMemcpyDeviceToHost);
      }
      cudaDeviceSynchronize();
    }

    int main() {
      CHRONE();

      waste_time();

      {
        CHRONE("Init CUDA");
        cudaFree(0);
      }

      waste_time();

      double* h = (double*)malloc(data_size * sizeof(double));
      for (int i = 0; i != data_size; ++i) {
        h[i] = i;
      }

      waste_time();

      double* d;
      cudaMalloc(&d, data_size * sizeof(double));

      waste_time();

      transfer_to_device(h, d);

      waste_time();

      use_gpu(d);

      waste_time();

      transfer_to_host(d, h);

      waste_time();

      cudaFree(d);

      waste_time();

      free(h);

      waste_time();
    }
<!-- STOP -->

<!-- @todo(later) Understand why transfers don't show in the report -->

This code is built using `make` and the following `Makefile`:

<!-- START run.sh --><!--
    #!/bin/bash

    set -o errexit
    trap 'echo "Error on ${BASH_SOURCE[0]}:$LINENO"' ERR

    if [[ -z "$CHRONES_DEV_USE_GPU" ]]
    then
      exit
    fi

    rm -f run-results.json example.*.chrones.csv cpu.*.chrones.csv gpu.*.chrones.csv report.png in.dat out.dat


    make
--><!-- STOP -->
<!-- CHMOD+X run.sh -->

<!-- START Makefile -->
    all: cpu gpu

    cpu: cpu.cpp
    	g++ -fopenmp -O3 -I`chrones instrument c++ header-location` cpu.cpp -o cpu

    gpu: gpu.cu
    	nvcc -O3 -I`chrones instrument c++ header-location` gpu.cu -o gpu
<!-- STOP -->
<!-- EXTEND Makefile --><!--

    cpu: Makefile
    gpu: Makefile
--><!-- STOP -->

It's executed like this:

<!-- EXTEND run.sh -->
    OMP_NUM_THREADS=4 chrones run --monitor-gpu -- ./example.sh
<!-- STOP -->

And the report is created like this:

<!-- EXTEND run.sh -->
    chrones report
<!-- STOP -->

# Known limitations

## Impacts of instrumentation

Adding instrumentation to your program will change what's observed by the monitoring:

- data is continuously output to the log file and this is visible in the "I/O" graph of the report
- the log file is also counted in the "Open files" graph
- in C++, an additional thread is launched in your process, visible in the "Threads" graph

## Non-monotonous system clock

*Chrones* does not handle Leap seconds well. But who does, really?

## Multiple GPUs

Machines with more than one GPU are not yet supported.
<!-- @todo(later) Support machines with several GPUs -->

# Developing *Chrones* itself

You'll need a Linux machine with:
- a reasonably recent version of Docker
- a reasonably recent version of Bash

<!-- @todo(later) Support developing on a machine without a GPU. -->
Oh, and for the moment, you need an NVidia GPU, with drivers installed and `nvidia-container-runtime` configured.

To build everything and run all tests:

    ./run-development-cycle.sh

To [bump the version number](semver.org) and publish on PyPI:

    ./publish.sh [patch|minor|major]
