Command Line Reference

The Samplepath Analysis Toolkit

1 Invocation

This tool provides command line utilities for sample-path analysis of flow-process datasets in CSV form and writes outputs to the local filesystem.

Invoke it on the command line with

flow <command> <csv-file> [options]

2 Commands

3 Analyze Command

What it does:

  1. Parse CLI arguments
  2. Create the output directory structure
  3. Copy input CSV under scenario
  4. Write CLI parameters into the scenario folder
  5. Run the sample-path analysis
  6. Generate charts and/or export files and write them to the output directory.
  7. Print the paths to generated charts and/or export files.

Example

flow analyze events.csv \
  --completed \
  --outlier-iqr 1.5 \
  --lambda-pctl 99 \
  --output-dir charts \
  --scenario weekly_report \
  --clean

4 CLI Options

4.1 Inputs and Outputs

Input Format

The input format is simple. The CSV requires three columns:

Additionally, you may pass other columns. They are ignored for computation (except a class column used for filtering), but preserved in the elements export.

Results and charts are saved to the output directory as follows:

Output Layout

For input events.csv, output is organized as:

<output-dir>/
└── events/
    └── <scenario>/                 # e.g., latest
        ├── input/                  # input snapshots
        ├── exports/                # CSV exports (if enabled)
        ├── core/                   # core metrics & tables
        ├── convergence/            # limit estimates & diagnostics
        ├── convergence/panels/     # multi-panel figures
        ├── stability/panels/       # stability/variance panels
        ├── advanced/               # optional deep-dive charts

Output Configuration

4.2 CSV Parsing

4.3 Data Filters

Row filters

Drop rows from the CSV before running the analysis. Useful for isolating subprocesses in the main file; use with --scenario to save subprocess results.

Outlier Trimming

Remove outliers to assess convergence on the remaining process.

5 Analysis Mode

This option selects between event mode and calendar mode when producing charts and exports.

In event mode (the default), metrics are indexed by events and evaluated at exact event timestamps. In calendar mode, metrics are indexed by calendar boundaries.

Calendar mode metrics are always derived by sub-sampling, not by aggregation. In principle, this behaves as if all metrics are first computed in event mode at the full timestamp resolution of the input data, and calendar mode then selects exact pre-computed values at calendar boundaries for reporting. In practice, values at calendar boundaries are computed by taking definite integrals of the finest-granularity data. In either case, no information is lost in this process.

Both modes present identical metric values at shared timestamps; calendar mode simply reports fewer points. This is fundamentally different from the way flow metrics are computed by metrics tools today.

See the metrics definition section under Exports for details.

--sampling-frequency (default: None)
Enables calendar mode analysis when set. Charts and exported data are indexed to calendar boundaries rather than raw event timestamps.
Accepted values: day, week, month, quarter, year, or pandas aliases like D, W-MON, MS, QS-JAN, YS-JAN.

In calendar mode:

--anchor (default: None)
Specifies the anchor for calendar frequency boundaries.

Only meaningful when --sampling-frequency is set.

5.1 Export Configuration

5.2 Chart Configuration

6 Charts and Exports

This package produces two outputs: CSV files containing flow analysis data and charts visualizing key metrics, exported as png or svg images.

The CSV dataset is the normative output of the package; all charts are derived views and introduce no additional semantics. Treat this dataset as the primary analytical artifact and data contract, and explore it using other tools.

6.1 Output files

When exports are enabled, two CSV files are written under <scenario>/exports/.

The two output files and the metrics they export are described below.

6.2 Flow Metrics CSV

The output filename depends on the analysis mode (see above):

Event mode flow metrics

Each metric is indexed to one or more events occurring at a single timestamp (mathematically, a point process) and represents the cumulative effect of those events on the evolution of flow over the observation window. A key measure of flow accumulation is cumulative presence mass (defined below).

Flow metrics are meaningful for reasoning about flow dynamics only when computed in exact event order and accumulated via integration of presence mass.

Note: This is the only correct definition of flow metrics for an arrival–departure process when the objective is to reason rigorously about flow dynamics.

Conventions

All metrics except \(N(t)\) are cumulative. Whether a definition depends on the instantaneous time \(t\) or the interval length \(T\) is intentional and significant.

Column Description
timestamp Event timestamp \(t\)
element_id Element ID: single ID if unique,
A text string with format A:id;id|D:id;id when multiple elements share the timestamp,
or empty if unresolvable
event_type A (arrival), D (departure), A/D (both), or - (none)
A(T) Cumulative arrivals up to and including the current timestamp t
D(T) Cumulative departures up to and including the current timestamp t
N(t) Net number in process at this timestamp: \[N(t) = A(T) - D(T)\]
H(T) Cumulative presence mass, the area under N(t): \[H(T) = \int_0^T N(t)\,dt\]
L(T) Time-average of presence mass: \[L(T) = \frac{H(T)}{T}\]
Lambda(T) Arrival rate: \[\Lambda(T) = \frac{A(T)}{T}\]
Theta(T) Departure rate: \[\Theta(T) = \frac{D(T)}{T}\]
w(T) Residence time per arrival: \[w(T) = \frac{H(T)}{A(T)}\]
w'(T) Residence time per departure: \[w'(T) = \frac{H(T)}{D(T)}\]
W*(T) Element-wise empirical mean sojourn time for completed items: \[W^*(T) = \text{Avg}(d_i - a_i), \quad d_i \in (0, T]\]

Calendar mode flow metrics

Calendar mode reports the same metrics as event mode, evaluated at fixed calendar boundaries rather than at individual events.

Conventions

Column Description
<frequency> Calendar boundary timestamp \(t\)
A(T) Cumulative arrivals: \(A(T)\)
D(T) Cumulative departures: \(D(T)\)
N(T) Work in process at the boundary: \[N(T) = N(t)\]
H(T) Cumulative presence mass: \[H(T) = \int_0^T N(t)\,dt\]
L(T) Time-average WIP: \[L(T) = \frac{H(T)}{T}\]
Lambda(T) Arrival rate: \[\Lambda(T) = \frac{A(T)}{T}\]
Theta(T) Departure rate: \[\Theta(T) = \frac{D(T)}{T}\]
w(T) Residence time per arrival: \[w(T) = \frac{H(T)}{A(T)}\]
w'(T) Residence time per departure (completed items only): \[w'(T) = \frac{H(T)}{D(T)}\]
W*(T) Element-wise empirical mean sojourn time: \[W^*(T) = \text{Avg}(d_i - a_i), \quad d_i \in (0, T]\]

Lean/Kanban Flow Metrics Mappings

The following table defines the normative mapping between sample path flow metrics and common industry terminology used in Lean/Kanban practice and commercial tooling.

Normative statement

Sample path flow metrics, as defined above, are the canonical and mathematically correct definitions of flow metrics. They are the only definitions that allow rigorous reasoning about flow using Little’s Law in both stable and unstable processes, consistently.

Industry-standard definitions and Lean/Kanban tools commonly diverge from these definitions in the following ways:

  • Metrics are defined over calendar buckets (days, weeks, etc.) rather than event timestamps.
  • Instantaneous WIP and time-average WIP are conflated when averaging instantaneous WIP over calendar buckets before computing the initial cumulative flow diagrams. At this point any causal reasoning is impossible because the connection between events and their impact on flow has already been discarded.
  • While CFDs are computed over the long run (as they should be), flow metrics like cycle time and throughput are computed over short operational windows rather than cumulatively over the same long observation horizon that CFDs are constructed over. Consistency of observation windows is required to reason rigorously about flow dynamics such as stability and convergence, and why the industry standard flow metrics are not helpful in this regard.

Any flow metric that does not preserve event ordering, cumulative integration of presence mass, and long-run convergence properties should be treated as a point-in-time reporting snapshot of unrelated flow statistics over a process, not as a metric suitable for rigorous reasoning about flow dynamics.

The mappings in the table below should be interpreted as terminological correspondences only and do not imply equivalence of computation or mathematical meaning.

Metric Vernacular name Rough Lean / Kanban mapping
A(T) Cumulative arrivals Top line in cumulative flow diagram (lossy)
D(T) Cumulative departures Bottom line in cumulative flow diagram (lossy)
N(t) Instantaneous Work in process WIP: the length of the line between two lines at the same calendar bucket in the CFD
H(T) Cumulative presence mass Not measured - informally recognized, visually, as the area in between the two lines in the CFD
L(T) Time Average of WIP Average WIP at fixed calendar buckets (lossy)
Lambda(T) Cumulative arrival rate Arrival rate (within a short operational window)
Theta(T) Cumulative departure rate Throughput (within a short operational window)
w(T) Average residence time per arrival No equivalent
w'(T) Average residence time per departure No equivalent
W*(T) Empirical average sojourn time Lead Time/Cycle Time etc. depending on the arrival/departure event semantics. Usually measured over a different window compared to Throughput/Arrival rate/WIP

6.3 Element CSV — elements.csv

One row per element, sorted by start_ts ascending.

Column Description
element_id Element identifier (renamed from input id)
start_ts Start timestamp \(a_i\)
end_ts End timestamp \(d_i\) (empty for incomplete items)
input columns All remaining columns from the input CSV
sojourn_time Element sojourn time: \[d_i - a_i\] (NaN if incomplete)
residence_time Element residence time within the observation window

Residence time definition

Let the observation window be \((t_0, t_n]\).

6.4 Charts

A complete reference to the charts produced can be found in The Chart Reference.

7 Esoteric settings

These settings are useful for configuring the visualizations and parameters of specific charts and analyses.

7.1 Lambda Fine Tuning

Sometimes it helps to drop early points in the λ(T) chart so the remainder displays on a more meaningful scale.

7.2 Convergence Thresholds