Metadata-Version: 2.1
Name: da4py
Version: 0.0.6
Summary: da4py implements state-of-the-art Process Mining methods over SAT encoding. An Ocaml version is Darksider.
Home-page: https://github.com/BoltMaud/da4py
Author: Boltenhagen Mathilde
Author-email: boltenhagen@lsv.fr
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown

>Author : Boltenhagen Mathilde with Thomas Chatain and Josep Carmona<br>
>Date : 09.2019<br>

## INTRODUCTION 

This project implements **Process Mining** algorithms with  _SAT encodings_ to get optimal results for verification problems.
Boolean formulas are first created, then converted into CNF form and solved with SAT solvers, thanks to  `pysat`.
This librairy used `pm4py` Objects. 

The project is a translation of the Ocaml version `darksider` created by Thomas Chatain and Mathilde Boltenhagen. 

### Scientific papers

- _Encoding Conformance Checking Artefacts in SAT_ by Mathilde Boltenhagen, Thomas Chatain, Josep Carmona <br>
- _Anti-alignments in conformance checking–the dark side of process models_ by Thomas Chatain, Josep Carmona
- _Generalized Alignment-Based Trace Clustering of Process Behavior_ by Mathilde Boltenhagen, Thomas Chatain, Josep Carmona

## ENVIRONNEMENT & INSTALLATION

 `python 3.7.x `

 Simply run : 
 `pip install da4py`

 (https://pypi.org/project/da4py/0.0.1/)


## USAGE

The librairy uses pm4py. 

```python
pm4py.objects.petri import importer
pm4py.objects.log.importer.xes import factory as xes_importer
from da4py.src.main.conformanceArtefacts import ConformanceArtefacts  
# get the data with pm4py 
model, m0, mf = importer.pnml.import_net('<PATH_TO_MODEL>')
traces = xes_importer.import_log('<PATH_TO_LOG>')
```
### Anti-alignment
> Formal definition : 
> Given a finite collection $L$ of log traces and a model $N$, an anti-alignment is a run $u \in Runs(N)$ which maximizes its distance $\min_{\sigma \in L} dist(\sigma,u)$ to the log. 



This launches the main module. This object, the model and the traces must be reloaded for each experimentation. This is an issue that will be fix soon. 
```python
artefacts = ConformanceArtefacts()
```
We can to set the size of the anti-alignment we want (usefull for prefix) : 
```python
artefacts.setSize_of_run(10)
```
For execution times or memory problems, we can set the maximum number of difference that will be tried. 
```python
artefacts.setMax_d(10)
```

Two types of distances are available : 
- Hamming distance
- Edit distance

```python
artefacts.setDistance_type("edit")
```
Then an anti-alignment can be found by running : 
```python
artefacts.antiAlignment(model,m0,mf,traces)
print(artefacts.getRun())
print(artefacts.getTracesWithDistances())
```

### Precision 
Then we can compute precision :
```python
print(artefacts.getPrecision())
```

### Other features 

One can add silent transition label that will not cost in the distances :
```python
artefacts.setSilentLabel("tau")
```

We can also compute sum instead of min :
```python
artefacts.setOptimizeMin(False)
```

### Multi-alignment
The same features (not precision) also work for multi-alignment: 
```python
model, m0, mf = importer.pnml.import_net('<PATH_TO_MODEL>')
traces = xes_importer.import_log('<PATH_TO_LOG>')
artefacts = ConformanceArtefacts()
artefacts.setSilentLabel("tau")
artefacts.setDistance_type("hamming")
artefacts.setOptimizeMin(True)
artefacts.setSize_of_run(10)
artefacts.setMax_d(10)

# run a multi-Alignment
artefacts.multiAlignment(model,m0,mf,traces)
print(artefacts.getRun())
print(artefacts.getTracesWithDistances())
```

## AMSTC 

AMSTC is a trace clustering method that allows one to extract subnet centroids from a process model. The input is then 
a log and a model and it outputs a set of subnets and associated clustered traces. The method is implemented in SAT but a sampling method allows to run large logs.
```python
# process model
model, m0, mf = importer.pnml.import_net('examples/medium/model2.pnml')

# log traces
traces = xes_importer.import_log('examples/medium/model2.xes')

# sampleSize : number of traces that are used in the sampling method
sampleSize= 5 

# sizeOfRun : maximal length requested to compute alignment 
sizeOfRun = 8

# maxNbC : maximal number of transitions per cluster to avoid to get a unique centroid
maxNbC = 5

# m : number of cluster that will be searching at each AMSTC of the sampling method. Understand that more than m cluster can 
be returned. 
m = 2

# maxCounter : as this is a sampling method, maxCounter is the number of fails of AMSTC before the sampling method stops
# silent_label : every transition that contains this string will not cost in alignment
clustering=samplingVariantsForAmstc(net,m0,mf,log,sampleSize,sizeOfRun,maxD,maxNbC,m,maxCounter=1,silent_label="tau")
```

The clustering can then be used like : 
```python
from pm4py.visualization.petrinet import factory as vizu

for (centroid, traces) in clustering:
    if type(centroid) is tuple:
        net, m0,mf=centroid
        vizu.apply(net, m0, mf).view()
        print(traces)
```

## SAT Encoding & Formula Shapes
The tool first constructs SAT formulas using operator classes AND and OR of da4py.utils.formulas.py. Those formulas are fully described in the published related papers. 
```
AND( [], [], 
	AND( [m_ip [0, 0]], [m_ip [0, 1]], 
		AND( [], [], 
			OR( [], [], 
				AND( [tau_it [1, 0]], [tau_it [1, 1], tau_it [1, 2]], ) 
				AND( [tau_it [1, 1]], [tau_it [1, 0], tau_it [1, 2]], ) 
				AND( [tau_it [1, 2]], [tau_it [1, 0], tau_it [1, 1]], )) 
			OR( [], [tau_it [1, 0]], 
				AND( [], [], 
					OR( [], [], 
						AND( [m_ip [1, 0], m_ip [0, 0]], [], ) 
						AND( [], [m_ip [1, 0], m_ip [0, 0]], )) 
					AND( [m_ip [1, 1], m_ip [0, 1]], [], ))) 
                    ...
```
Then, the formula is translated to a WCNF form which is solved with `pysat` library.
```
[[2], [-1], [7, -82], [-8, -82], [-9, -82], [8, -83], [82, 83, 84], [3, -86]...]
```


## ACKNOWLEDGEMENT 

Affiliations : LSV, CNRS, ENS Paris-Saclay, Inria, Université Paris-Saclay


