A central problem in the area of Process Mining is to obtain a formal model that represents the processes that are conducted in a system. If realized, this simple motivation allows for powerful techniques that can be used to formally analyze and optimize a system, without the need to resort to its semiformal and sometimes inaccurate specification.
In this paper is known as Process Discovery: to obtain a formal model from a set of system executions. The theory of regions is a valuable tool in process discovery: it aims at learning a formal model (Petri nets) from a set of traces. On its genuine form, the theory is applied on an automaton and therefore one should convert the traces into an acyclic automaton in order to apply these techniques. Given that the complexity of the region-based techniques depends on the size of the input automata, revealing the underlying cycles and folding the initial automaton can incur in a significant complexity alleviation of the region-based techniques.
We follow this idea by incorporating region information in the cycle detection algorithm, enabling the identification of complex cycles that cannot be obtained efficiently with state-of-the-art techniques. The experimental results obtained by the devised tool suggest that the techniques presented in this paper are a big step into widening the application of the theory of regions in Process Mining for industrial scenarios. Region-Based Foldings in Process Discovery
The global patterns that can be used to make predictions about the future has been one of the key elements that have brought Data Mining to be one of the most relevant research areas in the last decades. Data mining techniques can be applied naturally on large amount of data like databases or even the Internet, and with the help of other disciplines like statistics or machine learning, can effectively reveal important patterns in many scenarios such as health care, business or transportation. As in data mining, Process Discovery tries to reveal patterns.
However, the patterns aimed by Process Discovery techniques are process models, i.e., formal representations of the processes of a system. Due to its different focus, Process Discovery techniques apply disciplines different from the ones used in data mining, to allow for the derivation of both the statics and the dynamics of a system process
Depending on the emphasis, different dimensions can be considered ranging from social (the identification of communities) to control-flow (the identification of the complex interplay between system’s tasks). In this work we consider the latter: discover a Petri net from a log that is from a set of traces corresponding to executions of a system. The first method to obtain a Petri net from a log was presented.
The theory of regions was initially proposed to solve the synthesis problem: obtain a Petri net that has a behavior equivalent to a given transition system. Three conversions from a language to TS were proposed, namely sequence, multiset, and set.
The main difference between them is how it is decided whether the occurrence of an event in a trace produces a new state in the TS or just introduces an arc to an existing state. Together with these conversions, a number of additional conversions producing smaller TSs by means of abstractions have been proposed in the literature. Besides the sequence and multiset conversions, other conversions have been proposed that can yield smaller TSs at the cost of sacrificing regions.
We use the term abstraction techniques to refer to them. The fundamental difference between all these methods and our proposal is that, in our case, the set of sacrificed regions is controlled considering bounds that are already used by process discovery tools, thus the compression of the TS does not involve a quality reduction.