Abstract
Event logs are the main source for business process mining techniques. However, readily available logs are produced only by part of the existing systems, which may not always be part of an investigated environment. Furthermore, logs that are created by a given information system may reflect only parts of the full process, while other parts may span additional systems. We suggest that data generated by communication network traffic that is associated with business processes can fill this gap, both in availability and in span. However, traffic data are technically oriented and noisy, and there is a huge conceptual gap between these data and business meaningful event logs. Considering the above, we set the following aims. First, to assess whether the gap between technical-level traffic data and conceptual-level business activities can be bridged. Once this is established, to automatically recognize business activities within network traffic data, considering that these data hold interleaving activities that are performed in parallel. To address the first aim, we develop a conceptual model of traffic behavior that corresponds to a business activity. We use simulated traffic data annotated by the originating activity and perform an iterative process of abstracting and filtering the data, along with the application of process discovery. As a result, we obtained distinct process models for specific activity types and a generic higher-level model of traffic behavior in a business activity. To address the second aim, relying on the insights gained from the conceptual models, we propose a method utilizing sequence learning to identify activity types, and their boundaries (start and end) within network traffic data. Evaluation shows that the proposed approach has a high precision and recall in classifying packets by activities, even while these activities are performed in parallel to each other and their data are interleaved.
Original language | English |
---|---|
Pages (from-to) | 1827-1854 |
Number of pages | 28 |
Journal | Software and Systems Modeling |
Volume | 22 |
Issue number | 6 |
DOIs | |
State | Published - Dec 2023 |
Bibliographical note
Publisher Copyright:© 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
Keywords
- Activity recognition
- Event abstraction
- Interleaved data
- Network traffic
- Process mining
- Sequence models
ASJC Scopus subject areas
- Software
- Modeling and Simulation