- This event has passed.
PhD Colloquium
May 20 @ 11:00 AM - 1:00 PM IST
Title of Thesis: Frequent Episode Mining: Efficient Discovery Algorithms and Significance Analysis
Speaker: Mr. Santhosh B. Gandreti
Date/Time: Monday 20th May 2024
11:00 AM
Venue: MMCR, EE Department
Research Supervisor: Prof. P. S. Sastry
Abstract:
Frequent Pattern Mining is a popular area of data mining aimed at discovering interesting patterns that occur often in a given data. These patterns represent structures encapsulating correlations and dependencies among data elements. This thesis focuses on frequent episode mining which is aimed at discovering temporal patterns known as episodes in sequential data of event sequences.
Episodes are collections of event-types constrained by a partial order. An episode is frequent if its number of occurrences exceeds a user-defined threshold. Techniques for mining episodes employ either Breadth-First Search (BFS) or Depth-First Search (DFS) approaches to navigate the episode space. The talk begins by giving a brief introduction to frequent episode mining and various algorithms dealing with discovery of frequent episodes.
The talk next discusses a novel DFS algorithm for discovering injective general episodes and chain episodes, which are two broad subclasses of episodes. The proposed algorithms are more efficient compared to the state-of-art as demonstrated by empirical results. The talk next considers more complex patterns called episodes with simultaneous events, where the episodes contain multiple event-types at the same time instant. A novel BFS algorithm is presented for discovering serial episodes with simultaneous events. Through simulations on both synthetic and real data sets, the effectiveness and efficiency of the proposed algorithm is demonstrated.
The next part of the talk discusses significance analysis that assesses the statistical relevance of episodes and presents two novel approaches for significance analysis; one for non-overlapped occurrences under Markov-null hypothesis and the other for minimal occurrences under IID-null hypothesis. This analysis helps determine an episode-specific frequency threshold for the episode to be statistically significant, providing a more nuanced understanding of pattern relevance. For both the above mentioned cases, specialized Markov chains capturing the occurrences of interest are derived for a given episode, which help in computing episode-specific thresholds on frequency, to access its significance. Effectiveness of the proposed methods is confirmed through empirical studies.