Sagar Hasmukh Savla

Mr. Sagar Hasmukh Savla

Email id: sagarsavla[at]yahoo[dot]com

Graduation Year: December 2006

Dissertation:

Master's Thesis:

EFFICIENT MAIN MEMORY ALGORITHMS FOR SIGNIFICANT INTERVAL AND FREQUENT EPISODE DISCOVERY

Abstract

There is a considerable research on sequential mining of time-series data. Sensor-based applications such as MavHome require prediction of events for automating the environment using time-series data collected over a period of time. In these applications, it is important to predict tight and accurate intervals of interest to effectively automate the application. Also, detection of frequent patterns is needed for the automation of sequence of happenings. Although, there is a considerable body of work on sequential mining of transactional data, most of them deal with time point data and make several iterations over the entire data set for discovering frequently occurring patterns. An alternative approach consisting of three phases has been proposed for detecting significant intervals and frequent episodes. In the first phase, time-series data is folded over a periodicity (day, week, etc.) using which intervals are formed. The advantage of this approach is that the data set is compressed substantially thereby reducing the size of input used and hence the computation. Also, each event/device can be processed individually allowing for parallel computation of individual events. Significant intervals that satisfy the criteria of minimum confidence and maximum interval-length specified by the user are discovered from this compressed interval data. In this thesis, we present a new single pass main memory algorithm (OnePass-SI algorithm) for detecting significant intervals. Unlike its counterparts, OnePass-SI algorithm does not follow the classic apriori style to discover significant intervals. While analyzing the OnePass-SI algorithm, we shall discuss its characteristics, complexity, scalability issues and its advantages over other algorithms. We shall also compare the performance of our algorithm with previously developed SID and SQL-based algorithms. For the second phase, we propose a frequent episode discovery (FED) algorithm (OnePass-FED algorithm). The OnePass-FED algorithm proposed in this thesis is a main memory algorithm. The OnePass-FED algorithm works on the significant intervals discovered in the first phase to discover interesting episodes in a single pass as compared to the apriori class of algorithms. This approach is significantly more efficient and scales well as compared to traditional mining algorithms. Extensive experimental analysis establishes its efficiency and scalability. We also compare the performance of this algorithm with our previously developed SQL-based Hybrid Apriori algorithm. The third and final phase is frequent episode validation phase. This phase validates the frequent episodes generated by the second phase if they are to be interpreted on a finer granularity (example, week using daily periodicity). For the third phase, we present a BatchValidation algorithm. The steps carried out for frequent episode validation are redesigned in the BatchValidation algorithm to give a much improved performance. Experiments carried out to compare the performance of the BatchValidation algorithm with previously developed Naive algorithm showed that, the BatchValidation algorithm is significantly faster.

Department of Computer Science and Engineering