ASSOCIATION RULE MINING OVER MULTIPLE DATABASES: PARTITIONED AND INCREMENTAL APPROACHES
Database mining is the process of extracting interesting and previously unknown patterns and correlations from data stored in Data Base Management Systems (DBMSs). Association rule mining is the process of discovering items, which tend to occur together in transactions. If the data to be mined were stored as relations in multiple databases, instead of moving data from one database to another, a partitioned approach would be appropriate. Also, incremental addition of data to the data set should not necessitate recomputation of rules for the entire data set.
This thesis focuses on partitioned and incremental approaches to association rule mining for data stored in Relational DBMSs. This thesis proposes a partitioning approach that is very effective for partitioned databases as compared to the main memory partitioned approach. Our approach uses SQL-based K-way join algorithm and its optimizations. A second alternative that trades accuracy for performance is also presented. Our results indicate that, beyond a certain size of data sets, the accuracy is preserved with this approach and results in better performance. The incremental association rule-mining algorithm reduces the task of recomputing the rules each time new data is added to the database. This thesis implements the incremental algorithm using the negative border concept with a number of optimizations. Extensive experiments are performed and results are presented for both partitioned and incremental approaches using IBM DB2/UDB and Oracle 8i.