RELATIONAL APPROACH TO MODELING AND IMPLEMENTING SUBTLE ASPECTS OF GRAPH MINING
Data mining aims at discovering important and previously unknown patterns from datasets. Database mining performs mining directly on data stored in Data Base Management Systems. Complex relationships in data can be represented properly using graphs. As a result, graph mining can be used to mine data that have structural components.
The focus of this thesis is to support all aspects of graph mining by enhancing the algorithms previously developed (DB-Subdue) for graph mining using relational DBMS. The enhancements addressed in this thesis include: handling of cycles in the input, handling of overlapping substructures and their effect on compression, development of an MDL (minimum description length) equivalent for the relational approach, and inclusion of inexact graph matching. For some of the above, multiple approaches have been developed and tested. Extensive performance evaluation has been conducted to evaluate the extended algorithms and compare them with the main memory counterpart. Scalability has been addressed by exploring graphs of different sizes and their computation requirements.