Graph Mining and Querying
This research explores distributed framework (map/reduce) for finding the substructure that can compress the graph best in a very large directed labeled graph. The focus here is to develop graph analysis algorithms/techniques using different paradigms for identifying repetitive subgraphs that satisfy certain characteristics. The issues handled in the preliminary work include: i) developing two partitioned algorithms for frequent substructure generation using the map/reduce paradigm by formulating appropriate key values to facilitate duplicate elimination and isomorphism detection, ii) designing partition management strategies for evaluating their impact. Our preliminary work on directed labeled graphs addresses scalability along with loss-less synchronization, i/o usage and response time behavior in partitioned graphs. For verification purposes we have embedded substructures of different sizes and of different characteristics (small, large, sparse, and dense) in a large synthetically generated graph. Scalability and effectiveness have been established on large real datasets like Twitter, Orkut and LiveJournal.
As a general data structure, graphs have become increasingly important in modeling structural information and their relationships as is demonstrated by applications including chemical informatics, bio-informatics, computer vision, video indexing, text retrieval and web analysis. With the rapid development of social networks, graph databases such as Freebase, and graphs derived from social networks as well as Wiki have grown too large. Even non main-memory based approaches (disk-based, database-oriented, and traditional partitioning) cannot handle these graph sizes in a meaningful way. Thus the onus is now on developing scalable algorithms/approaches to process ever increasing graph sizes with meaningful response times.