MLN-SUBDUE: DECOUPLING APPROACH-BASED SUBSTRUCTURE DISCOVERY IN MULTILAYER NETWORKS (MLNS)
Substructure discovery is well-researched for single graphs (both simple and
attribute) as it is an important component of knowledge discovery for many applications such as finding the core substructure in a protein, important concept in a large
graph, etc. However, multilayer networks or MLNs (instead of attribute graphs) have
been shown to be better for modeling complex data sets that have multiple entity
and feature types. This model provides more clarity on semantics of data sets as
well as the ability to use an arbitrary subset of layers for analysis. However, the
challenge is that many algorithms such as community and centrality detection as well
as substructure discovery need to be extended to MLN representation.
With the representation of complex data sets as MLNs brings new challenges
in terms of finding substructures in MLNs or a subset of MLNs. A naive approach
would be to collapse (or aggregate) all (or a subset of) layers into a single attribute
graph and use extant algorithms. There are a number of substructure discovery algorithms ranging from memory-based, disk-based, SQL-based, and partitioned using
While substructure discovery has been widely used for the analysis of single
networks, attribute graphs, and forests, the development of an efficient substructure
discovery methods for multilayer networks without aggregation is currently not available. This thesis proposes a new decoupling approach-based substructure discovery
algorithm for homogeneous MLNs (or HoMLNs). HoMLNs are MLNs where each
layer has the same set of nodes but different connectivity in each layer.
In this dissertation, we propose a decoupling approach-based algorithm for
HoMLNs where aggregation is not needed. Further, the algorithm has been implemented using the Map/Reduce framework in order to handle arbitrary number of
layers and improving the response time through parallelism. Each layer is processed
individually/separately in parallel, but the substructures generated for each layer
are combined after each iteration to identify substructures across layers (or MLN).
The focus is on correctness of the algorithm and resource utilization with respect to
number of layers. The proposed algorithm is validated analytically as well through
extensive experimental analysis on large real-world and synthetic graphs.