DATA-DRIVEN MODELING OF HETEROGENEOUS MULTILAYER NETWORKS AND THEIR COMMUNITY-BASED ANALYSIS USING BIPARTITE GRAPHS
Today, more than ever, data modeling and analysis play a vital role for enterprises in terms of finding actionable business intelligence. Data is being collected
on a large scale from multiple sources hoping they can be leveraged using big data
analysis techniques. However, challenges associated with the analysis of such data
are numerous and depends on the characteristics of the data being collected. In many
real-world applications, data sets are becoming complex as they are characterised by
multiple entity types and multiple features (termed relationships) between entities.
There is a need for an elegant approach to not only model such data but also their
efficient analysis with respect to a given set of analysis objectives.
Traditionally, graphs have been used for modeling data that has structure in
terms of relationships. Single graph models (both simple and attributed) have been
widely used as there are a number of packages for their analysis. However, with
the increased number of entity types and features, it becomes quite cumbersome to
model and difficult (also inefficient) to analyze these complex data sets. Multilayer
networks (or MLNs) have been proposed as an alternative. This thesis addresses elegant modeling and efficient analysis of one type of MLNs called Heterogeneous
Multilayer Networks (or HeMLNs).
This thesis addresses modeling of complex data sets using HeMLNs for a given
set of analysis objectives of a data set using the popular entity-relationship (or ER)
model to meet the analysis objectives. Then it proposes a community-based approach
for analyzing and computing the objectives. For this analysis, a new community definition is used for HeMLNs as it currently available only for single graphs. A decomposition approach is proposed for efficiently computing communities in a HeMLN.
Since a bipartite graph is part of the community computation of HeMLNs, the role
of bipartite graph and algorithms for their use are proposed and elaborated. As the
use of bipartite graphs becomes a matching problem, different types of weight metrics
are proposed for HeMLN community detection.
This thesis has also conducted extensive experimental analysis for the proposed
community computation of HeMLNs using two widely-used data sets: IMDb, an
international movie database and DBLP, a computer science bibliography database.
Experimental analysis show the efficacy of our modeling and efficient computation of
a HeMLN community for analysis.