Abstract
Graph analysis is one of the techniques widely used for data analysis. It is used extensively on single graphs. Its ability to capture entities and relationships makes it an attractive data model. Search on graphs, such as finding triangles, cliques, shortest paths, etc., and aggregate analysis, such as communities, substructure, or centrality measures have well-defined algorithms for single graphs. The centrality measure, which is the focus of this thesis, identifies the most important nodes in a graph or network. While there are many centrality measures, the most commonly used ones are degree and betweenness centrality. Algorithms for analyzing these measures are numerous for single graphs.
In addition to graphs, multilayer networks (MLNs) are being used to model complex data sets. MLNs consist of several layers, each being a graph. If there are different types of entities in each layer and inter-layer edges are present, then the network is an example of a Heterogeneous Multilayer Network (HeMLN). Due to the lack of algorithms for HeMLNs, they are currently analyzed using aggregation of HeMLN layers including inter-layer edges into a single graph. An alternative projection-based approach is also used to analyze HeMLNs by transforming it into a single graph.
A decoupling-based framework has been proposed to avoid aggregation or projection and still obtain accurate results. This approach analyses the layers independently and composes the partial results to obtain results for a HeMLN. These algorithms have been shown to produce accurate results and are also efficient. Another advantage of this approach is that the layers can be analyzed in parallel. The composition algorithm produces the results of the entire HeMLN. To the best of our knowledge, there are no algorithms that compute centrality measures directly on HeMLNs. This thesis focuses on developing decoupling-based algorithms for degree and betweenness centrality measures for HeMLNs.
The challenge is to minimize the amount of information retained from each layer for use during composition to maximize accuracy. Also, keep the algorithm more efficient than its single graph counterpart, which is considered as the ground truth. This thesis proposes different heuristics and compares the results with the ground truth and naive algorithm. The proposed heuristics consistently improve the accuracy as compared to the naive algorithm while taking less time than the single graph approach.
Finally, the algorithms proposed in the thesis are tested against both realworld and synthetic data sets with different graph characteristics. This is important to demonstrate the efficacy of heuristics on an arbitrary graph. The results obtained are analyzed in detail to empirically establish the heuristic performance in terms of accuracy, time, and space complexity.