Multi Source, Multi Feature Data Analysis using Multilayer Networks
As a result of our increased ability to collect data from different sources, many real-world datasets are increasingly becoming multi-featured and these features can also be of different types. Examples of such multi-featured data include different modes of interactions among people (Facebook, Twitter, LinkedIn, telephone calls ...), traffic accidents associated with diverse factors (speed, light conditions, weather ...) or inter-connectivity among different networks like city-based airline, residence and collaboration networks. A multiplex is a network of networks that is ideal for analyzing these types of datasets by modeling each aspect separately into layers. Based on the type of entities, multilayer networks have been differentiated into two types - homogenous (multiple interactions among same type of entities) and heterogeneous (multiple interactions among different types of entities).
Multiplexes have computational advantages over monoplexes (single graphs) for modeling and analyzing multi-source, multi-feature data.
However, this multiplex-based modeling brings with itself a new set of challenges holistically analyzing such data. The literature focusses on overall multiplex diagnostics by either considering each layer individually or aggregating all the layers leading to loss of information, thus missing out on intricate details due to absence of ways that can analyze these multiplex-based representations in between these two extremes.
We are aligned towards building a suite of efficient composition-based algorithms for flexible analysis of multilayer networks.
We have proposed the combination of homogeneous multilayer layers using Boolean operations - AND, OR, NOT. Clearly, for n features, a total of 2n
layer combinations are possible causing the overall analysis process to
become exponential with respect to both time and space. In this regard,
we propsed an approach to re-create the communities of any AND-composed layer by just using the layer-wise communities, which has been empirically shown to provide an overall saving of more than 40% in computation time.
We have been able to propose four heuristics based on degree and closeness centrality to estimate the hubs (or central entities) of any AND-composed layers, which when applied to real-world, multi-featured datasets such as IMDb and traffic accidents, provide an average accuracy of more than 70-80% while reducing the overall computational time by at least 30%.
Currently, we are in the process of formulating techniques to re-construct the communities and hubs for OR-composed and NOT-composed layers. Moreover, we are working on obtaining a confidence interval on the accuracy and savings in computational time based on layer characteristics. In future, we plan to extend this work to weighted and directed edges as well. The development of community and hub reconstruction techniques using the primitive Boolean operators, will allow us to flexibly analyze other possible combinations of k layers like XOR, NOR, NAND and so on.
With respect to heterogeneous multiplexes the area of graph mining and querying has not been explored much. Further, the bipartite links between the layers makes the composition non-trivial. We are addressing the various challenges involved in mining frequent patterns and querying across inter-linked multiplex layers. Further, there is a need to quantify and develop effecient methods for generating the multilayer hubs and communities.
- NSF Grant (7/2020 to 6/2023), ~1M total: Collaborative Research: SHF: Medium: NetSplicer: Scalable Decoupling-Based Algorithms for Multilayer Network Analysis (UTA: Sharma Chakravarthy, UNT: Sanjukta Bhowmik, PSU: Kamesh Madduri)
- The Swedish Foundation for International Cooperation in Research and Higher Education (STINT) has funded a project - "Multilayer Networks Approach to Community Detection in Heterogeneous Personal Data" for 1 year (2018) (Lili Jiang - PI from Sweden, Sharma Chakravrthy - PI from USA)
- Sharma Chakravarthy, Abhishek Santra, Kanthi Sannappa Komar: Why Multilayer Networks Instead of Simple Graphs? Modeling Effectiveness and Analysis Flexibility and Efficiency! BDA 2019: 227-244
- Xuan-Son Vu, Abhishek Santra, Lili Jiang, Sharma Chakravarthy: Generic Multilayer Network Data Analysis with the Fusion of Content and Structure. CICLing 2019
- Abhishek Santra, Sanjukta Bhowmick: Holistic Analysis of Multi-source, Multi-feature Data: Modeling and Computation Challenges. BDA 2017: 59-68 [PDF Link1] [PDF Link 2] [PPT]
- Abhishek Santra, Sanjukta Bhowmick and Sharma Chakravarthy, HUBify: Efficient Estimation of Central Entities across Multiplex Layer Compositions, IEEE ICDM Workshop on Data Mining in Networks (ICDM Workshops 2017) [PDF Link1] [PDF Link2] [PPT]
- Abhishek Santra, Sanjukta Bhowmick, Sharma Chakravarthy:
Efficient Community Re-creation in Multilayer Networks Using Boolean Operations. ICCS 2017: 58-67 [PDF]
- Abhishek Santra, Kanthi Sannappa Komar, Sanjukta Bhowmick and Sharma Chakravarthy: Making a Case for MLNs for Data-Driven Analysis: Modeling, Efficiency, and Versatility. August 2019 [PDF]
- Abhishek Santra, Kanthi Sannappa Komar, Sanjukta Bhowmick and Sharma Chakravarthy: An Efficient Framework for Computing StructureAnd Semantics-Preserving Community in a Heterogeneous Multilayer Network. June 2019 [PDF]
- Abhishek Santra, Sanjukta Bhowmick and Sharma Chakravarthy: Efficient Community Detection in Boolean Composed Multiplex Networks. June 2019 [PDF]
- Abhishek Santra, Sanjukta Bhowmick, Sharma Chakravarthy: Scalable Holistic Analysis of Multi-Source, Data-Intensive Problems Using Multilayered Networks. CoRR abs/1611.01546 (2016) [PDF]