Skip to main content

2022 | Book

Cohesive Subgraph Search Over Large Heterogeneous Information Networks


About this book

This SpringerBrief provides the first systematic review of the existing works of cohesive subgraph search (CSS) over large heterogeneous information networks (HINs). It also covers the research breakthroughs of this area, including models, algorithms and comparison studies in recent years. This SpringerBrief offers a list of promising future research directions of performing CSS over large HINs.

The authors first classify the existing works of CSS over HINs according to the classic cohesiveness metrics such as core, truss, clique, connectivity, density, etc., and then extensively review the specific models and their corresponding search solutions in each group. Note that since the bipartite network is a special case of HINs, all the models developed for general HINs can be directly applied to bipartite networks, but the models customized for bipartite networks may not be easily extended for other general HINs due to their restricted settings. The authors also analyze and compare these cohesive subgraph models (CSMs) and solutions systematically. Specifically, the authors compare different groups of CSMs and analyze both their similarities and differences, from multiple perspectives such as cohesiveness constraints, shared properties, and computational efficiency. Then, for the CSMs in each group, the authors further analyze and compare their model properties and high-level algorithm ideas.

This SpringerBrief targets researchers, professors, engineers and graduate students, who are working in the areas of graph data management and graph mining. Undergraduate students who are majoring in computer science, databases, data and knowledge engineering, and data science will also want to read this SpringerBrief.

Table of Contents

Chapter 1. Introduction
Recently, with the prevalence of large heterogeneous information networks (HINs) in various real areas, much research effort has been devoted to studying CSS over large HINs, which has found a large number of real applications. In this chapter, we focus on providing the necessary background of CSS over large HINs, and give an introduction to the research field to highlight the popularity and applications in the topic of CSS. More specifically, we first show a list of example HINs and typical applications of CSS over large HINs, then discuss the two key challenges of conducting CSS over large HINs, and finally make a classification of existing works according to the classic cohesiveness metrics.
Yixiang Fang, Kai Wang, Xuemin Lin, Wenjie Zhang
Chapter 2. Preliminaries
To formally introduce the research problem and solutions of CSS over HINs, people often follow some commonly-used notations and models to facilitate the presentation. Before reviewing the specific models and solutions in the following chapters, in this chapter we first formally introduce the data models of HINs and bipartite networks, and then review several typical classic CSMs on homogeneous networks, including k-core, k-truss, k-clique, k-edge-connectivity component (k-ECC), and the densest subgraphs.
Yixiang Fang, Kai Wang, Xuemin Lin, Wenjie Zhang
Chapter 3. CSS on Bipartite Networks
In many real-world applications, relationships between two different types of entities (e.g., user-item people-location and author-paper) are naturally modeled as bipartite networks. When analyzing bipartite networks, CSMs and CSS techniques play an important role in many aspects including network measurement, dense region discovering, and network reinforcement. To meet the high demands in the era of big data, novel CSMs and CSS solutions over bipartite networks have been proposed recently. In this chapter, we extensively review existing studies of CSS over bipartite networks in the literature. Specifically, we discuss the core-, truss-, clique-, connectivity-, and density-based models and the solutions to search subgraphs following these models over bipartite networks.
Yixiang Fang, Kai Wang, Xuemin Lin, Wenjie Zhang
Chapter 4. CSS on Other General HINs
In the era of big data, most of data or informational objects, individual agents, groups, or components are interconnected or interact with each other, forming numerous, large, interconnected, and sophisticated networks, which are often called heterogeneous information networks (HINs) in the literature. For instance, Twitter contains 326 million monthly active users in over 160 countries, and they generate over 500 million daily tweets, including texts, images, videos, events, etc.—all these information naturally forms a huge HIN. Consequently, the sheer volume and the complex structure of HIN data become bottlenecks in querying cohesive subgraphs from large scale HINs. With strong demands from a number of recent real applications, the field of CSS over HINs has drawn a great deal of attention in the last two decades and a number of research breakthroughs have been reported, including novel CSMs and solutions. In this chapter, we extensively review the CSMs and solutions over general HINs that are not customized for bipartite networks, and may have various vertex types and edge types. We first introduce some key concepts on the general HINs, and then thoroughly review five groups of CSMs and solutions that are classified according to the cohesiveness metric.
Yixiang Fang, Kai Wang, Xuemin Lin, Wenjie Zhang
Chapter 5. Comparison Analysis
In the previous two chapters, we have extensively introduced the CSMs and solutions for the bipartite networks and other general HINs, such as core-, truss-, clique-, connectivity-, and density-based models and solutions. These works focus on different types of specific HINs and formulate CSMs in different manners, so a natural question is which CSMs perform better on which graphs? To answer this question, it is desirable to have a comprehensive comparison of these works. In this chapter, we perform an in-depth comparison analysis of all the CSMs and solutions for bipartite networks and other general HINs respectively. We carefully analyze the advantages and disadvantages of these CSMs and solutions, in terms of their computational complexities and application scenarios.
Yixiang Fang, Kai Wang, Xuemin Lin, Wenjie Zhang
Chapter 6. Related Work on CSMs and Solutions
In the literature, the topic of CSS on graphs has received tremendous research attention and it has been extensively studied in the past several decades, and most of existing research works focus on conventional homogeneous networks. However, the models and solutions of these works are highly related to the these of CSS over large HINs. In this chapter, we thoroughly review the five groups of works on CSS on homogeneous networks, which are core-, truss-, clique-, connectivity-, and density-based CSMs and solutions. In addition, we also review the works on HIN clustering and compare it with the earlier version of this book.
Yixiang Fang, Kai Wang, Xuemin Lin, Wenjie Zhang
Chapter 7. Future Work and Conclusion
Although much research effort has been devoted to CSS over large HINs over the past several decades, there are still many issues that are not well addressed, thus there is still much room to perform further study on CSS over large HINs in the future, from the perspectives of effective CSMs, computational efficiency, parameter optimization, tools, etc. In this chapter, we discuss several important future research directions about CSS over HINs, including novel application-driven CSMs, efficient search algorithms, parameter optimization, and an online repository for collecting HIN datasets, tools, and algorithm codes, which can provide researchers some good starting points to work in this area. In addition, we draw a brief conclusion for the book.
Yixiang Fang, Kai Wang, Xuemin Lin, Wenjie Zhang
Cohesive Subgraph Search Over Large Heterogeneous Information Networks
Dr. Yixiang Fang
Dr. Kai Wang
Xuemin Lin
Dr. Wenjie Zhang
Copyright Year
Electronic ISBN
Print ISBN