In the following three subsections, we describe the selected techniques in more detail. We focus mainly on the first four characteristics (1.-4.) since the applicability criteria (4.-8.) do not lend themselves to much interpretation, and add some information about the traversal strategy. The order of the discussed techniques will be chronological, allowing the reader to follow the methodological developments.
4.1 Attribute selection
We subdivide techniques according to whether they select attribute values or not, and the first class of techniques identifies attribute subspaces that are relevant for particular communities but not the values of those attributes, which could however be derived in a post-processing step.
CoPaM Moser et al. (
2009) propose to mine so-called cohesive patterns. A cohesive pattern is a tuple of a set attributes
D and a subgraph
\(G=(V,E)\) that fulfills three criteria: (1)
D satisfies a cohesion function, (2)
G is dense, and (3)
G is connected.
To find patterns, the approach first removes all non-cohesive edges, i.e. edges for which the vertices violate the cohesion function. The resulting connected components are processed independently in an Apriori-like manner, joining edges until they violate the cohesive pattern constraint, i.e. the approach is community-driven. Attribute subspaces are identified via a maximal attribute subspace for a subgraph that is still cohesive. This implies several characteristics of the found patterns: as mentioned above, while attribute subspaces are explicitly selected, attribute values are not. Since edges are joined, it is possible that communities (vertex-)overlap. Furthermore, because attribute subspaces are chosen without recurrence to the full graph or even communities resulting from the same connected community, it is neither guaranteed that they are local, nor that they are discriminative.
GAMER Günnemann et al. (
2010), Günnemann et al. (
2013c) enhance the above principle by taking the possible redundancy of subgraphs into account. While CoPaM reports all maximal dense subgraphs—which might overlap to a high extent—the works by Günnemann et al. (
2010), Günnemann et al. (
2013c) focus on finding a
set of non-redundant dense subgraphs with maximal interestingness. Here, interestingness can be any function taking the density, size, or number of attributes of the subgraph into account. Furthermore, the attributes of the community need to satisfy a cohesion function. To find clusters, a set enumeration tree operating on the set of vertices is exploited. The tree is traversed in a best-first approaches leading to an exact, non-heuristic solution.
EDCAR Günnemannet al. (
2013a) use the same modeling approach as the work above. In contrast, however, they exploit a heuristic search principle, thus leading to much better scalability. More precisely, the set enumeration tree is explored via the GRASP (Greedy Randomized Adaptive Search) principle.
DB-CSC Deviating from the above scenario that the attribute values of a cluster are bounded by a specific interval, Günnemann et al. (
2012), Günnemann et al. (
2011) propose a density-based cluster definition. More precisely, in the selected attribute subspace the cluster needs to follow the well-known DBSCAN (Ester et al.
1996) clustering definition; while in the graph space an extension of k-cores has been proposed. Again, a
set of non-redundant clusters is generated. Since DBSCAN allows to find arbitarly shaped clusters, no specific attribute values selection is provided per cluster. For finding the clusters, an apriori-like search principle combined with fixed-point iteration is exploited: starting with 1-dimensional clusters, higher dimensional clusters are iteratively constructed. Within each subspace, clusters can be detected via a fixpoint iteration. The subspaces are neither contrasted to the overall distribution, nor to other communities.
SSCG Günnemann et al. (
2013b) extend the principle of spectral clustering to find subspace clusters in attributed graphs. Following the idea of subspace clustering, each cluster is associated with an individual set of relevant attributes. The selected attributes subsequently determine the similarity/weight of two adjacent vertices; that is, the affinity matrix used in spectral clustering is no longer static but depends on the selected subspaces. Overall, since neither the subspaces nor the clusters are known, both aspect are learned in a joint fashion by minimizing the so-called normalized subspace cut—an extension of the normalized cut. The approach does not identify local patterns but optimizes a global model. Since solving this optimization problem is NP-hard, the authors propose an approximative alternating optimization scheme.
ConSub Sánchez et al. (
2013) use a Monte Carlo process to generate interval constraints on vertex attributes, which are used to create projected subgraphs. If the number of edges in the subgraph is higher than expected, a
congruent subspace and corresponding subgraph has been found. To derive larger attribute subspaces, the authors propose a bottom-up,
Apriori-like approach, similar to Günnemann et al. (
2010). The authors view their approach rather as dimensionality reduction to make community (outlier) detection more effective.
There are two common threads to the techniques described so far: 1) descriptions drive community discovery, and 2) vertex attributes’ values’ similarity are considered, either via explicit thresholding or via clustering.
OSCom Starting from ego-networks, Du et al. (
2017), Sun et al. (
2018) apply a metric-based greedy strategy for detecting a set of subnetworks based on the respective attributed neighborhood, i.e. the common attributes. After that, subcommunities are extracted, forming an overall supergraph. Finally, global semantic communities are identified on this supergraph.
MIMAG Orthogonal to the above works that mostly consider vertex-attributed graphs, Boden et al. (
2013), Boden et al. (
2012) focuse on edge-attributed graphs. Similar to the work of Günnemann et al. (
2010) they build on extensions of quasi-cliques (i.e.
\(\delta _{int} \ge 0.5\)), now taking multiple graph layers into account and finding descriptions operating on the edge attributes. They propose a joint set enumeration tree to efficiently generate the communities in an informed best first search.
4.2 Attribute-value selection
The second class of techniques identifies both attributes and their relevant values directly. This obviates the need for a post-processing step of the discovered patterns to discover the appropriate values for the description. These techniques in many cases also use descriptions to drive community detection directly in order to establish a mapping between attribute dimensions and induced subgraph. The presented techniques below are somewhat younger than those proposed in the preceding section, and not surprisingly, there are clear connections to existing (local) pattern mining approaches.
SCPM Silva et al. (
2012) binarize attributes, allowing them to treat attribut-value combinations as items, and apply frequent itemset mining to find promising candidates. By projecting the graph on the itemset, certain vertices will be removed, and the remaining connected components can be checked for the satisfaction of a minimum density constraint. By calculating upper bounds on the structural correlation of itemsets, the pruning capabilities of the approach are enhanced. Clearly, overlap is entirely possible for the communities found by this approach. In addition, while frequent patterns have been considered the first instance of local patterns in the literature, there is in fact no locality as such—a frequent pattern can be so frequent that it applies to different sections of the network. The literature on frequent patterns includes quite many examples of interestingness measures that relate the frequency of a pattern to background models (Vreeken and Tatti
2014).
ParaminerLC / MinerLC Soldano and Santini (
2014) take this approach towards the logical conclusion in terms of frequent itemset mining, mining
closed frequent itemsets as candidate descriptions. As in Silva et al. (
2012), the graph is projected and connected components identified. A difference to the older technique is the use of the Galois operator on the
candidate community, refining both community and description further. Both enumeration options, descriptions driving community discovery and communities driving description enumeration, are therefore interleaved.
A follow-up work (Soldano et al.
2015) turns the approach into an iterative one, treating found communities as networks in which sub-communities should be found. The similarity to the preceding approach means that Soldano
et al. also inherit the limitations, such as the lack of true locality, while they also apply a different definition of local (abstract) patterns; essentially, they add the idea of graph abstractions which lead to further constrained subnetworks where communities are identified, as described above. This is implemented in the MinerLC algorithm (as an adaptation of the ParaminerLC algorithm) for undirected but also regarding directed networks (Soldano et al.
2017) and further graph abstractions. If a (strong) constraining graph abstraction constraint is applied (e.g., a
k-core (Seidman
1983) constraint, where
\(k > 1\)), then MinerLC basically focuses on those (locally) induced (constrained) subgraphs, thus advancing on purely frequent pattern based approaches for community detection on attributed networks. There are further extensions, e.g., regarding two-mode attributed networks (Soldano et al.
2019) with according constraints as well.
DCM Instead of starting from the description side, as the approaches discussed above, Pool et al. (
2014) start with communities (as groups of vertices). The space of possible communities is larger than that of (conjunctive) descriptions, which means that they have to use a heuristic approach to find high-scoring ones, as is usual in community detection. Concretely, the approach starts from basic community candidates and greedily adds/removes vertices to improve a community score. Once those candidates stabilize, a pattern mining approach is used to find discriminative conjunctive patterns that predict vertices’ community membership. For each community, corresponding patterns are combined into a
disjunction. This gives
DCM a much richer description language than other methods discussed in this section. Vertices matching the description are included in the community (and non-matching ones removed). Since this will result in changes to the communities, the process is iterated until the community structure remains unchanged.
To ensure interpretability and control redundancy, the top-k communities are selected in a post-processing step, scored by a measure trading off community quality and description complexity, and controlled by a redundancy threshold on the Jaccard-similarity between communities vertices. The use of the discriminative pattern miner results in local patterns, and the redundancy threshold can be used to control community overlap—typically some overlap will be accepted.
Spectral, LDense, PivotGalbrun et al. (
2014) also consider the problem of finding a
set of at most
k communities in a labeled graph, the cumulative densities of which are maximal. Vertices are described by labels, i.e. by words. Since a
bag-of-words shows the same characteristics as an
itemset, the two problem settings are interchangeable. After translating their problem into the
generalized maximum coverage problem and showing guarantees for a greedy algorithm that always adds the community having highest residual density, they propose three different techniques for finding the best community. To control redundancy, edges already included in communities are removed between iterations but can be re-added in later iterations to improve the formed communities.
-
One of the three techniques, Spectral, begins with calculating a similarity matrix between attribute-values, using Jaccard over vertices having the respective attribute values as similarity measure. Using the Laplacian of this matrix, attribute values are ordered according to the fiedler vector, and continuous intervals in this ordering considered to identify candidates for communities. The set of communities found by this approach can be vertex-overlapping but edge-overlap is explicitly excluded. Descriptions are not compared to those of other communities or the background graph.
-
Next, LDense, greedily—i.e. heuristically—adds labels such that the corresponding vertex set has the highest density, until the description becomes too specific and matches no vertices anymore. Among the vertex sets formed during this search process, the densest (and its description) is included into the solution set.
-
The third approach,
Pivot, heuristically forms communities, and after formation greedily constructs the description best matching it. As in Pool et al. (
2014), vertices are then added and removed according to whether they match the description or not.
COMODO Atzmueller et al. (
2016), propose a technique that explicitly aims at identifying local patterns. Inspired by subgroup discovery methods (Atzmueller
2015), their approach exhaustively enumerates conjunctions of attribute-value tests, and calculates (standard) community quality measures such as the modularity (Newman and Girvan
2004), the segregation index (Freeman
1978), or the inverted average out degree fraction (Yang and Leskovec
2012) on the corresponding communities, using upper bounds/optimistic estimates of these measures to aid in pruning. Comodo returns the top-k community patterns, with an optional redundancy check using a minimal improvement filter, e.g., Bayardo et al. (
2000). The use of the community quality measure implies discriminative descriptions, such that a description covering several communities (components) receives a low score.
Atzmueller (
2016) applies the algorithm also to more complex community quality functions for anomaly detection on labeled edges. The measures used to score descriptions compare community densities to that of the entire graph, satisfying the locality property. (Atzmueller et al.
2016; Atzmueller
2016) build on Atzmueller and Mitzlaff (
2010), Atzmueller and Mitzlaff (
2011), which used fewer measures. These papers precede the other works discussed in this section.
MinerLSDAtzmueller et al. (
2018), Atzmueller et al. (
2019) combine central ideas of the discussed COMODO (Atzmueller et al.
2016) and MinerLC (Soldano et al.
2015,
2017) algorithms for explicitly mining closed local patterns into the MinerLSD algorithm. It focuses both on local pattern mining, applying the standard local modularity metric (Newman
2004; Atzmueller et al.
2016), as possible for COMODO. In addition, MinerLSD can utilize graph abstractions which reduce graphs to k-core subgraphs (Soldano et al.
2015) for enabling further graph (interestingness) constraints. Then, local patterns are identified in a similar way as for MinerLC, while the applied community measure (local modularity) also favors discriminative descriptions as for COMODO. In particular, in order to prevent the typical pattern explosion in pattern mining, MinerLSD employs closed patterns. Then, the top-k patterns or those above a certain local modularity threshold are returned.
RoSi Kalofolias et al. (
2019) apply the same approach as Comodo—treat community detection as subgroup discovery, let description enumeration drive discovery, use optimistic estimates—but propose a different, k-core based measure to discover more robust communities.
With the exception of DCM, all techniques in this section have very much in common with each other. SCPM, ParaminerLC, Spectral, LDense, and Pivot all use an itemset representation. While the conjunctions of attribute-value combinations used by Comodo, MinerLSD and RoSi would give them more flexibility in the case of numerical attributes, for discrete attributes these can be translated into items, as SCPM shows.
Most of the methods also let descriptions drive community discovery, although DCM and ParaminerLC interleave the two processes to a certain degree, and Pivot also starts from communities.
Table 3
Algorithmic categorization of the algorithms discussed in Sect.
4.3
Balasubramanyan and Cohen ( 2011) | Block-LDA | \(\times \) | Block model + LDA | V | Discrete |
| – | \(\times \) | Louvain | V | Discrete |
| TTR-LDA-Community model | \(\times \) | Topic model | V | Discrete |
| RELNA | \(\checkmark \) | Generative model | V | Discrete |
| GT Model | \(\checkmark \) | Topic Model | E | Discrete |
Newman and Clauset ( 2016) | – | \(\times \) | Stochastic block model | V | Discrete |
| ASCD | \(\checkmark \) | NMF | V | Discrete |
| SENC | \(\checkmark \) | Topic model | V | Continuous |
| B/T-CME | \(\times \) | Random walk | V | Continuous |
Steinhaeuser and Chavla ( 2008) | | \(\times \) | Clustering | V | Continuous |
| SCI | \(\checkmark \) | NMF | V | Discrete |
| BAGC | \(\times \) | Clustering | V | Discrete |
| CESNA | \(\checkmark \) | Generative model | V | Discrete |
4.3 Attribute-guided graph mining (post-processing possibilities)
There are a number of techniques that employ option three in Sect.
3.1, i.e. that utilize descriptive information for mining attributed graphs but do not explicitly select attributes or attribute values directly. The post-processing necessary would therefore be more extensive than in the case of the methods described in Sect.
4.1. Strictly speaking, most of these methods fall into the class of algorithms only exploiting attribute information to improve community detection that are described by Bothorel et al. (
2015).
They differ from those methods that integrate attribute information via combined similarity functions or by introducing virtual vertices, however. In the former case, inverting the function to derive attribute importance is far from obvious, and in the latter there may be parts of a community that depend not at all on attribute vertices and others that fall apart if one removed these vertices.
Whereas the methods described in Sect.
4.2 explicitly enumerate both attributes and their values, and those in Sect.
4.1 at least return the attributes that need to be processed, the techniques in this section calculate the relative importance of attributes and this information has to be post-processed to derive descriptions. We therefore discuss most of these techniques in less detail, giving more attention to those that demonstrated this kind of post-processing.
We summarize the different methods in Table
3, indicating whether overlapping communities can be found, the used algorithmic technique, whether the method considers vertex or edge attributes, and whether attributes are discrete or continuous.
4.3.1 Explicit post-processing
We start with methods including explicit post-processing options.
GT model We begin with the work of McCallum et al. (
2006), which has several interesting characteristics: (1) this is, to the best of our knowledge, the first such work, (2) they consider attributes on edges, not vertices, and (3) they explicitly post-process their results to retrieve the most relevant attributes. Concretely, they consider edges to be labeled with words, equivalent to items, and employ a topic model taking both labels and group membership into account. They extract the five to eight (depending on experimental setting) most relevant words from the topic model.
Block-LDA Balasubramanyan and Cohen (
2011) combine block models with LDA to estimate both community membership and conditional topic distributions. By Gibbs sampling fifteen terms per community, they recover the most relevant terms.
CESNA Yang et al. (
2013) use a model in which each vertex has community affiliation probabilities. Those affiliation probabilities predict both edges between vertices and attribute values, and the formation process consists of estimating those affiliations in such a way that they align with the edges and attributes observed in the data. Vertices are annotated with words or phrases, by exploiting the estimated conditional attribute weights, the authors extract the top attributes per community.
SENC Revelle et al. (
2015) use a topic model for finding relevant topics for communities, as well as vertex membership probabilities, and employ an EM algorithm to optimize the two. They assume that vertices are described by words but differ from other work in using
term weights (
TFIDF), i.e. switching from an itemset-like setting to one of numerical values. In the experimental evaluation, they present the top-40 terms per community according to learned conditional probabilities.
SCI Wang et al. (
2016) employ non-negative matrix factorization (NMF). Vertices are described by bags-of-words, or itemsets, and the objective function combines topology and attribute similarity, using a trade-off parameter. As a result of their formulation, one of the derived matrices encodes the relationships between communities and attributes, which they exploit to extract the top-10 words.
ASCD The work of Qin et al. (
2018) differs only to a small degree from
SCI, mainly due to a focus on the fact that topology and shared attribute values can
disagree, requiring the ability to fine-tune the trade-off between the two. They also extract the top-10 words.
4.3.2 Post-processing left to the user
Finally, we focus on approaches which do not include explicit post-processing, but leave that to the user for potentially extracting descriptions from the discovered communities. Steinhaeuser and Chavla (
2008) annotate edges with the similarity between vertices’ attribute values, and group them into communities by thresholding those similarity values. By post-processing those communities, one could identify those attributes for which vertices are similar, as well as their values but the descriptions could be rather general. Li et al. (
2010) first form communities using the Girvan-Newman method (Girvan and Newman
2002), and then identify relevant topics using Latent Dirichlet Allocation. Community detection is not informed by descriptions, and communities are not adjusted afterwards, however, meaning that descriptions could be unreliable or non-existent. Xu et al. (
2012) propose building an MAP model over vertex attribute values to cluster vertices. While one could use model values to identify the most relevant attributes for each cluster, this is not an output of the approach. Smith et al. (
2016) use a random-walk based method for identifying communities, and derive weights for attribute values based on their frequency in the network and the visitation frequency of the random walker. Those walks could be used to identify the description corresponding to a community in post-processing. Newman and Clauset (
2016) use a Bayesian modeling technique based on stochastic block models for estimating community allocations including structural and attributive information, however no description is targeted. Baldominos et al. (
2017) find
stereotypes from communities detected using a modularity-optimizing algorithm by weighting labels according to the proportion of vertices in the community that support them. Conversely, Martínez-Seis (
2017) use homophilic principles for obtaining a ranking of the attributes and then only apply those for community detection.