Elsevier

Computer Networks

Volume 56, Issue 2, 2 February 2012, Pages 810-819
Computer Networks

RAGE – A rapid graphlet enumerator for large networks

https://doi.org/10.1016/j.comnet.2011.08.019Get rights and content

Abstract

Counting network graphlets (and motifs) was shown to have an important role in studying a wide range of complex networks. However, when the network size is large, as in the case of the Internet topology and WWW graphs, counting the number of graphlets becomes prohibitive for graphlets of size 4 and above. Devising efficient graphlet counting algorithms thus becomes an important goal.

In this paper, we present efficient counting algorithms for 4-node graphlets. We show how to efficiently count the total number of each type of graphlet, and the number of graphlets adjacent to a node. We further present a new algorithm for node position-aware graphlet counting, namely partitioning the graphlet count by the node position in the graphlet. Since our algorithms are based on non-induced graphlet count, we also show how to calculate the count of induced graphlets given the non-induced count.

We implemented our algorithms on a set of both synthetic and real-world graphs. Our evaluation shows that the algorithms are scalable and perform up to 30 times faster than the state-of-the-art. We then apply the algorithms on the Internet Autonomous Systems (AS) graph, and show how fast graphlet counting can be leveraged for efficient and scalable classification of the ASes that comprise the Internet. Finally, we present RAGE, a tool for rapid graphlet enumeration available online.

Introduction

Network graphlets are small connected sub-graphs of a larger network [1]. In a seminal paper by Milo et al. [2], network motifs were defined as interaction patterns (or graphlets) occurring in a network more often than those in randomized networks. Significant motifs were found in various real world networks including protein interaction networks, neurobiological networks, social networks, World Wide Web (WWW) hyper-link networks, and the Internet Autonomous Systems (AS) network [3].

Recently, there is an increased interest in exploring the role of network motifs and graphlets in networking. Feldman and Shavitt [4] suggested that the bi-fan graphlet (a directed 2–2 full bi-partite graph) may indicate existence of “points of presence” (PoPs) in the Internet’s router level network. The distribution of the local number of triangles and the related clustering coefficient can be used to detect the presence of spamming activity in large scale Web graphs [5]. Hales and Arteconi [6] presented results from a motif analysis of networks produced by peer-to-peer protocols, showing that the motif profiles of such networks closely match protein structure networks.

Graphlet degree counting, counting the number of graphlets in which a node participates, was recently suggested as a method to classify nodes in the network into functional classes [7]. The suggestion assumes that the graphlet degree vector of a node portrays its function in the network.

Gonen and Shavitt [8], [20] were the first to suggest efficient algorithms for positional graphlet degree counting, namely partitioning the graphlet count by the node position in the graphlet. However, some of their algorithms only approximate the count with high probability. We suggest here improved algorithms for this task, giving an exact count of all the 4-node graphlets, and with better time complexity than the ones in their publication [8]. We also provide a simple algorithm for counting triangles, based on new-vertex-listing [9] and slightly modified to assist in per-node count. We thus cover with efficient algorithms all graphlets of size 4 and below. This leaves the problem of efficient 5-node counting open, which is usually the largest size graphlet used for classification.

For most network and node classification analysis we relate only to induced graphlets, meaning that a subgraph H(VH, EH) is counted as graphlet g(Vg, Eg) only if there exists an isomorphism f : VH  Vg such that (u, v)  EH iff (f(u), f(v))  Eg. However, the algorithms presented here and in some previous works [8], [20] count non-induced graphlets, namely a subgraph is counted even if additional edges exist between the subgraph nodes ((f(u), f(v))  Eg if (u, v)  EH). Thus, we present simple transformation that given the non-induced graphlet-count calculates the corresponding induced-graphlet count.

Batagelj and Mrvar [10] presented an algorithm which lists all triangles in a graph with time complexity O(dE∣) = O(nE∣), where d is maximum nodal degree in the graph. This result was further improved to O(∣E3/2) [9].

Itzhack et al. [11] gave an algorithm for counting all directed graphlets up to size 4, based on network decomposition via node removal. They claimed a time complexity of O(∣Ed2 log d). Shervashidze et al. [12] provide methodology for counting all undirected graphlets up to size five in bounded degree graphs, mostly by using the intersection between neighborhoods of each node pair of the graph edges.

Wernicke [13] presented the ESU algorithm, that enumerates all motifs of size k in a graph. The ESU algorithm starts with individual nodes in the graph and iteratively adds an additional node from the subgraph’s neighborhood, until reaching subgraphs of size k. Stoica and Prieur [14] extended the ESU algorithm to count the number of position-aware graphlets adjacent to each node in the graph.

Gonen and Shavitt [8] presented algorithms with time complexity O(3e)k·n·|E|·log(1/δ)ϵ2+|E|2+|E|nlogn that approximates the position aware graphlet degree, but only for k length paths, k-cycles and k-cycles with a chord graphlet, where k is at most O(log n). They also count per all nodes all non-induced and undirected graphlets up to size four, in time On|E|log(1/δ)ϵ2+|E|2+|E|nlogn. Gonen et al. [15] gave a sub-linear algorithm for approximating the non-induced star count, for any given star size.

In this paper we present a set of algorithms that count exactly all induced and non-induced position-aware graphlets of up to size four, adjacent to each node in the undirected graph, with time complexity of O(d · E + E2).

Specifically, we present algorithms that count: for each node, all non-induced tailed triangles, 4-node cycles with chord (chordal cycles), and a path of length three graphlets in time complexity of O(d · E∣); Count cliques and non-induced cycles in time complexity of O(d · E + E2). We also present a method to obtain the induced graphlet count from the non-induced graphlet count, for graphlet up to size four. Parts of the techniques presented are similar in spirit to those noted in [12], for global graphlet counting.

Since most real-world complex networks are sparse (i.e. ∣E = O(n)), analyzing the runtime of these algorithms on such networks shows that the runtime bound is much lower than the runtime of the trivial exhaustive search, over all possible edges/nodes.

As an application of our algorithms, we present methods enabling fast graphlet counting to be used for role classification of Internet Autonomous Systems [16], a classification which helps understand the complex ecosystem of the commercial Internet. Our methods improve on current work, by mitigating the need of inferring the relationship between ASes, a task which is known to be hard and inaccurate [17], [18], [19].

Section snippets

Non-induced graphlet count

All algorithms described in the following section assume an undirected simple input graph G(V, E), which is represented by an adjacency list. Furthermore, it is assumed that the graph vertices are labeled by the integers {1, 2,  , n}.1 We denote by N(v) the set of neighbor nodes of v (i.e. N(v) = {u  V∣(v, u)  E}).

A node v is adjacent to graphlet mi if v is a vertex of graphlet mi.

Obtaining induced graphlets

The per-node and global non-induced graphlet count, collected using the algorithms presented in the previous section, can be converted to an induced count using a post-processing technique described in the following section.

Denote by NI(m) the total number of non-induced appearances of graphlet m in G, I(m) the total number of induced appearances of graphlet m in graph G. In addition, for each node v in graph G, we denote NIv(m) by the number of non-induced position aware graphlets for which v

Evaluation

We implemented our algorithms, creating a tool named RAGE 2 and used it for evaluating their performance, scalability and applicability on the Internet Autonomous Systems (AS) graph.

The Internet is comprised of tens of thousands of ASes. The AS-graph is the coarsest view of the Internet, comprised of ASes as nodes. A link between two ASes exists when the two ASes have a peering agreement that allows them to directly exchange

Application example: classification of ASes

There are various types of ASes, often classified by their commercial role. Dhamdhere and Dovrolis [16] described five different types of ASes: Large Transit Providers (LTP) – international ISPs that have a large customer base and cover a wide geographical areas; Small Transit Providers (STP) – regional ISPs that provide both Internet access and transit services; Access or Hosting Providers (AHP) – ISPs which provide Internet access or hosting services both for residential users and for

Conclusion

Counting graphlets and motifs was shown in the last decade to be an important tool in analyzing complex networks in diverse fields. As our data collection tools become better, the networks we need to analyze are growing, and thus the task of graphlet counting becomes more challenging. In this paper we present efficient algorithms for counting graphlets of size 3 and 4, and prove their correctness and efficiency. Finally, we demonstrate the usage of graphlet enumeration for the problem of

Dror Marcus received his B.Sc degree from The Open University of Israel in 2009 (cum laude). He is currently a M.Sc candidate at Tel Aviv University, focusing on the Internet mapping, complex networks and large scale graphs.

References (29)

  • M. Gonen, Y. Shavitt, Approximating the number of network motifs, in: WAW’09: Proceedings of the 6th International...
  • N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, K. Borgwardt, Efficient graphlet kernels for large graph...
  • S. Wernicke

    Efficient detection of network motifs

    IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

    (2006)
  • A. Stoica, C. Prieur, Structure of neighborhoods in a large social network, in: International Conference on...
  • Cited by (0)

    Dror Marcus received his B.Sc degree from The Open University of Israel in 2009 (cum laude). He is currently a M.Sc candidate at Tel Aviv University, focusing on the Internet mapping, complex networks and large scale graphs.

    Yuval Shavitt has a D.Sc. in electrical engineering from the Technion, Haifa, Israel, and is currently a faculty member in the School of Electrical Engineering at Tel-Aviv University, Israel. His research interests include Internet measurements, mapping, and characterization; and data mining peer-to-peer networks.

    This work was partially funded by the Israeli Science Foundation’s center of knowledge grant 1685/07.

    View full text