Skip to main content
Top
Published in: The VLDB Journal 2/2015

01-04-2015 | Regular Paper

I/O efficient: computing SCCs in massive graphs

Authors: Zhiwei Zhang, Jeffrey Xu Yu, Lu Qin, Lijun Chang, Xuemin Lin

Published in: The VLDB Journal | Issue 2/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A strongly connected component (\(\mathsf {SCC}\)) is a maximal subgraph of a directed graph \(G\) in which every pair of nodes is reachable from each other in the \(\mathsf {SCC}\). With such a property, a general directed graph can be represented by a directed acyclic graph (DAG) by contracting every \(\mathsf {SCC}\) of \(G\) to a node in DAG. In many real applications that need graph pattern matching, topological sorting, or reachability query processing, the best way to deal with a general directed graph is to deal with its DAG representation. Therefore, finding all \(\mathsf {SCC}\)s in a directed graph \(G\) is a critical operation. The existing in-memory algorithms based on depth first search (DFS) can find all \(\mathsf {SCC}\)s in linear time with respect to the size of a graph. However, when a graph cannot reside entirely in the main memory, the existing external or semi-external algorithms to find all \(\mathsf {SCC}\)s have limitation to achieve high I/O efficiency. In this paper, we study new I/O-efficient semi-external algorithms to find all \(\mathsf {SCC}\)s for a massive directed graph \(G\) that cannot reside in main memory entirely. To overcome the deficiency of the existing DFS-based semi-external algorithm that heavily relies on a total order, we explore a weak order based on which we investigate new algorithms. We propose a new two-phase algorithm, namely, tree construction and tree search. In the tree construction phase, a spanning tree of \(G\) can be constructed in bounded number of sequential scans of \(G\). In the tree search phase, it needs to sequentially scan the graph once to find all \(\mathsf {SCC}\)s. In addition, we propose a new single-phase algorithm, which combines the tree construction and tree search phases into a single phase, with three new optimization techniques. They are early acceptance, early rejection, and batch processing. By the single-phase algorithm with the new optimization techniques, we can significantly reduce the number of I/Os and the CPU cost. We prove the correctness of the algorithms. We conduct extensive experimental studies using 4 real datasets including a massive real dataset and several synthetic datasets to confirm the I/O efficiency of our approaches.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
3
The up-edge will be used instead forward-cross-edge in the condition (line 8, Algorithm 1).
 
4
There exist in-memory approaches to find all \(\mathsf {SCC}\)s which only need DFS the graph once (e.g., Tarjan algorithm [22]). However, they need additional memory in addition to the memory for holding the graph, which cannot be efficiently used for the limited memory in our problem.
 
Literature
1.
go back to reference Abello, J., Buchsbaum, A.L., Westbrook, J.: A functional approach to external graph algorithms. Algorithmica 32(3), 437–458 (2002) Abello, J., Buchsbaum, A.L., Westbrook, J.: A functional approach to external graph algorithms. Algorithmica 32(3), 437–458 (2002)
2.
go back to reference Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988) Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)
3.
go back to reference Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983) Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
4.
go back to reference Ajwani, D., Dementiev, R., Meyer, U.: A computational study of external-memory bfs algorithms. In: Proceedings of SODA’06 (2006) Ajwani, D., Dementiev, R., Meyer, U.: A computational study of external-memory bfs algorithms. In: Proceedings of SODA’06 (2006)
5.
go back to reference Ajwani, D., Meyer, U.: Algorithmics of Large and Complex Networks, Chapter 1: Design and Engineering of External Memory Traversal Algorithms for General Graphs. Springer, Berlin (2009) Ajwani, D., Meyer, U.: Algorithmics of Large and Complex Networks, Chapter 1: Design and Engineering of External Memory Traversal Algorithms for General Graphs. Springer, Berlin (2009)
6.
go back to reference Ajwani, D., Meyer, U., Osipov, V.: Improved external memory bfs implementation. In: Proceedings of ALENEX’07 (2007) Ajwani, D., Meyer, U., Osipov, V.: Improved external memory bfs implementation. In: Proceedings of ALENEX’07 (2007)
7.
go back to reference Buchsbaum, A.L., Goldwasser, M.H., Venkatasubramanian, S., Westbrook, J.: On external memory graph traversal. In: Proceedings of SODA’00 (2000) Buchsbaum, A.L., Goldwasser, M.H., Venkatasubramanian, S., Westbrook, J.: On external memory graph traversal. In: Proceedings of SODA’00 (2000)
8.
go back to reference Chiang, Y.-J., Goodrich, M.T., Grove, E.F., Tamassia, R., Vengroff, D.E., Vitter, J.S.: External-memory graph algorithms. In: Proceedings of SODA’95 (1995) Chiang, Y.-J., Goodrich, M.T., Grove, E.F., Tamassia, R., Vengroff, D.E., Vitter, J.S.: External-memory graph algorithms. In: Proceedings of SODA’95 (1995)
9.
go back to reference Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms. McGraw-Hill, NY (2001) Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms. McGraw-Hill, NY (2001)
10.
go back to reference Cosgaya-Lozano, A., Zeh, N.: A heuristic strong connectivity algorithm for large graphs. In: Proceedings of SEA’09 (2009) Cosgaya-Lozano, A., Zeh, N.: A heuristic strong connectivity algorithm for large graphs. In: Proceedings of SEA’09 (2009)
11.
go back to reference Dementiev, R., Sanders, P., Schultes, D., Sibeyn, J.F.: Engineering an external memory minimum spanning tree algorithm. In: IFIP TCS (2004) Dementiev, R., Sanders, P., Schultes, D., Sibeyn, J.F.: Engineering an external memory minimum spanning tree algorithm. In: IFIP TCS (2004)
12.
go back to reference Fan, W., Li, J., Ma, S., Wang, H., Wu, Y.: Graph homomorphism revisited for graph matching. PVLDB 3(1), 1161–1172 (2010) Fan, W., Li, J., Ma, S., Wang, H., Wu, Y.: Graph homomorphism revisited for graph matching. PVLDB 3(1), 1161–1172 (2010)
13.
go back to reference Hellings, J., Fletcher, G.H., Haverkort, H.: Efficient external-memory bisimulation on dags. In: Proceedings of SIGMOD’12 (2012) Hellings, J., Fletcher, G.H., Haverkort, H.: Efficient external-memory bisimulation on dags. In: Proceedings of SIGMOD’12 (2012)
14.
go back to reference Kumar, V., Schwabe, E.J.: Improved algorithms and data structures for solving graph problems in external memory. In: Proceedings of SPDP’96 (1996) Kumar, V., Schwabe, E.J.: Improved algorithms and data structures for solving graph problems in external memory. In: Proceedings of SPDP’96 (1996)
15.
go back to reference Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: large-scale graph computation on just a pc. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pp. 31–46, Berkeley, CA, USA. USENIX Association (2012) Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: large-scale graph computation on just a pc. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pp. 31–46, Berkeley, CA, USA. USENIX Association (2012)
16.
go back to reference Mehlhorn, K., Meyer, U.: External-memory breadth-first search with sublinear i/o. In: Proceedings of ESA’02 (2002) Mehlhorn, K., Meyer, U.: External-memory breadth-first search with sublinear i/o. In: Proceedings of ESA’02 (2002)
17.
go back to reference Meyer, U., Osipov, V.: Design and implementation of a practical i/o-efficient shortest paths algorithm. In: Proceedings of ALENEX’09 (2009) Meyer, U., Osipov, V.: Design and implementation of a practical i/o-efficient shortest paths algorithm. In: Proceedings of ALENEX’09 (2009)
18.
go back to reference Meyer, U., Zeh, N.: I/O-efficient undirected shortest paths. In: Proceedings of ESA’03 (2003) Meyer, U., Zeh, N.: I/O-efficient undirected shortest paths. In: Proceedings of ESA’03 (2003)
19.
go back to reference Meyer, U., Zeh, N.: I/O-efficient undirected shortest paths with unbounded edge lengths. In: Proceedings of ESA’06 (2006) Meyer, U., Zeh, N.: I/O-efficient undirected shortest paths with unbounded edge lengths. In: Proceedings of ESA’06 (2006)
20.
go back to reference Sibeyn, J.F.: External connected components. In: Proceedings of SWAT’04 (2004) Sibeyn, J.F.: External connected components. In: Proceedings of SWAT’04 (2004)
21.
go back to reference Sibeyn, J.F., Abello, J., Meyer, U.: Heuristics for semi-external depth first search on directed graphs. In: Proceedings of SPAA’02 (2002) Sibeyn, J.F., Abello, J., Meyer, U.: Heuristics for semi-external depth first search on directed graphs. In: Proceedings of SPAA’02 (2002)
23.
go back to reference Vitter, J.S.: External memory algorithms and data structures. ACM Comput. Surv. 33(2), 209–271 (2001) Vitter, J.S.: External memory algorithms and data structures. ACM Comput. Surv. 33(2), 209–271 (2001)
24.
go back to reference Yildirim, H., Chaoji, V., Zaki, M.J.: Grail: scalable reachability index for large graphs. PVLDB, 3(1), 276–284 (2010) Yildirim, H., Chaoji, V., Zaki, M.J.: Grail: scalable reachability index for large graphs. PVLDB, 3(1), 276–284 (2010)
25.
go back to reference Zhang, Z., Yu, J.X., Qin, L., Chang, L., Lin, X.: I/o efficient: computing sccs in massive graphs. In: Proceedings of SIGMOD’13 (2013) Zhang, Z., Yu, J.X., Qin, L., Chang, L., Lin, X.: I/o efficient: computing sccs in massive graphs. In: Proceedings of SIGMOD’13 (2013)
Metadata
Title
I/O efficient: computing SCCs in massive graphs
Authors
Zhiwei Zhang
Jeffrey Xu Yu
Lu Qin
Lijun Chang
Xuemin Lin
Publication date
01-04-2015
Publisher
Springer Berlin Heidelberg
Published in
The VLDB Journal / Issue 2/2015
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-014-0372-z

Other articles of this Issue 2/2015

The VLDB Journal 2/2015 Go to the issue

Premium Partner