Skip to main content
Top
Published in: International Journal of Parallel Programming 1/2020

15-11-2019

CSMqGraph: Coarse-Grained and Multi-external-storage Multi-queue I/O Management for Graph Computing

Authors: Shuo Chen, Zhan Shi, Dan Feng, Shang Liu, Fang Wang, Lei Yang, Ruili Yu

Published in: International Journal of Parallel Programming | Issue 1/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

As graphs continue growing, external storage graph processing systems serve as a promising alternative to distributed in-memory solutions for low cost and high scalability. To obtain high I/O throughput, these systems usually use multiple external storage devices. They adopt the operating system I/O management method based on striped volume, resulting in unsatisfactory performance, such as low sequential bandwidth utilization of each external storage device, limited I/O parallelism and expensive management overhead. In this paper, we analyzed the problems of the operating system I/O management method based on striped volume. Then we designed CSMqGraph, a graph processing system adopts coarse-grained striping method matching sequential large I/O to fully utilize the maximum sequential bandwidth of each external storage device and an I/O management strategy based on multi-external-storage multi-queue making I/O threads dedicated to each external storage device to further improve I/O throughput and fully exploit the parallelism of multiple external storage devices. For different graph algorithms and datasets, our evaluation shows that CSMqGraph consistently outperforms state-of-the-art engines GridGraph by up to 40%, and has better I/O scalability.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Chi, Y., Dai, G., Wang, Y., Sun, G., Li, G., Yang, H.: Nxgraph: an efficient graph processing system on a single machine. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp. 409–420. IEEE (2016) Chi, Y., Dai, G., Wang, Y., Sun, G., Li, G., Yang, H.: Nxgraph: an efficient graph processing system on a single machine. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp. 409–420. IEEE (2016)
2.
go back to reference Coffman, T., Greenblatt, S., Marcus, S.: Graph-based technologies for intelligence analysis. Commun. ACM 47(3), 45–47 (2004)CrossRef Coffman, T., Greenblatt, S., Marcus, S.: Graph-based technologies for intelligence analysis. Commun. ACM 47(3), 45–47 (2004)CrossRef
3.
go back to reference Del Sol, A., Fujihashi, H., O’Meara, P.: Topology of small-world networks of protein-protein complex structures. Bioinformatics 21(8), 1311–1315 (2005)CrossRef Del Sol, A., Fujihashi, H., O’Meara, P.: Topology of small-world networks of protein-protein complex structures. Bioinformatics 21(8), 1311–1315 (2005)CrossRef
4.
go back to reference Doerr, C., Blenn, N.: Metric convergence in social network sampling. In: Proceedings of the 5th ACM Workshop on HotPlanet, pp. 45–50. ACM (2013) Doerr, C., Blenn, N.: Metric convergence in social network sampling. In: Proceedings of the 5th ACM Workshop on HotPlanet, pp. 45–50. ACM (2013)
5.
go back to reference Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Presented as part of the 10th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 12), pp. 17–30 (2012) Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Presented as part of the 10th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 12), pp. 17–30 (2012)
6.
go back to reference Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: graph processing in a distributed dataflow framework. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 14), pp. 599–613 (2014) Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: graph processing in a distributed dataflow framework. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 14), pp. 599–613 (2014)
7.
go back to reference Huberman, B.A., Adamic, L.A.: Internet: growth dynamics of the world-wide web. Nature 401(6749), 131 (1999)CrossRef Huberman, B.A., Adamic, L.A.: Internet: growth dynamics of the world-wide web. Nature 401(6749), 131 (1999)CrossRef
8.
go back to reference Jeong, H., Mason, S.P., Barabási, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41 (2001)CrossRef Jeong, H., Mason, S.P., Barabási, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41 (2001)CrossRef
9.
go back to reference Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., Barabási, A.L.: The large-scale organization of metabolic networks. Nature 407(6804), 651 (2000)CrossRef Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., Barabási, A.L.: The large-scale organization of metabolic networks. Nature 407(6804), 651 (2000)CrossRef
10.
go back to reference Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, pp. 229–238. Washington, DC, USA (2009) Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, pp. 229–238. Washington, DC, USA (2009)
11.
go back to reference Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., Kalnis, P.: Mizan: a system for dynamic load balancing in large-scale graph processing. In: Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys ’13, pp. 169–182. ACM, New York, NY, USA (2013). https://doi.org/10.1145/2465351.2465369 Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., Kalnis, P.: Mizan: a system for dynamic load balancing in large-scale graph processing. In: Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys ’13, pp. 169–182. ACM, New York, NY, USA (2013). https://​doi.​org/​10.​1145/​2465351.​2465369
12.
go back to reference Kumar, P., Huang, H.H.: G-store: high-performance graph store for trillion-edge processing. In: SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 830–841. IEEE (2016) Kumar, P., Huang, H.H.: G-store: high-performance graph store for trillion-edge processing. In: SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 830–841. IEEE (2016)
13.
go back to reference Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600. ACM (2010) Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600. ACM (2010)
14.
go back to reference Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: large-scale graph computation on just a \(\{\)PC\(\}\). In: Presented as part of the 10th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 12), pp. 31–46 (2012) Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: large-scale graph computation on just a \(\{\)PC\(\}\). In: Presented as part of the 10th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 12), pp. 31–46 (2012)
15.
go back to reference Lee, E.K., Katz, R.H.: An analytic performance model of disk arrays. In: ACM SIGMETRICS Performance Evaluation Review, vol. 21, pp. 98–109. ACM (1993) Lee, E.K., Katz, R.H.: An analytic performance model of disk arrays. In: ACM SIGMETRICS Performance Evaluation Review, vol. 21, pp. 98–109. ACM (1993)
17.
go back to reference Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146. ACM (2010) Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146. ACM (2010)
18.
19.
go back to reference Randles, M., Lamb, D., Taleb-Bendiab, A.: A comparative study into distributed load balancing algorithms for cloud computing. In: 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops, pp. 551–556. IEEE (2010) Randles, M., Lamb, D., Taleb-Bendiab, A.: A comparative study into distributed load balancing algorithms for cloud computing. In: 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops, pp. 551–556. IEEE (2010)
20.
go back to reference Roy, A., Bindschaedler, L., Malicevic, J., Zwaenepoel, W.: Chaos: scale-out graph processing from secondary storage. In: Proceedings of the 25th Symposium on Operating Systems Principles, pp. 410–424. ACM (2015) Roy, A., Bindschaedler, L., Malicevic, J., Zwaenepoel, W.: Chaos: scale-out graph processing from secondary storage. In: Proceedings of the 25th Symposium on Operating Systems Principles, pp. 410–424. ACM (2015)
21.
go back to reference Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 472–488. ACM (2013) Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 472–488. ACM (2013)
22.
23.
go back to reference Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: ACM Sigplan Notices, vol. 48, pp. 135–146. ACM (2013) Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: ACM Sigplan Notices, vol. 48, pp. 135–146. ACM (2013)
24.
go back to reference Vora, K., Xu, G., Gupta, R.: Load the edges you need: a generic i/o optimization for disk-based graph processing. In: 2016 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\) \(\{\)ATC\(\}\) 16), pp. 507–522 (2016) Vora, K., Xu, G., Gupta, R.: Load the edges you need: a generic i/o optimization for disk-based graph processing. In: 2016 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\) \(\{\)ATC\(\}\) 16), pp. 507–522 (2016)
25.
go back to reference Wang, P., Zhang, K., Chen, R., Chen, H., Guan, H.: Replication-based fault-tolerance for large-scale graph processing. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 562–573. IEEE (2014) Wang, P., Zhang, K., Chen, R., Chen, H., Guan, H.: Replication-based fault-tolerance for large-scale graph processing. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 562–573. IEEE (2014)
26.
go back to reference Wang, Z., Gu, Y., Bao, Y., Yu, G., Yu, J.X.: Hybrid pulling/pushing for i/o-efficient distributed and iterative graph computing. In: Proceedings of the 2016 International Conference on Management of Data, pp. 479–494. ACM (2016) Wang, Z., Gu, Y., Bao, Y., Yu, G., Yu, J.X.: Hybrid pulling/pushing for i/o-efficient distributed and iterative graph computing. In: Proceedings of the 2016 International Conference on Management of Data, pp. 479–494. ACM (2016)
27.
go back to reference Zhao, Y., Yoshigoe, K., Xie, M., Zhou, S., Seker, R., Bian, J.: Lightgraph: lighten communication in distributed graph-parallel processing. In: 2014 IEEE International Congress on Big Data, pp. 717–724. IEEE (2014) Zhao, Y., Yoshigoe, K., Xie, M., Zhou, S., Seker, R., Bian, J.: Lightgraph: lighten communication in distributed graph-parallel processing. In: 2014 IEEE International Congress on Big Data, pp. 717–724. IEEE (2014)
28.
go back to reference Zheng, D., Burns, R., Szalay, A.S.: Toward millions of file system iops on low-cost, commodity hardware. In: SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE (2013) Zheng, D., Burns, R., Szalay, A.S.: Toward millions of file system iops on low-cost, commodity hardware. In: SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE (2013)
29.
go back to reference Zheng, D., Mhembere, D., Burns, R., Vogelstein, J., Priebe, C.E., Szalay, A.S.: Flashgraph: processing billion-node graphs on an array of commodity ssds. In: 13th \(\{\)USENIX\(\}\) Conference on File and Storage Technologies (\(\{\)FAST\(\}\) 15), pp. 45–58 (2015) Zheng, D., Mhembere, D., Burns, R., Vogelstein, J., Priebe, C.E., Szalay, A.S.: Flashgraph: processing billion-node graphs on an array of commodity ssds. In: 13th \(\{\)USENIX\(\}\) Conference on File and Storage Technologies (\(\{\)FAST\(\}\) 15), pp. 45–58 (2015)
30.
go back to reference Zhu, X., Han, W., Chen, W.: Gridgraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: 2015 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\) \(\{\)ATC\(\}\) 15), pp. 375–386 (2015) Zhu, X., Han, W., Chen, W.: Gridgraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: 2015 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\) \(\{\)ATC\(\}\) 15), pp. 375–386 (2015)
Metadata
Title
CSMqGraph: Coarse-Grained and Multi-external-storage Multi-queue I/O Management for Graph Computing
Authors
Shuo Chen
Zhan Shi
Dan Feng
Shang Liu
Fang Wang
Lei Yang
Ruili Yu
Publication date
15-11-2019
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 1/2020
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-019-00651-0

Other articles of this Issue 1/2020

International Journal of Parallel Programming 1/2020 Go to the issue

Premium Partner