Top

International Journal of Parallel Programming

Published in:

15-11-2019

CSMqGraph: Coarse-Grained and Multi-external-storage Multi-queue I/O Management for Graph Computing

Authors: Shuo Chen, Zhan Shi, Dan Feng, Shang Liu, Fang Wang, Lei Yang, Ruili Yu

Published in: International Journal of Parallel Programming | Issue 1/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

As graphs continue growing, external storage graph processing systems serve as a promising alternative to distributed in-memory solutions for low cost and high scalability. To obtain high I/O throughput, these systems usually use multiple external storage devices. They adopt the operating system I/O management method based on striped volume, resulting in unsatisfactory performance, such as low sequential bandwidth utilization of each external storage device, limited I/O parallelism and expensive management overhead. In this paper, we analyzed the problems of the operating system I/O management method based on striped volume. Then we designed CSMqGraph, a graph processing system adopts coarse-grained striping method matching sequential large I/O to fully utilize the maximum sequential bandwidth of each external storage device and an I/O management strategy based on multi-external-storage multi-queue making I/O threads dedicated to each external storage device to further improve I/O throughput and fully exploit the parallelism of multiple external storage devices. For different graph algorithms and datasets, our evaluation shows that CSMqGraph consistently outperforms state-of-the-art engines GridGraph by up to 40%, and has better I/O scalability.

previous article Characterizing Scalability of Sparse Matrix–Vector Multiplications on Phytium FT-2000+

next article Message Passing Optimization in Robot Operating System

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Chi, Y., Dai, G., Wang, Y., Sun, G., Li, G., Yang, H.: Nxgraph: an efficient graph processing system on a single machine. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp. 409–420. IEEE (2016)

Coffman, T., Greenblatt, S., Marcus, S.: Graph-based technologies for intelligence analysis. Commun. ACM 47(3), 45–47 (2004)CrossRef

Del Sol, A., Fujihashi, H., O’Meara, P.: Topology of small-world networks of protein-protein complex structures. Bioinformatics 21(8), 1311–1315 (2005)CrossRef

Doerr, C., Blenn, N.: Metric convergence in social network sampling. In: Proceedings of the 5th ACM Workshop on HotPlanet, pp. 45–50. ACM (2013)

Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Presented as part of the 10th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 12), pp. 17–30 (2012)

Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: graph processing in a distributed dataflow framework. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 14), pp. 599–613 (2014)

Huberman, B.A., Adamic, L.A.: Internet: growth dynamics of the world-wide web. Nature 401(6749), 131 (1999)CrossRef

Jeong, H., Mason, S.P., Barabási, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41 (2001)CrossRef

Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., Barabási, A.L.: The large-scale organization of metabolic networks. Nature 407(6804), 651 (2000)CrossRef

10.

Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, pp. 229–238. Washington, DC, USA (2009)

11.

Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., Kalnis, P.: Mizan: a system for dynamic load balancing in large-scale graph processing. In: Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys ’13, pp. 169–182. ACM, New York, NY, USA (2013). https://doi.org/10.1145/2465351.2465369

12.

Kumar, P., Huang, H.H.: G-store: high-performance graph store for trillion-edge processing. In: SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 830–841. IEEE (2016)

13.

Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600. ACM (2010)

14.

Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: large-scale graph computation on just a \(\{\)PC\(\}\). In: Presented as part of the 10th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 12), pp. 31–46 (2012)

15.

Lee, E.K., Katz, R.H.: An analytic performance model of disk arrays. In: ACM SIGMETRICS Performance Evaluation Review, vol. 21, pp. 98–109. ACM (1993)

16.

Liu, H., Huang, H.H.: Graphene: fine-grained IO management for graph computing. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp. 285–300. USENIX Association, Santa Clara, CA (2017). https://www.usenix.org/conference/fast17/technical-sessions/presentation/liu

17.

Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146. ACM (2010)

18.

Nguyen, D., Lenharth, A., Pingali, K.: A lightweight infrastructure for graph analytics. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, pp. 456–471. ACM, New York, NY, USA (2013). https://doi.org/10.1145/2517349.2522739

19.

Randles, M., Lamb, D., Taleb-Bendiab, A.: A comparative study into distributed load balancing algorithms for cloud computing. In: 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops, pp. 551–556. IEEE (2010)

20.

Roy, A., Bindschaedler, L., Malicevic, J., Zwaenepoel, W.: Chaos: scale-out graph processing from secondary storage. In: Proceedings of the 25th Symposium on Operating Systems Principles, pp. 410–424. ACM (2015)

21.

Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 472–488. ACM (2013)

22.

Shiloach, Y., Vishkin, U.: An o(log n) parallel connectivity algorithm. J. Algorithms 3, 57–67 (1982)MathSciNetCrossRef

23.

Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: ACM Sigplan Notices, vol. 48, pp. 135–146. ACM (2013)

24.

Vora, K., Xu, G., Gupta, R.: Load the edges you need: a generic i/o optimization for disk-based graph processing. In: 2016 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\) \(\{\)ATC\(\}\) 16), pp. 507–522 (2016)

25.

Wang, P., Zhang, K., Chen, R., Chen, H., Guan, H.: Replication-based fault-tolerance for large-scale graph processing. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 562–573. IEEE (2014)

26.

Wang, Z., Gu, Y., Bao, Y., Yu, G., Yu, J.X.: Hybrid pulling/pushing for i/o-efficient distributed and iterative graph computing. In: Proceedings of the 2016 International Conference on Management of Data, pp. 479–494. ACM (2016)

27.

Zhao, Y., Yoshigoe, K., Xie, M., Zhou, S., Seker, R., Bian, J.: Lightgraph: lighten communication in distributed graph-parallel processing. In: 2014 IEEE International Congress on Big Data, pp. 717–724. IEEE (2014)

28.

Zheng, D., Burns, R., Szalay, A.S.: Toward millions of file system iops on low-cost, commodity hardware. In: SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE (2013)

29.

Zheng, D., Mhembere, D., Burns, R., Vogelstein, J., Priebe, C.E., Szalay, A.S.: Flashgraph: processing billion-node graphs on an array of commodity ssds. In: 13th \(\{\)USENIX\(\}\) Conference on File and Storage Technologies (\(\{\)FAST\(\}\) 15), pp. 45–58 (2015)

30.

Zhu, X., Han, W., Chen, W.: Gridgraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: 2015 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\) \(\{\)ATC\(\}\) 15), pp. 375–386 (2015)

Title: CSMqGraph: Coarse-Grained and Multi-external-storage Multi-queue I/O Management for Graph Computing
Authors: Shuo Chen
Zhan Shi
Dan Feng
Shang Liu
Fang Wang
Lei Yang
Ruili Yu
Publication date: 15-11-2019
Publisher: Springer US
Published in: International Journal of Parallel Programming / Issue 1/2020
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-019-00651-0

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 1/2020

Empirical Mode Decomposition and Temporal Convolutional Networks for Remaining Useful Life Estimation

Planning Above the API Clouds Before Flying Above the Clouds: A Real-Time Personalized Air Travel Planning Approach

Characterizing Scalability of Sparse Matrix–Vector Multiplications on Phytium FT-2000+

Convoider: A Concurrency Bug Avoider Based on Transparent Software Transactional Memory

Message Passing Optimization in Robot Operating System

Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential

Premium Partner