skip to main content
10.1145/2637166.2637236acmotherconferencesArticle/Chapter ViewAbstractPublication PagesapsysConference Proceedingsconference-collections
research-article

Bipartite-oriented distributed graph partitioning for big learning

Published:25 June 2014Publication History

ABSTRACT

Many machine learning and data mining (MLDM) problems like recommendation, topic modeling and medical diagnosis can be modeled as computing on bipartite graphs. However, most distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually causes excessive replication of vertices as well as significant pressure on network communication. This article identifies the challenges and opportunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partition algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X (from 1.38X) for four typical MLDM algorithms, due to reducing up to 62% vertex replication, and up to 96% network traffic.

References

  1. Netflix prize. http://www.netflixprize.com/.Google ScholarGoogle Scholar
  2. Brin, S., and Page, L. The anatomy of a large-scale hypertextual web search engine. In WWW (1998), pp. 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chen, R., Shi, J., Chen, Y., Guan, H., Zang, B., and Chen, H. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. http://ipads.se.sjtu.edu.cn/projects/powerlyra/PowerLyra-IPADSTR-2013-001.pdf, 2013.Google ScholarGoogle Scholar
  4. Davis, T., and Hu, Y. The university of florida sparse matrix collection. http://www.cise.ufl.edu/research/sparse/matrices/index.html.Google ScholarGoogle Scholar
  5. Dhillon, I. S. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (2001), ACM, pp. 269--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gao, B., Liu, T.-Y., Feng, G., Qin, T., Cheng, Q.-S., and Ma, W.-Y. Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning. Knowledge and Data Engineering, IEEE Transactions on 17, 9 (2005), 1263--1273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gao, B., Liu, T.-Y., Zheng, X., Cheng, Q.-S., and Ma, W.-Y. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (2005), ACM, pp. 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gonzalez, J., Low, Y., Gu, H., Bickson, D., and Guestrin, C. PowerGraph: Distributed graph-parallel computation on natural graphs. In OSDI (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jain, N., Liao, G., and Willke, T. L. Graphbuilder: scalable graph etl framework. In First International Workshop on Graph Data Management Experiences and Systems (New York, NY, USA, 2013), GRADES '13, ACM, pp. 4:1--4:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Koren, Y., Bell, R., and Volinsky, C. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kumar, A., Beutel, A., Ho, Q., and Xing, E. P. Fugue: Slow-worker-agnostic distributed learning for big models on big data. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (2014), pp. 531--539.Google ScholarGoogle Scholar
  12. Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., and Hellerstein, J. M. Distributed GraphLab: a framework for machine learning and data mining in the cloud. VLDB Endow. 5, 8 (2012), 716--727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., and Czajkowski, G. Pregel: a system for large-scale graph processing. In SIGMOD (2010), pp. 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Project, S. N. A. Stanford large network dataset collection. http://snap.stanford.edu/data/.Google ScholarGoogle Scholar
  15. Zha, H., He, X., Ding, C., Simon, H., and Gu, M. Bipartite graph partitioning and data clustering. In Proceedings of the tenth international conference on Information and knowledge management (2001), ACM, pp. 25--32. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bipartite-oriented distributed graph partitioning for big learning

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              APSys '14: Proceedings of 5th Asia-Pacific Workshop on Systems
              June 2014
              98 pages
              ISBN:9781450330244
              DOI:10.1145/2637166

              Copyright © 2014 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 25 June 2014

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              APSys '14 Paper Acceptance Rate14of35submissions,40%Overall Acceptance Rate149of386submissions,39%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader