Skip to main content
Top
Published in:

01-12-2016 | Original Article

Parallel collective factorization for modeling large heterogeneous networks

Authors: Ryan A. Rossi, Rong Zhou

Published in: Social Network Analysis and Mining | Issue 1/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Relational learning methods for heterogeneous network data are becoming increasingly important for many real-world applications. However, existing relational learning approaches are sequential, inefficient, unable to scale to large heterogeneous networks, as well as many other limitations related to convergence, parameter tuning, etc. In this paper, we propose Parallel Collective Matrix Factorization (PCMF) that serves as a fast and flexible framework for joint modeling of a variety of heterogeneous network data. The PCMF learning algorithm solves for a single parameter given the others, leading to a parallel scheme that is fast, flexible, and general for a variety of relational learning tasks and heterogeneous data types. The proposed approach is carefully designed to be (1) efficient for large heterogeneous networks (linear in the total number of observations from the set of input matrices), (2) flexible as many components are interchangeable and easily adaptable, and (3) effective for a variety of applications as well as for different types of data. The experiments demonstrate the scalability, flexibility, and effectiveness of PCMF for a variety of relational modeling tasks. In particular, PCMF outperforms a recent state-of-the-art approach in runtime, scalability, and prediction quality. Finally, we also investigate variants of PCMF for serving predictions in a real-time streaming fashion.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
Note that undirected homogeneous networks (symmetric matrices) are a special case of our framework.
 
2
A worker is a thread in shared memory setting and machine in distributed memory architecture.
 
4
The likelihood expression assumes noise in the data is Gaussian.
 
5
Edges were also sampled inversely proportional to the degree of each neighborhood node.
 
7
A recently proposed parallel coordinate descent method for recommendation.
 
8
Speed may be fundamentally more important than accuracy.
 
9
Undirected social networks give rise to variants based on in/out/total degree.
 
10
Note that these are known actual relationships in the social network, but are not used for learning.
 
Literature
go back to reference Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on Very large data bases, vol 29. VLDB Endowment, pp 81–92 Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on Very large data bases, vol 29. VLDB Endowment, pp 81–92
go back to reference Ahmed NK, Neville J, Kompella R (2013) Network sampling: from static to streaming graphs. TKDD, pp 1–54 Ahmed NK, Neville J, Kompella R (2013) Network sampling: from static to streaming graphs. TKDD, pp 1–54
go back to reference Ahmed NK, Rossi RA (2015) Interactive visual graph analytics on the web. In: International AAAI conference on web and social media (ICWSM), pp 566–569 Ahmed NK, Rossi RA (2015) Interactive visual graph analytics on the web. In: International AAAI conference on web and social media (ICWSM), pp 566–569
go back to reference Bilgic M, Mihalkova L, Getoor L (2010) Active learning for networked data. In: ICML, pp 79–86 Bilgic M, Mihalkova L, Getoor L (2010) Active learning for networked data. In: ICML, pp 79–86
go back to reference Bonhard P, Sasse M (2006) Knowing me, knowing you using profiles and social networking to improve recommender systems. BT Technol J 24(3):84–98CrossRef Bonhard P, Sasse M (2006) Knowing me, knowing you using profiles and social networking to improve recommender systems. BT Technol J 24(3):84–98CrossRef
go back to reference Borgatti SP, Everett MG, Johnson JC (2013) Analyzing social networks. SAGE Publications Limited, California Borgatti SP, Everett MG, Johnson JC (2013) Analyzing social networks. SAGE Publications Limited, California
go back to reference Fairbanks J, Ediger D, McColl R, Bader DA, Gilbert E (2013) A statistical framework for streaming graph analysis. In: ASONAM, pp 341–347 Fairbanks J, Ediger D, McColl R, Bader DA, Gilbert E (2013) A statistical framework for streaming graph analysis. In: ASONAM, pp 341–347
go back to reference Gemulla R, Nijkamp E, Haas PJ, Sismanis Y (2011) Large-scale matrix factorization with distributed stochastic gradient descent. In: SIGKDD, pp 69–77 Gemulla R, Nijkamp E, Haas PJ, Sismanis Y (2011) Large-scale matrix factorization with distributed stochastic gradient descent. In: SIGKDD, pp 69–77
go back to reference Jamali M, Ester M (2010) A matrix factorization technique with trust propagation for recommendation in social networks. In: RecSys, pp 135–142 Jamali M, Ester M (2010) A matrix factorization technique with trust propagation for recommendation in social networks. In: RecSys, pp 135–142
go back to reference Jiang D, Pei J, Li H (2013) Mining search and browse logs for web search: a survey. TIST 4(4):57CrossRef Jiang D, Pei J, Li H (2013) Mining search and browse logs for web search: a survey. TIST 4(4):57CrossRef
go back to reference Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37CrossRef Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37CrossRef
go back to reference La Fond T, Neville J (2010) Randomization tests for distinguishing social influence and homophily effects. In: WWW, pp 601–610 La Fond T, Neville J (2010) Randomization tests for distinguishing social influence and homophily effects. In: WWW, pp 601–610
go back to reference Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. JASIST 58(7):1019–1031CrossRef Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. JASIST 58(7):1019–1031CrossRef
go back to reference Liu W, He J, Chang S-F (2010) Large graph construction for scalable semi-supervised learning. In: Proceedings of the 27th international conference on machine learning, pp 679–686 Liu W, He J, Chang S-F (2010) Large graph construction for scalable semi-supervised learning. In: Proceedings of the 27th international conference on machine learning, pp 679–686
go back to reference Lusk EL, Pieper SC, Butler RM et al (2010) More scalability, less pain: a simple programming model and its implementation for extreme computing. SciDAC Rev 17(1):30–37 Lusk EL, Pieper SC, Butler RM et al (2010) More scalability, less pain: a simple programming model and its implementation for extreme computing. SciDAC Rev 17(1):30–37
go back to reference Ma H, Yang H, Lyu MR, King I (2008) Sorec: social recommendation using probabilistic matrix factorization. In: CIKM, pp 931–940 Ma H, Yang H, Lyu MR, King I (2008) Sorec: social recommendation using probabilistic matrix factorization. In: CIKM, pp 931–940
go back to reference Massa P, Avesani P (2007) Trust-aware recommender systems. In: Proceedings of the 2007 ACM conference on recommender systems. ACM, pp 17–24 Massa P, Avesani P (2007) Trust-aware recommender systems. In: Proceedings of the 2007 ACM conference on recommender systems. ACM, pp 17–24
go back to reference McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27:415–444CrossRef McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27:415–444CrossRef
go back to reference Mislove A, Marcon M, Gummadi K, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: SIGCOMM, pp 29–42 Mislove A, Marcon M, Gummadi K, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: SIGCOMM, pp 29–42
go back to reference Niu F, Recht B, Ré C, Wright SJ (2011) Hogwild!: a lock-free approach to parallelizing stochastic gradient descent. NIPS 24:693–701 Niu F, Recht B, Ré C, Wright SJ (2011) Hogwild!: a lock-free approach to parallelizing stochastic gradient descent. NIPS 24:693–701
go back to reference Recht B, Ré C (2013) Parallel stochastic gradient algorithms for large-scale matrix completion. Math Program Comput 5(2):201–226MathSciNetCrossRefMATH Recht B, Ré C (2013) Parallel stochastic gradient algorithms for large-scale matrix completion. Math Program Comput 5(2):201–226MathSciNetCrossRefMATH
go back to reference Rossi RA, Ahmed NK (2014) Role discovery in networks. TKDE 26(7):1–20 Rossi RA, Ahmed NK (2014) Role discovery in networks. TKDE 26(7):1–20
go back to reference Rossi RA, Ahmed NK (2016) An interactive data repository with visual analytics. SIGKDD Explor 17(2):37–41CrossRef Rossi RA, Ahmed NK (2016) An interactive data repository with visual analytics. SIGKDD Explor 17(2):37–41CrossRef
go back to reference Rossi RA, McDowell LK, Aha DW, Neville J (2012) Transforming graph data for statistical relational learning. JAIR 45(1):363–441MATH Rossi RA, McDowell LK, Aha DW, Neville J (2012) Transforming graph data for statistical relational learning. JAIR 45(1):363–441MATH
go back to reference Salakhutdinov R, Mnih A (2007) Probabilistic matrix factorization. In NIPS, vol 1, pp 1–2 Salakhutdinov R, Mnih A (2007) Probabilistic matrix factorization. In NIPS, vol 1, pp 1–2
go back to reference Satuluri V, Parthasarathy S, Ruan Y (2011) Local graph sparsification for scalable clustering. In: Proceedings of the 2011 international conference on Management of data. ACM, pp 721–732 Satuluri V, Parthasarathy S, Ruan Y (2011) Local graph sparsification for scalable clustering. In: Proceedings of the 2011 international conference on Management of data. ACM, pp 721–732
go back to reference Singla P, Richardson M (2008) Yes, there is a correlation: from social networks to personal behavior on the web. In: WWW, pp 655–664 Singla P, Richardson M (2008) Yes, there is a correlation: from social networks to personal behavior on the web. In: WWW, pp 655–664
go back to reference Spielman DA, Teng S-H (2004) Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: Proceedings of the thirty-sixth annual ACM symposium on theory of computing. ACM, pp 81–90 Spielman DA, Teng S-H (2004) Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: Proceedings of the thirty-sixth annual ACM symposium on theory of computing. ACM, pp 81–90
go back to reference Sun Y, Han J (2012) Mining heterogeneous information networks: principles and methodologies. Synth Lect Data Min Knowl Discov 3(2):1–159CrossRef Sun Y, Han J (2012) Mining heterogeneous information networks: principles and methodologies. Synth Lect Data Min Knowl Discov 3(2):1–159CrossRef
go back to reference Tang J, Hu X, Liu H (2013) Social recommendation: a review. SNAM 3(4):1113–1133 Tang J, Hu X, Liu H (2013) Social recommendation: a review. SNAM 3(4):1113–1133
go back to reference Tsai M-H, Aggarwal C, Huang T (2014) Ranking in heterogeneous social media. In: WSDM, pp 613–622 Tsai M-H, Aggarwal C, Huang T (2014) Ranking in heterogeneous social media. In: WSDM, pp 613–622
go back to reference Vorontsov M, Carhart G, Ricklin J (1997) Adaptive phase-distortion correction based on parallel gradient-descent optimization. Opt Lett 22(12):907–909CrossRef Vorontsov M, Carhart G, Ricklin J (1997) Adaptive phase-distortion correction based on parallel gradient-descent optimization. Opt Lett 22(12):907–909CrossRef
go back to reference Yang X, Guo Y, Liu Y, Steck H (2013) A survey of collaborative filtering based social recommender systems. Comput Commun 41:1–10CrossRef Yang X, Guo Y, Liu Y, Steck H (2013) A survey of collaborative filtering based social recommender systems. Comput Commun 41:1–10CrossRef
go back to reference Yang S-H, Long B, Smola A, Sadagopan N, Zheng Z, and Zha H (2011) Like like alike: joint friendship and interest propagation in social networks. In: WWW, pp 537–546 Yang S-H, Long B, Smola A, Sadagopan N, Zheng Z, and Zha H (2011) Like like alike: joint friendship and interest propagation in social networks. In: WWW, pp 537–546
go back to reference Yasui Y, Fujisawa K, Goto K (2013) NUMA-optimized parallel breadth-first search on multicore single-node system. In: Big data, pp 394–402 Yasui Y, Fujisawa K, Goto K (2013) NUMA-optimized parallel breadth-first search on multicore single-node system. In: Big data, pp 394–402
go back to reference Yu H-F, Hsieh C-J, Si S, Dhillon IS (2012) Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In: ICDM, pp 765–774 Yu H-F, Hsieh C-J, Si S, Dhillon IS (2012) Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In: ICDM, pp 765–774
go back to reference Zhou Y, Wilkinson D, Schreiber R, Pan R (2008) Large-scale parallel collaborative filtering for the netflix prize. In: Algorithmic aspects in information and management. Springer, pp 337–348 Zhou Y, Wilkinson D, Schreiber R, Pan R (2008) Large-scale parallel collaborative filtering for the netflix prize. In: Algorithmic aspects in information and management. Springer, pp 337–348
go back to reference Zinkevich M, Weimer M, Smola AJ, Li L (2010) Parallelized stochastic gradient descent. In: NIPS, vol 4, p 4 Zinkevich M, Weimer M, Smola AJ, Li L (2010) Parallelized stochastic gradient descent. In: NIPS, vol 4, p 4
Metadata
Title
Parallel collective factorization for modeling large heterogeneous networks
Authors
Ryan A. Rossi
Rong Zhou
Publication date
01-12-2016
Publisher
Springer Vienna
Published in
Social Network Analysis and Mining / Issue 1/2016
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-016-0349-6

Premium Partner