Abstract
Interpersonal ties are responsible for the structure of social networks and the transmission of information through these networks. Different types of social ties have essentially different influences on people. Awareness of the types of social ties can benefit many applications, such as recommendation and community detection. For example, our close friends tend to move in the same circles that we do, while our classmates may be distributed into different communities. Though a bulk of research has focused on inferring particular types of relationships in a specific social network, few publications systematically study the generalization of the problem of predicting social ties across multiple heterogeneous networks.
In this work, we develop a framework referred to as TranFG for classifying the type of social relationships by learning across heterogeneous networks. The framework incorporates social theories into a factor graph model, which effectively improves the accuracy of predicting the types of social relationships in a target network by borrowing knowledge from a different source network. We also present several active learning strategies to further enhance the inferring performance. To scale up the model to handle really large networks, we design a distributed learning algorithm for the proposed model.
We evaluate the proposed framework (TranFG) on six different networks and compare with several existing methods. TranFG clearly outperforms the existing methods on multiple metrics. For example, by leveraging information from a coauthor network with labeled advisor-advisee relationships, TranFG is able to obtain an F1-score of 90% (8%--28% improvements over alternative methods) for predicting manager-subordinate relationships in an enterprise email network. The proposed model is efficient. It takes only a few minutes to train the proposed transfer model on large networks containing tens of thousands of nodes.
- Lada A. Adamic and Eytan Adar. 2001. Friends and neighbors on the web. Social Networks 25 (2001), 211--230.Google ScholarCross Ref
- Rie Kubota Ando and Tong Zhang. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research 6 (2005), 1817--1853. Google ScholarDigital Library
- Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. 2006. Multi-task feature learning. In Proceedings of the 18th Neural Information Processing Systems (NIPS’06). 41--48.Google Scholar
- Andreas Argyriou, Andreas Maurer, and Massimiliano Pontil. 2008. An algorithm for transfer learning in a heterogeneous environment. In Proceedings of 2008 European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD’08). 71--85.Google ScholarCross Ref
- Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the 4th ACM International Conference on Web Search and Web Data Mining (WSDM’11). 635--644. Google ScholarDigital Library
- Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509--512.Google Scholar
- John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP’06). 120--128. Google ScholarDigital Library
- Ronald S. Burt. 1992. Structural Holes: The Social Structure of Competition. Harvard University Press.Google ScholarCross Ref
- Bin Cao, Nathan Nan Liu, and Qiang Yang. 2010. Transfer learning for collective link prediction in multiple heterogenous domains. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 159--166.Google ScholarDigital Library
- Wei Chen, Yajun Wang, and Siyu Yang. 2009. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). 199--207. Google ScholarDigital Library
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20 (1995), 273--297. Google ScholarDigital Library
- David J. Crandall, Lars Backstrom, Dan Cosley, Daniel Huttenlocher Siddharth Suri, and Jon Kleinberg. 2010. Inferring social ties from geographic coincidences. Proceedings of the National Academy of Science 107 (Dec. 2010), 22436--22441.Google Scholar
- Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. 2008. Translated learning: Transfer learning across different feature spaces. In Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS’08). 353--360.Google Scholar
- Wenyuan Dai, Guirong Xue, Qiang Yang, and Yong Yu. 2007a. Co-clustering based classification for out-of-domain documents. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’07). 210--219. Google ScholarDigital Library
- Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. 2007b. Boosting for transfer learning. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). 193--200. Google ScholarDigital Library
- James A. Davis and Samuel Leinhardt. 1972. The structure of positive interpersonal relations in small groups. In Sociological Theories in Progress, J. Berger (Ed.). Vol. 2. Houghton Mifflin, 218--251.Google Scholar
- Christopher P. Diehl, Galileo Namata, and Lise Getoor. 2007. Relationship identification for social network discovery. In Proceedings of the 24th National Conference on Artificial Intelligence and 9th Conference on Innovative Applications of Artificial Intelligence (AAAI’07). 546--552. Google ScholarDigital Library
- Yuxiao Dong, Jie Tang, Nitesh V. Chawla, Tiancheng Lou, Yang Yang, and Bai Wang. 2015. Inferring social status and rich club effects in enterprise communication networks. PLOS One 10, 3 (2015), e0119446.Google ScholarCross Ref
- Yuxiao Dong, Jie Tang, Sen Wu, Jilei Tian, Nitesh V. Chawla, Jinghai Rao, and Huanhuan Cao. 2012. Link prediction and recommendation across heterogeneous social networks. In Proceedings of the 12th International Conference on Data Mining (ICDM’12). 181--190. Google ScholarDigital Library
- Yuxiao Dong, Jing Zhang, Jie Tang, Nitesh V. Chawla, and Bai Wang. 2015. CoupledLP: Link prediction in coupled networks. In Proceedings of the 21nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15). 199--208. Google ScholarDigital Library
- Nathan Eagle, Alex (Sandy) Pentland, and David Lazer. 2009. Inferring social network structure using mobile phone data. Proceedings of the National Academy of Science 106, 36 (2009), 15274--15278.Google ScholarCross Ref
- David Easley and Jon Kleinberg. 2010. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press. Google ScholarDigital Library
- Jing Gao, Wei Fan, Jing Jian, and Jiawei Han. 2008. Knowledge transfer via multiple model local structure mapping. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). 283--291. Google ScholarDigital Library
- Lise Getoor and Ben Taskar. 2007. Introduction to Statistical Relational Learning. MIT Press. Google ScholarDigital Library
- Malcolm Gladwell. 2001. The Tipping Point - How Little Things Make a Big Difference (New York, NY: Little Brown, 2000); G. Khermouch and J. Green, Buzz Marketing: Suddenly This Stealth Strategy Is Hot -- but It's Still Fraught with Risk. Business Week (2001), 50.Google Scholar
- Bruno Goncalves, Nicola Perra, and Alessandro Vespignani. 2011. Modeling users’ activity on twitter networks: Validation of dunbar’s number. PLoS ONE 6 (08 2011), e22656.Google Scholar
- Mark Granovetter. 1973. The strength of weak ties. American Journal of Sociology 78, 6 (1973), 1360--1380.Google ScholarCross Ref
- R. Guha, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins. 2004. Propagation of trust and distrust. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). 403--412. Google ScholarDigital Library
- J. M. Hammersley and P. Clifford. 1971. Markov field on finite graphs and lattices. Unpublished manuscript (1971).Google Scholar
- John Hopcroft, Tiancheng Lou, and Jie Tang. 2011. Who will follow you back? Reciprocal relationship prediction. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). 1137--1146. Google ScholarDigital Library
- Tony Jebara. 2004. Multi-task feature and kernel selection for svms. In Proceedings of the 21th International Conference on Machine Learning (ICML’04). 55--62. Google ScholarDigital Library
- George Karypis and Vipin Kumar. 1998. MeTis: Unstrctured Graph Partitioning and Sparse Matrix Ordering System, Version 4.0. http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/manual.pdf.Google Scholar
- Elihu Katz. 1957. The two-step flow of communication: An up-to-date report on an hypothesis. Public Opinion Quarterly 21, 1 (1957), 61--78.Google ScholarCross Ref
- Elihu Katz and Paul Felix Lazarsfeld. 1955. Personal Influence. The Free Press, New York, USA.Google Scholar
- Leo Katz. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 1 (1953), 39--43.Google ScholarCross Ref
- David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03). 137--146. Google ScholarDigital Library
- Nikos Komodakis, Nikos Paragios, and Georgios Tziritas. 2011. MRF energy minimization and beyond via dual decomposition. IEEE Transactions on Pattern Analalysis and Machine Intelligence 33, 3 (2011), 531--552. Google ScholarDigital Library
- David Krackhardt. 1992. The strength of strong ties: The importance of philos in organizations. Networks and Organizations: Structure, Form, and Action 216 (1992), 239.Google Scholar
- Frank R. Kschischang, Brendan J. Frey, and Hans andrea Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47 (2001), 498--519. Google ScholarDigital Library
- John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 282--289. Google ScholarDigital Library
- Steffen L. Lauritzen. 1996. Graphical Models. Oxford University Press, Oxford.Google Scholar
- Paul Felix Lazarsfeld, Bernard Berelson, and Hazel Gaudet. 1944. The People’s Choice: How the Voter Makes up his Mind in a Presidential Campaign. Columbia University Press, New York, NY, USA.Google Scholar
- Su-In Lee, Vassil Chatalbashev, David Vickrey, and Daphne Koller. 2007. Learning a meta-level prior for feature relevance from multiple related tasks. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). 489--496. Google ScholarDigital Library
- Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010a. Predicting positive and negative links in online social networks. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 641--650. Google ScholarDigital Library
- Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010b. Signed networks in social media. In Proceedings of the 28th International Conference on Human Factors in Computing Systems (CHI’10). 1361--1370. Google ScholarDigital Library
- Xuejun Liao, Ya Xue, and Lawrence Carin. 2005. Logistic regression with an auxiliary data source. In Proceedings of the 22th International Conference on Machine Learning (ICML’05). 505--512. Google ScholarDigital Library
- David Liben-Nowell and Jon M. Kleinberg. 2007. The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58, 7 (2007), 1019--1031. Google ScholarDigital Library
- Ryan Lichtenwalter, Jake T. Lussier, and Nitesh V. Chawla. 2010. New perspectives and methods in link prediction. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10). 243--252. Google ScholarDigital Library
- Xiao Ling, Guirong Xue, Wenyuan Dai, Yun Jiang, Qiang Yang, and Yong Yu. 2008. Can Chinese web pages be classified with english data source? In Proceeding of the 17th International Conference on World Wide Web (WWW’08). 969--978. Google ScholarDigital Library
- Tiancheng Lou and Jie Tang. 2013. Mining structural hole spanners through information diffusion in social networks. In Proceedings of the 22th International Conference on World Wide Web (WWW’13). 837--848. Google ScholarDigital Library
- Tiancheng Lou, Jie Tang, John Hopcroft, Zhanpeng Fang, and Xiaowen Ding. 2013. Learning to predict reciprocity and triadic closure in social networks. ACM Transactions on Knowledge Discovery from Data 7, 2 (2013), Article No. 5. Google ScholarDigital Library
- Julian McAuley and Jure Leskovec. 2014. Discovering social circles in ego networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 8, 1 (2014), 4. Google ScholarDigital Library
- Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI’99). 467--475. Google ScholarDigital Library
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions. Mathematical Programming 14, 1 (1978), 265--294.Google ScholarDigital Library
- M. E. J. Newman. 2001. Clustering and preferential attachment in growing networks. Physical Reviews E 64, 2 (2001), 025102.Google ScholarCross Ref
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report SIDL-WP-1999-0120. Stanford University.Google Scholar
- Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (Oct. 2010), 1345 --1359. Google ScholarDigital Library
- Maayan Roth, Assaf Ben-David, David Deutscher, Guy Flysher, Ilan Horn, Ari Leichtberg, Naty Leiser, Yossi Matias, and Ron Merom. 2010. Suggesting friends using the implicit social graph. In Proceedings of the 16th International Conference on Knowledge Discovery and Data Mining (KDD’10). 233--242. Google ScholarDigital Library
- Burr Settles and Mark Craven. 2008. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). 1070--1079. Google ScholarDigital Library
- Xiaoxiao Shi, Wei Fan, and Jiangtao Ren. 2008. Actively transfer domain knowledge. In Proceedings of 2008 European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD’08). 342--357.Google ScholarCross Ref
- Xiaodan Song, Yun Chi, Koji Hino, and Belle L. Tseng. 2007. Identifying opinion leaders in the blogosphere. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM’06). 971--974. Google ScholarDigital Library
- Chenhao Tan, Lillian Lee, Jie Tang, Long Jiang, Ming Zhou, and Ping Li. 2011. User-level sentiment analysis incorporating social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11). 1397--1405. Google ScholarDigital Library
- Chenhao Tan, Jie Tang, Jimeng Sun, Quan Lin, and Fengjiao Wang. 2010. Social action tracking via noise tolerant time-varying factor graphs. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10). 1049--1058. Google ScholarDigital Library
- Jie Tang, Tiancheng Lou, and Jon Kleinberg. 2012. Inferring social ties across heterogeneous networks. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM’12). 743--752. Google ScholarDigital Library
- Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. 2009. Social influence analysis in large-scale networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). 807--816. Google ScholarDigital Library
- Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). 990--998. Google ScholarDigital Library
- Lei Tang and Huan Liu. 2011. Leveraging social media networks for classification. Data Mining and Knowledge Discovery 23, 3 (2011), 447--478. Google ScholarDigital Library
- Wenbin Tang, Honglei Zhuang, and Jie Tang. 2011. Learning to Infer Social Ties in Large Networks. In Proceedings of 2011 European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD’11). 381--397. Google ScholarDigital Library
- Benjamin Taskar, Ming Fai Wong, Pieter Abbeel, and Daphne Koller. 2003. Link prediction in relational data. In Proceedings of the 15th Neural Information Processing Systems (NIPS’03).Google Scholar
- Martin J. Wainwright and Michael I. Jordan. 2008. Graphical models, exponential families, and variational inference. Foundational Trends in Machine Learning 1, 1--2 (Jan. 2008), 1--305. Google ScholarDigital Library
- Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu, and Jingyi Guo. 2010. Mining advisor-advisee relationships from research publication networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10). 203--212. Google ScholarDigital Library
- Liaoruo Wang, Tiancheng Lou, Jie Tang, and John Hopcroft. 2011. Detecting community kernels in large social networks. In Proceedings of 2011 IEEE International Conference on Data Mining (ICDM’11). 784--793. Google ScholarDigital Library
- Wim Wiegerinck. 2000. Variational approximations between mean field theory and the junction tree algorithm. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI’00). 626--633. Google ScholarDigital Library
- Shaomei Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. 2011. Who says what to whom on twitter. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). 705--714. Google ScholarDigital Library
- Rongjing Xiang, Jennifer Neville, and Monica Rogati. 2010. Modeling relationship strength in online social networks. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 981--990. Google ScholarDigital Library
- Eric P. Xing, Michael I. Jordan, and Stuart Russell. 2003. A generalized mean field algorithm for variational inference in exponential families. In Proceedings of 19th Conference on Uncertainty in Artificial Intelligence (UAI’03). 583--591. Google ScholarDigital Library
- Yang Yang, Yizhou Sun, Jie Tang, Bo Ma, and Juanzi Li. 2015. Entity matching across heterogeneous sources. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15). 1395--1404. Google ScholarDigital Library
- Zi Yang, Jingyi Guo, Keke Cai, Jie Tang, Juanzi Li, Li Zhang, and Zhong Su. 2010. Understanding retweeting behaviors in social networks. In Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM’10). 1633--1636. Google ScholarDigital Library
- Jiawei Zhang, Xiangnan Kong, and Philip S. Yu. 2013. Predicting social links for new users across aligned heterogeneous social networks. In ICDM’13. 1289--1294.Google Scholar
- Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip Yu. 2015. COSNET: Connecting heterogeneous social networks with local and global consistency. In KDD’15. 1485--1494. Google ScholarDigital Library
- Honglei Zhuang, Jie Tang, Wenbin Tang, Tiancheng Lou, Alvin Chin, and Xia Wang. 2012. Actively learning to infer social ties. Data Mining and Knowledge Discovery 25, 2 (2012), 270--297. Google ScholarDigital Library
Index Terms
- Transfer Learning to Infer Social Ties across Heterogeneous Networks
Recommendations
Inferring social ties across heterogenous networks
WSDM '12: Proceedings of the fifth ACM international conference on Web search and data miningIt is well known that different types of social ties have essentially different influence on people. However, users in online social networks rarely categorize their contacts into "family", "colleagues", or "classmates". While a bulk of research has ...
Learning to predict reciprocity and triadic closure in social networks
We study how links are formed in social networks. In particular, we focus on investigating how a reciprocal (two-way) link, the basic relationship in social networks, is developed from a parasocial (one-way) relationship and how the relationships ...
Actively learning to infer social ties
We study the extent to which social ties between people can be inferred in large social network, in particular via active user interactions. In most online social networks, relationships are lack of meaning labels (e.g., "colleague" and "intimate ...
Comments