Abstract
As location-based services using mobile devices have become globally popular these days, social network analysis (especially, community detection) increasingly benefits from combining social relationships with geographic preferences. In this regard, this article addresses the emerging problem of geosocial community detection. We first formalize the problem of geosocial co-clustering, which co-clusters the users in social networks and the locations they visited. Geosocial co-clustering detects higher-quality communities than existing approaches by improving the mapping clusterability, whereby users in the same community tend to visit locations in the same region. While geosocial co-clustering is soundly formalized as non-negative matrix tri-factorization, conventional matrix tri-factorization algorithms suffer from a significant computational overhead when handling large-scale datasets. Thus, we also develop an efficient framework for geosocial co-clustering, called GEOsocial COarsening and DEcomposition (GEOCODE). To achieve efficient matrix tri-factorization, GEOCODE reduces the numbers of users and locations through coarsening and then decomposes the single whole matrix tri-factorization into a set of multiple smaller sub-matrix tri-factorizations. Thorough experiments conducted using real-world geosocial networks show that GEOCODE reduces the elapsed time by 19–69 times while achieving the accuracy of up to 94.8% compared with the state-of-the-art co-clustering algorithm. Furthermore, the benefit of the mapping clusterability is clearly demonstrated through a local expert recommendation application.
- Lada A. Adamic, Jun Zhang, Eytan Bakshy, and Mark S. Ackerman. 2008. Knowledge sharing and Yahoo answers: Everyone knows something. In Proceedings of the 17th International Conference on World Wide Web. 665--674.Google Scholar
- Waseem Ahmad and Ashfaq Khokhar. 2007. cHawk: An efficient biclustering algorithm based on bipartite graph crossing minimization. In Proceedings of the VLDB Workshop on Data Mining in Bioinformatics. 1553--1558.Google Scholar
- Nikos Armenatzoglou, Stavros Papadopoulos, and Dimitris Papadias. 2013. A general framework for geo-social query processing. Proc. VLDB Endow. 6, 10 (2013), 913--924.Google ScholarDigital Library
- Lars Backstrom, Eric Sun, and Cameron Marlow. 2010. Find me if you can: Improving geographical prediction with social and spatial proximity. In Proceedings of the 19th International Conference on World Wide Web. 61--70.Google ScholarDigital Library
- Jie Bao, Yu Zheng, and Mohamed F. Mokbel. 2012. Location-based and preference-aware recommendation using sparse geo-social networking data. In Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 199--208.Google Scholar
- Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2011. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 8 (2011), 1548--1560.Google ScholarDigital Library
- Fazli Can and Esen A. Ozkarahan. 1990. Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases. ACM Trans. Database Syst. 15, 4 (1990), 483--517.Google ScholarDigital Library
- Carlos Castro-Herrera, Chuan Duan, Jane Cleland-Huang, and Bamshad Mobasher. 2009. A recommender system for requirements elicitation in large-scale software projects. In Proceedings of the 2009 ACM Symposium on Applied Computing. 1419--1426.Google ScholarDigital Library
- Zhiyuan Cheng, James Caverlee, Himanshu Barthwal, and Vandana Bachani. 2014. Who is the Barbecue King of Texas?: A geo-spatial approach to finding local experts on Twitter. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. 335--344.Google ScholarDigital Library
- Eunjoon Cho, Seth A. Myers, and Jure Leskovec. 2011. Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1082--1090.Google ScholarDigital Library
- Minsoo Choy, Jae-Gil Lee, Gahgene Gweon, and Daehoon Kim. 2014. Glaucus: Exploiting the wisdom of crowds for location-based queries in mobile environments. In Proceedings of the 8th AAAI International Conference on Weblogs and Social Media. 61--70.Google Scholar
- Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S. Modha. 2003. Information-theoretic co-clustering. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 89--98.Google Scholar
- Chris Ding, Xiaofeng He, and Horst D. Simon. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the 2005 SIAM International Conference on Data Mining. 606--610.Google Scholar
- Chris Ding, Tao Li, Wei Peng, and Haesun Park. 2006. Orthogonal nonnegative matrix tri-factorizations for clustering. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 126--135.Google ScholarDigital Library
- Peter D. Grünwald, In Jae Myung, and Mark A Pitt. 2005. Advances in Minimum Description Length: Theory and Applications. MIT Press.Google Scholar
- Quanquan Gu and Jie Zhou. 2009. Co-clustering on manifolds. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 359--368.Google ScholarDigital Library
- Negar Hariri, Carlos Castro-Herrera, Mehdi Mirakhorli, Jane Cleland-Huang, and Bamshad Mobasher. 2013. Supporting domain analysis through mining and recommending features from online product listings. IEEE Trans. Softw. Eng. 39, 12 (2013), 1736--1752.Google ScholarDigital Library
- Xiaofei He and Partha Niyogi. 2004. Locality preserving projections. In Advances in Neural Information Processing Systems. 153--160.Google Scholar
- Andrew E. G. Jonas. 2012. Region and place: Regionalism in question. Progr. Hum. Geogr. 36, 2 (2012), 263--272.Google ScholarCross Ref
- George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 1 (1998), 359--392.Google ScholarDigital Library
- Jungeun Kim, Minseo Kang, Sungsu Lim, and Jae-Gil Lee. 2015. Triangle counting in networks using a multi-level branching technique. In Proceedings of the 2015 International Conference on Big Data and Smart Computing. 47--50.Google ScholarCross Ref
- Jungeun Kim and Jae-Gil Lee. 2015. Community detection in multi-layer graphs: A survey. ACM SIGMOD Rec. 44, 3 (2015), 37--48.Google ScholarDigital Library
- Jungeun Kim, Jae-Gil Lee, and Sungsu Lim. 2016. Differential flattening: A novel framework for community detection in multi-layer graphs. ACM Trans. Intell. Syst. Technol. 8, 2 (2016), 27.Google Scholar
- Jungeun Kim, Sungsu Lim, Jae-Gil Lee, and Byung Lee. 2018. LinkBlackHole*: Robust overlapping community detection using link embedding. IEEE Trans. Knowl. Data Eng. 31, 11 (2018), 2138--2150.Google ScholarCross Ref
- Jae-Gil Lee and Minseo Kang. 2015. Geospatial big data: Challenges and opportunities. Big Data Res. 2, 2 (2015), 74--81.Google ScholarDigital Library
- Thomas Lee. 2001. An introduction to coding theory and the two-part minimum description length principle. Int. Stat. Rev. 69, 2 (2001), 169--183.Google ScholarCross Ref
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/data.Google Scholar
- Jure Leskovec, Kevin J. Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th International Conference on World Wide Web. 631--640.Google ScholarDigital Library
- Kenneth Wai-Ting Leung, Dik Lun Lee, and Wang-Chien Lee. 2011. CLR: A collaborative location recommendation framework based on co-clustering. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 305--314.Google Scholar
- Yafei Li, Rui Chen, Jianliang Xu, Qiao Huang, Haibo Hu, and Byron Choi. 2015. Geo-social K-cover group queries for collaborative spatial computing. IEEE Trans. Knowl. Data Eng. 27, 10 (2015), 2729--2742.Google ScholarDigital Library
- David Liben-Nowell and Jon M. Kleinberg. 2007. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58, 7 (2007), 1019--1031.Google ScholarCross Ref
- Richard M. Medina and George F. Hepner. 2011. Advancing the understanding of sociospatial dependencies in terrorist networks. Trans. GIS 15, 5 (2011), 577--597.Google ScholarCross Ref
- Kyriakos Mouratidis, Jing Li, Yu Tang, and Nikos Mamoulis. 2015. Joint search by social and spatial proximity. IEEE Trans. Knowl. Data Eng. 27, 3 (2015), 781--793.Google ScholarDigital Library
- Valerio Perrone, Paul A Jenkins, Dario Spano, and Yee Whye Teh. 2017. Poisson random fields for dynamic feature models. J. Mach. Learn. Res. 18, 1 (2017), 4626--4670.Google ScholarDigital Library
- Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20 (1987), 53--65.Google ScholarDigital Library
- Mohamed Sarwat, Justin J. Levandoski, Ahmed Eldawy, and Mohamed F. Mokbel. 2014. LARS*: An efficient and scalable location-aware recommender system. IEEE Trans. Knowl. Data Eng. 26, 6 (2014), 1384--1399.Google ScholarDigital Library
- Paulo Shakarian, Patrick Roos, Devon Callahan, and Cory Kirk. 2013. Mining for geographically disperse communities in social networks by leveraging distance modularity. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1402--1409.Google ScholarCross Ref
- Fanhua Shang, L. C. Jiao, and Fei Wang. 2012. Graph dual regularization non-negative matrix factorization for co-clustering. Pattern Recogn. 45, 6 (2012), 2237--2250.Google ScholarDigital Library
- Jieming Shi, Nikos Mamoulis, Dingming Wu, and David W. Cheung. 2014. Density-based place clustering in geo-social networks. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 99--110.Google Scholar
- Yves van Gennip, Blake Hunter, Raymond Ahn, Peter Elliott, Kyle Luh, Megan Halvorson, Shannon Reid, Matthew Valasik, James Wo, George E. Tita, Andrea L. Bertozzi, and P. Jeffrey Brantingham. 2013. Community detection using spectral clustering on sparse geosocial data. SIAM J. Appl. Math. 73, 1 (2013), 67--83.Google ScholarCross Ref
- Hua Wang, Feiping Nie, Heng Huang, and Chris Ding. 2011. Nonnegative matrix tri-factorization based high-order co-clustering and its fast implementation. In Proceedings of the 11th IEEE International Conference on Data Mining. 774--783.Google ScholarCross Ref
- Hao Wang, Manolis Terrovitis, and Nikos Mamoulis. 2013. Location recommendation in location-based social networks using user check-in data. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 364--374.Google ScholarDigital Library
- Meng Wang, Chaokun Wang, Jeffrey Xu Yu, and Jun Zhang. 2015. Community detection in social networks: An in-depth benchmarking study with a procedure-oriented framework. Proc. VLDB Endow. 8, 10 (2015), 998--1009.Google ScholarDigital Library
- Xiaoyang Wang, Ying Zhang, Wenjie Zhang, and Xuemin Lin. 2016. Distance-aware influence maximization in geo-social network. In Proceedings of the 32nd IEEE International Conference on Data Engineering. 1--12.Google ScholarCross Ref
- Yao Wu, Xudong Liu, Min Xie, Martin Ester, and Qing Yang. 2015. CCCF: Improving collaborative filtering via scalable user-item co-clustering. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. 73--82.Google Scholar
- De-Nian Yang, Chih-Ya Shen, Wang-Chien Lee, and Ming-Syan Chen. 2012. On socio-spatial group query for location-based social networks. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 949--957.Google ScholarDigital Library
- Hongzhi Yin, Zhiting Hu, Xiaofang Zhou, Hao Wang, Kai Zheng, Nguyen Quoc Viet Hung, and Shazia Wasim Sadiq. 2016. Discovering interpretable geo-social communities for user behavior prediction. In Proceedings of the 32nd IEEE International Conference on Data Engineering. 942--953.Google ScholarCross Ref
- Jia-Dong Zhang and Chi-Yin Chow. 2013. iGSLR: Personalized geo-social location recommendation—A kernel density estimation approach. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 334--343.Google ScholarDigital Library
- Marinka Zitnik and Blaz Zupan. 2015. Data fusion by matrix factorization. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1 (2015), 41--53.Google ScholarCross Ref
Index Terms
- Geosocial Co-Clustering: A Novel Framework for Geosocial Community Detection
Recommendations
Co-clustering over multiple dynamic data streams based on non-negative matrix factorization
Clustering multiple data streams has become an active area of research with many practical applications. Most of the early work in this area focused on one-sided clustering, i.e., clustering data streams based on feature correlation. However, recent ...
Non-negative Matrix Tri-Factorization for co-clustering
Non-negative dyadic data, that is data representing observations which relate two finite sets of objects, appear in several domain applications, such as text-mining-based information retrieval, collaborative filtering and recommender systems, micro-...
Non-Exhaustive, Overlapping Co-Clustering
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementThe goal of co-clustering is to simultaneously identify a clustering of the rows as well as the columns of a two dimensional data matrix. Most existing co-clustering algorithms are designed to find pairwise disjoint and exhaustive co-clusters. However, ...
Comments