ABSTRACT
Online services are playing critical roles in almost all aspects of users' life. Users usually have multiple online identities (IDs) in different online services. In order to fuse the separated user data in multiple services for better business intelligence, it is critical for service providers to link online IDs belonging to the same user. On the other hand, the popularity of mobile networks and GPS-equipped smart devices have provided a generic way to link IDs, i.e., utilizing the mobility traces of IDs. However, linking IDs based on their mobility traces has been a challenging problem due to the highly heterogeneous, incomplete and noisy mobility data across services.
In this paper, we propose DPLink, an end-to-end deep learning based framework, to complete the user identity linkage task for heterogeneous mobility data collected from different services with different properties. DPLink is made up by a feature extractor including a location encoder and a trajectory encoder to extract representative features from trajectory and a comparator to compare and decide whether to link two trajectories as the same user. Particularly, we propose a pre-training strategy with a simple task to train the DPLink model to overcome the training difficulties introduced by the highly heterogeneous nature of different source mobility data. Besides, we introduce a multi-modal embedding network and a co-attention mechanism in DPLink to deal with the low-quality problem of mobility data. By conducting extensive experiments on two real-life ground-truth mobility datasets with eight baselines, we demonstrate that DPLink outperforms the state-of-the-art solutions by more than 15% in terms of hit-precision. Moreover, it is expandable to add external geographical context data and works stably with heterogeneous noisy mobility traces. Our code is publicly available1.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473(2014).Google Scholar
- Wei Cao, Zhengwei Wu, Dong Wang, Jian Li, and Haishan Wu. 2016. Automatic user identification method across heterogeneous mobility data sources. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 978-989.Google ScholarCross Ref
- Alket Cecaj, Marco Mamei, and Franco Zambonelli. 2016. Re-identification and information fusion between anonymized CDR and social network data. Journal of Ambient Intelligence and Humanized Computing 7, 1 (2016), 83-96.Google ScholarCross Ref
- Wei Chen, Hongzhi Yin, Weiqing Wang, Lei Zhao, and Xiaofang Zhou. 2018. Effective and Efficient User Account Linkage Across Location Based Social Networks. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 1085-1096.Google Scholar
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555(2014).Google Scholar
- Jie Feng, Yong Li, Chao Zhang, Funing Sun, Fanchao Meng, Ang Guo, and Depeng Jin. 2018. Deepmove: Predicting human mobility with attentional recurrent networks. In Proceedings of the 2018 World Wide Web Conference on World Wide Web (WWW). International World Wide Web Conferences Steering Committee, 1459-1468. Google ScholarDigital Library
- Qiang Gao, Fan Zhou, Kunpeng Zhang, Goce Trajcevski, Xucheng Luo, and Fengli Zhang. 2017. Identifying human mobility via trajectory embeddings. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI). Google ScholarDigital Library
- Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. 2013. Exploiting innocuous activity for correlating users across sites. In Proceedings of the 22nd international conference on World Wide Web (WWW). Google ScholarDigital Library
- Oana Goga, Patrick Loiseau, Robin Sommer, Renata Teixeira, and Krishna P Gummadi. 2015. On the reliability of profile matching across large online social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1799-1808. Google ScholarDigital Library
- Marta C Gonzalez, Cesar A Hidalgo, and Albert-Laszlo Barabasi. 2008. Understanding individual human mobility patterns. Nature 453, 7196 (2008), 779-782.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural computation 9 8(1997), 1735-80.Google Scholar
- Shouling Ji, Weiqing Li, Neil Zhenqiang Gong, Prateek Mittal, and Raheem A Beyah. 2015. On Your Social Network De-anonymizablity: Quantification and Large Scale Evaluation with Seed Knowledge.. In Proceedings of the Network and Distributed System Security Symposium (NDSS).Google ScholarCross Ref
- Shouling Ji, Weiqing Li, Mudhakar Srivatsa, and Raheem Beyah. 2014. Structural data de-anonymization: Quantification, practice, and implications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1040-1053. Google ScholarDigital Library
- Ethan Katz-Bassett, John P John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson, and Yatin Chawathe. 2006. Towards IP geolocation using delay and topology measurements. In Proceedings of the ACM SIGCOMM conference on Internet Measurement (IMC). Google ScholarDigital Library
- Nitish Korula and Silvio Lattanzi. 2014. An efficient reconciliation algorithm for social networks. Proceedings of the VLDB Endowment 7, 5 (2014), 377-388. Google ScholarDigital Library
- Shamanth Kumar, Reza Zafarani, and Huan Liu. 2011. Understanding User Migration Patterns in Social Media. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). Google ScholarDigital Library
- Xiucheng Li, Kaiqi Zhao, Gao Cong, Christian S Jensen, and Wei Wei. 2018. Deep Representation Learning for Trajectory Similarity Computation. (2018).Google Scholar
- Ziqian Lin, Jie Feng, Ziyang Lu, Yong Li, and Depeng Jin. 2019. DeepSTN+: Context-aware Spatial-Temporal Neural Network for Crowd Flow Prediction in Metropolis. In AAAI.Google Scholar
- Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2016. Predicting the Next Location: A Recurrent Model with Spatial and Temporal Contexts. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). Google ScholarDigital Library
- Chris YT Ma, David KY Yau, Nung Kwan Yip, and Nageswara SV Rao. 2013. Privacy vulnerability of published anonymous mobility traces. IEEE/ACM Transactions on Networking (TON)(2013). Google ScholarDigital Library
- Nehal Magdy, Mahmoud A. Sakr, Tamer Mostafa, and Khaled El-Bahnasy. 2016. Review on trajectory similarity measures. In IEEE Seventh International Conference on Intelligent Computing and Information Systems.Google Scholar
- Farid M Naini, Jayakrishnan Unnikrishnan, Patrick Thiran, and Martin Vetterli. 2016. Where you are is who you are: User identification by matching statistics. IEEE Transactions on Information Forensics and Security (TIFS) (2016).Google Scholar
- Arvind Narayanan and Vitaly Shmatikov. 2008. Robust de-anonymization of large sparse datasets. In Proceedings of the IEEE Symposium on Security and Privacy (SP). Google ScholarDigital Library
- Christopher Riederer, Yunsung Kim, Augustin Chaintreau, Nitish Korula, and Silvio Lattanzi. 2016. Linking users across domains with location data: Theory and validation. In Proceedings of the 25th International Conference on World Wide Web (WWW). 707-719. Google ScholarDigital Library
- Luca Rossi and Mirco Musolesi. 2014. It's the way you check-in: identifying users in location-based social networks. In Proceedings of the second ACM Conference on Online Social Networks (COSN). Google ScholarDigital Library
- Reza Shokri, George Theodorakopoulos, Jean-Yves Le Boudec, and Jean-Pierre Hubaux. 2011. Quantifying location privacy. In Proceedings of the IEEE Symposium on Security and Privacy (SP). Google ScholarDigital Library
- Chaoming Song, Zehui Qu, Nicholas Blumm, and Albert-László Barabási. 2010. Limits of predictability in human mobility. Science 327, 5968 (2010), 1018-1021.Google Scholar
- Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28, 1 (1972), 11-21.Google ScholarCross Ref
- Mudhakar Srivatsa and Mike Hicks. 2012. Deanonymizing mobility traces: Using social network as a side-channel. In Proceedings of the 2012 ACM conference on Computer and communications security. ACM, 628-637. Google ScholarDigital Library
- Zhen Tu, Kai Zhao, Fengli Xu, Yong Li, Li Su, and Depeng Jin. 2018. Protecting Trajectory from Semantic Attack Considering k-Anonymity, l-diversity and t-closeness. IEEE Transactions on Network and Service Management (2018).Google Scholar
- Gang Wang, Sarita Yardi Schoenebeck, Haitao Zheng, and Ben Y. Zhao. 2016. ”Will Check-in for Badges”: Understanding Bias and Misbehavior on Location-Based Social Networks. In Proceedings of the International Conference on Web and Social Media (ICWSM).Google Scholar
- Huandong Wang, Chen Gao, Yong Li, Gang Wang, Depeng Jin, and Jingbo Sun. 2018. De-anonymization of Mobility Trajectories: Dissecting the Gaps between Theory and Practice. In Proceedings of the Network and Distributed System Security Symposium (NDSS).Google ScholarCross Ref
- Huandong Wang, Chen Gao, Yong Li, Zhi-Li Zhang, and Depeng Jin. 2017. From Fingerprint to Footprint: Revealing Physical World Privacy Leakage by Cyberspace Cookie Logs. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM). 1209-1218. Google ScholarDigital Library
- Huandong Wang, Yong Li, Gang Wang, and Depeng Jin. 2018. You Are How You Move: Linking Multiple User Identities From Massive Mobility Traces. In Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM, 189-197.Google ScholarCross Ref
- Fengli Xu, Zhen Tu, Yong Li, Pengyu Zhang, Xiaoming Fu, and Depeng Jin. 2017. Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data. In Proceedings of the 26th International Conference on World Wide Web (WWW. 1241-1250. Google ScholarDigital Library
- Fengli Xu, Guozhen Zhang, Zhilong Chen, Jiaxin Huang, Yong Li, Diyi Yang, Ben Y Zhao, and Fanchao Meng. 2018. Understanding Motivations behind Inaccurate Check-ins. Proceedings of the ACM on Human-Computer Interaction (CSCW) (2018). Google ScholarDigital Library
- Ming Yan, Jitao Sang, Tao Mei, and Changsheng Xu. 2013. Friend transfer: cold-start friend recommendation with cross-platform transfer learning of social knowledge. In Proceedings of the International Conference on Multimedia and Expo (ICME).Google Scholar
- Chunfeng Yang, Huan Yan, Donghan Yu, Yong Li, and Dah Ming Chiu. 2017. Multi-site User Behavior Modeling and Its Application in Video Recommendation. In Proceedings of the International ACM Conference on Research and Development in Information Retrieval (SIGIR). Google ScholarDigital Library
- Di Yao, Chao Zhang, Zhihua Zhu, Jianhui Huang, and Jingping Bi. 2017. Trajectory clustering via deep representation learning. In International Joint Conference on Neural Networks (IJCNN).Google ScholarCross Ref
- Reza Zafarani and Huan Liu. 2014. Finding Friends on a New Site Using Minimum Information. In Proceedings of the SIAM International Conference on Data Mining (SDM).Google ScholarCross Ref
- Jiawei Zhang, Xiangnan Kong, and Philip S. Yu. 2014. Transferring heterogeneous links across location-based social networks. In WSDM. Google ScholarDigital Library
- Zefang Zong, Jie Feng, Kechun Liu, Hongzhi Shi, and Yong Li. 2019. DeepDPM: Dynamic Population Mapping via Deep Neural Network. In AAAI.Google Scholar
Recommendations
Learning to Simulate Human Mobility
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningRealistic simulation of a massive amount of human mobility data is of great use in epidemic spreading modeling and related health policy-making. Existing solutions for mobility simulation can be classified into two categories: model-based methods and ...
User Identity Linkage via Co-Attentive Neural Network From Heterogeneous Mobility Data
Online services are playing critical roles in almost all aspects of users’ life. Users usually have multiple online identities (IDs) in different online services. In order to fuse the separated user data in multiple services for better business ...
Variational Cross-Network Embedding for Anonymized User Identity Linkage
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementUser identity linkage (UIL) task aims to infer the identical users between different social networks/platforms. Existing models leverage the labeled inter-linkages or high-quality user attributes to make predictions. Nevertheless, it is often difficult ...
Comments