research-article

Telco User Activity Level Prediction with Massive Mobile Broadband Data

Authors:
Chen Luo

Soochow University, Suzhou, China

Soochow University, Suzhou, China
View Profile

,
Jia Zeng

Soochow University and Huawei Noah's Ark Lab, Huawei Noah's Ark Lab, Hong Kong

Soochow University and Huawei Noah's Ark Lab, Huawei Noah's Ark Lab, Hong Kong
View Profile

,
Mingxuan Yuan

Huawei Noah's Ark Lab, Hong Kong, Hong Kong

Huawei Noah's Ark Lab, Hong Kong, Hong Kong
View Profile

,
Wenyuan Dai

Fourth Paradigm Technology Co. Ltd., Beijing, China

Fourth Paradigm Technology Co. Ltd., Beijing, China
View Profile

,
Qiang Yang

Hong Kong University of Science and Technology, Hong Kong

Hong Kong University of Science and Technology, Hong Kong
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 7 Issue 4Article No.: 63pp 1–30https://doi.org/10.1145/2856057

Published:02 May 2016Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Telecommunication (telco) operators aim to provide users with optimized services and bandwidth in a timely manner. The goal is to increase user experience while retaining profit. To do this, knowing the changing behavior patterns of users through their activity levels in advance can be a great help for operators to adjust their management strategies and reduce operational risk. To achieve this goal, the operators can make use of knowledge discovered from telco’s historical mobile broadband (MBB) records to predict mobile access activity level at an early stage. In this article, we report our research in a real-world telco setting involving more than one million telco users. Our novel contribution includes representing users as documents containing a collection of changing spatiotemporal “words” that express user behavior. By extracting users’ space-time access records in MBB data, we use latent Dirichlet allocation (LDA) to learn user-specific compact topic features for user activity level prediction. We propose a scalable online expectation-maximization (OEM) algorithm that can scale LDA to massive MBB data, which is significantly faster than several state-of-the-art online LDA algorithms. Using these real-world MBB data, we confirm high performance in user activity level prediction. In addition, we show that the inferred topics indicate that future activity level anomalies correlate highly with early skewed bandwidth supply and demand relations. Thus, our prediction system can also guide the telco operators to balance the telecommunication network in terms of supply-demand relations, saving deployment costs and energy of cell towers in the future.

References

Jae-Hyeon Ahna, Sang-Pil Hana, and Yung-Seop Lee. 2006. Customer churn analysis: Churn determinants and mediation effects of partial defection in the Korean mobile telecommunications service industry. Telecommunications Policy 30, 552--568.Google ScholarCross Ref
Arthur Asuncion, Max Welling, Padhraic Smyth, and Yee Whye Teh. 2009. On smoothing and inference for topic models. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI’09). 27--34. Google ScholarDigital Library
C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarDigital Library
David Blei, Andrew Y. Ng, and Michael Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993--1022. Google ScholarDigital Library
David M. Blei. 2012. Introduction to probabilistic topic models. Communications of the ACM 55, 4, 77--84. Google ScholarDigital Library
Leo Breiman. 2001. Random forests. Machine Learning 45, 5--32. Google ScholarDigital Library
Olivier Cappé and Eric Moulines. 2009. Online expectation-maximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B 71, 3, 593--613.Google ScholarCross Ref
Tianqi Chen. 2015. Large-Scale and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and More. Retrieved March 13, 2016, from https://github.com/dmlc/xgboost.Google Scholar
Hong Cheng, Jihang Ye, and Zhe Zhu. 2013. What’s your next move: User activity prediction in location-based social networks. In Proceedings of the 2013 SIAM International Conference on Data Mining (SDM’13). 171--179.Google Scholar
Koustuv Dasgupta, Rahul Singh, Balaji Viswanathan, Dipanjan Chakraborty, Sougata Mukherjea, Amit A. Nanavati, and Anupam Joshi. 2008. Social ties and their relevance to churn in mobile telecom networks. In Proceedings of the 11th International Conference on Extending Database Technology (EDBT’08). 668--677. Google ScholarDigital Library
N. de Freitas and K. Barnard. 2001. Bayesian Latent Semantic Analysis of Multimedia Databases. Technical Report. University of British Columbia. Google ScholarDigital Library
A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1--38.Google ScholarCross Ref
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871--1874. Google ScholarDigital Library
James Foulds, Levi Boyles, Christopher DuBois, Padhraic Smyth, and Max Welling. 2013. Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). 446--454. Google ScholarDigital Library
Fosca Giannotti, Mirco Nanni, Fabio Pinelli, and Dino Pedreschi. 2007. Trajectory pattern mining. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’07). 330--339. Google ScholarDigital Library
Marta C. Gonzalez, Cesar A. Hidalgo, and Albert-Laszlo Barabasi. 2008. Understanding individual human mobility patterns. Nature 453, 7196, 779--782.Google Scholar
T. L. Griffiths and M. Steyvers. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences 101, 5228--5235.Google ScholarCross Ref
Isabelle Guyon, Vincent Lemaire, Marc Boullé, Gideon Dror, and David Vogel. 2009. Analysis of the KDD Cup 2009: Fast scoring on a large orange customer database. Journal of Machine Learning Research 7 1--22.Google Scholar
Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786, 504--507.Google Scholar
Yap Kok Ho. 2011. Managing user experience for MBB. Huawei Communicate 60, 19--21.Google Scholar
M. Hoffman, D. Blei, and F. Bach. 2010. Online learning for latent Dirichlet allocation. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS’10). 856--864.Google Scholar
Baoxing Huai, Enhong Chen, Hengshu Zhu, Hui Xiong, Tengfei Bao, Qi Liu, and Jilei Tian. 2014. Toward personalized context recognition for mobile users: A semisupervised Bayesian HMM approach. ACM Transactions on Knowledge Discovery from Data 9, 2, 10. Google ScholarDigital Library
Shu Huang, Min Chen, Bo Luo, and Dongwon Lee. 2012. Predicting aggregate social activities using continuous-time stochastic process. In Proceedings of the 21st ACM Conference on Information and Knowledge Management (CIKM’12). 982--991. Google ScholarDigital Library
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 675--678. Google ScholarDigital Library
Shan Jiang, Joseph Ferreira Jr, and Marta C. Gonzalez. 2012. Discovering urban spatial-temporal structure from human activity patterns. In Proceedings of the KDD Workshop on Urban Computing. 95--102. Google ScholarDigital Library
Shan Jiang, Gaston A. Fiore, Yingxiang Yang, Joseph Ferreira Jr, Emilio Frazzoli, and Marta C. González. 2013. A review of urban computing for mobile phone traces: Current methods, challenges and opportunities. In Proceedings of the KDD Workshop on Urban Computing. 2--9. Google ScholarDigital Library
Enric Junqeé de Fortuny, David Martens, and Foster Provost. 2013. Predictive modeling with big data: Is bigger really better. Big Data 1, 215--226.Google ScholarCross Ref
Marcel Karnstedt, Matthew Rowe, Jeff Chan, Harith Alani, and Conor Hayes. 2011. The effect of user features on churn in social networks. In Proceedings of the ACM Web Science Conference. 14--17. Google ScholarDigital Library
P. Liang and D. Klein. 2009. Online EM for unsupervised models. In Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL. 611--619. Google ScholarDigital Library
Zhiyuan Liu, Yuzhou Zhang, Edward Y. Chang, and Maosong Sun. 2011. PLDA+: Parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Transactions on Intelligent Systems and Technology 2, 3, 26. Google ScholarDigital Library
Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA. Google ScholarDigital Library
R. M. Neal and G. E. Hinton. 1998. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models 89, 355--368. Google ScholarDigital Library
Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2001. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14 (NIPS’01).Google Scholar
Huy Pham, Cyrus Shahabi, and Yan Liu. 2013. EBM: An entropy-based model to infer social strength from spatiotemporal data. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 265--276. Google ScholarDigital Library
I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. 2008. Fast collapsed Gibbs sampling for latent Dirichlet allocation. In Proceedings of the KDD Conference. 569--577. Google ScholarDigital Library
Yossi Richter, Elad Yom-Tov, and Noam Slonim. 2010. Predicting customer churn in mobile networks through analysis of social groups. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10). 732--741.Google ScholarCross Ref
H. Robbins and S. Monro. 1951. A stochastic approximation method. Annals of Mathematical Statistics 22, 3, 400--407.Google ScholarCross Ref
C. Song, T. Koren, P. Wang, and A.-L. Barabási. 2010. Modelling the scaling properties of human mobility. Nature Physics 6, 10, 818--823.Google ScholarCross Ref
Lu-An Tang, Yu Zheng, Jing Yuan, Jiawei Han, Alice Leung, Wen-Chih Peng, and Thomas La Porta. 2013. A framework of traveling companion discovery on trajectory data streams. ACM Transactions on Intelligent Systems and Technology 5, 3. Google ScholarDigital Library
Yee Whye Teh, David Newman, and Max Welling. 2006. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems (NIPS’06). 1353--1360.Google Scholar
Jameson L. Toole, Michael Ulm, Marta C. González, and Dietmar Bauer. 2012. Inferring land use from mobile phone activity. In Proceedings of the KDD Workshop on Urban Computing. 1--8. Google ScholarDigital Library
P. Wang, T. Hunter, A. M. Bayen, K. Schechtner, and M. C. González. 2012. Understanding road usage patterns in urban areas. Scientific Reports 2, 1001.Google ScholarCross Ref
Yi Wang, Xuemin Zhao, Zhenlong Sun, Hao Yan, Lifeng Wang, Zhihui Jin, Liubin Wang, Yang Gao, Ching Law, and Jia Zeng. 2015. Peacock: Learning long-tail topic features for industrial applications. ACM Transactions on Intelligent Systems and Technology 6, 4, Article No. 47. Google ScholarDigital Library
Kuan-Wei Wu, Chun-Sung Ferng, Chia-Hua Ho, An-Chun Liang, Chun-Heng Huang, Wei-Yuan Shen, Jyun-Yu Jiang, Ming-Hao Yang, Ting-Wei Lin, Ching-Pei Lee, and others. 2012. A two-stage ensemble of diverse models for advertisement ranking in KDD Cup 2012. In Proceedings of the KDD Cup Workshop.Google Scholar
Limin Yao, David Mimno, and Andrew McCallum. 2009. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). 937--946. Google ScholarDigital Library
Hsiang-Fu Yu, Hung-Yi Lo, Hsun-Ping Hsieh, Jing-Kai Lou, Todd G McKenzie, Jung-Wei Chou, Po-Han Chung, Chia-Hua Ho, Chun-Fu Chang, Yin-Hsuan Wei, and others. 2010. Feature engineering and classifier ensemble for KDD Cup 2010. In Proceedings of the KDD Cup Workshop.Google Scholar
J. Yuan, Y. Zheng, and X. Xie. 2012. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). 186--194. Google ScholarDigital Library
Mingxuan Yuan, Ke Deng, Jia Zeng, Yanhua Li, Bing Ni, Xiuqiang He, Fei Wang, Wenyuan Dai, and Qiang Yang. 2014. OceanST: A distributed analytic system for large-scale spatiotemporal mobile broadband data. In Proceedings of the 40th International Conference on Very Large Data Bases (VLDB’14). 1561--1564. Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’10). Google ScholarDigital Library
Jia Zeng, William K. Cheung, and Jiming Liu. 2013. Learning topic models by belief propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 5, 1121--1134. Google ScholarDigital Library
Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. 2014. Urban computing: Concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology 5, 3, Article No. 38. Google ScholarDigital Library
Yu Zheng and Xing Xie. 2011. Learning travel recommendations from user-generated GPS traces. ACM Transactions on Intelligent Systems and Technology 2, 2. Google ScholarDigital Library
Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, and Tianrui Li. 2015. Forecasting fine-grained air quality based on big data. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2267--2276. Google ScholarDigital Library
Yu Zheng and Xiaofang Zhou. 2011. Computing with Spatial Trajectories. Springer. Google ScholarDigital Library
Hengshu Zhu, Enhong Chen, Hui Xiong, Kuifei Yu, Huanhuan Cao, and Jilei Tian. 2014. Mining mobile user preferences for personalized context-aware recommendation. ACM Transactions on Intelligent Systems and Technology 5, 4, 58. Google ScholarDigital Library
Yin Zhu, Erheng Zhong, Sinno Jialin Pan, Xiao Wang, Minzhe Zhou, and Qiang Yang. 2013. Predicting user activity level in social networks. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). 159--168. Google ScholarDigital Library

Index Terms

Telco User Activity Level Prediction with Massive Mobile Broadband Data
1. Information systems
  1. Information systems applications

Recommendations

Analysis of mobile broadband competition: 3G vs. WiFi

This paper analyses optimal pricing of two different platforms of broadband mobile internet access where one provider uses third generation (3G) and the other WiFi. The authors utilised a game theoretic competition model considering population density, ...
Read More
The broadband digital divide and the economic benefits of mobile broadband for rural areas

Broadband is becoming increasingly important to national economies and the personal lives of users. However, broadband availability and adoption are not diffusing in rural and urban areas at the same rates. This article updates the rural broadband ...
Read More
Fixed-to-mobile substitution in the European Union

This paper analyzes substitution between access to fixed-line and mobile telephony in the European Union using cross-section panel data on households' choices of telecommunications technologies in years 2005-2010. We estimate a structural model of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 7, Issue 4
Special Issue on Crowd in Intelligent Systems, Research Note/Short Paper and Regular Papers
July 2016
498 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2906145
Editor:
Yu Zheng
Microsoft Research, China
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 May 2016
- Accepted: 1 December 2015
- Revised: 1 November 2015
- Received: 1 February 2015
Published in tist Volume 7, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Mobile broadband
OEM algorithm
activity level prediction
big spatiotemporal data
latent Dirichlet allocation
user-specific topic features
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 418
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Telco User Activity Level Prediction with Massive Mobile Broadband Data

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Analysis of mobile broadband competition: 3G vs. WiFi

The broadband digital divide and the economic benefits of mobile broadband for rural areas

Fixed-to-mobile substitution in the European Union