ABSTRACT
Location-based data is increasingly prevalent with the rapid increase and adoption of mobile devices. In this paper we address the problem of learning spatial density models, focusing specifically on individual-level data. Modeling and predicting a spatial distribution for an individual is a challenging problem given both (a) the typical sparsity of data at the individual level and (b) the heterogeneity of spatial mobility patterns across individuals. We investigate the application of kernel density estimation (KDE) to this problem using a mixture model approach that can interpolate between an individual's data and broader patterns in the population as a whole. The mixture-KDE approach is evaluated on two large geolocation/check-in data sets, from Twitter and Gowalla, with comparisons to non-KDE baselines, using both log-likelihood and detection of simulated identity theft as evaluation metrics. Our experimental results indicate that the mixture-KDE method provides a useful and accurate methodology for capturing and predicting individual-level spatial patterns in the presence of noisy and sparse data.
Supplemental Material
- Twitter streaming api. https://dev.twitter.com/docs/using-search.Google Scholar
- J. Bithell. An application of density estimation to geographical epidemiology. Statistics in Medicine, 9(6):691--701, 1990.Google ScholarCross Ref
- L. Breiman, W. Meisel, and E. Purcell. Variable kernel estimates of multivariate densities. Technometrics, 19(2):135--144, 1977.Google ScholarCross Ref
- D. Brockmann, L. Hufnagel, and T. Geisel. The scaling laws of human travel. Nature, 439(7075):462--465, 2006.Google ScholarCross Ref
- J. Chang and E. Sun. Location 3: How users share and respond to location-based data on social networking sites. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, pages 74--80, 2011.Google Scholar
- C. Cheng, H. Yang, I. King, and M. R. Lyu. Fused matrix factorization with geographical and social influence in location-based social networks. In Proceedings of the 26th AAAI, pages 17--23, 2012.Google Scholar
- E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: user movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1082--1090, ACM, 2011. Google ScholarDigital Library
- Federal Trade Commission Identity theft survey report, 2006. URL http://www.ftc.gov/reports/federal-trade-commission-2006-identity-theft-survey-report-prepared-commission-synovateGoogle Scholar
- J. Cranshaw, R. Schwartz, J. I. Hong, and N. M. Sadeh. The livehoods project: Utilizing social media to understand the dynamics of a city. In Proceedings of the Sixth ICWSM, pages 58--65, 2012.Google Scholar
- J. Cranshaw and T. Yano. Seeing a home away from the home: Distilling proto-neighborhoods from incidental data with latent topic modeling. In CSSWC Workshop at NIPS, 2010.Google Scholar
- N. Donthu and R. T. Rust. Estimating geographic customer densities using kernel density estimation. Marketing Science, 8(2):191--203, 1989.Google ScholarDigital Library
- N. Eagle and A. S. Pentland. Eigenbehaviors: Identifying structure in routine. Behavioral Ecology and Sociobiology, 63(7):1057--1066, 2009.Google ScholarCross Ref
- M. D. Escobar and M. West. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90(430):577--588, 1995.Google ScholarCross Ref
- J. Fieberg. Kernel density estimators of home range: smoothing and the autocorrelation red herring. Ecology, 88(4):1059--1066, 2007.Google ScholarCross Ref
- V. Frias-Martinez, V. Soto, H. Hohwald, and E. Frias-Martinez. Characterizing urban landscapes using geolocated tweets. In Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom), pages 239--248. IEEE, 2012. Google ScholarDigital Library
- M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi. Understanding individual human mobility patterns. Nature, 453(7196):779--782, 2008.Google ScholarCross Ref
- A. G. Gray and A. W. Moore. Nonparametric density estimation: Toward computational tractability. In Proceeding of the 2003 SIAM International Conference of Data Mining, pages 203--211, 2003.Google ScholarCross Ref
- S. Hasan, X. Zhan, and S. V. Ukkusuri. Understanding urban human activity and mobility patterns using large-scale location-based data from online social media. In Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing, ACM, 2013. Google ScholarDigital Library
- K. Joseph, C. H. Tan, and K. M. Carley. Beyond local, categories and friends: clustering foursquare users with latent topics. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, pages 919--926. ACM, 2012. Google ScholarDigital Library
- R. Lee, S. Wakamiya, and K. Sumiya. Urban area characterization based on crowd behavioral lifelogs over twitter. Personal and Ubiquitous Computing, 17(4):605--620, 2013. Google ScholarDigital Library
- Z. Li, B. Ding, J. Han, R. Kays, and P. Nye. Mining periodic behaviors for moving objects. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1099--1108. ACM, 2010. Google ScholarDigital Library
- K. P. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012. Google ScholarDigital Library
- A. Sadilek, H. A. Kautz, and V. Silenzio. Modeling spread of disease from social interactions. In Proceedings of the Sixth AAAI International Conference on Weblogs and Social Media (ICWSM), pages 322--329, 2012.Google Scholar
- S. Scellato, M. Musolesi, C. Mascolo, V. Latora, and A. T. Campbell. Nextplace: a spatio-temporal prediction framework for pervasive systems. In Pervasive Computing, pages 152--169. Springer, 2011. Google ScholarDigital Library
- B. W. Silverman. Density Estimation for Statistics and Data Analysis. CRC press, 1986.Google Scholar
- P. Smyth and D. Wolpert. Linearly combining density estimators via stacking. Machine Learning, 36(1--2):59--83, 1999. Google ScholarDigital Library
- L. Song, D. Kotz, R. Jain, and X. He. Evaluating next-cell predictors with extensive wi-fi mobility data. Mobile Computing, IEEE Transactions on, 5(12):1633--1649, 2006. Google ScholarDigital Library
- S. J. Vaughan-Nichols. Will mobile computing's future be location, location, location? Computer, 42(2):14--17, 2009. Google ScholarDigital Library
- J.-D. Zhang and C.-Y. Chow. igslr: personalized geo-social location recommendation: a kernel density estimation approach. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 324--333. ACM, 2013. Google ScholarDigital Library
Index Terms
- Modeling human location data with mixtures of kernel densities
Recommendations
A continuation approach to mode-finding of multivariate Gaussian mixtures and kernel density estimates
Gaussian mixtures (i.e. linear combinations of multivariate Gaussian probability densities) appear in numerous applications due to their universal ability to approximate multimodal probability distributions. Finding the modes (maxima) of a Gaussian ...
Epanechnikov kernel for PDF estimation applied to equalization and blind source separation
Highlights- Using a non-Gaussian kernel (Epanechnikov kernel) for kernel density estimation.
AbstractInformation Theoretic Learning (ITL) methods have been applied in a variety of applications as dynamic modeling, equalization and blind source separation. Usually, such methods involve the estimation of the probability density function ...
Multivariate online kernel density estimation with Gaussian kernels
We propose a novel approach to online estimation of probability density functions, which is based on kernel density estimation (KDE). The method maintains and updates a non-parametric model of the observed data, from which the KDE can be calculated. We ...
Comments