ABSTRACT
Location data are routinely available to a plethora of mobile apps and third party web services. The resulting datasets are increasingly available to advertisers for targeting and also requested by governmental agencies for law enforcement purposes. While the re-identification risk of such data has been widely reported, the discriminative power of mobility has received much less attention. In this study we fill this void with an open and reproducible method. We explore how the growing number of geotagged footprints left behind by social network users in photosharing services can give rise to inferring demographic information from mobility patterns. Chiefly among those, we provide the first detailed analysis of ethnic mobility patterns in two metropolitan areas. This analysis allows us to examine questions pertaining to spatial segregation and the extent to which ethnicity can be inferred using only location data. Our results reveal that even a few location records at a coarse grain can be sufficient for simple algorithms to draw an accurate inference. Our method generalizes to other features, such as gender, offering for the first time a general approach to evaluate discriminative risks associated with location-enabled personalization.
- Y. Altshuler, N. Aharony, M. Fire, Y. Elovici, and A. Pentland. Incremental learning with accuracy prediction of social and individual properties from mobile-phone data. In SocialCom/PASSAT, pages 969--974. IEEE, 2012. Google ScholarDigital Library
- E. Badger. This is how women feel about walking alone at night in their own neighborhoods. http://www.washingtonpost.com/blogs/wonkblog/wp/2014-/05/28/this-is-how-women-feel-about-walking-alone-at-night-in-their-own-neighborhoods/, May 2014.Google Scholar
- R. Becker, R. Cáceres, K. Hanson, S. Isaacman, J. M. Loh, M. Martonosi, J. Rowland, S. Urbanek, A. Varshavsky, and C. Volinsky. Human mobility characterization from cellular network data. Communications of the ACM, 56(1), Jan. 2013. Google ScholarDigital Library
- J. Brea, J. Burroni, M. Minnoni, and C. Sarraute. Harnessing Mobile Phone Social Network Topology to Infer Users Demographic Attributes. In SNAKDD'14: Proceedings of the 8th Workshop on Social Network Mining and Analysis. ACM Request Permissions, Aug. 2014. Google ScholarDigital Library
- J. Chang, I. Rosenn, L. Backstrom, and C. Marlow. epluribus: Ethnicity on social networks, 2010.Google Scholar
- Z. Cheng, J. Caverlee, K. Lee, and D. Sui. Exploring millions of footprints in location sharing services, 2011.Google Scholar
- E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: user movement in location-based social networks. In KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Request Permissions, Aug. 2011. Google ScholarDigital Library
- J. Cranshaw, E. Toch, J. Hong, A. Kittur, and N. Sadeh. Bridging the gap between physical location and online social networks. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing, UbiComp '10, pages 119--128, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- Y.-A. de Montjoye et al. Unique in the crowd: The privacy bounds of human mobility. Sci. Rep., 3, 2013.Google Scholar
- Y.-A. de Montjoye, J. Quoidbach, F. Robic, and A. S. Pentland. Predicting personality using novel mobile phone-based metrics. In Proceedings of the 6th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction, SBP'13, pages 48--55, Berlin, Heidelberg, 2013. Springer-Verlag. Google ScholarDigital Library
- Z. Deng and M. Ji. Deriving Rules for Trip Purpose Identification from GPS Travel Survey Data and Land Use Data: A Machine Learning Approach, chapter 72, pages 768--777. 2010.Google Scholar
- M. Duggan and J. Brenner. The demographics of social media users - 2012. Pew Research Center, 2013.Google Scholar
- T. File. Computer and internet use in the united states. http://www.census.gov/prod/2013pubs/p20--569.pdf, May 2013.Google Scholar
- M. González, C. Hidalgo, and A.-L. Barabasi. Understanding individual human mobility patterns. Nature, 2008.Google Scholar
- M. Grossglauser and D. Tse. Mobility increases the capacity of ad hoc wireless networks. Networking, IEEE/ACM Transactions on, 10(4):477--486, 2002. Google ScholarDigital Library
- S. Guha, M. Jain, and V. N. Padmanabhan. Koi: a location-privacy platform for smartphone apps. In NSDI'12: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, Apr. 2012. Google ScholarDigital Library
- Y. Hu, L. Manikonda, and S. Kambhampati. What we instagram: A first analysis of instagram photo content and user types, 2014.Google Scholar
- J. Iceland, D. Weinberg, and L. Hughes. The residential segregation of detailed Hispanic and Asian groups in the United States: 1980--2010. Demographic Research, 3:593--624, 2014.Google ScholarCross Ref
- S. Isaacman, R. Becker, R. Cáceres, S. Kobourov, M. Martonosi, J. Rowland, and A. Varshavsky. Identifying important places in people's lives from cellular network data. Pervasive Computing, pages 133--151, 2011. Google ScholarDigital Library
- S. Isaacman, R. Becker, R. Cáceres, S. Kobourov, M. Martonosi, J. Rowland, and A. Varshavsky. Ranges of human mobility in Los Angeles and New York. In Pervasive Computing and Communications Workshops (PERCOM Workshops), 2011 IEEE International Conference on, pages 88--93, 2011.Google ScholarCross Ref
- S. Isaacman, R. Becker, R. Cáceres, S. Kobourov, J. Rowland, and A. Varshavsky. A tale of two cities. In HotMobile '10: Proceedings of the Eleventh Workshop on Mobile Computing Systems & Applications. ACM Request Permissions, Feb. 2010. Google ScholarDigital Library
- Kelton. 4th annual springhill suites annual travel survey. http://news.marriott.com/springhill-suites-annual-travel-survey.html, April 2013.Google Scholar
- K. Krippendorff. Content analysis: An introduction to its methodology. SAGE, Beverly Hills, CA, USA, 1980.Google Scholar
- M.-P. Kwan. Gender, the home-work link, and space-time patterns of nonemployment activities. Economic Geography, 75(4):pp --370, 1999.Google Scholar
- N. Lathia, D. Quercia, and J. Crowcroft. The hidden image of the city: Sensing community well-being from urban mobility. In J. Kay, P. Lukowicz, H. Tokuda, P. Olivier, and A. Krüger, editors, Pervasive, volume 7319 of Lecture Notes in Computer Science, pages 91--98. Springer, 2012. Google ScholarDigital Library
- K. Lewis, J. Kaufman, and N. Christakis. The taste for privacy: An analysis of college student privacy settings in an online social network. J. Computer-Mediated Communication, 14(1):79--100, 2008.Google ScholarCross Ref
- L. Liao, D. Fox, and H. Kautz. Extracting places and activities from GPS traces using hierarchical conditional random fields. Int. J. Rob. Res., 26(1):119--134, Jan. 2007. Google ScholarDigital Library
- J. Lindamood, R. Heatherly, M. Kantarcioglu, and B. Thuraisingham. Inferring private information using social network data. In Proceedings of the 18th International Conference on World Wide Web, WWW '09, pages 1145--1146, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- F. Liu, D. Janssens, G. Wets, and M. Cools. Annotating mobile phone location data with activity purposes using machine learning algorithms. Expert Syst. Appl., 40(8):3299--3311, June 2013. Google ScholarDigital Library
- M. Madden. Privacy management on social media sites. Pew Research Center, 2012.Google Scholar
- M. Madden, A. Lenhart, S. Cortesi, U. Grasser, M. Duggan, A. Smith, and M. Beaton. Teens, social media, and privacy. Pew Research Center, 2013.Google Scholar
- C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarDigital Library
- D. S. Massey and N. A. Denton. The dimensions of residential segregation. Social Forces, 67(2):281--315, 1988.Google ScholarCross Ref
- S. McDonough and D. L. Brunsma. Navigating the color complex: How multiracial individuals narrate the elements of appearance and dynamics of color in twenty-first-century america. In R. E. Hall, editor, The Melanin Millennium. Springer, Dordrecht, 2013.Google ScholarCross Ref
- A. Mislove, S. Lehmann, Y.-Y. Ahn, J.-P. Onnela, and J. N. Rosenquist. Understanding the Demographics of Twitter Users. In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM'11). Barcelona, Spain, July 2011.Google Scholar
- A. Noulas, S. Scellato, C. Mascolo, and M. Pontil. An empirical study of geographic user activity patterns in foursquare, 2011.Google Scholar
- G. Paolacci, J. Chandler, and P. G. Ipeirotis. Running experiments on amazon mechanical turk. Judgment and Decision Making, 5(5):411--419, 2010.Google ScholarCross Ref
- F. Pedregosa et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarDigital Library
- M. Pennacchiotti and A.-M. Popescu. A machine learning approach to twitter user classification, 2011.Google Scholar
- D. Rao, D. Yarowsky, A. Shreevats, and M. Gupta. Classifying latent user attributes in twitter. In Proceedings of the 2Nd International Workshop on Search and Mining User-generated Contents, SMUC '10, pages 37--44, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- S. F. Reardon. A Conceptual Framework for Measuring Segregation and its Association with Population Outcomes, chapter 7, pages 169--192. John Wiley Sons, San Francisco, CA, USA, 2006.Google Scholar
- J. T. Roscoe and J. A. Byars. An Investigation of the Restraints with Respect to Sample Size Commonly Imposed on the Use of the Chi-Square Statistic. Journal of the American Statistical Association, 66(336):755--759, Dec. 1971.Google ScholarCross Ref
- C. Sarraute, P. Blanc, and J. Burroni. A study of age and gender seen through mobile phone usage patterns in Mexico. In Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on, pages 836--843, 2014.Google ScholarDigital Library
- C. Song, Z. Qu, N. Blumm, and A.-L. Barabási. Limits of predictability in human mobility. Science, 327(5968):1018--1021, 2010.Google ScholarCross Ref
- Statista. Social networking time per user in the united states in july 2012, by ethnicity (in hours and minutes). http://www.statista.com/statistics/248158/social-networking-time-per-us-user-by-ethnicity/, 2012.Google Scholar
- United States Census Bureau. 2010 census. http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml, 2010.Google Scholar
- United States v. Jones. 2012. 132 S. Ct. 945, 955 (Sotomayor, J., concurring) (quoting People v. Weaver, 12 N.Y.3d 433, 441--42 (2009)).Google Scholar
- M. J. White. Segregation and diversity measures in population distribution. Population Index, 52(2):198--221, 1986.Google ScholarCross Ref
- H. Zang and J. Bolot. Anonymization of location data does not work: a large-scale measurement study. In MobiCom '11: Proceedings of the 17th annual international conference on Mobile computing and networking. ACM Request Permissions, Sept. 2011. Google ScholarDigital Library
- Y. Zhong, N. J. Yuan, W. Zhong, F. Zhang, and X. Xie. You are where you go: Inferring demographic attributes from location check-ins. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM '15, pages 295--304, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
Index Terms
- "I don't have a photograph, but you can have my footprints.": Revealing the Demographics of Location Data
Recommendations
Detecting Overlapping Communities in LBSNs with Enhanced Location Privacy
WCI '15: Proceedings of the Third International Symposium on Women in Computing and InformaticsLocation based social network (LBSNs) for instance Facebook places and Twitter provides large amount of data which allows service providers to create several applications like group marketing, friend and location recommendations, trend inquiry etc. ...
Learning to rank for spatiotemporal search
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data miningIn this article we consider the problem of mapping a noisy estimate of a user's current location to a semantically meaningful point of interest, such as a home, restaurant, or store. Despite the poor accuracy of GPS on current mobile devices and the ...
Privacy in (mobile) Telecommunications Services
Telecommunications services are for long subject to privacy regulations. At stake are traditionally: privacy of the communication and the protection of traffic data. Privacy of the communication is legally founded. Traffic data subsume under the notion ...
Comments