skip to main content
research-article

Imaginary People Representing Real Numbers: Generating Personas from Online Social Media Data

Authors Info & Claims
Published:01 November 2018Publication History
Skip Abstract Section

Abstract

We develop a methodology to automate creating imaginary people, referred to as personas, by processing complex behavioral and demographic data of social media audiences. From a popular social media account containing more than 30 million interactions by viewers from 198 countries engaging with more than 4,200 online videos produced by a global media corporation, we demonstrate that our methodology has several novel accomplishments, including: (a) identifying distinct user behavioral segments based on the user content consumption patterns; (b) identifying impactful demographics groupings; and (c) creating rich persona descriptions by automatically adding pertinent attributes, such as names, photos, and personal characteristics. We validate our approach by implementing the methodology into an actual working system; we then evaluate it via quantitative methods by examining the accuracy of predicting content preference of personas, the stability of the personas over time, and the generalizability of the method via applying to two other datasets. Research findings show the approach can develop rich personas representing the behavior and demographics of real audiences using privacy-preserving aggregated online social media data from major online platforms. Results have implications for media companies and other organizations distributing content via online platforms.

References

  1. Sofiane Abbar, J. An, H. Kwak, Yacine Messaoui, and Javier Borge-Holthoefer. 2015. Consumers and suppliers: Attention asymmetries. A Case Study of Aljazeera's News Coverage and Comments. Computation+Journalism Symposium 2015, New York, NY, 2--3 October.Google ScholarGoogle Scholar
  2. Tamara Adlin and John Pruitt. 2010. The Essential Persona Lifecycle: Your Guide to Building and Using Personas: Morgan Kaufmann Publishers, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alchemy Taxonomy API. 2017. IBM Accessed 1 July https://www.ibm.com/watson/developercloud/alchemy-language.html.Google ScholarGoogle Scholar
  4. J. An, H. Kwak, and B. J. Jansen. 2016a. Validating social media data for automatic persona generation. The 2nd International Workshop on Online Social Networks Technologies (OSNT-2016), 13th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA2016). Agidar, Morocco, 29 November - 2 December.Google ScholarGoogle Scholar
  5. J. An, H. Kwak, and B. J. Jansen. 2017a. Automatic generation of personas using YouTube social media data. Proceedings of the 50th International Conference on System Sciences (HICSS-50). Waikoloa, Hawaii, 4--7 January.Google ScholarGoogle Scholar
  6. J. An, H. Kwak, and B. J. Jansen. 2017b. Personas for content creators via decomposed aggregate audience statistics. The 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2017). Sydney, Australia 31 Jul-3 Aug. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jisun An, Ho Youn Cho, Haewoon Kwak, Mohammed Ziyaad Hassen, and Bernard J. Jansen. 2016b. Towards automatic persona generation using socialmedia. The 3rd International Symposium on Social Networks Analysis, Management and Security (SNAMS2016), The 4th International Conference on Future Internet of Things and Cloud. Vienna, Austria, 29 November - 2 December.Google ScholarGoogle Scholar
  8. Hugh Beyer and Karen Holtzblatt. 1998. Contextual Design: Defining Customer-centered Systems. Morgan-Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Åsa Blomquist and Mattias Arvola. 2002. Personas in action: Ethnography in an interaction design team. Proceedings of the 2nd Nordic Conference on Human-Computer Interaction. Aarhus, Denmark. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lesly Camacho, Alejandra Gonzalez, and Solange Nice Alves-Souza. 2018. Social network data to alleviate cold-start in recommender system: A systematic review. Information Processing 8 Management 54, 4, 529--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. George Casella and Roger L. Berger. 2001. Statistical Inference. Pacific Grove, CA, Duxbury Press.Google ScholarGoogle Scholar
  13. M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. 2007. I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system. Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Christopher N. Chapman, E. Love, R. P. Milham, P. ElRif, and J. L. Alford. 2008. Quantitative evaluation of personas as information. Human Factors and Ergonomics Society 52nd Annual Meeting. New York, NY, 22--26 September.Google ScholarGoogle Scholar
  15. Christopher N. Chapman and Russell P. Milham. 2006. The Personas' New Clothes: Methodological and practical arguments against a popular method. Human Factors and Ergonomics Society Annual Meeting. San Francisco, CA, 16--20 October.Google ScholarGoogle Scholar
  16. Xihui Chen, Jun Pang, and Ran Xue. 2014. Constructing and comparing user mobility profiles. ACM Transactions on the Web (TWEB) 8, 4, Article 21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michael F. Clarke. 2015. The work of mad men that makes the methods of math men work: Practically occasioned segment design. The 33rd Annual ACM Conference on Human Factors in Computing Systems. Seoul, Republic of Korea. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alan Cooper. 2004. The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity (2nd Edition). Pearson Higher Education. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Pallavi Dharwada, Joel S. Greenstein, Anand K. Gramopadhye, and Steve J. Davis. 2007. A case study on use of personas in design and development of an audit management system. Human Factors and Ergonomics Society Annual Meeting Proceedings. Baltimore, MD, 1--5 October.Google ScholarGoogle Scholar
  20. Vidya L. Drego and Moira Dorsey. 2010. The ROI of Personas. Forrester Research.Google ScholarGoogle Scholar
  21. Alexey Drutsa, Gleb Gusev, and Pavel Serdyukov. 2017. Periodicity in user engagement with a search engine and its application to online controlled experiments. ACM Transactions on the Web (TWEB) 11, 2, 1--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Elina Eriksson, Henrik Artman, and Anna Swartling. 2013. The secret life of a persona: When the personal becomes private. The SIGCHI Conference on Human Factors in Computing Systems. Paris, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shamal Faily and Ivan Flechais. 2011. Persona cases: A technique for grounding personas. The SIGCHI Conference on Human Factors in Computing Systems. Vancouver, BC, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Erin Friess. 2012. Personas and decision making in the design process: An ethnographic case study. The SIGCHI Conference on Human Factors in Computing Systems. Austin, Texas. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kim Goodwin and Alan Cooper. 2009. Designing for the Digital Age: How to Create Human-Centered Products and Services. Indianapolis, IN, Wiley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. M. Gray. 1984. Vector quantization. IEEE ASSP Magazine 1, 2, 4--29.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Grudin and J. Pruitt. 2002. Personas, participatory design and product development: An infrastructure for engagement. Participatory Design Conference.Google ScholarGoogle Scholar
  28. Rosa Guljonsdottir and Sinna Lindquist. 2008. Personas and scenarios: Design tool or a communication device. 8th International Conference on Cooperative Systems (COOP'08). Carry-le-Rouet, France, 20--23 May.Google ScholarGoogle Scholar
  29. Frank Y. Guo, Sanjay Shamdasani, and Bruce Randall. 2011. Creating effective personas for product design: Insights from a case study. In Internationalization, Design and Global Development: 4th International Conference, IDGD 2011, Held as Part of HCI International 2011, Orlando, FL, July 9-14, 2011, P. L. Patrick Rau (Ed.). Springer Berlin, 37--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hoang Thi Bich Ngoc and Josiane Mothe. 2018. Location extraction from tweets. Information Processing 8 Management 54, 2, 129--144.Google ScholarGoogle Scholar
  31. B. J. Jansen, J. An, H. Kwak, Mohammed Ziyaad Hassen, and Ho Youn Cho. 2016. Efforts towards automatically generating personas in real-time using actual user data. Qatar Foundation Annual Research Conference 2016. Doha, Qatar, 22--23 March.Google ScholarGoogle ScholarCross RefCross Ref
  32. B. J. Jansen, Kate Sobel, and Geoff Cook. 2011. Classifying ecommerce information sharing behaviour by youths on social networking sites. Journal of Information Science 37, 2, 120--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ian Jolliffe. 2002. Principal Component Analysis (2nd ed). New York, John Wiley 8 Sons, Ltd.Google ScholarGoogle Scholar
  34. Tejinder Judge, Tara Matthews, and Steve Whittaker. 2012. Comparing collaboration and individual personas for the design and evaluation of collaboration software. SIGCHI Conference on Human Factors in Computing Systems. Austin, Texas. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Jung, J. An, H. Kwak, M. Ahmad, L. Nielsen, and B. J. Jansen. 2017. Persona Generation from aggregated social media data. ACM Conference on Human Factors in Computing Systems 2017 (CHI2017). Denver, CO, 6--11 May. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Kahneman and A. Tversky. 1972. Subjective probability: A judgment of representativeness. Cognitive Psychology 3, 3, 430--454.Google ScholarGoogle ScholarCross RefCross Ref
  37. Jeon-Hyung Kang, and Kristina Lerman. 2017. Effort mediates access to information in online social networks. ACM Transactions on the Web (TWEB) 11, 1, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. D. Krashen. 1984. Immersion: Why it works and what it has taught us. Language and Society 12, 1, 61--64.Google ScholarGoogle Scholar
  39. H. Kwak and J. An. 2014. Understanding news geography and major determinants of global news coverage of disasters. Computation+Journalism Symposium 2014. New York, NY, 24--25 October.Google ScholarGoogle Scholar
  40. H. Kwak, J. An, and B. J. Jansen. 2017. Automatic generation of personas using youtube social media data. Hawaii International Conference on System Sciences (HICSS-50). Waikoloa, Hawaii, 4--7 January.Google ScholarGoogle Scholar
  41. Haewoon Kwak, Jisun An, Joni Salminen, Soon-Gyo Jung, and Bernard J. Jansen. 2018. What we read, what we search: Media attention and public attention among 193 countries. The 2018 World Wide Web Conference. Lyon, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Daniel D. Lee and Sebastian H. Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755, 788--791.Google ScholarGoogle Scholar
  43. E. Mao and J. Zhang. 2015. What drives consumers to click on social media ads? The Roles of Content, Media, and Individual Factors. 2015 48th Hawaii International Conference on System Sciences, 5--8 Jan. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Nicola Marsden and Maren Haag. 2016. Stereotypes and politics: Reflections on Personas. The 2016 CHI Conference on Human Factors in Computing Systems. Santa Clara, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Adrienne L. Massanari. 2010. Designing for imaginary friends: Information architecture, personas, and the politics of user-centered design. New Media 8 Society 12, 4, 401--416.Google ScholarGoogle Scholar
  46. Tara Matthews, Tejinder Judge, and Steve Whittaker. 2012. How do designers and user experience professionals actually perceive and use personas? SIGCHI Conference on Human Factors in Computing Systems. Austin, TX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jennifer McGinn and Nalini Kotamraju. 2008. Data-driven persona development. SIGCHI Conference on Human Factors in Computing Systems. Florence, Italy. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. L. McHugh. 2012. Interrater reliability: The kappa statistic. Biochemia Medica 22, 3, 276--282.Google ScholarGoogle ScholarCross RefCross Ref
  49. Tomasz Miaskiewicz, Susan Jung Grant, and Kenneth A. Kozar. 2009. A preliminary examination of using personas to enhance user-centered design. AMCIS 2009 Proceedings.Google ScholarGoogle Scholar
  50. Steve Mulder and Ziv Yaar. 2006. The User Is Always Right: A Practical Guide to Creating and Using Personas for the Web. New Rider, Berkely, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Duc T. Nguyen and Jai E. Jung. 2017. Real-time event detection for online behavioral analysis of big social data. Future Generation Computer Systems 66, 137--145.Google ScholarGoogle ScholarCross RefCross Ref
  52. Lene Nielsen. 2004. Engaging Personas and Narrative Scenarios. Department of Informatics, Copenhagen Business School.Google ScholarGoogle Scholar
  53. Lene Nielsen and Kira Storgaard Hansen. 2014. Personas is applicable: A study on the use of personas in Denmark. 32nd Annual ACM Conference on Human Factors in Computing Systems. Toronto, Ontario, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Rafael B. Pereira, Alexandre Plastino, Bianca Zadrozny, and Luiz H. C. Merschmann. 2018. Correlation analysis of performance measures for multi-label classification. Information Processing 8 Management 54, 3, 359--369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Steve Portigal. 2008. Persona non grata. Last Modified January 2008 Accessed 29 December. http://www.portigal.com/wp-content/uploads/2008/01/Portigal-Consulting-White-Paper-Persona-Non-Grata.pdf.Google ScholarGoogle Scholar
  56. John Pruitt and Tamara Adlin. 2005. The Persona Lifecycle: Keeping People in Mind Throughout Product Design. Morgan-Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. John Pruitt and Jonathan Grudin. 2003. Personas: Practice and theory. 2003 Conference on Designing for User Experiences. San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Adele Revella. 2015. Buyer Personas: How to Gain Insight into Your Customer's Expectations, Align Your Marketing Strategies, and Win More Business. Wiley.Google ScholarGoogle Scholar
  59. Kerry Rodden, Hilary Hutchinson, and Xin Fu. 2010. Measuring the user experience on a large scale: User-centered metrics for web applications. SIGCHI Conference on Human Factors in Computing Systems. Atlanta, GA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Kari Rönkkö. 2005. An empirical study demonstrating how different design constraints, project organization and contexts limited the utility of personas. 38th Annual Hawaii International Conference on System Sciences, 03--06 Jan. 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Kari Rönkkö, Mats Hellman, Britta Kilander, and Yvonne Dittrich. 2004. Personas is not applicable: Local remedies interpreted in a wider context. 8th Conference on Participatory Design: Artful Integration: Interweaving Media, Materials and Practices - Volume 1. Toronto, Ontario, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Joni Salminen, Lene Nielsen, Soon-Gyo Jung, Jisun An, Haewoon Kwak, and Bernard J. Jansen. 2018. Is more better?: Impact of multiple photos on perception of persona profiles. 2018 CHI Conference on Human Factors in Computing Systems. Montreal QC, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. G. Shuradze and H. T. Wagner. 2016. Towards a conceptualization of data analytics capabilities. 2016 49th Hawaii International Conference on System Sciences (HICSS), 5--8 Jan. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Wendell R. Smith. 1956. A product differentiation and market segmentation as alternative marketing strategies. Journal of Advertising 21, 1, 3--8.Google ScholarGoogle Scholar
  65. Barbara B. Stern. 1994. A revised communication model for advertising: Multiple dimensions of the source, the message, and the recipient. Journal of Advertising 23, 2, 5--15.Google ScholarGoogle ScholarCross RefCross Ref
  66. Renata Tesch. 1990. Qualitative Research: Analysis Types and Software Tools. Psychology Press.Google ScholarGoogle Scholar
  67. Xiang Zhang, Hans-Frederick Brown, and Anil Shankar. 2016. Data-driven personas: Constructing archetypal users with clickstreams and user telemetry. 2016 CHI Conference on Human Factors in Computing Systems. Santa Clara, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Imaginary People Representing Real Numbers: Generating Personas from Online Social Media Data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on the Web
      ACM Transactions on the Web  Volume 12, Issue 4
      November 2018
      215 pages
      ISSN:1559-1131
      EISSN:1559-114X
      DOI:10.1145/3281744
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 November 2018
      • Accepted: 1 August 2018
      • Revised: 1 May 2018
      • Received: 1 August 2017
      Published in tweb Volume 12, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader