skip to main content
10.1145/2858036.2858107acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Public Access

Unsupervised Clickstream Clustering for User Behavior Analysis

Authors Info & Claims
Published:07 May 2016Publication History

ABSTRACT

Online services are increasingly dependent on user participation. Whether it's online social networks or crowdsourcing services, understanding user behavior is important yet challenging. In this paper, we build an unsupervised system to capture dominating user behaviors from clickstream data (traces of users' click events), and visualize the detected behaviors in an intuitive manner. Our system identifies "clusters" of similar users by partitioning a similarity graph (nodes are users; edges are weighted by clickstream similarity). The partitioning process leverages iterative feature pruning to capture the natural hierarchy within user clusters and produce intuitive features for visualizing and understanding captured user behaviors. For evaluation, we present case studies on two large-scale clickstream traces (142 million events) from real social networks. Our system effectively identifies previously unknown behaviors, e.g., dormant users, hostile chatters. Also, our user study shows people can easily interpret identified behaviors using our visualization tool.

References

  1. 1. E. Adar, J. Teevan, and S. T. Dumais. 2008. Large Scale Analysis of Web Revisitation Patterns. In Proc. of CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2. F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida. 2009. Characterizing User Behavior in Online Social Networks. In Proc. of IMC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3. N. Bhatti, A. Bouch, and A. Kuchinsky. 2000. Integrating user-perceived quality into Web server design. Computer Networks 33, 1--6 (2000), 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4. V. D. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre. 2008. Fast unfolding of communities in large networks. JSTAT 2008, 10 (2008).Google ScholarGoogle Scholar
  5. 5. A. Brahaj. 2009. English Stop Words. http://xpo6. com/list-of-english-stop-words/. (2009).Google ScholarGoogle Scholar
  6. 6. D. Correa, L. A. Silva, M. Mondal, F. Benevenuto, and K. P. Gummadi. 2015. The Many Shades of Anonymity: Characterizing Anonymous Social Media Content.. In Proc. of ICWSM.Google ScholarGoogle Scholar
  7. 7. R. S. Geiger and A. Halfaker. 2013. Using Edit Sessions to Measure Participation in Wikipedia. In Proc. of CSCW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8. S. Gündüz and M. T. Özsu. 2003. A Web page prediction model based on click-stream tree representation of user behavior. In Proc. of SIGKDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9. J. Hartigan and M. Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Applied statistics (1979), 100--108.Google ScholarGoogle Scholar
  10. 10. J. Heer and E. H. Chi. 2002. Separating the swarm: categorization methods for user sessions on the web. In Proc. of CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11. M. E. Houle, H. Kriegel, P. Kröger, E. Schubert, and A. Zimek. 2010. Can Shared-neighbor Distances Defeat the Curse of Dimensionality?. In Proc. of SSDBM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12. B. Johnson and B. Shneiderman. 1991. Tree-Maps: A Space-filling Approach to the Visualization of Hierarchical Information Structures. In Proc. of VIS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13. L. Kaufman and P. Rousseeuw. 2009. Finding groups in data: an introduction to cluster analysis. Vol. 344. John Wiley & Sons.Google ScholarGoogle Scholar
  14. 14. J. B. Kruskal and J. M. Landwehr. 1983. Icicle plots: Better displays for hierarchical clustering. The American Statistician 37, 2 (1983), 162--168.Google ScholarGoogle ScholarCross RefCross Ref
  15. 15. L. Lu, M. Dunham, and Y. Meng. 2005. Mining significant usage patterns from clickstream data. In Proc. of WebKDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16. J. Matejka, T. Grossman, and G. Fitzmaurice. 2013. Patina: Dynamic Heatmaps for Visualizing Application Usage. In Proc. of CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17. M. Motoyama, D. McCoy, K. Levchenko, S. Savage, and G. M. Voelker. 2011. An Analysis of Underground Forums. In Proc. of IMC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18. H. Obendorf, H. Weinreich, E. Herder, and M. Mayer. 2007. Web page revisitation revisited: implications of a long-term click-stream study of browser usage. In Proc. of CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19. J. Y. Park, N. O'Hare, R. Schifanella, A. Jaimes, and C. Chung. 2015. A Large-Scale Study of User Image Search Behavior on the Web. In Proc. of CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20. J. M. Rzeszotarski and A. Kittur. 2011. Instrumenting the Crowd: Using Implicit Behavioral Measures to Predict Task Performance. In Proc. of UIST. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21. N. Sadagopan and J. Li. 2008. Characterizing Typical and Atypical User Sessions in Clickstreams. In Proc. of WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22. S. Salvador and P. Chan. 2004. Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms. In Proc. of ICTAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23. J. Srivastava, R. Cooley, M. Deshpande, and P. N. Tan. 2000. Web usage mining: discovery and applications of usage patterns from Web data. SIGKDD Explor. Newsl. 1, 2 (2000), 12--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24. J. Stasko and E. Zhang. 2000. Focus+ context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations. In Proc. of InfoVis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25. Q. Su and L. Chen. 2015. A method for discovering clusters of e-commerce interest patterns using click-stream data. ECRA 14, 1 (2015), 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26. J. Suler and W. L. Phillips. 1998. The Bad Boys of Cyberspace: Deviant Behavior in a Multimedia Chat Community. Cyberpsy., Behavior, and Soc. Networking 1, 3 (1998), 275--294.Google ScholarGoogle Scholar
  27. 27. I. Ting, C. Kimble, and D. Kudenko. 2005. UBB Mining: Finding Unexpected Browsing Behaviour in Clickstream Data to Improve a Web Site's Design. In Proc. of ICWI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28. G. Wang, K. Gill, M. Mohanlal, H. Zheng, and B. Y. Zhao. 2013a. Wisdom in the Social Crowd: an Analysis of Quora. In Proc. of WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29. G. Wang, T. Konolige, C. Wilson, X. Wang, H. Zheng, and B. Y. Zhao. 2013b. You Are How You Click: Clickstream Analysis for Sybil Detection. In Proc. of USENIX Security. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30. G. Wang, B. Wang, T. Wang, A. Nika, H. Zheng, and B. Y. Zhao. 2014. Whispers in the Dark: Analysis of an Anonymous Social Network. In Proc. of IMC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. 31. W. Wang, H. Wang, G. Dai, and H. Wang. 2006. Visualization of Large Hierarchical Data by Circle Packing. In Proc. of CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32. J. Wei, Z. Shen, N. Sundaresan, and K. Ma. 2012. Visual cluster exploration of web clickstream data. In Proc. of VAST. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33. Yiming Yang and Jan O. Pedersen. 1997. A Comparative Study on Feature Selection in Text Categorization. In ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. 34. Z. Yang, S. Cai, Z. Zhou, and N. Zhou. 2005. Development and validation of an instrument to measure user perceived service quality of information presenting Web portals. Information & Management 42, 4 (2005), 575--589. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. 35. J. Zhao, Z. Liu, M. Dontcheva, A. Hertzmann, and A. Wilson. 2015. MatrixWave: Visual Comparison of Event Sequence Data. In Proc. of CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Unsupervised Clickstream Clustering for User Behavior Analysis

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems
            May 2016
            6108 pages
            ISBN:9781450333627
            DOI:10.1145/2858036

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 May 2016

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            CHI '16 Paper Acceptance Rate565of2,435submissions,23%Overall Acceptance Rate6,199of26,314submissions,24%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader