ABSTRACT
Online services are increasingly dependent on user participation. Whether it's online social networks or crowdsourcing services, understanding user behavior is important yet challenging. In this paper, we build an unsupervised system to capture dominating user behaviors from clickstream data (traces of users' click events), and visualize the detected behaviors in an intuitive manner. Our system identifies "clusters" of similar users by partitioning a similarity graph (nodes are users; edges are weighted by clickstream similarity). The partitioning process leverages iterative feature pruning to capture the natural hierarchy within user clusters and produce intuitive features for visualizing and understanding captured user behaviors. For evaluation, we present case studies on two large-scale clickstream traces (142 million events) from real social networks. Our system effectively identifies previously unknown behaviors, e.g., dormant users, hostile chatters. Also, our user study shows people can easily interpret identified behaviors using our visualization tool.
- 1. E. Adar, J. Teevan, and S. T. Dumais. 2008. Large Scale Analysis of Web Revisitation Patterns. In Proc. of CHI. Google ScholarDigital Library
- 2. F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida. 2009. Characterizing User Behavior in Online Social Networks. In Proc. of IMC. Google ScholarDigital Library
- 3. N. Bhatti, A. Bouch, and A. Kuchinsky. 2000. Integrating user-perceived quality into Web server design. Computer Networks 33, 1--6 (2000), 1--16. Google ScholarDigital Library
- 4. V. D. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre. 2008. Fast unfolding of communities in large networks. JSTAT 2008, 10 (2008).Google Scholar
- 5. A. Brahaj. 2009. English Stop Words. http://xpo6. com/list-of-english-stop-words/. (2009).Google Scholar
- 6. D. Correa, L. A. Silva, M. Mondal, F. Benevenuto, and K. P. Gummadi. 2015. The Many Shades of Anonymity: Characterizing Anonymous Social Media Content.. In Proc. of ICWSM.Google Scholar
- 7. R. S. Geiger and A. Halfaker. 2013. Using Edit Sessions to Measure Participation in Wikipedia. In Proc. of CSCW. Google ScholarDigital Library
- 8. S. Gündüz and M. T. Özsu. 2003. A Web page prediction model based on click-stream tree representation of user behavior. In Proc. of SIGKDD. Google ScholarDigital Library
- 9. J. Hartigan and M. Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Applied statistics (1979), 100--108.Google Scholar
- 10. J. Heer and E. H. Chi. 2002. Separating the swarm: categorization methods for user sessions on the web. In Proc. of CHI. Google ScholarDigital Library
- 11. M. E. Houle, H. Kriegel, P. Kröger, E. Schubert, and A. Zimek. 2010. Can Shared-neighbor Distances Defeat the Curse of Dimensionality?. In Proc. of SSDBM. Google ScholarDigital Library
- 12. B. Johnson and B. Shneiderman. 1991. Tree-Maps: A Space-filling Approach to the Visualization of Hierarchical Information Structures. In Proc. of VIS. Google ScholarDigital Library
- 13. L. Kaufman and P. Rousseeuw. 2009. Finding groups in data: an introduction to cluster analysis. Vol. 344. John Wiley & Sons.Google Scholar
- 14. J. B. Kruskal and J. M. Landwehr. 1983. Icicle plots: Better displays for hierarchical clustering. The American Statistician 37, 2 (1983), 162--168.Google ScholarCross Ref
- 15. L. Lu, M. Dunham, and Y. Meng. 2005. Mining significant usage patterns from clickstream data. In Proc. of WebKDD. Google ScholarDigital Library
- 16. J. Matejka, T. Grossman, and G. Fitzmaurice. 2013. Patina: Dynamic Heatmaps for Visualizing Application Usage. In Proc. of CHI. Google ScholarDigital Library
- 17. M. Motoyama, D. McCoy, K. Levchenko, S. Savage, and G. M. Voelker. 2011. An Analysis of Underground Forums. In Proc. of IMC. Google ScholarDigital Library
- 18. H. Obendorf, H. Weinreich, E. Herder, and M. Mayer. 2007. Web page revisitation revisited: implications of a long-term click-stream study of browser usage. In Proc. of CHI. Google ScholarDigital Library
- 19. J. Y. Park, N. O'Hare, R. Schifanella, A. Jaimes, and C. Chung. 2015. A Large-Scale Study of User Image Search Behavior on the Web. In Proc. of CHI. Google ScholarDigital Library
- 20. J. M. Rzeszotarski and A. Kittur. 2011. Instrumenting the Crowd: Using Implicit Behavioral Measures to Predict Task Performance. In Proc. of UIST. Google ScholarDigital Library
- 21. N. Sadagopan and J. Li. 2008. Characterizing Typical and Atypical User Sessions in Clickstreams. In Proc. of WWW. Google ScholarDigital Library
- 22. S. Salvador and P. Chan. 2004. Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms. In Proc. of ICTAI. Google ScholarDigital Library
- 23. J. Srivastava, R. Cooley, M. Deshpande, and P. N. Tan. 2000. Web usage mining: discovery and applications of usage patterns from Web data. SIGKDD Explor. Newsl. 1, 2 (2000), 12--23. Google ScholarDigital Library
- 24. J. Stasko and E. Zhang. 2000. Focus+ context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations. In Proc. of InfoVis. Google ScholarDigital Library
- 25. Q. Su and L. Chen. 2015. A method for discovering clusters of e-commerce interest patterns using click-stream data. ECRA 14, 1 (2015), 1--13. Google ScholarDigital Library
- 26. J. Suler and W. L. Phillips. 1998. The Bad Boys of Cyberspace: Deviant Behavior in a Multimedia Chat Community. Cyberpsy., Behavior, and Soc. Networking 1, 3 (1998), 275--294.Google Scholar
- 27. I. Ting, C. Kimble, and D. Kudenko. 2005. UBB Mining: Finding Unexpected Browsing Behaviour in Clickstream Data to Improve a Web Site's Design. In Proc. of ICWI. Google ScholarDigital Library
- 28. G. Wang, K. Gill, M. Mohanlal, H. Zheng, and B. Y. Zhao. 2013a. Wisdom in the Social Crowd: an Analysis of Quora. In Proc. of WWW. Google ScholarDigital Library
- 29. G. Wang, T. Konolige, C. Wilson, X. Wang, H. Zheng, and B. Y. Zhao. 2013b. You Are How You Click: Clickstream Analysis for Sybil Detection. In Proc. of USENIX Security. Google ScholarDigital Library
- 30. G. Wang, B. Wang, T. Wang, A. Nika, H. Zheng, and B. Y. Zhao. 2014. Whispers in the Dark: Analysis of an Anonymous Social Network. In Proc. of IMC. Google ScholarDigital Library
- 31. W. Wang, H. Wang, G. Dai, and H. Wang. 2006. Visualization of Large Hierarchical Data by Circle Packing. In Proc. of CHI. Google ScholarDigital Library
- 32. J. Wei, Z. Shen, N. Sundaresan, and K. Ma. 2012. Visual cluster exploration of web clickstream data. In Proc. of VAST. Google ScholarDigital Library
- 33. Yiming Yang and Jan O. Pedersen. 1997. A Comparative Study on Feature Selection in Text Categorization. In ICML. Google ScholarDigital Library
- 34. Z. Yang, S. Cai, Z. Zhou, and N. Zhou. 2005. Development and validation of an instrument to measure user perceived service quality of information presenting Web portals. Information & Management 42, 4 (2005), 575--589. Google ScholarDigital Library
- 35. J. Zhao, Z. Liu, M. Dontcheva, A. Hertzmann, and A. Wilson. 2015. MatrixWave: Visual Comparison of Event Sequence Data. In Proc. of CHI. Google ScholarDigital Library
Index Terms
- Unsupervised Clickstream Clustering for User Behavior Analysis
Recommendations
Clickstream User Behavior Models
The next generation of Internet services is driven by users and user-generated content. The complex nature of user behavior makes it highly challenging to manage and secure online services. On one hand, service providers cannot effectively prevent ...
Identifying user behavior in online social networks
SocialNets '08: Proceedings of the 1st Workshop on Social Network SystemsOnline social networks pose an interesting problem: how to best characterize the different classes of user behavior. Traditionally, user behavior characterization methods, based on user individual features, are not appropriate for online networking ...
Mining Web User Behavior: A Systematic Mapping Study
Computational Science and Its Applications – ICCSA 2022 WorkshopsAbstractNowadays, the number of people using the internet online increases day by day. Therefore, there is a growing need to analyze user behavior trends using navigational clickstream data these days. User behavior can be defined as the collection of the ...
Comments