ABSTRACT
In this paper, we suggest a novel approach to studying user browsing behavior, i.e., the ways users get to different pages on the Web. Namely, we classified all user browsing paths leading to web pages into several types or browsing patterns. In order to define browsing patterns, we consider several important points of the browsing path: its origin, the last page before the user gets to the domain of the target page, and the target page referrer. Each point can be of several types, which leads to 56 possible patterns. The distribution of the browsing paths over these patterns forms the navigational profile of a web page.
We conducted a comprehensive large-scale study of navigational profiles of different web pages. First, we demonstrated that the navigational profile of a web page carry crucial information about the properties of this page (e.g., its popularity and age). Second, we found that the Web consists of several typical non-overlapping clusters formed by pages of similar ranges of incoming traffic. These clusters can be characterized by the functionality of their pages.
- R. Baeza-Yates, A. P. Jr, and N. Ziviani. The evolution of web content and search engines. In Proceedings of the 8th ACM Workshop on Web Mining and Web Usage Analysis, 2008.Google Scholar
- P. Bailey, R. W. White, H. Liu, and G. Kumaran. Mining historic query trails to label long and rare search engine queries. In ACM Transactions on the Web, volume 4 (4), 2010. Google ScholarDigital Library
- M. Bilenko and R. W. White. Mining the search trails of surfing crowds: identifying relevant websites from user activity. In Proceedings of the 17th international conference on World Wide Web, pages 51--60, 2008. Google ScholarDigital Library
- J. Cho and S. Roy. Impact of search engines on page popularity. In Proceedings of the 13th international conference on World Wide Web, pages 20--29, 2004. Google ScholarDigital Library
- J. H. Friedman. Stochastic gradient boosting. In Comput. Stat. Data Anal., volume 38(4), pages 367--378, 2002. Google ScholarDigital Library
- S. Goel, J. M. Hofman, and M. I. Sirer. Who does what on the web: A large-scale study of browsing behavior. In Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media, 2012.Google Scholar
- T. Hastie, R. Tibshirani, and J. H. Friedman. The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations. New York: Springer-Verlag, 2001.Google Scholar
- S. Ieong, N. Mishra, E. Sadikov, and L. Zhang. Domain bias in web search. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 413--422, 2012. Google ScholarDigital Library
- R. Kumar and A. Tomkins. A characterization of online browsing behavior. In Proceedings of the 19th international conference on World wide web, pages 561--570, 2010. Google ScholarDigital Library
- J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. Proceedings of the 14th ACM SIGKDD international conference on Knowledge Discovery and Data mining, pages 462--470, 2008. Google ScholarDigital Library
- M. Liu, R. Cai, M. Zhang, and L. Zhang. User browsing behavior-driven web crawling. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 87--92, 2011. Google ScholarDigital Library
- Y. Liu, B. Gao, T.-Y. Liu, Y. Zhang, Z. Ma, S. He, and H. Li. Browserank: letting web users vote for page importance. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 451--458, 2008. Google ScholarDigital Library
- L. Ostroumova, I. Bogatyy, A. Chelnokov, A. Tikhonov, and G. Gusev. Crawling policies based on web page popularity prediction. In Advances in Information Retrieval, Lecture Notes in Computer Science, vol. 8416, pages 100--111, 2014.Google ScholarCross Ref
- F. Qiu, Z. Liu, and J. Cho. Analysis of user web traffic with a focus on search activities. In WebDB, pages 103--108, 2005.Google Scholar
- W. M. Rand. Objective criteria for the evaluation of clustering methods. In Journal of the American Statistical Association, volume 66(336), pages 846--850, 1971.Google ScholarCross Ref
- C. R. Rao. Linear statistical inference and its applications. Wiley, New York, 1973.Google ScholarCross Ref
- A. Spink, M. Park, B. J. Jansen, and J. Pedersen. Multitasking during web search sessions. In Information Processing and Management, volume 42(1), pages 264--475, 2006. Google ScholarDigital Library
- A. Tolstikov, M. Shakhray, G. Gusev, and P. Serdyukov. Through-the-looking glass: utilizing rich post-search trail statistics for web search. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pages 1897--1900, 2013. Google ScholarDigital Library
- I. Weber and A. Jaimes. Who uses web search for what: and how. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 15--24, 2011. Google ScholarDigital Library
- R. W. White and J. Huang. Assessing the scenic route: measuring the value of search trails in web logs. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 587--594, 2010. Google ScholarDigital Library
- M. Zhukovskiy, A. Khropov, G. Gusev, and P. Serdyukov. Introducing search behavior into browsing based models of page's importance. In Proceedings of the 22nd international conference on World Wide Web companion, pages 129--130, 2013. Google ScholarDigital Library
Index Terms
- What can be Found on the Web and How: A Characterization of Web Browsing Patterns
Recommendations
Why web sites are lost (and how they're sometimes found)
Scratch Programming for AllIntroduction
The web is in constant flux---new pages and Web sites appear daily, and old pages and sites disappear almost as quickly. One study estimates that about two percent of the Web disappears from its current location every week.2 Although Web ...
Keeping found things found on the web
CIKM '01: Proceedings of the tenth international conference on Information and knowledge managementThis paper describes the results of an observational study into the methods people use to manage web information for re-use. People observed in our study used a diversity of methods and associated tools. For example, several participants emailed web ...
Knowledge worker intranet behaviour and usability
Understanding human behaviour in electronic spaces is of central importance in modern business intelligence and web behaviour mining. We present a novel analytic framework that enables identification of the significant navigational elements, observation ...
Comments