skip to main content
10.1145/2736277.2741080acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web

Authors Info & Claims
Published:18 May 2015Publication History

ABSTRACT

When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful for e.g., improving underlying network structures, predicting user clicks or enhancing recommendations. In this work, we present a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our approach utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to leverage the sensitivity of Bayes factors on the prior for comparing hypotheses with each other. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including website navigation, business reviews and online music played. Our work expands the repertoire of methods available for studying human trails on the Web.

References

  1. D. Achlioptas. Database-friendly random projections. In Symposium on Principles of Database Systems, pages 274--281. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. An, D. Quercia, and J. Crowcroft. Partisan sharing: facebook evidence and societal consequences. In Conference on Online Social Networks, pages 13--24. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  4. T. Berners-Lee and M. Fischetti. Weaving the Web: The original design and ultimate destiny of the World Wide Web by its inventor. Harper Information, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Bilenko and R. W. White. Mining the search trails of surfing crowds: identifying relevant websites from user activity. In International Conference on World Wide Web, pages 51--60. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Borges and M. Levene. Data mining of user navigation patterns. In Web usage analysis and user profiling, pages 92--112. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In International Conference on World Wide Web, pages 107--117. Elsevier Science Publishers B. V., 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. P. Brumby and A. Howes. Good enough but i'll just check: Web-page search as attentional refocusing. In International Conference on Cognitive Modeling, pages 46--51, 2004.Google ScholarGoogle Scholar
  9. V. Bush. As we may think. The Atlantic Monthly, 176(1):101--108, 1945.Google ScholarGoogle Scholar
  10. L. D. Catledge and J. E. Pitkow. Characterizing browsing strategies in the world-wide web. Computer Networks and ISDN Systems, 27(6):1065--1073, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. O. Celma. Music Recommendation and Discovery in the Long Tail. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Chalmers, K. Rodden, and D. Brodbeck. The order of things: activity-centred information access. Computer Networks and ISDN Systems, 30(1):359--367, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. H. Chi, P. L. T. Pirolli, K. Chen, and J. Pitkow. Using information scent to model user information needs and actions and the web. In Conference on Human Factors in Computing Systems, pages 490--497. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Chierichetti, R. Kumar, P. Raghavan, and T. Sarlos. Are web users really markovian? In International Conference on World Wide Web, pages 609--618. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Dasgupta and A. Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms, 22(1):60--65, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Davidson-Pilon. Probablistic Programming & Bayesian Methods for Hackers. 2014.Google ScholarGoogle Scholar
  17. M. Deshpande and G. Karypis. Selective markov models for predicting web page accesses. ACM Transactions on Internet Technology, 4(2):163--184, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. H. Garthwaite, J. B. Kadane, and A. O'Hagan. Statistical methods for eliciting probability distributions. Journal of the American Statistical Association, 100(470):680--701, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  19. S. Gore. Biostatistics and the medical research council. Medical Research Council News, 35:19--20, 1987.Google ScholarGoogle Scholar
  20. B. A. Huberman, P. L. T. Pirolli, J. E. Pitkow, and R. M. Lukose. Strong regularities in world wide web surfing. Science, 280(5360):95--97, Mar 1998.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. E. Kass and A. E. Raftery. Bayes factors. Journal of the American Statistical Association, 90(430):773--795, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  22. S. Laxman, V. Tankasali, and R. W. White. Stream prediction using a generative model based on frequent episodes in event sequences. In International Conference on Knowledge Discovery and Data Mining, pages 453--461. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Lempel and S. Moran. The stochastic approach for link-structure analysis (salsa) and the tkc effect. Computer Networks, 33(1):387--401, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Li, T. J. Hastie, and K. W. Church. Very sparse random projections. In International Conference on Knowledge Discovery and Data Mining, pages 287--296. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Matsubara, Y. Sakurai, C. Faloutsos, T. Iwata, and M. Yoshikawa. Fast mining and forecasting of complex time-stamped events. In International Conference on Knowledge Discovery and Data Mining, pages 271--279. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. H. Nelson. Complex information processing: a file structure for the complex, the changing and the indeterminate. In National Conference, pages 84--100. ACM, 1965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Oakley. Eliciting univariate probability distributions. Rethinking Risk Measurement and Reporting, 1, 2010.Google ScholarGoogle Scholar
  29. B. J. Pierce, S. R. Parkinson, and N. Sisson. Effects of semantic similarity, omission probability and number of alternatives in computer menu search. International Journal of Man-Machine Studies, 37(5):653--677, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. L. T. Pirolli and S. K. Card. Information foraging. Psychological Review, 106(4):643--675, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  31. P. L. T. Pirolli and J. E. Pitkow. Distributions of surfers? paths through the world wide web: Empirical characterizations. World Wide Web, 2(1-2):29--45, Jan 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. d. S. Price. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5):292--306, 1976.Google ScholarGoogle ScholarCross RefCross Ref
  33. H. Rubenstein and J. B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627--633, 1965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513--523, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Singer, D. Helic, B. Taraghi, and M. Strohmaier. Detecting memory and structure in human navigation patterns using markov chain models of varying order. PloS one, 9(7):e102070, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  36. P. Singer, T. Niebler, M. Strohmaier, and A. Hotho. Computing semantic relatedness from human navigational paths: A case study on wikipedia. International Journal on Semantic Web and Information Systems, 9(4):41--70, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. W. Sinnott. Virtues of the haversine. Sky and Telescope, 68(2):158, 1984.Google ScholarGoogle Scholar
  38. C. C. Strelioff, J. P. Crutchfield, and A. W. Hübler. Inferring markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. Physical Review E, 76(1):011106, Jul 2007.Google ScholarGoogle ScholarCross RefCross Ref
  39. W. Vanpaemel. Prior sensitivity in theory testing: An apologia for the bayes factor. Journal of Mathematical Psychology, 54(6):491--498, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  40. W. Vanpaemel. Constructing informative model priors using hierarchical methods. Journal of Mathematical Psychology, 55(1):106--117, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  41. W. Vanpaemel and M. D. Lee. Using priors to formalize theory: Optimal attention and the generalized context model. Psychonomic Bulletin & Review, 19(6):1047--1056, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  42. S. Walk, P. Singer, and M. Strohmaier. Sequential action patterns in collaborative ontology-engineering projects: A case-study in the biomedical domain. In International Conference on Conference on Information & Knowledge Management. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Walk, P. Singer, M. Strohmaier, T. Tudorache, M. A. Musen, and N. F. Noy. Discovering beaten paths in collaborative ontology-engineering projects using markov chains. Journal of Biomedical Informatics, 51:254--271, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. L. Wasserman. Bayesian model selection and model averaging. Journal of Mathematical Psychology, 44(1):92--107, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. R. West and J. Leskovec. Human wayfinding in information networks. In International Conference on World Wide Web, pages 619--628. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. R. West, J. Pineau, and D. Precup. Wikispeedia: An online game for inferring semantic distances between concepts. In International Joint Conference on Artificial Intelligence, pages 1598--1603. Morgan Kaufmann Publishers Inc., 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. R. W. White and J. Huang. Assessing the scenic route: measuring the value of search trails in web logs. In Conference on Research and Development in Information Retrieval, pages 587--594. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. W. Xie, P. O. Lewis, Y. Fan, L. Kuo, and M.-H. Chen. Improving marginal likelihood estimation for bayesian phylogenetic model selection. Systematic Biology, 60(2):150--160, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  49. J. Yang, J. McAuley, J. Leskovec, P. LePendu, and N. Shah. Finding progression stages in time-evolving event sequences. In International Conference on World Wide Web, pages 783--794. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '15: Proceedings of the 24th International Conference on World Wide Web
      May 2015
      1460 pages
      ISBN:9781450334693

      Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      • Published: 18 May 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '15 Paper Acceptance Rate131of929submissions,14%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader