skip to main content
10.1145/1008992.1009048acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Hourly analysis of a very large topically categorized web query log

Published:25 July 2004Publication History

ABSTRACT

We review a query log of hundreds of millions of queries that constitute the total query traffic for an entire week of a general-purpose commercial web search service. Previously, query logs have been studied from a single, cumulative view. In contrast, our analysis shows changes in popularity and uniqueness of topically categorized queries across the hours of the day. We examine query traffic on an hourly basis by matching it against lists of queries that have been topically pre-categorized by human editors. This represents 13% of the query traffic. We show that query traffic from particular topical categories differs both from the query stream as a whole and from other categories. This analysis provides valuable insight for improving retrieval effectiveness and efficiency. It is also relevant to the development of enhanced query disambiguation, routing, and caching algorithms.

References

  1. Beitzel, S., Jensen, E., Chowdhury, A., and Grossman, D. Using Titles and Category Names from Editor-driven Taxonomies for Automatic Evaluation. In Proceedings of CIKM'03 (New Orleans, LA, November, 2003), ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Broder, A. A Taxonomy of Web Search. SIGIR Forum 36(2) (Fall, 2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chowdhury, A., G. Pass. "Operational Requirements for Scalable Search Systems", In Proceedings of CIKM'03 (New Orleans, LA, November 2003), ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eastman, C., B. Jansen, "Coverage, Relevance, and Ranking: The Impact of Query Operators on Web Search Engine Results", ACM Transactions on Information Systems, Vol. 21, No. 4, October 2003, Pages 383--411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Eiron, N., K. McCurley. "Analysis of Anchor Text for Web Search", In Proceedings of SIGIR'03 (Toronto, Canada, July 2003), ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hawking, D., Craswell, N., and Griffiths, K. Which Search Engine is Best at Finding Online Services? In Proceedings of WWW10 (Hong Kong, May 2001), Posters. Actual poster available as http://pigfish.vic.cmis.csiro.au/ nickc/pubs/www10actualposter.pdfGoogle ScholarGoogle Scholar
  7. Jansen, B. and Pooch, U. A review of Web searching studies and a framework for future research. Journal of the American Society for Information Science and Technology 52(3), 235--246, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jansen, B., Spink, A., and Saracevic, T. Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2) (2000), 207--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jansen, B.J., Goodrum, A., Spink, A. Searching for multimedia: video, audio, and image Web queries. World Wide Web 3(4), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lawrence, S. and Giles, C.L. Searching the World Wide Web. Science 280(5360), 98--100, 1998.Google ScholarGoogle Scholar
  11. Lempel, R. and Moran, S. Predictive caching and prefetching of query results in search engines. In Proceedings of WWW12 (Budapest, May 2003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Markatos, E.P. On Caching Search Engine Query Results. In the Proceedings of the 5th International Web Caching and Content Delivery Workshop, May 2000.Google ScholarGoogle Scholar
  13. Raghavan, V. and Sever, H. On the Reuse of Past Optimal Queries. In Proc. of the 1995 SIGIR Conference, 344--350, Seattle, WA, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ross, N. and Wolfram, D. End user searching on the Internet: An analysis of term pair topics submitted to the Excite search engine. Journal of the American Society for Information Science 51(10), 949--958, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Saraiva, P., Moura, E., Ziviani, N., Meira, W., Fonseca, R., Riberio-Neto, B. Rank-preserving two-level caching for scalable search engines. In Proc. of the 24th SIGIR Conference, 51--58, New Orleans, LA, September, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Silverstein, C., Henzinger, M., Marais, H., and Moricz, M. Analysis of a very large web search engine query log. SIGIR Forum 33(1) (Fall, 1999), 6--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Spink, A., Ozmutlu, S., Ozmutlu, H.C., and Jansen, B.J. U.S. versus European web searching trends. SIGIR Forum 36(2), 32--38, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Spink, A., Jansen, B.J., Wolfram, D., and Saracevic, T. From E-sex to e-commerce: Web search changes. IEEE Computer, 35(3), 107--109, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Spink, A., Wolfram, D., Jansen, B.J. and Saracevic, T. Searching the Web: The Public and Their Queries. Journal of the American Society of Information Science 53(2), 226--234, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Spink, A., Jansen, B.J., and Saracevic, T. Vox populi: The public searching of the web. Journal of the American Society of Information Science 52 (12), 1073--1074, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Spink, A., Jansen, B.J., and Ozmultu, H.C. Use of query reformulation and relevance feedback by Excite users. Internet Research: Electronic Networking Applications and Policy 10 (4), 2000.Google ScholarGoogle Scholar
  22. Sullivan, D. Searches Per Day. Search Engine Watch, February, 2003. http://searchenginewatch.com/reports/article.php/2156461Google ScholarGoogle Scholar
  23. Wang, P., Berry, M., and Yang, Y. Mining longitudinal web queries: Trends and patterns. Journal of the American Society for Information Science and Technology 54(8), 743--758, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Wen, J. Nie, H. Zhang "Query Clustering using User Logs" ACM Transactions on Information Systems, Vol. 20, No. 1, January 2002, pp 59--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wolfram, D., H. Xie, "Subject categorization of query terms for exploring Web users' search interests", Journal of the American Society for Information Science, v.53 n.8, p.617--630, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xie, Y., O'Hallaron, D. Locality in Search Engine Queries and Its Implications for Caching. Infocom 2002.Google ScholarGoogle Scholar

Index Terms

  1. Hourly analysis of a very large topically categorized web query log

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
        July 2004
        624 pages
        ISBN:1581138814
        DOI:10.1145/1008992

        Copyright © 2004 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 July 2004

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader