Skip to main content
Top
Published in: Discover Computing 3/2011

01-06-2011 | Web Mining for Search

Incorporating web browsing activities into anchor texts for web search

Authors: Bo Zhou, Yiqun Liu, Min Zhang, Yijiang Jin, Shaoping Ma

Published in: Discover Computing | Issue 3/2011

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Anchor texts complement Web page content and have been used extensively in commercial Web search engines. Existing methods for anchor text weighting rely on the hyperlink information which is created by page content editors. Since anchor texts are created to help user browse the Web, browsing behavior of Web users may also provide useful or complementary information for anchor text weighting. In this paper, we discuss the possibility and effectiveness of incorporating browsing activities of Web users into anchor texts for Web search. We first make an analysis on the effectiveness of anchor texts with browsing activities. And then we propose two new anchor models which incorporate browsing activities. To deal with the data sparseness problem of user-clicked anchor texts, two features of user’s browsing behavior are explored and analyzed. Based on these features, a smoothing method for the new anchor models is proposed. Experimental results show that by incorporating browsing activities the new anchor models outperform the state-of-art anchor models which use only the hyperlink information. This study demonstrates the benefits of Web browsing activities to affect anchor text weighting.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
2
Generally an anchor document of a target Web page consists of all anchor texts on different source pages with reference to the target page.
 
3
In this paper, we call anchor document (representations) constructed according to the definition of an anchor model XXX as XXX based anchor document (representations).
 
4
The label “\( \succ \)” means the ranking performance of the left hand side anchor model/ranking method is better than the right hand side; “\( \prec \)” has the opposite meaning.
 
Literature
go back to reference Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In Proceedings of the ACM conference on research and development on information retrieval (SIGIR). New York, NY, USA: ACM. Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In Proceedings of the ACM conference on research and development on information retrieval (SIGIR). New York, NY, USA: ACM.
go back to reference Amitay, E., & Paris, C. (2000). Automatically summarising websites: Is there a way around it? In Proceeding of CIKM ‘00 (pp. 173–179). New York, NY, USA: ACM. Amitay, E., & Paris, C. (2000). Automatically summarising websites: Is there a way around it? In Proceeding of CIKM ‘00 (pp. 173–179). New York, NY, USA: ACM.
go back to reference Bilenko, M., & White, R. W. (2008). Mining the search trails of surfing crowds: identifying relevant websites from user activity. In Proceeding of WWW ‘08 (pp. 51–60). New York, NY, USA: ACM. Bilenko, M., & White, R. W. (2008). Mining the search trails of surfing crowds: identifying relevant websites from user activity. In Proceeding of WWW ‘08 (pp. 51–60). New York, NY, USA: ACM.
go back to reference Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Seventh international world-wide web conference (WWW 1998), April 14–18, 1998, Brisbane, Australia. New York, NY, USA: ACM. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Seventh international world-wide web conference (WWW 1998), April 14–18, 1998, Brisbane, Australia. New York, NY, USA: ACM.
go back to reference Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3–10. ACM.CrossRef Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3–10. ACM.CrossRef
go back to reference Clarke, C. L. A., Craswell, N., & Soboroff, I. (2009). Overview of the TREC 2009 webtrack. In Proceedings of the 18th text retrieval conference. Clarke, C. L. A., Craswell, N., & Soboroff, I. (2009). Overview of the TREC 2009 webtrack. In Proceedings of the 18th text retrieval conference.
go back to reference Craswell, N., Hawking, D., & Robertson, S. (2001). Effective site finding using link anchor information. In Proceeding of SIGIR ‘01 (pp. 250–257). New York, NY, USA: ACM. Craswell, N., Hawking, D., & Robertson, S. (2001). Effective site finding using link anchor information. In Proceeding of SIGIR ‘01 (pp. 250–257). New York, NY, USA: ACM.
go back to reference Dou, Z., Song, R., Nie, J.-Y., & Wen, J.-R. (2009). Using anchor texts with their hyperlink structure for web search. In Proceeding of SIGIR ‘09 (pp. 227–234). New York, NY, USA: ACM. Dou, Z., Song, R., Nie, J.-Y., & Wen, J.-R. (2009). Using anchor texts with their hyperlink structure for web search. In Proceeding of SIGIR ‘09 (pp. 227–234). New York, NY, USA: ACM.
go back to reference Eiron, N., & McCurley, K. S. (2003). Analysis of anchor text for web search. In Proceeding of SIGIR ‘03 (pp. 459–460). New York, NY, USA: ACM. Eiron, N., & McCurley, K. S. (2003). Analysis of anchor text for web search. In Proceeding of SIGIR ‘03 (pp. 459–460). New York, NY, USA: ACM.
go back to reference Fujii, A. (2008). Modeling anchor text and classifying queries to enhance web document retrieval. In Proceeding of WWW’08 (pp. 337–346). New York, NY, USA: ACM. Fujii, A. (2008). Modeling anchor text and classifying queries to enhance web document retrieval. In Proceeding of WWW’08 (pp. 337–346). New York, NY, USA: ACM.
go back to reference Gyöngyi, Z., & Garcia-Molina, H. (2005). Web spam taxonomy. In the 1st international workshop on adversarial information retrieval on the web. AIRWeb ‘05. New York, USA: ACM. Gyöngyi, Z., & Garcia-Molina, H. (2005). Web spam taxonomy. In the 1st international workshop on adversarial information retrieval on the web. AIRWeb ‘05. New York, USA: ACM.
go back to reference Jarvelin, K., & Kekalainen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In Proceedings of the ACM conference on research and development on information retrieval (SIGIR). New York, NY, USA: ACM. Jarvelin, K., & Kekalainen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In Proceedings of the ACM conference on research and development on information retrieval (SIGIR). New York, NY, USA: ACM.
go back to reference Kraft, R., & Zien, J. (2004). Mining anchor text for query refinement. In Proceeding of WWW ‘04 (pp. 666–674). New York, NY, USA: ACM. Kraft, R., & Zien, J. (2004). Mining anchor text for query refinement. In Proceeding of WWW ‘04 (pp. 666–674). New York, NY, USA: ACM.
go back to reference Lee, U., Liu, Z., & Cho, J. (2005). Automatic identification of user goals in web search. In Proceeding of WWW ‘05 (pp. 391–400). New York, NY, USA: ACM. Lee, U., Liu, Z., & Cho, J. (2005). Automatic identification of user goals in web search. In Proceeding of WWW ‘05 (pp. 391–400). New York, NY, USA: ACM.
go back to reference Liu, Y., Cen, R., Zhang, M., Ma, S., & Ru, L. (2008a). Identifying web spam with user behavior analysis. In the 4th international workshop on adversarial information retrieval on the web. AIRWeb ’08 (pp. 9–16). New York, NY: ACM. Liu, Y., Cen, R., Zhang, M., Ma, S., & Ru, L. (2008a). Identifying web spam with user behavior analysis. In the 4th international workshop on adversarial information retrieval on the web. AIRWeb ’08 (pp. 9–16). New York, NY: ACM.
go back to reference Liu, Y., Gao, B., Liu, T.-Y., Zhang, Y., Ma, Z., He, S. et al. (2008b). BrowseRank: letting web users vote for page importance. In Proceeding of SIGIR’08 (pp. 451–458). New York, NY, USA: ACM. Liu, Y., Gao, B., Liu, T.-Y., Zhang, Y., Ma, Z., He, S. et al. (2008b). BrowseRank: letting web users vote for page importance. In Proceeding of SIGIR’08 (pp. 451–458). New York, NY, USA: ACM.
go back to reference Lu, W.-H., Chien, L.-F., & Lee, H.-J. (2004). Anchor text mining for translation of web queries: A transitive translation approach. ACM Transaction on Information System, 22(2), 242–269.CrossRef Lu, W.-H., Chien, L.-F., & Lee, H.-J. (2004). Anchor text mining for translation of web queries: A transitive translation approach. ACM Transaction on Information System, 22(2), 242–269.CrossRef
go back to reference Metzler, D., Novak, J., Cui, H., & Reddy, S. (2009). Building enriched document representations using aggregated anchor text. In Proceeding of SIGIR’09 (pp. 123–130). New York, NY, USA: ACM. Metzler, D., Novak, J., Cui, H., & Reddy, S. (2009). Building enriched document representations using aggregated anchor text. In Proceeding of SIGIR’09 (pp. 123–130). New York, NY, USA: ACM.
go back to reference Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceeding of SIGIR ‘98 (pp. 275–281). New York, NY, USA: ACM. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceeding of SIGIR ‘98 (pp. 275–281). New York, NY, USA: ACM.
go back to reference Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1996). Okapi at trec-3. In Proceedings of TREC–3 (pp. 109–126). Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1996). Okapi at trec-3. In Proceedings of TREC–3 (pp. 109–126).
go back to reference Robertson, S, Zaragoza, H., & Taylor, M. (2004). Simple bm25 extension to multiple weighted fields. In Proceedings of CIKM ‘04 (pp. 42–49). ACM. Robertson, S, Zaragoza, H., & Taylor, M. (2004). Simple bm25 extension to multiple weighted fields. In Proceedings of CIKM ‘04 (pp. 42–49). ACM.
go back to reference Sarukkai, R. R. (2000). Link prediction and path analysis using Markov chains. Computer Networks, 33, 377–386.CrossRef Sarukkai, R. R. (2000). Link prediction and path analysis using Markov chains. Computer Networks, 33, 377–386.CrossRef
go back to reference Sarwar, B. M., Karypis, G., Konstan, J. A., & Riedl, J. T. (2000). Analysis of recommender algorithms for e-commerce. In Proceedings of 2nd ACM Conference on electronic commerce (pp. 158–167). NewYork: ACM Press. Sarwar, B. M., Karypis, G., Konstan, J. A., & Riedl, J. T. (2000). Analysis of recommender algorithms for e-commerce. In Proceedings of 2nd ACM Conference on electronic commerce (pp. 158–167). NewYork: ACM Press.
go back to reference Westerveld, T., Kraaij, W., & Hiemstra, D. (2001). Retrieving web pages using content, links, urls and anchors. In Tenth text retrieval conference (pp. 663–672). Westerveld, T., Kraaij, W., & Hiemstra, D. (2001). Retrieving web pages using content, links, urls and anchors. In Tenth text retrieval conference (pp. 663–672).
go back to reference White, R. W., Bilenko, M., & Cucerzan, S. (2007). Studying the use of popular destinations to enhance web search interaction. In SIGIR ‘07 (pp. 159–166). New York, USA: ACM. White, R. W., Bilenko, M., & Cucerzan, S. (2007). Studying the use of popular destinations to enhance web search interaction. In SIGIR ‘07 (pp. 159–166). New York, USA: ACM.
go back to reference Yiqun, L., & Liyun Ru, S. M. (2006). Automatic query type identification based on click through information. In Proceeding of AIRS ‘06 (pp. 593–600). Yiqun, L., & Liyun Ru, S. M. (2006). Automatic query type identification based on click through information. In Proceeding of AIRS ‘06 (pp. 593–600).
Metadata
Title
Incorporating web browsing activities into anchor texts for web search
Authors
Bo Zhou
Yiqun Liu
Min Zhang
Yijiang Jin
Shaoping Ma
Publication date
01-06-2011
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 3/2011
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-010-9151-7

Other articles of this Issue 3/2011

Discover Computing 3/2011 Go to the issue

Premium Partner