Skip to main content
Top
Published in: Soft Computing 1/2006

01-01-2006 | Focus

s-HITSc: an improved model and algorithm for topic distillation on the Web

Authors: Zhuoming XU, Xiao CAO, Yisheng DONG, Yahong HAN

Published in: Soft Computing | Issue 1/2006

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Topic distillation on the Web, namely, finding quality information sources related to a given query topic with hyperlink analysis, has been shown to be useful in Web IR. Based on the analysis of three deficiencies of classical topic distillation algorithm HITS, this paper presents an improved model and algorithm named s-HITSc. Given a query topic, the improved algorithm can model a neighborhood graph at site granularity, compute the relevance weights of the nodes to the topic with content analysis, and apply weighted I/O operations in its iterative hyperlink analysis. Theoretical analysis and experimental results show that s-HITSc can control topic drift and identify more reasonable and meaningful authority and hub sites on a given topic.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
Here, we separate Web sites by host (not domain). But, in fact, our algorithms can be applicable for both host and domain cases.
 
Literature
1.
go back to reference Amento B, Terveen L, Hill W (2000) Does “authority” mean quality? Predicting expert quality ratings of Web documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, ACM Press, Athens, pp. 296–303 Amento B, Terveen L, Hill W (2000) Does “authority” mean quality? Predicting expert quality ratings of Web documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, ACM Press, Athens, pp. 296–303
2.
go back to reference Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, New York Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, New York
3.
go back to reference Bharat K, Henzinger M (1998) Improved algorithms for topic distillation in a hyperlinked environment. In: Proceedings of the 21th annual international ACM SIGIR conference on research and development in information retrieval, ACM Press, Melbourne, pp. 104–111 Bharat K, Henzinger M (1998) Improved algorithms for topic distillation in a hyperlinked environment. In: Proceedings of the 21th annual international ACM SIGIR conference on research and development in information retrieval, ACM Press, Melbourne, pp. 104–111
4.
go back to reference Borodin A, Roberts GO, Rosenthal JS, Tsaparas P (2001) Finding authorities and hubs from link structure on the World Wide Web. In: Proceedings of the 10th international World Wide Web conference, ACM Press, Hong Kang, pp. 415–429 Borodin A, Roberts GO, Rosenthal JS, Tsaparas P (2001) Finding authorities and hubs from link structure on the World Wide Web. In: Proceedings of the 10th international World Wide Web conference, ACM Press, Hong Kang, pp. 415–429
5.
go back to reference Chakrabarti S, Dom Byron E, et al (1999) Mining the Web’s link structure. IEEE Computer 32(8):60–67 Chakrabarti S, Dom Byron E, et al (1999) Mining the Web’s link structure. IEEE Computer 32(8):60–67
6.
go back to reference Craswell N, Hawking D, Robertson S (2001) Effective site finding using link anchor information. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, ACM Press New, Orleans, pp. 250–257 Craswell N, Hawking D, Robertson S (2001) Effective site finding using link anchor information. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, ACM Press New, Orleans, pp. 250–257
7.
go back to reference Golub G, Van Loan CF (1989) Matrix computations. Johns Hopkins University Press, Baltimore Golub G, Van Loan CF (1989) Matrix computations. Johns Hopkins University Press, Baltimore
8.
go back to reference Gong-Ning Chen (1990) Theory of matrix and its application. China Higher Education Press, Beijing Gong-Ning Chen (1990) Theory of matrix and its application. China Higher Education Press, Beijing
9.
go back to reference Hearst MA (2000) Next generation Web search: setting our sites. Bulletin of the technical Committee on data engineering, IEEE Computer Society 23(3):38–48 Hearst MA (2000) Next generation Web search: setting our sites. Bulletin of the technical Committee on data engineering, IEEE Computer Society 23(3):38–48
10.
go back to reference Henzinger M (2001) Hyperlink analysis for the Web. IEEE Internet Comput. 5(1):45–50 Henzinger M (2001) Hyperlink analysis for the Web. IEEE Internet Comput. 5(1):45–50
11.
go back to reference Jain AK, Murty MN, Flynn PJ (1999) Data clustering: A review. ACM Comput. Surv. 31(3):264–323 Jain AK, Murty MN, Flynn PJ (1999) Data clustering: A review. ACM Comput. Surv. 31(3):264–323
12.
go back to reference Jansen BJ, Pooch U (2001) A Review of Web searching studies and a framework for future research. J. Am. Soc. Inf. Sci. Technol. 52(3):235–246 Jansen BJ, Pooch U (2001) A Review of Web searching studies and a framework for future research. J. Am. Soc. Inf. Sci. Technol. 52(3):235–246
13.
go back to reference Kleinberg J (1999) Authoritative sources in hyperlinked environment. J. ACM 46(5):604–632 Kleinberg J (1999) Authoritative sources in hyperlinked environment. J. ACM 46(5):604–632
14.
go back to reference Kleinberg J, Lawrence S (2001) The structure of the Web. Science 294(30):1849–1850 Kleinberg J, Lawrence S (2001) The structure of the Web. Science 294(30):1849–1850
15.
go back to reference Lempei R, Moran S (2001) SALSA - the stochastic approach for link-structure analysis. ACM Trans. Inf. Sys. 19(2):131–160 Lempei R, Moran S (2001) SALSA - the stochastic approach for link-structure analysis. ACM Trans. Inf. Sys. 19(2):131–160
16.
go back to reference Levene M, Wheeldon R (2001) A Web site navigation engine. In: Poster Proceedings of the 10th international World Wide Web conference, ACM Press, Hong Kong, pp. 1014–1015 Levene M, Wheeldon R (2001) A Web site navigation engine. In: Poster Proceedings of the 10th international World Wide Web conference, ACM Press, Hong Kong, pp. 1014–1015
18.
go back to reference Urvi Shah, Timothy W Finin, Anupam Joshi (2002) Information retrieval on the semantic web. In: Proceedings of the ACM 11th international conference on information and knowledge management, ACM Press, McLean, Virginia, USA, pp. 461–468 Urvi Shah, Timothy W Finin, Anupam Joshi (2002) Information retrieval on the semantic web. In: Proceedings of the ACM 11th international conference on information and knowledge management, ACM Press, McLean, Virginia, USA, pp. 461–468
Metadata
Title
s-HITSc: an improved model and algorithm for topic distillation on the Web
Authors
Zhuoming XU
Xiao CAO
Yisheng DONG
Yahong HAN
Publication date
01-01-2006
Publisher
Springer-Verlag
Published in
Soft Computing / Issue 1/2006
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-005-0457-0

Other articles of this Issue 1/2006

Soft Computing 1/2006 Go to the issue

Premium Partner