Skip to main content
Erschienen in: Discover Computing 2/2011

01.04.2011

Time-weighted web authoritative ranking

verfasst von: Bundit Manaskasemsak, Arnon Rungsawang, Hayato Yamana

Erschienen in: Discover Computing | Ausgabe 2/2011

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We investigate temporal factors in assessing the authoritativeness of web pages. We present three different metrics related to time: age, event, and trend. These metrics measure recentness, special event occurrence, and trend in revisions, respectively. An experimental dataset is created by crawling selected web pages for a period of several months. This data is used to compare page rankings by human users with rankings computed by the standard PageRank algorithm (which does not include temporal factors) and three algorithms that incorporate temporal factors, including the Time-Weighted PageRank (TWPR) algorithm introduced here. Analysis of the rankings shows that all three temporal-aware algorithms produce rankings more like those of human users than does the PageRank algorithm. Of these, the TWPR algorithm produces rankings most similar to human users’, indicating that all three temporal factors are relevant in page ranking. In addition, analysis of parameter values used to weight the three temporal factors reveals that age factor has the most impact on page rankings, while trend and event factors have the second and the least impact. Proper weighting of the three factors in TWPR algorithm provides the best ranking results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The pages with zero out-degree cause the PageRank leak problem (Page et al. 1999).
 
2
The small self-loop cluster, having no connection back to main community, causes the PageRank sink problem (Page et al. 1999).
 
Literatur
Zurück zum Zitat Adamic, L. A., & Huberman, B. A. (2001). The web’s hidden order. Communications of the ACM, 44(9), 55–59.CrossRef Adamic, L. A., & Huberman, B. A. (2001). The web’s hidden order. Communications of the ACM, 44(9), 55–59.CrossRef
Zurück zum Zitat Baeza-Yates, R. A., & Ribeiro-Neto, B. A. (1999). Modern information retrieval. New York: ACM Press & Addison Wesley. Baeza-Yates, R. A., & Ribeiro-Neto, B. A. (1999). Modern information retrieval. New York: ACM Press & Addison Wesley.
Zurück zum Zitat Baeza-Yates, R. A., Saint-Jean, F., & Castillo, C. (2002). Web structure, dynamics and page quality. In SPIRE ’02: Proceedings of the 9th international symposium on string processing and information retrieval (pp. 117–130). Baeza-Yates, R. A., Saint-Jean, F., & Castillo, C. (2002). Web structure, dynamics and page quality. In SPIRE ’02: Proceedings of the 9th international symposium on string processing and information retrieval (pp. 117–130).
Zurück zum Zitat Berberich, K., Vazirgiannis, M., & Weikum, G. (2006). Time-aware authority ranking. Internet Mathematics, 2(3), 301–332.CrossRefMathSciNet Berberich, K., Vazirgiannis, M., & Weikum, G. (2006). Time-aware authority ranking. Internet Mathematics, 2(3), 301–332.CrossRefMathSciNet
Zurück zum Zitat Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.CrossRef Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.CrossRef
Zurück zum Zitat Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton N., & Hullender, G. (2005). Learning to rank using gradient descent. In ICML ’05: Proceedings of the 22nd international conference on machine learning (pp. 89–96). Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton N., & Hullender, G. (2005). Learning to rank using gradient descent. In ICML ’05: Proceedings of the 22nd international conference on machine learning (pp. 89–96).
Zurück zum Zitat Cai, D., Yu, S., Wen, J. R., & Ma, W. Y. (2003a). Extracting content structure for web pages based on visual representation. In APWeb ’03: Proceedings of the 5th Asia Pacific web conference (pp. 406–417). Cai, D., Yu, S., Wen, J. R., & Ma, W. Y. (2003a). Extracting content structure for web pages based on visual representation. In APWeb ’03: Proceedings of the 5th Asia Pacific web conference (pp. 406–417).
Zurück zum Zitat Cai, D., Yu, S., Wen, J. R., & Ma, W. Y. (2003b). Vips: A vision-based page segmentation algorithm. Technical report, Microsoft Research. Cai, D., Yu, S., Wen, J. R., & Ma, W. Y. (2003b). Vips: A vision-based page segmentation algorithm. Technical report, Microsoft Research.
Zurück zum Zitat Cai, D., He, X., Wen, J. R., & Ma, W. Y. (2004). Block-level link analysis. In SIGIR ’04: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 440–447). Cai, D., He, X., Wen, J. R., & Ma, W. Y. (2004). Block-level link analysis. In SIGIR ’04: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 440–447).
Zurück zum Zitat Cho, J., & Garcia-Molina, H. (2003a). Effective page refresh policies for web crawlers. ACM Transactions on Database Systems, 28(4), 390–426.CrossRef Cho, J., & Garcia-Molina, H. (2003a). Effective page refresh policies for web crawlers. ACM Transactions on Database Systems, 28(4), 390–426.CrossRef
Zurück zum Zitat Cho, J., & Garcia-Molina, H. (2003b). Estimating frequency of change. ACM Transactions on Internet Technology, 3(3), 256–290.CrossRef Cho, J., & Garcia-Molina, H. (2003b). Estimating frequency of change. ACM Transactions on Internet Technology, 3(3), 256–290.CrossRef
Zurück zum Zitat Cho, J., & Roy, S. (2004). Impact of search engines on page popularity. In WWW ’04: Proceedings of the 13th international world wide web conference (pp. 20–29). Cho, J., & Roy, S. (2004). Impact of search engines on page popularity. In WWW ’04: Proceedings of the 13th international world wide web conference (pp. 20–29).
Zurück zum Zitat Cho, J., Roy, S., & Adams, R. E. (2005). Page quality: In search of an unbiased web ranking. In SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD international conference on management of data (pp. 551–562). Cho, J., Roy, S., & Adams, R. E. (2005). Page quality: In search of an unbiased web ranking. In SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD international conference on management of data (pp. 551–562).
Zurück zum Zitat Dwork, C., Kumar, R., Naor, M., & Sivakumar, D. (2001). Rank aggregation methods for the web. In WWW ’01: Proceedings of the 10th international world wide web conference (pp. 613–622). Dwork, C., Kumar, R., Naor, M., & Sivakumar, D. (2001). Rank aggregation methods for the web. In WWW ’01: Proceedings of the 10th international world wide web conference (pp. 613–622).
Zurück zum Zitat Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874. Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.
Zurück zum Zitat Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.CrossRef Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.CrossRef
Zurück zum Zitat Geng, X., Liu, T. Y., Qin, T., Arnold, A., Li, H., & Shum, H. Y. (2008). Query dependent ranking using k-nearest neighbor. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 115–122). Geng, X., Liu, T. Y., Qin, T., Arnold, A., Li, H., & Shum, H. Y. (2008). Query dependent ranking using k-nearest neighbor. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 115–122).
Zurück zum Zitat Golub, G. H., & Loan, C. F. V. (1996). Matrix computations. Baltimore and London: Johns Hopkins University Press.MATH Golub, G. H., & Loan, C. F. V. (1996). Matrix computations. Baltimore and London: Johns Hopkins University Press.MATH
Zurück zum Zitat Gonçalves, B., Meiss, M. R., Ramasco, J. J., Flammini, A., & Menczer, F. (2009). Remembering what we like: Toward an agent-based model of web traffic. In WSDM ’09: Proceedings of the 2nd ACM international conference on web search and data mining. Gonçalves, B., Meiss, M. R., Ramasco, J. J., Flammini, A., & Menczer, F. (2009). Remembering what we like: Toward an agent-based model of web traffic. In WSDM ’09: Proceedings of the 2nd ACM international conference on web search and data mining.
Zurück zum Zitat Grimmett, G. R., & Stirzaker, D. R. (2001). Probability and random processes. New York: Oxford University Press. Grimmett, G. R., & Stirzaker, D. R. (2001). Probability and random processes. New York: Oxford University Press.
Zurück zum Zitat Haveliwala, T. H. (1999). Efficient computation of pagerank. Technical report, Stanford InfoLab. Haveliwala, T. H. (1999). Efficient computation of pagerank. Technical report, Stanford InfoLab.
Zurück zum Zitat Haveliwala, T. H. (2003). Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, 15(4), 784–796.CrossRef Haveliwala, T. H. (2003). Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, 15(4), 784–796.CrossRef
Zurück zum Zitat Järvelin, K., & Kekäläinen, J. (2000). Ir evaluation methods for retrieving highly relevant documents. In SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 41–48). Järvelin, K., & Kekäläinen, J. (2000). Ir evaluation methods for retrieving highly relevant documents. In SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 41–48).
Zurück zum Zitat Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRef Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRef
Zurück zum Zitat Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.CrossRefMathSciNetMATH Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.CrossRefMathSciNetMATH
Zurück zum Zitat Liu, T. Y., Yang, Y., Wan, H., Zhou, Q., Gao, B., Zeng, H. J., Chen, Z., & Ma, W. Y. (2005). An experimental study on large-scale web categorization. In WWW ’05: Special interest tracks and posters of the 14th international world wide web conference (pp. 1106–1107). Liu, T. Y., Yang, Y., Wan, H., Zhou, Q., Gao, B., Zeng, H. J., Chen, Z., & Ma, W. Y. (2005). An experimental study on large-scale web categorization. In WWW ’05: Special interest tracks and posters of the 14th international world wide web conference (pp. 1106–1107).
Zurück zum Zitat Liu, Y., Gao, B., Liu, T. Y., Zhang, Y., Ma, Z., He, S., & Li, H. (2008). Browserank: Letting web users vote for page importance. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 451–458). Liu, Y., Gao, B., Liu, T. Y., Zhang, Y., Ma, Z., He, S., & Li, H. (2008). Browserank: Letting web users vote for page importance. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 451–458).
Zurück zum Zitat Liu, Y., Liu, T. Y., Gao, B., Ma, Z., & Li, H. (2010). A framework to compute page importance based on user behaviors. Information Retrieval, 13(1), 22–45.CrossRef Liu, Y., Liu, T. Y., Gao, B., Ma, Z., & Li, H. (2010). A framework to compute page importance based on user behaviors. Information Retrieval, 13(1), 22–45.CrossRef
Zurück zum Zitat Meiss, M. R., Menczer, F., Fortunato, S., Flammini, A., & Vespignani, A. (2008). Ranking web sites with real user traffic. In WSDM ’08: Proceedings of the 1st ACM international conference on web search and data mining (pp. 65–76). Meiss, M. R., Menczer, F., Fortunato, S., Flammini, A., & Vespignani, A. (2008). Ranking web sites with real user traffic. In WSDM ’08: Proceedings of the 1st ACM international conference on web search and data mining (pp. 65–76).
Zurück zum Zitat Melucci, M. (2007). On rank correlation in information retrieval evaluation. ACM SIGIR Forum, 41(1), 18–33.CrossRef Melucci, M. (2007). On rank correlation in information retrieval evaluation. ACM SIGIR Forum, 41(1), 18–33.CrossRef
Zurück zum Zitat Minamide, Y. (2005). Static approximation of dynamically generated web pages. In WWW ’05: Proceedings of the 14th international world wide web conference (pp. 432–441). Minamide, Y. (2005). Static approximation of dynamically generated web pages. In WWW ’05: Proceedings of the 14th international world wide web conference (pp. 432–441).
Zurück zum Zitat Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab. Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab.
Zurück zum Zitat Richardson, M., & Domingos, P. (2002). The intelligent surfer: Probabilistic combination of link and content information in pagerank. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems 14 (pp. 1441–1448). Cambridge, MA: MIT Press. Richardson, M., & Domingos, P. (2002). The intelligent surfer: Probabilistic combination of link and content information in pagerank. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems 14 (pp. 1441–1448). Cambridge, MA: MIT Press.
Zurück zum Zitat Richardson, M., Prakash, A., & Brill, E. (2006). Beyond pagerank: Machine learning for static ranking. In WWW ’06: Proceedings of the 15th international world wide web conference (pp. 707–715). Richardson, M., Prakash, A., & Brill, E. (2006). Beyond pagerank: Machine learning for static ranking. In WWW ’06: Proceedings of the 15th international world wide web conference (pp. 707–715).
Zurück zum Zitat Ross, S. M. (2002). Introduction to probability models. San Diego: Academic Press. Ross, S. M. (2002). Introduction to probability models. San Diego: Academic Press.
Zurück zum Zitat Silverstein, C., Marais, H., Henzinger, M., & Moricz, M. (1999). Analysis of a very large web search engine query log. ACM SIGIR Forum, 33(1), 6–12.CrossRef Silverstein, C., Marais, H., Henzinger, M., & Moricz, M. (1999). Analysis of a very large web search engine query log. ACM SIGIR Forum, 33(1), 6–12.CrossRef
Zurück zum Zitat Tang, L., Rajan, S., & Narayanan, V. K. (2009). Large scale multi-label classification via metalabeler. In WWW ’09: Proceedings of the 18th international world wide web conference (pp. 211–220). Tang, L., Rajan, S., & Narayanan, V. K. (2009). Large scale multi-label classification via metalabeler. In WWW ’09: Proceedings of the 18th international world wide web conference (pp. 211–220).
Zurück zum Zitat Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 391–398). Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 391–398).
Zurück zum Zitat Yu, P. S., Li, X., & Liu, B. (2005). Adding the temporal dimension to search—A case study in publication search. In WI ’05: Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence (pp. 543–549). Yu, P. S., Li, X., & Liu, B. (2005). Adding the temporal dimension to search—A case study in publication search. In WI ’05: Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence (pp. 543–549).
Metadaten
Titel
Time-weighted web authoritative ranking
verfasst von
Bundit Manaskasemsak
Arnon Rungsawang
Hayato Yamana
Publikationsdatum
01.04.2011
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 2/2011
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-010-9138-4

Weitere Artikel der Ausgabe 2/2011

Discover Computing 2/2011 Zur Ausgabe

Premium Partner