Skip to main content
Top
Published in: Discover Computing 2/2011

01-04-2011

Time-weighted web authoritative ranking

Authors: Bundit Manaskasemsak, Arnon Rungsawang, Hayato Yamana

Published in: Discover Computing | Issue 2/2011

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We investigate temporal factors in assessing the authoritativeness of web pages. We present three different metrics related to time: age, event, and trend. These metrics measure recentness, special event occurrence, and trend in revisions, respectively. An experimental dataset is created by crawling selected web pages for a period of several months. This data is used to compare page rankings by human users with rankings computed by the standard PageRank algorithm (which does not include temporal factors) and three algorithms that incorporate temporal factors, including the Time-Weighted PageRank (TWPR) algorithm introduced here. Analysis of the rankings shows that all three temporal-aware algorithms produce rankings more like those of human users than does the PageRank algorithm. Of these, the TWPR algorithm produces rankings most similar to human users’, indicating that all three temporal factors are relevant in page ranking. In addition, analysis of parameter values used to weight the three temporal factors reveals that age factor has the most impact on page rankings, while trend and event factors have the second and the least impact. Proper weighting of the three factors in TWPR algorithm provides the best ranking results.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The pages with zero out-degree cause the PageRank leak problem (Page et al. 1999).
 
2
The small self-loop cluster, having no connection back to main community, causes the PageRank sink problem (Page et al. 1999).
 
Literature
go back to reference Adamic, L. A., & Huberman, B. A. (2001). The web’s hidden order. Communications of the ACM, 44(9), 55–59.CrossRef Adamic, L. A., & Huberman, B. A. (2001). The web’s hidden order. Communications of the ACM, 44(9), 55–59.CrossRef
go back to reference Baeza-Yates, R. A., & Ribeiro-Neto, B. A. (1999). Modern information retrieval. New York: ACM Press & Addison Wesley. Baeza-Yates, R. A., & Ribeiro-Neto, B. A. (1999). Modern information retrieval. New York: ACM Press & Addison Wesley.
go back to reference Baeza-Yates, R. A., Saint-Jean, F., & Castillo, C. (2002). Web structure, dynamics and page quality. In SPIRE ’02: Proceedings of the 9th international symposium on string processing and information retrieval (pp. 117–130). Baeza-Yates, R. A., Saint-Jean, F., & Castillo, C. (2002). Web structure, dynamics and page quality. In SPIRE ’02: Proceedings of the 9th international symposium on string processing and information retrieval (pp. 117–130).
go back to reference Berberich, K., Vazirgiannis, M., & Weikum, G. (2006). Time-aware authority ranking. Internet Mathematics, 2(3), 301–332.CrossRefMathSciNet Berberich, K., Vazirgiannis, M., & Weikum, G. (2006). Time-aware authority ranking. Internet Mathematics, 2(3), 301–332.CrossRefMathSciNet
go back to reference Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.CrossRef Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.CrossRef
go back to reference Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton N., & Hullender, G. (2005). Learning to rank using gradient descent. In ICML ’05: Proceedings of the 22nd international conference on machine learning (pp. 89–96). Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton N., & Hullender, G. (2005). Learning to rank using gradient descent. In ICML ’05: Proceedings of the 22nd international conference on machine learning (pp. 89–96).
go back to reference Cai, D., Yu, S., Wen, J. R., & Ma, W. Y. (2003a). Extracting content structure for web pages based on visual representation. In APWeb ’03: Proceedings of the 5th Asia Pacific web conference (pp. 406–417). Cai, D., Yu, S., Wen, J. R., & Ma, W. Y. (2003a). Extracting content structure for web pages based on visual representation. In APWeb ’03: Proceedings of the 5th Asia Pacific web conference (pp. 406–417).
go back to reference Cai, D., Yu, S., Wen, J. R., & Ma, W. Y. (2003b). Vips: A vision-based page segmentation algorithm. Technical report, Microsoft Research. Cai, D., Yu, S., Wen, J. R., & Ma, W. Y. (2003b). Vips: A vision-based page segmentation algorithm. Technical report, Microsoft Research.
go back to reference Cai, D., He, X., Wen, J. R., & Ma, W. Y. (2004). Block-level link analysis. In SIGIR ’04: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 440–447). Cai, D., He, X., Wen, J. R., & Ma, W. Y. (2004). Block-level link analysis. In SIGIR ’04: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 440–447).
go back to reference Cho, J., & Garcia-Molina, H. (2003a). Effective page refresh policies for web crawlers. ACM Transactions on Database Systems, 28(4), 390–426.CrossRef Cho, J., & Garcia-Molina, H. (2003a). Effective page refresh policies for web crawlers. ACM Transactions on Database Systems, 28(4), 390–426.CrossRef
go back to reference Cho, J., & Garcia-Molina, H. (2003b). Estimating frequency of change. ACM Transactions on Internet Technology, 3(3), 256–290.CrossRef Cho, J., & Garcia-Molina, H. (2003b). Estimating frequency of change. ACM Transactions on Internet Technology, 3(3), 256–290.CrossRef
go back to reference Cho, J., & Roy, S. (2004). Impact of search engines on page popularity. In WWW ’04: Proceedings of the 13th international world wide web conference (pp. 20–29). Cho, J., & Roy, S. (2004). Impact of search engines on page popularity. In WWW ’04: Proceedings of the 13th international world wide web conference (pp. 20–29).
go back to reference Cho, J., Roy, S., & Adams, R. E. (2005). Page quality: In search of an unbiased web ranking. In SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD international conference on management of data (pp. 551–562). Cho, J., Roy, S., & Adams, R. E. (2005). Page quality: In search of an unbiased web ranking. In SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD international conference on management of data (pp. 551–562).
go back to reference Dwork, C., Kumar, R., Naor, M., & Sivakumar, D. (2001). Rank aggregation methods for the web. In WWW ’01: Proceedings of the 10th international world wide web conference (pp. 613–622). Dwork, C., Kumar, R., Naor, M., & Sivakumar, D. (2001). Rank aggregation methods for the web. In WWW ’01: Proceedings of the 10th international world wide web conference (pp. 613–622).
go back to reference Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874. Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.
go back to reference Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.CrossRef Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.CrossRef
go back to reference Geng, X., Liu, T. Y., Qin, T., Arnold, A., Li, H., & Shum, H. Y. (2008). Query dependent ranking using k-nearest neighbor. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 115–122). Geng, X., Liu, T. Y., Qin, T., Arnold, A., Li, H., & Shum, H. Y. (2008). Query dependent ranking using k-nearest neighbor. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 115–122).
go back to reference Golub, G. H., & Loan, C. F. V. (1996). Matrix computations. Baltimore and London: Johns Hopkins University Press.MATH Golub, G. H., & Loan, C. F. V. (1996). Matrix computations. Baltimore and London: Johns Hopkins University Press.MATH
go back to reference Gonçalves, B., Meiss, M. R., Ramasco, J. J., Flammini, A., & Menczer, F. (2009). Remembering what we like: Toward an agent-based model of web traffic. In WSDM ’09: Proceedings of the 2nd ACM international conference on web search and data mining. Gonçalves, B., Meiss, M. R., Ramasco, J. J., Flammini, A., & Menczer, F. (2009). Remembering what we like: Toward an agent-based model of web traffic. In WSDM ’09: Proceedings of the 2nd ACM international conference on web search and data mining.
go back to reference Grimmett, G. R., & Stirzaker, D. R. (2001). Probability and random processes. New York: Oxford University Press. Grimmett, G. R., & Stirzaker, D. R. (2001). Probability and random processes. New York: Oxford University Press.
go back to reference Haveliwala, T. H. (1999). Efficient computation of pagerank. Technical report, Stanford InfoLab. Haveliwala, T. H. (1999). Efficient computation of pagerank. Technical report, Stanford InfoLab.
go back to reference Haveliwala, T. H. (2003). Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, 15(4), 784–796.CrossRef Haveliwala, T. H. (2003). Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, 15(4), 784–796.CrossRef
go back to reference Järvelin, K., & Kekäläinen, J. (2000). Ir evaluation methods for retrieving highly relevant documents. In SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 41–48). Järvelin, K., & Kekäläinen, J. (2000). Ir evaluation methods for retrieving highly relevant documents. In SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 41–48).
go back to reference Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRef Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRef
go back to reference Liu, T. Y., Yang, Y., Wan, H., Zhou, Q., Gao, B., Zeng, H. J., Chen, Z., & Ma, W. Y. (2005). An experimental study on large-scale web categorization. In WWW ’05: Special interest tracks and posters of the 14th international world wide web conference (pp. 1106–1107). Liu, T. Y., Yang, Y., Wan, H., Zhou, Q., Gao, B., Zeng, H. J., Chen, Z., & Ma, W. Y. (2005). An experimental study on large-scale web categorization. In WWW ’05: Special interest tracks and posters of the 14th international world wide web conference (pp. 1106–1107).
go back to reference Liu, Y., Gao, B., Liu, T. Y., Zhang, Y., Ma, Z., He, S., & Li, H. (2008). Browserank: Letting web users vote for page importance. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 451–458). Liu, Y., Gao, B., Liu, T. Y., Zhang, Y., Ma, Z., He, S., & Li, H. (2008). Browserank: Letting web users vote for page importance. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 451–458).
go back to reference Liu, Y., Liu, T. Y., Gao, B., Ma, Z., & Li, H. (2010). A framework to compute page importance based on user behaviors. Information Retrieval, 13(1), 22–45.CrossRef Liu, Y., Liu, T. Y., Gao, B., Ma, Z., & Li, H. (2010). A framework to compute page importance based on user behaviors. Information Retrieval, 13(1), 22–45.CrossRef
go back to reference Meiss, M. R., Menczer, F., Fortunato, S., Flammini, A., & Vespignani, A. (2008). Ranking web sites with real user traffic. In WSDM ’08: Proceedings of the 1st ACM international conference on web search and data mining (pp. 65–76). Meiss, M. R., Menczer, F., Fortunato, S., Flammini, A., & Vespignani, A. (2008). Ranking web sites with real user traffic. In WSDM ’08: Proceedings of the 1st ACM international conference on web search and data mining (pp. 65–76).
go back to reference Melucci, M. (2007). On rank correlation in information retrieval evaluation. ACM SIGIR Forum, 41(1), 18–33.CrossRef Melucci, M. (2007). On rank correlation in information retrieval evaluation. ACM SIGIR Forum, 41(1), 18–33.CrossRef
go back to reference Minamide, Y. (2005). Static approximation of dynamically generated web pages. In WWW ’05: Proceedings of the 14th international world wide web conference (pp. 432–441). Minamide, Y. (2005). Static approximation of dynamically generated web pages. In WWW ’05: Proceedings of the 14th international world wide web conference (pp. 432–441).
go back to reference Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab. Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab.
go back to reference Richardson, M., & Domingos, P. (2002). The intelligent surfer: Probabilistic combination of link and content information in pagerank. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems 14 (pp. 1441–1448). Cambridge, MA: MIT Press. Richardson, M., & Domingos, P. (2002). The intelligent surfer: Probabilistic combination of link and content information in pagerank. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems 14 (pp. 1441–1448). Cambridge, MA: MIT Press.
go back to reference Richardson, M., Prakash, A., & Brill, E. (2006). Beyond pagerank: Machine learning for static ranking. In WWW ’06: Proceedings of the 15th international world wide web conference (pp. 707–715). Richardson, M., Prakash, A., & Brill, E. (2006). Beyond pagerank: Machine learning for static ranking. In WWW ’06: Proceedings of the 15th international world wide web conference (pp. 707–715).
go back to reference Ross, S. M. (2002). Introduction to probability models. San Diego: Academic Press. Ross, S. M. (2002). Introduction to probability models. San Diego: Academic Press.
go back to reference Silverstein, C., Marais, H., Henzinger, M., & Moricz, M. (1999). Analysis of a very large web search engine query log. ACM SIGIR Forum, 33(1), 6–12.CrossRef Silverstein, C., Marais, H., Henzinger, M., & Moricz, M. (1999). Analysis of a very large web search engine query log. ACM SIGIR Forum, 33(1), 6–12.CrossRef
go back to reference Tang, L., Rajan, S., & Narayanan, V. K. (2009). Large scale multi-label classification via metalabeler. In WWW ’09: Proceedings of the 18th international world wide web conference (pp. 211–220). Tang, L., Rajan, S., & Narayanan, V. K. (2009). Large scale multi-label classification via metalabeler. In WWW ’09: Proceedings of the 18th international world wide web conference (pp. 211–220).
go back to reference Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 391–398). Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 391–398).
go back to reference Yu, P. S., Li, X., & Liu, B. (2005). Adding the temporal dimension to search—A case study in publication search. In WI ’05: Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence (pp. 543–549). Yu, P. S., Li, X., & Liu, B. (2005). Adding the temporal dimension to search—A case study in publication search. In WI ’05: Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence (pp. 543–549).
Metadata
Title
Time-weighted web authoritative ranking
Authors
Bundit Manaskasemsak
Arnon Rungsawang
Hayato Yamana
Publication date
01-04-2011
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 2/2011
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-010-9138-4

Other articles of this Issue 2/2011

Discover Computing 2/2011 Go to the issue

Premium Partner