Skip to main content
Erschienen in: Cluster Computing 1/2017

20.12.2016

Estimating reliability of the retrieval systems effectiveness rank based on performance in multiple experiments

verfasst von: Shuxiang Zhang, Sri Devi Ravana

Erschienen in: Cluster Computing | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

For decades, the use of test collection has been a standardized approach in information retrieval evaluation. However, given the intrinsic nature of its construction, this approach has a number of limitations, such as bias in pooling, disagreement between human assessors, different levels of difficulty of topics, and performance constraints of the evaluation metrics. Any of these factors may distort the results of the relative effectiveness of different retrieval strategies, or rather the retrieval systems and thus result in unreliable system rankings. In this study, we have suggested techniques in estimating the reliability of the retrieval system effectiveness rank based on rankings from multiple experiments. These rankings may be from previous experimental results or rankings generated by conducting multiple experiments using smaller number of topics. These techniques will assist in precisely predicting the performance of each system in future experiments. To validate the proposed rank reliability estimation methods, two alternative systems ranking methods are proposed to generate new system rankings. The experimentation shows that system rank correlation coefficient values mostly remain above 0.8 against the gold standard. On top of that, the proposed techniques have generated system rankings that are more reliable than the baseline [traditional system ranking techniques used in text retrieval conference (TREC)-like initiatives]. The results from both TREC-2004 and TREC-8 show the same outcome which further confirms the effectiveness of the proposed rank reliability estimation method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefMATH Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefMATH
2.
Zurück zum Zitat Ravana, S.D., Rajagopal, P., Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments. Aslib J. Inf. Manag. 67(6), 700–714 (2015)CrossRef Ravana, S.D., Rajagopal, P., Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments. Aslib J. Inf. Manag. 67(6), 700–714 (2015)CrossRef
3.
Zurück zum Zitat Voorhees, E.M.: The philosophy of information retrieval evaluation. In: Evaluation of Cross-Language Information Retrieval Systems, pp. 355–370. Springer, (2001) Voorhees, E.M.: The philosophy of information retrieval evaluation. In: Evaluation of Cross-Language Information Retrieval Systems, pp. 355–370. Springer, (2001)
4.
Zurück zum Zitat Tonon, A., Demartini, G., Cudr-Mauroux, P.: Pooling-based continuous evaluation of information retrieval systems. Inf. Retr. J. 18(5), 445–472 (2015)CrossRef Tonon, A., Demartini, G., Cudr-Mauroux, P.: Pooling-based continuous evaluation of information retrieval systems. Inf. Retr. J. 18(5), 445–472 (2015)CrossRef
5.
Zurück zum Zitat Buckley, C., Dimmick, D., Soboroff, I., Voorhees, E.: Bias and the limits of pooling for large collections. Inf. Retr. 10(6), 491–508 (2007)CrossRef Buckley, C., Dimmick, D., Soboroff, I., Voorhees, E.: Bias and the limits of pooling for large collections. Inf. Retr. 10(6), 491–508 (2007)CrossRef
6.
Zurück zum Zitat Jayasinghe, G.K., Webber, W., Sanderson, M., Culpepper, J.S.: Improving test collection pools with machine learning. In: Proceedings of the 2014 Australasian Document Computing Symposium, p. 2. ACM, (2014) Jayasinghe, G.K., Webber, W., Sanderson, M., Culpepper, J.S.: Improving test collection pools with machine learning. In: Proceedings of the 2014 Australasian Document Computing Symposium, p. 2. ACM, (2014)
7.
Zurück zum Zitat Losada, D.E., Parapar, J., Barreiro, A.: Feeling lucky? Multi-armed bandits for ordering judgements in pooling-based evaluation. Paper presented at the proceedings of the 31st annual ACM symposium on applied computing. (2016) Losada, D.E., Parapar, J., Barreiro, A.: Feeling lucky? Multi-armed bandits for ordering judgements in pooling-based evaluation. Paper presented at the proceedings of the 31st annual ACM symposium on applied computing. (2016)
8.
Zurück zum Zitat Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. Paper presented at the proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. (2001) Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. Paper presented at the proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. (2001)
9.
Zurück zum Zitat Nuray, R., Can, F.: Automatic ranking of information retrieval systems using data fusion. Inf. Process. Manag. 42(3), 595–614 (2006)CrossRefMATH Nuray, R., Can, F.: Automatic ranking of information retrieval systems using data fusion. Inf. Process. Manag. 42(3), 595–614 (2006)CrossRefMATH
10.
Zurück zum Zitat Hauff, C., Hiemstra, D., Azzopardi, L., De Jong, F.: A case for automatic system evaluation. Advances in Information Retrieval, pp. 153–165. Springer, (2010) Hauff, C., Hiemstra, D., Azzopardi, L., De Jong, F.: A case for automatic system evaluation. Advances in Information Retrieval, pp. 153–165. Springer, (2010)
11.
Zurück zum Zitat Gao, N., Webber, W., Oard, D.W.: Reducing reliance on relevance judgments for system comparison by using expectation-maximization. Advances in Information Retrieval, pp. 1–12. Springer, (2014) Gao, N., Webber, W., Oard, D.W.: Reducing reliance on relevance judgments for system comparison by using expectation-maximization. Advances in Information Retrieval, pp. 1–12. Springer, (2014)
12.
Zurück zum Zitat Demeester, T., Aly, R., Hiemstra, D., Nguyen, D., Develder, C.: Predicting relevance based on assessor disagreement: analysis and practical applications for search evaluation. Inf. Retr. J. 19(3), 284–312 (2016). doi:10.1007/s10791-015-9275-x Demeester, T., Aly, R., Hiemstra, D., Nguyen, D., Develder, C.: Predicting relevance based on assessor disagreement: analysis and practical applications for search evaluation. Inf. Retr. J. 19(3), 284–312 (2016). doi:10.​1007/​s10791-015-9275-x
13.
Zurück zum Zitat Trotman, A., Jenkinson, D.: IR evaluation using multiple assessors per topic. In: Proceedings of ADCS. (2007) Trotman, A., Jenkinson, D.: IR evaluation using multiple assessors per topic. In: Proceedings of ADCS. (2007)
14.
Zurück zum Zitat Zhu, J., Wang, J., Vinay, V., Cox, I.J.: Topic (query) selection for IR evaluation. Paper presented at the proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. (2009) Zhu, J., Wang, J., Vinay, V., Cox, I.J.: Topic (query) selection for IR evaluation. Paper presented at the proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. (2009)
15.
Zurück zum Zitat Moffat, A., Scholer, F., Thomas, P., Bailey, P.: Pooled evaluation over query variations: users are as diverse as systems. Paper presented at the proceedings of the 24th ACM international on conference on information and knowledge management. (2015) Moffat, A., Scholer, F., Thomas, P., Bailey, P.: Pooled evaluation over query variations: users are as diverse as systems. Paper presented at the proceedings of the 24th ACM international on conference on information and knowledge management. (2015)
16.
Zurück zum Zitat Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. Paper presented at the proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. (2004) Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. Paper presented at the proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. (2004)
17.
Zurück zum Zitat Sakai, T.: Alternatives to bpref. Paper presented at the proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. (2007) Sakai, T.: Alternatives to bpref. Paper presented at the proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. (2007)
18.
Zurück zum Zitat Yilmaz, E., Aslam, J.A.: Estimating average precision with incomplete and imperfect judgments. Paper presented at the proceedings of the 15th ACM international conference on information and knowledge management. (2006) Yilmaz, E., Aslam, J.A.: Estimating average precision with incomplete and imperfect judgments. Paper presented at the proceedings of the 15th ACM international conference on information and knowledge management. (2006)
19.
Zurück zum Zitat Sakai, T., Kando, N.: On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf. Retr. 11(5), 447–470 (2008) Sakai, T., Kando, N.: On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf. Retr. 11(5), 447–470 (2008)
20.
Zurück zum Zitat Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating AP and NDCG. Paper presented at the proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. (2008) Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating AP and NDCG. Paper presented at the proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. (2008)
21.
Zurück zum Zitat Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J.A., Allan, J.: Evaluation over thousands of queries. Paper presented at the proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. (2008) Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J.A., Allan, J.: Evaluation over thousands of queries. Paper presented at the proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. (2008)
22.
Zurück zum Zitat Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1), 2 (2008)CrossRef Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1), 2 (2008)CrossRef
23.
Zurück zum Zitat Moffat, A., Bailey, P., Scholer, F., Thomas, P.: INST: an adaptive metric for information retrieval evaluation. Paper presented at the proceedings of the 20th Australasian document computing symposium. (2015) Moffat, A., Bailey, P., Scholer, F., Thomas, P.: INST: an adaptive metric for information retrieval evaluation. Paper presented at the proceedings of the 20th Australasian document computing symposium. (2015)
24.
Zurück zum Zitat Webber, W., Park, L.A.: Score adjustment for correction of pooling bias. Paper presented at the proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. (2009) Webber, W., Park, L.A.: Score adjustment for correction of pooling bias. Paper presented at the proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. (2009)
25.
Zurück zum Zitat Lipani, A., Lupu, M., Hanbury, A.: Splitting water: precision and anti-precision to reduce pool bias. Paper presented at the proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. (2015) Lipani, A., Lupu, M., Hanbury, A.: Splitting water: precision and anti-precision to reduce pool bias. Paper presented at the proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. (2015)
26.
Zurück zum Zitat Ravana, S.D., Moffat, A.: Score aggregation techniques in retrieval experimentation. Paper presented at the proceedings of the twentieth Australasian conference on Australasian database, vol. 92. (2009) Ravana, S.D., Moffat, A.: Score aggregation techniques in retrieval experimentation. Paper presented at the proceedings of the twentieth Australasian conference on Australasian database, vol. 92. (2009)
28.
Zurück zum Zitat Voorhees, E.M., Harman, D.: The text REtrieval conference (TREC): history and plans for TREC-9. Paper presented at the ACM SIGIR forum. (1999) Voorhees, E.M., Harman, D.: The text REtrieval conference (TREC): history and plans for TREC-9. Paper presented at the ACM SIGIR forum. (1999)
29.
Zurück zum Zitat Moffat, A., Scholer, F., Thomas, P.: Models and metrics: IR evaluation as a user process. Paper presented at the proceedings of the seventeenth Australasian document computing symposium. (2012) Moffat, A., Scholer, F., Thomas, P.: Models and metrics: IR evaluation as a user process. Paper presented at the proceedings of the seventeenth Australasian document computing symposium. (2012)
30.
Zurück zum Zitat Ferro, N., Silvello, G.: Rank-biased precision reloaded: reproducibility and generalization advances in information retrieval, pp. 768–780. Springer, (2015) Ferro, N., Silvello, G.: Rank-biased precision reloaded: reproducibility and generalization advances in information retrieval, pp. 768–780. Springer, (2015)
Metadaten
Titel
Estimating reliability of the retrieval systems effectiveness rank based on performance in multiple experiments
verfasst von
Shuxiang Zhang
Sri Devi Ravana
Publikationsdatum
20.12.2016
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 1/2017
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-016-0709-z

Weitere Artikel der Ausgabe 1/2017

Cluster Computing 1/2017 Zur Ausgabe

Premium Partner