nach oben

Cluster Computing

Erschienen in:

20.12.2016

Estimating reliability of the retrieval systems effectiveness rank based on performance in multiple experiments

verfasst von: Shuxiang Zhang, Sri Devi Ravana

Erschienen in: Cluster Computing | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

For decades, the use of test collection has been a standardized approach in information retrieval evaluation. However, given the intrinsic nature of its construction, this approach has a number of limitations, such as bias in pooling, disagreement between human assessors, different levels of difficulty of topics, and performance constraints of the evaluation metrics. Any of these factors may distort the results of the relative effectiveness of different retrieval strategies, or rather the retrieval systems and thus result in unreliable system rankings. In this study, we have suggested techniques in estimating the reliability of the retrieval system effectiveness rank based on rankings from multiple experiments. These rankings may be from previous experimental results or rankings generated by conducting multiple experiments using smaller number of topics. These techniques will assist in precisely predicting the performance of each system in future experiments. To validate the proposed rank reliability estimation methods, two alternative systems ranking methods are proposed to generate new system rankings. The experimentation shows that system rank correlation coefficient values mostly remain above 0.8 against the gold standard. On top of that, the proposed techniques have generated system rankings that are more reliable than the baseline [traditional system ranking techniques used in text retrieval conference (TREC)-like initiatives]. The results from both TREC-2004 and TREC-8 show the same outcome which further confirms the effectiveness of the proposed rank reliability estimation method.

Vorheriger Artikel An approach for scaling cloud resource management

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://trec.nist.gov/.

Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefMATH

Ravana, S.D., Rajagopal, P., Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments. Aslib J. Inf. Manag. 67(6), 700–714 (2015)CrossRef

Voorhees, E.M.: The philosophy of information retrieval evaluation. In: Evaluation of Cross-Language Information Retrieval Systems, pp. 355–370. Springer, (2001)

Tonon, A., Demartini, G., Cudr-Mauroux, P.: Pooling-based continuous evaluation of information retrieval systems. Inf. Retr. J. 18(5), 445–472 (2015)CrossRef

Buckley, C., Dimmick, D., Soboroff, I., Voorhees, E.: Bias and the limits of pooling for large collections. Inf. Retr. 10(6), 491–508 (2007)CrossRef

Jayasinghe, G.K., Webber, W., Sanderson, M., Culpepper, J.S.: Improving test collection pools with machine learning. In: Proceedings of the 2014 Australasian Document Computing Symposium, p. 2. ACM, (2014)

Losada, D.E., Parapar, J., Barreiro, A.: Feeling lucky? Multi-armed bandits for ordering judgements in pooling-based evaluation. Paper presented at the proceedings of the 31st annual ACM symposium on applied computing. (2016)

Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. Paper presented at the proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. (2001)

Nuray, R., Can, F.: Automatic ranking of information retrieval systems using data fusion. Inf. Process. Manag. 42(3), 595–614 (2006)CrossRefMATH

10.

Hauff, C., Hiemstra, D., Azzopardi, L., De Jong, F.: A case for automatic system evaluation. Advances in Information Retrieval, pp. 153–165. Springer, (2010)

11.

Gao, N., Webber, W., Oard, D.W.: Reducing reliance on relevance judgments for system comparison by using expectation-maximization. Advances in Information Retrieval, pp. 1–12. Springer, (2014)

12.

Demeester, T., Aly, R., Hiemstra, D., Nguyen, D., Develder, C.: Predicting relevance based on assessor disagreement: analysis and practical applications for search evaluation. Inf. Retr. J. 19(3), 284–312 (2016). doi:10.1007/s10791-015-9275-x

13.

Trotman, A., Jenkinson, D.: IR evaluation using multiple assessors per topic. In: Proceedings of ADCS. (2007)

14.

Zhu, J., Wang, J., Vinay, V., Cox, I.J.: Topic (query) selection for IR evaluation. Paper presented at the proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. (2009)

15.

Moffat, A., Scholer, F., Thomas, P., Bailey, P.: Pooled evaluation over query variations: users are as diverse as systems. Paper presented at the proceedings of the 24th ACM international on conference on information and knowledge management. (2015)

16.

Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. Paper presented at the proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. (2004)

17.

Sakai, T.: Alternatives to bpref. Paper presented at the proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. (2007)

18.

Yilmaz, E., Aslam, J.A.: Estimating average precision with incomplete and imperfect judgments. Paper presented at the proceedings of the 15th ACM international conference on information and knowledge management. (2006)

19.

Sakai, T., Kando, N.: On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf. Retr. 11(5), 447–470 (2008)

20.

Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating AP and NDCG. Paper presented at the proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. (2008)

21.

Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J.A., Allan, J.: Evaluation over thousands of queries. Paper presented at the proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. (2008)

22.

Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1), 2 (2008)CrossRef

23.

Moffat, A., Bailey, P., Scholer, F., Thomas, P.: INST: an adaptive metric for information retrieval evaluation. Paper presented at the proceedings of the 20th Australasian document computing symposium. (2015)

24.

Webber, W., Park, L.A.: Score adjustment for correction of pooling bias. Paper presented at the proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. (2009)

25.

Lipani, A., Lupu, M., Hanbury, A.: Splitting water: precision and anti-precision to reduce pool bias. Paper presented at the proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. (2015)

26.

Ravana, S.D., Moffat, A.: Score aggregation techniques in retrieval experimentation. Paper presented at the proceedings of the twentieth Australasian conference on Australasian database, vol. 92. (2009)

27.

Voorhees, E.M.: The TREC robust retrieval track. SIGIR Forum 39(1), 11–20 (2005). doi:10.1145/1067268.1067272 CrossRef

28.

Voorhees, E.M., Harman, D.: The text REtrieval conference (TREC): history and plans for TREC-9. Paper presented at the ACM SIGIR forum. (1999)

29.

Moffat, A., Scholer, F., Thomas, P.: Models and metrics: IR evaluation as a user process. Paper presented at the proceedings of the seventeenth Australasian document computing symposium. (2012)

30.

Ferro, N., Silvello, G.: Rank-biased precision reloaded: reproducibility and generalization advances in information retrieval, pp. 768–780. Springer, (2015)

Titel: Estimating reliability of the retrieval systems effectiveness rank based on performance in multiple experiments
verfasst von: Shuxiang Zhang
Sri Devi Ravana
Publikationsdatum: 20.12.2016
Verlag: Springer US
Erschienen in: Cluster Computing / Ausgabe 1/2017
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-016-0709-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2017

UPQC voltage sag detection based on chaotic immune gentic algorithm

Efficient and robust frame-synchronized blind audio watermarking by featuring multilevel DWT and DCT

Domain-specific diagrammatic modelling: a source of machine-readable semantics for the Internet of Things

A dynamic CTA scheduling scheme for massive parallel computing

The study of exercise and health services platform for prevention of dementia

A nanocommunication system for endocrine diseases

Premium Partner