nach oben

Discover Computing

Erschienen in:

04.10.2021

Combining semi-supervised and active learning to rank algorithms: application to Document Retrieval

verfasst von: Faiza Dammak, Hager Kammoun

Erschienen in: Discover Computing | Ausgabe 6/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Generally, the purpose of learning to rank methods is to combine the results from existing ranking models that within a single ranking function, applied to order the documents as efficiently as possible, improving the quality lists of results returned. However, learning to rank has several limitations namely the creation and size of the labeled database. We have considered the two frameworks of semi-supervised and active learning in order to look for solutions to these problems. We have been interested in semi-supervised, active and semi-active learning to rank algorithms for Document Retrieval (DR) which is a ranking application of alternatives. A good balance between exploration and exploitation has a positive impact on the performance of the learning. Thus, we have focused firstly on two active learning to rank algorithms that use supervised learning and semi-supervised learning as auxiliaries and use an automatic method for the labeling of unlabeled pairs selected. These algorithms are named “Semi-Active Learning to Rank: SAL2R” and “Active-Semi-Supervised Learning to Rank: ASSL2R”. We have been particulary interested in providing efficient and effective algorithms to handle a large set of unlabeled data. Second, we have considered improvement of these semi-active SAL2R and ASSL2R algorithms using a multi-pair in the selection step. Our contribution lies particulary in the in depth experimental study of the performance of these algorithms and precisely the influence of certain fixed parameters on the learned ranking function.

Nächster Artikel Neural ranking models for document retrieval

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://www.bigdatalab.ac.cn/benchmark/bm/Domain?domain=Information%20Retrieval.

http://research.microsoft.com/en-us/um/beijing/projects/letor/.

Ailon, N. (2012). An active learning algorithm for ranking from pairwise preferences with an almost optimal query complexity. Journal of Machine Learning Research,13(Jan), 137–164.

Amini, M. R., Truong, T. V., & Goutte, C. (2008). A boosting algorithm for learning bipartite ranking functions with partially labeled data, In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 99–106).

Aslam, J. A., Pavlu, V., & Yilmaz, E. (2006). A statistical method for system evaluation using incomplete judgments. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 541–548).

Brinker, K. (2004). Active learning of label ranking functions. In Proceedings of the twenty-first international conference on Machine learning (p. 17).

Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. N. (2005). Learning to rank using gradient descent. In Proceedings of the 22nd international conference on machine learning (ICML-05) (pp. 89–96).

Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., & Li, H. (2007). Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning (pp. 129–136).

Carterette, B., Allan, J., & Sitaraman, R. (2006). Minimal test collections for retrieval evaluation. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 268–275).

Chapelle, O., & Chang, Y. (2011). Yahoo! learning to rank challenge overview. In Proceedings of the learning to rank challenge (pp. 1–24).

Chapelle, O., Schölkopf, B., & Zien, A. (2006). Semi-supervised learning, ser. Adaptive computation and machine learning. The MIT Press.

Dammak, F., Gabsi, I., Kammoun, H., & Hamadou, A. B. (2015). Active learning to rank for documents retrieval. In The tenth international conference on internet and web applications and services (ICIW) (pp. 16–21).

Dammak, F., Kammoun, H., & Hamadou, A. B. (2017). Improving pairwise learning to rank algorithms for document retrieval, 2017. In IEEE symposium series on computational intelligence (SSCI) (pp. 1–8). IEEE.

Dammak, F., Kammoun, H., Hmid, S. B., & Hamadou, A. B. (2017). Semi-active learning to rank algorithms for document retrieval. International Journal of Intelligent Information and Database Systems, 10(3–4), 289–313.CrossRef

Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research,7, 1–30 (JMLR. org).

Duh, K., & Kirchhoff, K. (2008). Learning to rank with partially-labeled data. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 251–258).

Freund, Yoav, I., Raj, S., Robert, E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of machine learning research,4(Nov), 933–969.

Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28(2–3), 133–168.CrossRef

Fujiwara, Y., & Irie, G. (2014). Efficient label propagation. In International conference on machine learning (pp. 784–792).

Gu, Y., Jin, Z., & Chiu, S. C. (2014). Combining active learning and semi-supervised learning using local and global consistency. International conference on neural information processing (pp. 215–222). Springer.

Huang, S.-J., Jin, R., & Zhou, Z.-H. (2010). Active learning by querying informative and representative examples. In Advances in neural information processing systems (pp. 892–900).

Järvelin, K., & Kekäläinen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 41–48). ACM.

Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 133–142).

Kanoulas, E. (2009). Building reliable test and training collections in information retrieval. Northeastern University Boston.

Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5), 604–632.MathSciNetCrossRef

Krithara, A., Amini, M. R., Goutte, C., & Renders, J.-M. (2011). Learning aspect models with partially labeled data. Pattern Recognition Letters, 32(2), 297–304.CrossRef

Kuwadekar, A., & Neville, J. (2011). Relational active learning for joint collective classification models. In Proceedings of the 28th international conference on machine learning (icml-11) (pp. 385–392).

Leng, Y., Xu, X., & Qi, G. (2013). Combining active learning and semi-supervised learning to construct SVM classifier. Knowledge-Based Systems, 44, 121–131.CrossRef

Li, H. (2011). A short introduction to learning to rank. IEICE Transactions on Information and Systems, 94(10), 1854–1862.CrossRef

Li, M., Li, H., & Zhou, Z.-H. (2009). Semi-supervised document retrieval. Information Processing & Management, 45(3), 341–355.CrossRef

Liu, T.-Y. (2011). Learning to rank for information retrieval. Springer Science & Business Media.

Liu, T. Y., Xu, J., Qin, T., Xiong, W., & Li, H. (2007). LETOR: Benchmark dataset for research on learning to rank for IR, L R4IR.

Long, B., Bian, J., Chapelle, O., Zhang, Y., Inagaki, Y., & Chang, Y. (2014). Active learning for ranking through expected loss optimization. IEEE Transactions on Knowledge and Data Engineering, 27(5), 1180–1191.CrossRef

Melville, P., & Mooney, R. J. (2004). Diverse ensembles for active learning. In Proceedings of the twenty-first international conference on Machine learning. ACM 74.

Miao, Z., & Tang, K. (2013). Semi-supervised ranking via list-wise approach. In International conference on intelligent data engineering and automated learning (pp. 376–383). Springer.

Muslea, I., Minton, S., & Knoblock, C. A. (2002). Active+ semi-supervised learning= robust multi-view learning. ICML, 2, 435–442.

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web. Stanford InfoLab.

Pan, Z., You, X., Chen, H., Tao, D., & Pang, B. (2013). Generalization performance of magnitude-preserving semi-supervised ranking with graph-based regularization. Information Sciences, 221, 284–296.MathSciNetCrossRef

Pavlu, V. (2008). Large scale ir evaluation. ProQuest LLC.

Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 275–281).

Qiang, W., Burges, C. J. C., Svore, K. M., & Gao, J. (2010). Adapting boosting for information retrieval measures. Information Retrieval Journal, 13(53), 254–270.

Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and trends in information retrieval (pp. 333–389).

Roy, N., & McCallum, A. (2001). Toward optimal active learning through monte carlo estimation of error reduction (pp. 441–448). ICML.

Settles, B. (2010). Active Learning Literature Survey (p. 1648). Comput. Sci. Technol. Rep.

Settles, B., & Craven, M. (2008). An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the conference on empirical methods in natural language processing (pp. 1070–1079). Association for Computational Linguistics.

Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 287–294). ACM.

Song, M., Yu, H., & Han, W.-S. (2011). Combining active learning and semi-supervised learning techniques to extract protein interaction sentences. BMC bioinformatics,12(12), S4. BioMed Central.

Tong, S. & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research,2(Nov), 45–66.

Truong, T. V. (2009). Learning functions ranking with little labeled examples. PhD thesis, University of Pierre and Marie Curie—Paris.

Tur, G., Hakkani-Tür, D., & Schapire, R. E. (2005). Combining active and semi-supervised learning for spoken language understanding. Speech Communication, 45(2), 171–186.CrossRef

Wilcoxon, F. (1992). Individual comparisons by ranking methods, Breakthroughs in statistics (pp. 196–202). Springer.

Xia, F., Liu, T.-Y., Wang, J., Zhang, W., & Li, H. (2008). Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th international conference on Machine learning (pp. 1192–1199).

Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 391–398). ACM.

Yilmaz, E., & Aslam, J. A. (2006). Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM international conference on Information and knowledge management (pp. 102–111).

Zhou, Z.-H., Chen, K.-J., & Dai, H.-B. (2006). ACM Transactions on Information Systems (TOIS),24(2), 219–244 (ACM).

Zhu, X. J. (2005). Semi-supervised learning literature survey. University of Wisconsin-Madison Department of Computer Sciences.

Zhu, X., & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation. Citeseer.

Titel: Combining semi-supervised and active learning to rank algorithms: application to Document Retrieval
verfasst von: Faiza Dammak
Hager Kammoun
Publikationsdatum: 04.10.2021
Verlag: Springer Netherlands
Erschienen in: Discover Computing / Ausgabe 6/2021
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI: https://doi.org/10.1007/s10791-021-09396-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner