Skip to main content
Erschienen in: Discover Computing 6/2021

04.10.2021

Combining semi-supervised and active learning to rank algorithms: application to Document Retrieval

verfasst von: Faiza Dammak, Hager Kammoun

Erschienen in: Discover Computing | Ausgabe 6/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Generally, the purpose of learning to rank methods is to combine the results from existing ranking models that within a single ranking function, applied to order the documents as efficiently as possible, improving the quality lists of results returned. However, learning to rank has several limitations namely the creation and size of the labeled database. We have considered the two frameworks of semi-supervised and active learning in order to look for solutions to these problems. We have been interested in semi-supervised, active and semi-active learning to rank algorithms for Document Retrieval (DR) which is a ranking application of alternatives. A good balance between exploration and exploitation has a positive impact on the performance of the learning. Thus, we have focused firstly on two active learning to rank algorithms that use supervised learning and semi-supervised learning as auxiliaries and use an automatic method for the labeling of unlabeled pairs selected. These algorithms are named “Semi-Active Learning to Rank: SAL2R” and “Active-Semi-Supervised Learning to Rank: ASSL2R”. We have been particulary interested in providing efficient and effective algorithms to handle a large set of unlabeled data. Second, we have considered improvement of these semi-active SAL2R and ASSL2R algorithms using a multi-pair in the selection step. Our contribution lies particulary in the in depth experimental study of the performance of these algorithms and precisely the influence of certain fixed parameters on the learned ranking function.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ailon, N. (2012). An active learning algorithm for ranking from pairwise preferences with an almost optimal query complexity. Journal of Machine Learning Research,13(Jan), 137–164. Ailon, N. (2012). An active learning algorithm for ranking from pairwise preferences with an almost optimal query complexity. Journal of Machine Learning Research,13(Jan), 137–164.
Zurück zum Zitat Amini, M. R., Truong, T. V., & Goutte, C. (2008). A boosting algorithm for learning bipartite ranking functions with partially labeled data, In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 99–106). Amini, M. R., Truong, T. V., & Goutte, C. (2008). A boosting algorithm for learning bipartite ranking functions with partially labeled data, In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 99–106).
Zurück zum Zitat Aslam, J. A., Pavlu, V., & Yilmaz, E. (2006). A statistical method for system evaluation using incomplete judgments. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 541–548). Aslam, J. A., Pavlu, V., & Yilmaz, E. (2006). A statistical method for system evaluation using incomplete judgments. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 541–548).
Zurück zum Zitat Brinker, K. (2004). Active learning of label ranking functions. In Proceedings of the twenty-first international conference on Machine learning (p. 17). Brinker, K. (2004). Active learning of label ranking functions. In Proceedings of the twenty-first international conference on Machine learning (p. 17).
Zurück zum Zitat Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. N. (2005). Learning to rank using gradient descent. In Proceedings of the 22nd international conference on machine learning (ICML-05) (pp. 89–96). Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. N. (2005). Learning to rank using gradient descent. In Proceedings of the 22nd international conference on machine learning (ICML-05) (pp. 89–96).
Zurück zum Zitat Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., & Li, H. (2007). Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning (pp. 129–136). Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., & Li, H. (2007). Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning (pp. 129–136).
Zurück zum Zitat Carterette, B., Allan, J., & Sitaraman, R. (2006). Minimal test collections for retrieval evaluation. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 268–275). Carterette, B., Allan, J., & Sitaraman, R. (2006). Minimal test collections for retrieval evaluation. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 268–275).
Zurück zum Zitat Chapelle, O., & Chang, Y. (2011). Yahoo! learning to rank challenge overview. In Proceedings of the learning to rank challenge (pp. 1–24). Chapelle, O., & Chang, Y. (2011). Yahoo! learning to rank challenge overview. In Proceedings of the learning to rank challenge (pp. 1–24).
Zurück zum Zitat Chapelle, O., Schölkopf, B., & Zien, A. (2006). Semi-supervised learning, ser. Adaptive computation and machine learning. The MIT Press. Chapelle, O., Schölkopf, B., & Zien, A. (2006). Semi-supervised learning, ser. Adaptive computation and machine learning. The MIT Press.
Zurück zum Zitat Dammak, F., Gabsi, I., Kammoun, H., & Hamadou, A. B. (2015). Active learning to rank for documents retrieval. In The tenth international conference on internet and web applications and services (ICIW) (pp. 16–21). Dammak, F., Gabsi, I., Kammoun, H., & Hamadou, A. B. (2015). Active learning to rank for documents retrieval. In The tenth international conference on internet and web applications and services (ICIW) (pp. 16–21).
Zurück zum Zitat Dammak, F., Kammoun, H., & Hamadou, A. B. (2017). Improving pairwise learning to rank algorithms for document retrieval, 2017. In IEEE symposium series on computational intelligence (SSCI) (pp. 1–8). IEEE. Dammak, F., Kammoun, H., & Hamadou, A. B. (2017). Improving pairwise learning to rank algorithms for document retrieval, 2017. In IEEE symposium series on computational intelligence (SSCI) (pp. 1–8). IEEE.
Zurück zum Zitat Dammak, F., Kammoun, H., Hmid, S. B., & Hamadou, A. B. (2017). Semi-active learning to rank algorithms for document retrieval. International Journal of Intelligent Information and Database Systems, 10(3–4), 289–313.CrossRef Dammak, F., Kammoun, H., Hmid, S. B., & Hamadou, A. B. (2017). Semi-active learning to rank algorithms for document retrieval. International Journal of Intelligent Information and Database Systems, 10(3–4), 289–313.CrossRef
Zurück zum Zitat Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research,7, 1–30 (JMLR. org). Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research,7, 1–30 (JMLR. org).
Zurück zum Zitat Duh, K., & Kirchhoff, K. (2008). Learning to rank with partially-labeled data. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 251–258). Duh, K., & Kirchhoff, K. (2008). Learning to rank with partially-labeled data. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 251–258).
Zurück zum Zitat Freund, Yoav, I., Raj, S., Robert, E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of machine learning research,4(Nov), 933–969. Freund, Yoav, I., Raj, S., Robert, E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of machine learning research,4(Nov), 933–969.
Zurück zum Zitat Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28(2–3), 133–168.CrossRef Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28(2–3), 133–168.CrossRef
Zurück zum Zitat Fujiwara, Y., & Irie, G. (2014). Efficient label propagation. In International conference on machine learning (pp. 784–792). Fujiwara, Y., & Irie, G. (2014). Efficient label propagation. In International conference on machine learning (pp. 784–792).
Zurück zum Zitat Gu, Y., Jin, Z., & Chiu, S. C. (2014). Combining active learning and semi-supervised learning using local and global consistency. International conference on neural information processing (pp. 215–222). Springer. Gu, Y., Jin, Z., & Chiu, S. C. (2014). Combining active learning and semi-supervised learning using local and global consistency. International conference on neural information processing (pp. 215–222). Springer.
Zurück zum Zitat Huang, S.-J., Jin, R., & Zhou, Z.-H. (2010). Active learning by querying informative and representative examples. In Advances in neural information processing systems (pp. 892–900). Huang, S.-J., Jin, R., & Zhou, Z.-H. (2010). Active learning by querying informative and representative examples. In Advances in neural information processing systems (pp. 892–900).
Zurück zum Zitat Järvelin, K., & Kekäläinen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 41–48). ACM. Järvelin, K., & Kekäläinen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 41–48). ACM.
Zurück zum Zitat Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 133–142). Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 133–142).
Zurück zum Zitat Kanoulas, E. (2009). Building reliable test and training collections in information retrieval. Northeastern University Boston. Kanoulas, E. (2009). Building reliable test and training collections in information retrieval. Northeastern University Boston.
Zurück zum Zitat Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5), 604–632.MathSciNetCrossRef Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5), 604–632.MathSciNetCrossRef
Zurück zum Zitat Krithara, A., Amini, M. R., Goutte, C., & Renders, J.-M. (2011). Learning aspect models with partially labeled data. Pattern Recognition Letters, 32(2), 297–304.CrossRef Krithara, A., Amini, M. R., Goutte, C., & Renders, J.-M. (2011). Learning aspect models with partially labeled data. Pattern Recognition Letters, 32(2), 297–304.CrossRef
Zurück zum Zitat Kuwadekar, A., & Neville, J. (2011). Relational active learning for joint collective classification models. In Proceedings of the 28th international conference on machine learning (icml-11) (pp. 385–392). Kuwadekar, A., & Neville, J. (2011). Relational active learning for joint collective classification models. In Proceedings of the 28th international conference on machine learning (icml-11) (pp. 385–392).
Zurück zum Zitat Leng, Y., Xu, X., & Qi, G. (2013). Combining active learning and semi-supervised learning to construct SVM classifier. Knowledge-Based Systems, 44, 121–131.CrossRef Leng, Y., Xu, X., & Qi, G. (2013). Combining active learning and semi-supervised learning to construct SVM classifier. Knowledge-Based Systems, 44, 121–131.CrossRef
Zurück zum Zitat Li, H. (2011). A short introduction to learning to rank. IEICE Transactions on Information and Systems, 94(10), 1854–1862.CrossRef Li, H. (2011). A short introduction to learning to rank. IEICE Transactions on Information and Systems, 94(10), 1854–1862.CrossRef
Zurück zum Zitat Li, M., Li, H., & Zhou, Z.-H. (2009). Semi-supervised document retrieval. Information Processing & Management, 45(3), 341–355.CrossRef Li, M., Li, H., & Zhou, Z.-H. (2009). Semi-supervised document retrieval. Information Processing & Management, 45(3), 341–355.CrossRef
Zurück zum Zitat Liu, T.-Y. (2011). Learning to rank for information retrieval. Springer Science & Business Media. Liu, T.-Y. (2011). Learning to rank for information retrieval. Springer Science & Business Media.
Zurück zum Zitat Liu, T. Y., Xu, J., Qin, T., Xiong, W., & Li, H. (2007). LETOR: Benchmark dataset for research on learning to rank for IR, L R4IR. Liu, T. Y., Xu, J., Qin, T., Xiong, W., & Li, H. (2007). LETOR: Benchmark dataset for research on learning to rank for IR, L R4IR.
Zurück zum Zitat Long, B., Bian, J., Chapelle, O., Zhang, Y., Inagaki, Y., & Chang, Y. (2014). Active learning for ranking through expected loss optimization. IEEE Transactions on Knowledge and Data Engineering, 27(5), 1180–1191.CrossRef Long, B., Bian, J., Chapelle, O., Zhang, Y., Inagaki, Y., & Chang, Y. (2014). Active learning for ranking through expected loss optimization. IEEE Transactions on Knowledge and Data Engineering, 27(5), 1180–1191.CrossRef
Zurück zum Zitat Melville, P., & Mooney, R. J. (2004). Diverse ensembles for active learning. In Proceedings of the twenty-first international conference on Machine learning. ACM 74. Melville, P., & Mooney, R. J. (2004). Diverse ensembles for active learning. In Proceedings of the twenty-first international conference on Machine learning. ACM 74.
Zurück zum Zitat Miao, Z., & Tang, K. (2013). Semi-supervised ranking via list-wise approach. In International conference on intelligent data engineering and automated learning (pp. 376–383). Springer. Miao, Z., & Tang, K. (2013). Semi-supervised ranking via list-wise approach. In International conference on intelligent data engineering and automated learning (pp. 376–383). Springer.
Zurück zum Zitat Muslea, I., Minton, S., & Knoblock, C. A. (2002). Active+ semi-supervised learning= robust multi-view learning. ICML, 2, 435–442. Muslea, I., Minton, S., & Knoblock, C. A. (2002). Active+ semi-supervised learning= robust multi-view learning. ICML, 2, 435–442.
Zurück zum Zitat Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web. Stanford InfoLab. Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web. Stanford InfoLab.
Zurück zum Zitat Pan, Z., You, X., Chen, H., Tao, D., & Pang, B. (2013). Generalization performance of magnitude-preserving semi-supervised ranking with graph-based regularization. Information Sciences, 221, 284–296.MathSciNetCrossRef Pan, Z., You, X., Chen, H., Tao, D., & Pang, B. (2013). Generalization performance of magnitude-preserving semi-supervised ranking with graph-based regularization. Information Sciences, 221, 284–296.MathSciNetCrossRef
Zurück zum Zitat Pavlu, V. (2008). Large scale ir evaluation. ProQuest LLC. Pavlu, V. (2008). Large scale ir evaluation. ProQuest LLC.
Zurück zum Zitat Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 275–281). Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 275–281).
Zurück zum Zitat Qiang, W., Burges, C. J. C., Svore, K. M., & Gao, J. (2010). Adapting boosting for information retrieval measures. Information Retrieval Journal, 13(53), 254–270. Qiang, W., Burges, C. J. C., Svore, K. M., & Gao, J. (2010). Adapting boosting for information retrieval measures. Information Retrieval Journal, 13(53), 254–270.
Zurück zum Zitat Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and trends in information retrieval (pp. 333–389). Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and trends in information retrieval (pp. 333–389).
Zurück zum Zitat Roy, N., & McCallum, A. (2001). Toward optimal active learning through monte carlo estimation of error reduction (pp. 441–448). ICML. Roy, N., & McCallum, A. (2001). Toward optimal active learning through monte carlo estimation of error reduction (pp. 441–448). ICML.
Zurück zum Zitat Settles, B. (2010). Active Learning Literature Survey (p. 1648). Comput. Sci. Technol. Rep. Settles, B. (2010). Active Learning Literature Survey (p. 1648). Comput. Sci. Technol. Rep.
Zurück zum Zitat Settles, B., & Craven, M. (2008). An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the conference on empirical methods in natural language processing (pp. 1070–1079). Association for Computational Linguistics. Settles, B., & Craven, M. (2008). An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the conference on empirical methods in natural language processing (pp. 1070–1079). Association for Computational Linguistics.
Zurück zum Zitat Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 287–294). ACM. Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 287–294). ACM.
Zurück zum Zitat Song, M., Yu, H., & Han, W.-S. (2011). Combining active learning and semi-supervised learning techniques to extract protein interaction sentences. BMC bioinformatics,12(12), S4. BioMed Central. Song, M., Yu, H., & Han, W.-S. (2011). Combining active learning and semi-supervised learning techniques to extract protein interaction sentences. BMC bioinformatics,12(12), S4. BioMed Central.
Zurück zum Zitat Tong, S. & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research,2(Nov), 45–66. Tong, S. & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research,2(Nov), 45–66.
Zurück zum Zitat Truong, T. V. (2009). Learning functions ranking with little labeled examples. PhD thesis, University of Pierre and Marie Curie—Paris. Truong, T. V. (2009). Learning functions ranking with little labeled examples. PhD thesis, University of Pierre and Marie Curie—Paris.
Zurück zum Zitat Tur, G., Hakkani-Tür, D., & Schapire, R. E. (2005). Combining active and semi-supervised learning for spoken language understanding. Speech Communication, 45(2), 171–186.CrossRef Tur, G., Hakkani-Tür, D., & Schapire, R. E. (2005). Combining active and semi-supervised learning for spoken language understanding. Speech Communication, 45(2), 171–186.CrossRef
Zurück zum Zitat Wilcoxon, F. (1992). Individual comparisons by ranking methods, Breakthroughs in statistics (pp. 196–202). Springer. Wilcoxon, F. (1992). Individual comparisons by ranking methods, Breakthroughs in statistics (pp. 196–202). Springer.
Zurück zum Zitat Xia, F., Liu, T.-Y., Wang, J., Zhang, W., & Li, H. (2008). Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th international conference on Machine learning (pp. 1192–1199). Xia, F., Liu, T.-Y., Wang, J., Zhang, W., & Li, H. (2008). Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th international conference on Machine learning (pp. 1192–1199).
Zurück zum Zitat Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 391–398). ACM. Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 391–398). ACM.
Zurück zum Zitat Yilmaz, E., & Aslam, J. A. (2006). Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM international conference on Information and knowledge management (pp. 102–111). Yilmaz, E., & Aslam, J. A. (2006). Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM international conference on Information and knowledge management (pp. 102–111).
Zurück zum Zitat Zhou, Z.-H., Chen, K.-J., & Dai, H.-B. (2006). ACM Transactions on Information Systems (TOIS),24(2), 219–244 (ACM). Zhou, Z.-H., Chen, K.-J., & Dai, H.-B. (2006). ACM Transactions on Information Systems (TOIS),24(2), 219–244 (ACM).
Zurück zum Zitat Zhu, X. J. (2005). Semi-supervised learning literature survey. University of Wisconsin-Madison Department of Computer Sciences. Zhu, X. J. (2005). Semi-supervised learning literature survey. University of Wisconsin-Madison Department of Computer Sciences.
Zurück zum Zitat Zhu, X., & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation. Citeseer. Zhu, X., & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation. Citeseer.
Metadaten
Titel
Combining semi-supervised and active learning to rank algorithms: application to Document Retrieval
verfasst von
Faiza Dammak
Hager Kammoun
Publikationsdatum
04.10.2021
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 6/2021
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-021-09396-2

Premium Partner