Skip to main content
Erschienen in: Pattern Analysis and Applications 3/2020

28.10.2019 | Theoretical advances

An empirical comparison of random forest-based and other learning-to-rank algorithms

verfasst von: Muhammad Ibrahim

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Random forest (RF)-based pointwise learning-to-rank (LtR) algorithms use surrogate loss functions to minimize the ranking error. In spite of their competitive performance to other state-of-the-art LtR algorithms, these algorithms, unlike other frameworks such as boosting and neural network, have not been thoroughly investigated in the literature so far. In the first part of this study, we aim to better understand and improve the RF-based pointwise LtR algorithms. When working with such an algorithm, currently we need to choose a setting from a number of available options such as (1) classification versus regression setting, (2) using absolute relevance judgements versus mapped labels, (3) the number of features using which a split-point for data is chosen, and (4) using weighted versus un-weighted average of the predictions of multiple base learners (i.e., trees). We conduct a thorough study on these four aspects as well as on a pairwise objective function for RF-based rank-learners. Experimental results on several benchmark LtR datasets demonstrate that performance can be significantly improved by exploring these aspects. In the second part of this paper, we, guided by our investigations performed into RF-based rank-learners, conduct extensive comparison between these and state-of-the-art rank-learning algorithms. This comparison reveals some interesting and insightful findings about LtR algorithms including the finding that RF-based LtR algorithms are among the most robust techniques across datasets with diverse properties.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
2
Manning et al. [1] nicely explain these models.
 
5
To know details of these metrics, the reader can go through Järvelin and Kekäläinen [31], Chapelle et al. [32] and Ibrahim and Murshed [4].
 
6
This algorithm is also used by Ibrahim and Carman [15].
 
7
While there exist several functions to be used as splitting criterion, Marko Robnik-Sikonja [35] shows that this choice makes insignificant performance variation, if at all.
 
8
While in the literature most of the implementations of a tree uses a depth-first (i.e., recursive) exploration of the nodes, the implementation shown here uses a breadth-first exploration mainly because we think that this represents a more systematic way of exploring the nodes. For an entropy-based objective function, the node exploration strategy does not affect the tree structure, i.e., the data partitions [15].
 
9
For all of the experiments of this section, for two larger datasets (MSLR-WEB10K and Yahoo), the bold and italic and bold figures denote that the best performance is significant with p value less than 0.01 and 0.05, respectively. For the smaller datasets, an average over five independent runs is reported (and each run is the result of fivefold cross-validation), and the winning value is given in italic font.
 
10
Recall that the usual practice has been to use the average relevance of the instances as the score.
 
11
Since the properties of HP2004 and NP2004 datasets are similar to that of TD2004, we do not conduct further experiments.
 
12
Although the features are not disclosed for the Yahoo dataset, a particular feature index corresponding to BM25 is mentioned.
 
13
This is intuitive since navigational queries have very few relevant documents, setting a higher value for k facilitates the base ranker(s) put those few relevant documents into the training set of rank-learning phase.
 
14
Ibrahim and Carman [15] report results of only RF-rand, RF-point with classification and RF-list.
 
15
We perform a pairwise significance test on the comparatively larger (in terms of number of queries) MQ2007 and MQ2008 datasets. Since for the rest of the datasets the number of queries is small, the significance test results may not be reliable.
 
16
As explained earlier, since HP2004 and NP2004 datasets contain navigational queries, MAP may not be considered to be a very effective choice for evaluation of this type of information need [1, Sec. 8.4]. That is why, in Table 12 we chose NDCG@10 for overall comparison.
 
17
We, however, did run a pilot experiment on comparison between RF-point and RF-hybrid with \(K \in \{23, 50, 80, 130\}\) (the value 23 is used as \(\sqrt{(}M)\)), and did not observe improvement in performance of RF-hybrid over RF-point for any of the settings. We thus conclude that the K is not likely to play a significant role in the relative performance of these two systems for the Yahoo dataset.
 
18
The inventors of the said technique [63] admit that there is a concern of degradation of accuracy, slightly though.
 
Literatur
1.
Zurück zum Zitat Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, CambridgeMATH Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, CambridgeMATH
2.
Zurück zum Zitat Li H (2011) Learning to rank for information retrieval and natural language processing. Synth Lect Hum Lang Technol 4(1):1–113 Li H (2011) Learning to rank for information retrieval and natural language processing. Synth Lect Hum Lang Technol 4(1):1–113
3.
Zurück zum Zitat Liu TY (2011) Learning to rank for information retrieval. Springer, BerlinMATH Liu TY (2011) Learning to rank for information retrieval. Springer, BerlinMATH
4.
Zurück zum Zitat Ibrahim M, Murshed M (2016) From tf-idf to learning-to-rank: an overview. In: Handbook of research on innovations in information retrieval, analysis, and management. IGI Global, USA, pp 62–109 Ibrahim M, Murshed M (2016) From tf-idf to learning-to-rank: an overview. In: Handbook of research on innovations in information retrieval, analysis, and management. IGI Global, USA, pp 62–109
5.
Zurück zum Zitat Karatzoglou A, Baltrunas L, Shi Y (2013) Learning to rank for recommender systems. In: Proceedings of the 7th ACM conference on recommender systems, ACM, pp 493–494 Karatzoglou A, Baltrunas L, Shi Y (2013) Learning to rank for recommender systems. In: Proceedings of the 7th ACM conference on recommender systems, ACM, pp 493–494
6.
Zurück zum Zitat Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47(2):227–237 Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47(2):227–237
7.
Zurück zum Zitat Santos RL, Macdonald C, Ounis I (2013) Learning to rank query suggestions for adhoc and diversity search. Inf Retr 16(4):429–451 Santos RL, Macdonald C, Ounis I (2013) Learning to rank query suggestions for adhoc and diversity search. Inf Retr 16(4):429–451
8.
Zurück zum Zitat Li Z, Tang J, Mei T (2019) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083 Li Z, Tang J, Mei T (2019) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083
9.
Zurück zum Zitat Li Z, Tang J, He X (2018) Robust structured nonnegative matrix factorization for image representation. IEEE Trans Neural Netw Learn Syst 29(5):1947–1960MathSciNet Li Z, Tang J, He X (2018) Robust structured nonnegative matrix factorization for image representation. IEEE Trans Neural Netw Learn Syst 29(5):1947–1960MathSciNet
10.
Zurück zum Zitat Dang V, Bendersky M, Croft WB (2013) Two-stage learning to rank for information retrieval. In: Advances in information retrieval. Springer, pp 423–434 Dang V, Bendersky M, Croft WB (2013) Two-stage learning to rank for information retrieval. In: Advances in information retrieval. Springer, pp 423–434
11.
Zurück zum Zitat Macdonald C, Santos RL, Ounis I (2013) The whens and hows of learning to rank for web search. Inf Retr 16(5):584–628 Macdonald C, Santos RL, Ounis I (2013) The whens and hows of learning to rank for web search. Inf Retr 16(5):584–628
12.
Zurück zum Zitat Aslam JA, Kanoulas E, Pavlu V, Savev S, Yilmaz E (2009) Document selection methodologies for efficient and effective learning-to-rank. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 468–475 Aslam JA, Kanoulas E, Pavlu V, Savev S, Yilmaz E (2009) Document selection methodologies for efficient and effective learning-to-rank. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 468–475
13.
Zurück zum Zitat Pavlu V (2008) Large scale ir evaluation. ProQuest LLC, Ann Arbor Pavlu V (2008) Large scale ir evaluation. ProQuest LLC, Ann Arbor
14.
Zurück zum Zitat Qin T, Liu TY, Xu J, Li H (2010) Letor: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13(4):346–374 Qin T, Liu TY, Xu J, Li H (2010) Letor: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13(4):346–374
15.
Zurück zum Zitat Ibrahim M, Carman M (2016) Comparing pointwise and listwise objective functions for random-forest-based learning-to-rank. ACM TOIS 34(4):20 Ibrahim M, Carman M (2016) Comparing pointwise and listwise objective functions for random-forest-based learning-to-rank. ACM TOIS 34(4):20
16.
Zurück zum Zitat Breiman L (2001) Random forests. Mach Learn 45(1):5–32MATH Breiman L (2001) Random forests. Mach Learn 45(1):5–32MATH
18.
Zurück zum Zitat Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180 Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180
19.
Zurück zum Zitat Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 161–168 Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 161–168
20.
Zurück zum Zitat Criminisi A (2011) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7(2–3):81–227MATH Criminisi A (2011) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7(2–3):81–227MATH
21.
Zurück zum Zitat Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181MathSciNetMATH Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181MathSciNetMATH
22.
23.
Zurück zum Zitat Cui Z, Chen W, He Y, Chen Y (2015) Optimal action extraction for random forests and boosted trees. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 179–188 Cui Z, Chen W, He Y, Chen Y (2015) Optimal action extraction for random forests and boosted trees. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 179–188
24.
Zurück zum Zitat Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7(1):3 Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7(1):3
25.
Zurück zum Zitat Dong Y, Zhang Y, Yue J, Hu Z (2016) Comparison of random forest, random ferns and support vector machine for eye state classification. Multimed Tools Appl 75(19):11763–11783 Dong Y, Zhang Y, Yue J, Hu Z (2016) Comparison of random forest, random ferns and support vector machine for eye state classification. Multimed Tools Appl 75(19):11763–11783
26.
Zurück zum Zitat Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294–300 Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294–300
27.
Zurück zum Zitat Chapelle O, Chang Y (2011) Yahoo! learning to rank challenge overview. J Mach Learn Res Proc Track 14:1–24 Chapelle O, Chang Y (2011) Yahoo! learning to rank challenge overview. J Mach Learn Res Proc Track 14:1–24
28.
Zurück zum Zitat Geurts P, Louppe G (2011) Learning to rank with extremely randomized trees. In: JMLR: workshop and conference proceedings, vol 14 Geurts P, Louppe G (2011) Learning to rank with extremely randomized trees. In: JMLR: workshop and conference proceedings, vol 14
29.
Zurück zum Zitat Mohan A, Chen Z, Weinberger KQ (2011) Web-search ranking with initialized gradient boosted regression trees. J Mach Learn Res Proc Track 14:77–89 Mohan A, Chen Z, Weinberger KQ (2011) Web-search ranking with initialized gradient boosted regression trees. J Mach Learn Res Proc Track 14:77–89
30.
Zurück zum Zitat Han X, Lei S (2018) Feature selection and model comparison on microsoft learning-to-rank data sets. arXiv preprint arXiv:1803.05127 Han X, Lei S (2018) Feature selection and model comparison on microsoft learning-to-rank data sets. arXiv preprint arXiv:​1803.​05127
31.
Zurück zum Zitat Järvelin K, Kekäläinen J (2000) Ir evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 41–48 Järvelin K, Kekäläinen J (2000) Ir evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 41–48
32.
Zurück zum Zitat Chapelle O, Metlzer D, Zhang Y, Grinspan P (2009) Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM). ACM, pp 621–630 Chapelle O, Metlzer D, Zhang Y, Grinspan P (2009) Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM). ACM, pp 621–630
33.
Zurück zum Zitat Li P, Wu Q, Burges C (2007) McRank: learning to rank using classification and gradient boosting. Adv Neural Inf Process Syst 20:897–904 Li P, Wu Q, Burges C (2007) McRank: learning to rank using classification and gradient boosting. Adv Neural Inf Process Syst 20:897–904
34.
Zurück zum Zitat Cossock D, Zhang T (2006) Subset ranking using regression. Learning Theory, pp 605–619 Cossock D, Zhang T (2006) Subset ranking using regression. Learning Theory, pp 605–619
35.
Zurück zum Zitat Robnik-Šikonja M (2004) Improving random forests. In: Machine learning: ECML 2004. Springer, pp 359–370 Robnik-Šikonja M (2004) Improving random forests. In: Machine learning: ECML 2004. Springer, pp 359–370
36.
Zurück zum Zitat Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42MATH Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42MATH
37.
Zurück zum Zitat Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J Mach Learn Res 15(1):1625–1651MathSciNetMATH Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J Mach Learn Res 15(1):1625–1651MathSciNetMATH
38.
Zurück zum Zitat Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH
39.
40.
Zurück zum Zitat Winham SJ, Freimuth RR, Biernacka JM (2013) A weighted random forests approach to improve predictive performance. Stat Anal Data Min ASA Data Sci J 6(6):496–505MathSciNetMATH Winham SJ, Freimuth RR, Biernacka JM (2013) A weighted random forests approach to improve predictive performance. Stat Anal Data Min ASA Data Sci J 6(6):496–505MathSciNetMATH
41.
Zurück zum Zitat Bernard S, Heutte L, Adam S (2009) On the selection of decision trees in random forests. In: International joint conference on neural networks, 2009. IJCNN 2009. IEEE, pp 302–307 Bernard S, Heutte L, Adam S (2009) On the selection of decision trees in random forests. In: International joint conference on neural networks, 2009. IJCNN 2009. IEEE, pp 302–307
42.
Zurück zum Zitat Li HB, Wang W, Ding HW, Dong J (2010) Trees weighting random forest method for classifying high-dimensional noisy data. In: 2010 IEEE 7th international conference on e-business engineering (ICEBE). IEEE, pp 160–163 Li HB, Wang W, Ding HW, Dong J (2010) Trees weighting random forest method for classifying high-dimensional noisy data. In: 2010 IEEE 7th international conference on e-business engineering (ICEBE). IEEE, pp 160–163
43.
Zurück zum Zitat Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: MLDM. Springer, pp 154–168 Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: MLDM. Springer, pp 154–168
44.
Zurück zum Zitat Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969MathSciNetMATH Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969MathSciNetMATH
45.
Zurück zum Zitat Tax N, Bockting S, Hiemstra D (2015) A cross-benchmark comparison of 87 learning to rank methods. Inf Process Manag 51(6):757–772 Tax N, Bockting S, Hiemstra D (2015) A cross-benchmark comparison of 87 learning to rank methods. Inf Process Manag 51(6):757–772
46.
Zurück zum Zitat Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 133–142 Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 133–142
47.
Zurück zum Zitat Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 391–398 Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 391–398
48.
Zurück zum Zitat Wu Q, Burges CJ, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retr 13(3):254–270 Wu Q, Burges CJ, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retr 13(3):254–270
49.
Zurück zum Zitat Metzler D, Croft WB (2007) Linear feature-based models for information retrieval. Inf Retr 10(3):257–274 Metzler D, Croft WB (2007) Linear feature-based models for information retrieval. Inf Retr 10(3):257–274
50.
Zurück zum Zitat Ibrahim M (2019) Sampling non-relevant documents of training sets for learning-to-rank algorithms. Int J Mach Learn Comput 10(2) (to appear) Ibrahim M (2019) Sampling non-relevant documents of training sets for learning-to-rank algorithms. Int J Mach Learn Comput 10(2) (to appear)
51.
Zurück zum Zitat Ibrahim M (2019) Reducing correlation of random forest-based learning-to-rank algorithms using subsample size. Comput Intell 35(2):1–25MathSciNet Ibrahim M (2019) Reducing correlation of random forest-based learning-to-rank algorithms using subsample size. Comput Intell 35(2):1–25MathSciNet
52.
Zurück zum Zitat Ibrahim M, Carman M (2014) Improving scalability and performance of random forest based learning-to-rank algorithms by aggressive subsampling. In: Proceedings of the 12th Australasian data mining conference, pp 91–99 Ibrahim M, Carman M (2014) Improving scalability and performance of random forest based learning-to-rank algorithms by aggressive subsampling. In: Proceedings of the 12th Australasian data mining conference, pp 91–99
53.
Zurück zum Zitat He B, Macdonald C, Ounis I (2008) Retrieval sensitivity under training using different measures. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 67–74 He B, Macdonald C, Ounis I (2008) Retrieval sensitivity under training using different measures. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 67–74
54.
Zurück zum Zitat Robertson S (2008) On the optimisation of evaluation metrics. In: Keynote, SIGIR 2008 workshop learning to rank for information retrieval (LR4IR) Robertson S (2008) On the optimisation of evaluation metrics. In: Keynote, SIGIR 2008 workshop learning to rank for information retrieval (LR4IR)
55.
Zurück zum Zitat Donmez P, Svore KM, Burges CJ (2009) On the local optimality of lambdarank. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 460–467 Donmez P, Svore KM, Burges CJ (2009) On the local optimality of lambdarank. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 460–467
56.
Zurück zum Zitat Yilmaz E, Robertson S (2010) On the choice of effectiveness measures for learning to rank. Inf Retr 13(3):271–290 Yilmaz E, Robertson S (2010) On the choice of effectiveness measures for learning to rank. Inf Retr 13(3):271–290
57.
Zurück zum Zitat Hanbury A, Lupu M (2013) Toward a model of domain-specific search. In: Proceedings of the 10th conference on open research areas in information retrieval, Le Centre De Hautes Etudes Internationales D’Informatique Documentaire, pp 33–36 Hanbury A, Lupu M (2013) Toward a model of domain-specific search. In: Proceedings of the 10th conference on open research areas in information retrieval, Le Centre De Hautes Etudes Internationales D’Informatique Documentaire, pp 33–36
58.
Zurück zum Zitat Hawking D (2004) Challenges in enterprise search. In: Proceedings of the 15th Australasian database conference, vol 27. Australian Computer Society, Inc., pp 15–24 Hawking D (2004) Challenges in enterprise search. In: Proceedings of the 15th Australasian database conference, vol 27. Australian Computer Society, Inc., pp 15–24
59.
Zurück zum Zitat McCallum A, Nigam K, Rennie J, Seymore K (1999) A machine learning approach to building domain-specific search engines. In: IJCAI, vol 99. Citeseer, pp 662–667 McCallum A, Nigam K, Rennie J, Seymore K (1999) A machine learning approach to building domain-specific search engines. In: IJCAI, vol 99. Citeseer, pp 662–667
60.
Zurück zum Zitat Owens L, Brown M, Poore K, Nicolson N (2008) The forrester wave: enterprise search, q2 2008. For information and knowledge management professionals Owens L, Brown M, Poore K, Nicolson N (2008) The forrester wave: enterprise search, q2 2008. For information and knowledge management professionals
61.
Zurück zum Zitat Yan X, Lau RY, Song D, Li X, Ma J (2011) Toward a semantic granularity model for domain-specific information retrieval. ACM TOIS 29(3):15 Yan X, Lau RY, Song D, Li X, Ma J (2011) Toward a semantic granularity model for domain-specific information retrieval. ACM TOIS 29(3):15
62.
Zurück zum Zitat Szummer M, Yilmaz E (2011) Semi-supervised learning to rank with preference regularization. In: Proceedings of the 20th ACM international conference on information and knowledge management (CIKM). ACM, pp 269–278 Szummer M, Yilmaz E (2011) Semi-supervised learning to rank with preference regularization. In: Proceedings of the 20th ACM international conference on information and knowledge management (CIKM). ACM, pp 269–278
63.
Zurück zum Zitat Tyree S, Weinberger KQ, Agrawal K, Paykin J (2011) Parallel boosted regression trees for web search ranking. In: Proceedings of the 20th international conference on world wide web. ACM, pp 387–396 Tyree S, Weinberger KQ, Agrawal K, Paykin J (2011) Parallel boosted regression trees for web search ranking. In: Proceedings of the 20th international conference on world wide web. ACM, pp 387–396
64.
Zurück zum Zitat Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory. Springer, pp 23–37 Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory. Springer, pp 23–37
65.
Zurück zum Zitat Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232 (English summary)MathSciNetMATH Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232 (English summary)MathSciNetMATH
66.
Zurück zum Zitat Quoc C, Le V (2007) Learning to rank with nonsmooth cost functions. Proc Adv Neural Inf Process Syst 19:193–200 Quoc C, Le V (2007) Learning to rank with nonsmooth cost functions. Proc Adv Neural Inf Process Syst 19:193–200
67.
Zurück zum Zitat Ganjisaffar Y, Caruana R, Lopes CV (2011) Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 85–94 Ganjisaffar Y, Caruana R, Lopes CV (2011) Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 85–94
68.
Zurück zum Zitat Ganjisaffar Y, Debeauvais T, Javanmardi S, Caruana R, Lopes CV (2011) Distributed tuning of machine learning algorithms using mapreduce clusters. In: Proceedings of the third workshop on large scale data mining: theory and applications. ACM, p 2 Ganjisaffar Y, Debeauvais T, Javanmardi S, Caruana R, Lopes CV (2011) Distributed tuning of machine learning algorithms using mapreduce clusters. In: Proceedings of the third workshop on large scale data mining: theory and applications. ACM, p 2
Metadaten
Titel
An empirical comparison of random forest-based and other learning-to-rank algorithms
verfasst von
Muhammad Ibrahim
Publikationsdatum
28.10.2019
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 3/2020
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-019-00856-6

Weitere Artikel der Ausgabe 3/2020

Pattern Analysis and Applications 3/2020 Zur Ausgabe

Premium Partner