nach oben

Pattern Analysis and Applications

Erschienen in:

28.10.2019 | Theoretical advances

An empirical comparison of random forest-based and other learning-to-rank algorithms

verfasst von: Muhammad Ibrahim

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Random forest (RF)-based pointwise learning-to-rank (LtR) algorithms use surrogate loss functions to minimize the ranking error. In spite of their competitive performance to other state-of-the-art LtR algorithms, these algorithms, unlike other frameworks such as boosting and neural network, have not been thoroughly investigated in the literature so far. In the first part of this study, we aim to better understand and improve the RF-based pointwise LtR algorithms. When working with such an algorithm, currently we need to choose a setting from a number of available options such as (1) classification versus regression setting, (2) using absolute relevance judgements versus mapped labels, (3) the number of features using which a split-point for data is chosen, and (4) using weighted versus un-weighted average of the predictions of multiple base learners (i.e., trees). We conduct a thorough study on these four aspects as well as on a pairwise objective function for RF-based rank-learners. Experimental results on several benchmark LtR datasets demonstrate that performance can be significantly improved by exploring these aspects. In the second part of this paper, we, guided by our investigations performed into RF-based rank-learners, conduct extensive comparison between these and state-of-the-art rank-learning algorithms. This comparison reveals some interesting and insightful findings about LtR algorithms including the finding that RF-based LtR algorithms are among the most robust techniques across datasets with diverse properties.

Vorheriger Artikel Encoding features from multi-layer Gabor filtering for visual smoke recognition

Nächster Artikel Classifying imbalanced data using BalanceCascade-based kernelized extreme learning machine

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

www.google.com.

Manning et al. [1] nicely explain these models.

http://research.microsoft.com/en-us/um/beijing/projects/letor/.

http://research.microsoft.com/en-us/projects/mslr/.

To know details of these metrics, the reader can go through Järvelin and Kekäläinen [31], Chapelle et al. [32] and Ibrahim and Murshed [4].

This algorithm is also used by Ibrahim and Carman [15].

While there exist several functions to be used as splitting criterion, Marko Robnik-Sikonja [35] shows that this choice makes insignificant performance variation, if at all.

While in the literature most of the implementations of a tree uses a depth-first (i.e., recursive) exploration of the nodes, the implementation shown here uses a breadth-first exploration mainly because we think that this represents a more systematic way of exploring the nodes. For an entropy-based objective function, the node exploration strategy does not affect the tree structure, i.e., the data partitions [15].

For all of the experiments of this section, for two larger datasets (MSLR-WEB10K and Yahoo), the bold and italic and bold figures denote that the best performance is significant with p value less than 0.01 and 0.05, respectively. For the smaller datasets, an average over five independent runs is reported (and each run is the result of fivefold cross-validation), and the winning value is given in italic font.

Recall that the usual practice has been to use the average relevance of the instances as the score.

Since the properties of HP2004 and NP2004 datasets are similar to that of TD2004, we do not conduct further experiments.

Although the features are not disclosed for the Yahoo dataset, a particular feature index corresponding to BM25 is mentioned.

This is intuitive since navigational queries have very few relevant documents, setting a higher value for k facilitates the base ranker(s) put those few relevant documents into the training set of rank-learning phase.

Ibrahim and Carman [15] report results of only RF-rand, RF-point with classification and RF-list.

We perform a pairwise significance test on the comparatively larger (in terms of number of queries) MQ2007 and MQ2008 datasets. Since for the rest of the datasets the number of queries is small, the significance test results may not be reliable.

As explained earlier, since HP2004 and NP2004 datasets contain navigational queries, MAP may not be considered to be a very effective choice for evaluation of this type of information need [1, Sec. 8.4]. That is why, in Table 12 we chose NDCG@10 for overall comparison.

We, however, did run a pilot experiment on comparison between RF-point and RF-hybrid with \(K \in \{23, 50, 80, 130\}\) (the value 23 is used as \(\sqrt{(}M)\)), and did not observe improvement in performance of RF-hybrid over RF-point for any of the settings. We thus conclude that the K is not likely to play a significant role in the relative performance of these two systems for the Yahoo dataset.

The inventors of the said technique [63] admit that there is a concern of degradation of accuracy, slightly though.

http://research.microsoft.com/en-us/um/beijing/projects/letor//Baselines/RankSVM-Primal.html.

https://people.cs.umass.edu/~vdang/ranklib.html.

https://code.google.com/p/jforests/.

Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, CambridgeMATH

Li H (2011) Learning to rank for information retrieval and natural language processing. Synth Lect Hum Lang Technol 4(1):1–113

Liu TY (2011) Learning to rank for information retrieval. Springer, BerlinMATH

Ibrahim M, Murshed M (2016) From tf-idf to learning-to-rank: an overview. In: Handbook of research on innovations in information retrieval, analysis, and management. IGI Global, USA, pp 62–109

Karatzoglou A, Baltrunas L, Shi Y (2013) Learning to rank for recommender systems. In: Proceedings of the 7th ACM conference on recommender systems, ACM, pp 493–494

Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47(2):227–237

Santos RL, Macdonald C, Ounis I (2013) Learning to rank query suggestions for adhoc and diversity search. Inf Retr 16(4):429–451

Li Z, Tang J, Mei T (2019) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083

Li Z, Tang J, He X (2018) Robust structured nonnegative matrix factorization for image representation. IEEE Trans Neural Netw Learn Syst 29(5):1947–1960MathSciNet

10.

Dang V, Bendersky M, Croft WB (2013) Two-stage learning to rank for information retrieval. In: Advances in information retrieval. Springer, pp 423–434

11.

Macdonald C, Santos RL, Ounis I (2013) The whens and hows of learning to rank for web search. Inf Retr 16(5):584–628

12.

Aslam JA, Kanoulas E, Pavlu V, Savev S, Yilmaz E (2009) Document selection methodologies for efficient and effective learning-to-rank. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 468–475

13.

Pavlu V (2008) Large scale ir evaluation. ProQuest LLC, Ann Arbor

14.

Qin T, Liu TY, Xu J, Li H (2010) Letor: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13(4):346–374

15.

Ibrahim M, Carman M (2016) Comparing pointwise and listwise objective functions for random-forest-based learning-to-rank. ACM TOIS 34(4):20

16.

Breiman L (2001) Random forests. Mach Learn 45(1):5–32MATH

17.

Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York. https://doi.org/10.1007/978-0-387-84858-7 MATH

18.

Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180

19.

Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 161–168

20.

Criminisi A (2011) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7(2–3):81–227MATH

21.

Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181MathSciNetMATH

22.

Biau G (2012) Analysis of a random forests model. J Mach Learn Res 98888:1063–1095MathSciNetMATH

23.

Cui Z, Chen W, He Y, Chen Y (2015) Optimal action extraction for random forests and boosted trees. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 179–188

24.

Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7(1):3

25.

Dong Y, Zhang Y, Yue J, Hu Z (2016) Comparison of random forest, random ferns and support vector machine for eye state classification. Multimed Tools Appl 75(19):11763–11783

26.

Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294–300

27.

Chapelle O, Chang Y (2011) Yahoo! learning to rank challenge overview. J Mach Learn Res Proc Track 14:1–24

28.

Geurts P, Louppe G (2011) Learning to rank with extremely randomized trees. In: JMLR: workshop and conference proceedings, vol 14

29.

Mohan A, Chen Z, Weinberger KQ (2011) Web-search ranking with initialized gradient boosted regression trees. J Mach Learn Res Proc Track 14:77–89

30.

Han X, Lei S (2018) Feature selection and model comparison on microsoft learning-to-rank data sets. arXiv preprint arXiv:1803.05127

31.

Järvelin K, Kekäläinen J (2000) Ir evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 41–48

32.

Chapelle O, Metlzer D, Zhang Y, Grinspan P (2009) Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM). ACM, pp 621–630

33.

Li P, Wu Q, Burges C (2007) McRank: learning to rank using classification and gradient boosting. Adv Neural Inf Process Syst 20:897–904

34.

Cossock D, Zhang T (2006) Subset ranking using regression. Learning Theory, pp 605–619

35.

Robnik-Šikonja M (2004) Improving random forests. In: Machine learning: ECML 2004. Springer, pp 359–370

36.

Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42MATH

37.

Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J Mach Learn Res 15(1):1625–1651MathSciNetMATH

38.

Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH

39.

Zaidi N, Webb G, Carman M, Petitjean F (2015) Deep broad learning—big models for big data. arXiv preprint arXiv:1509.01346

40.

Winham SJ, Freimuth RR, Biernacka JM (2013) A weighted random forests approach to improve predictive performance. Stat Anal Data Min ASA Data Sci J 6(6):496–505MathSciNetMATH

41.

Bernard S, Heutte L, Adam S (2009) On the selection of decision trees in random forests. In: International joint conference on neural networks, 2009. IJCNN 2009. IEEE, pp 302–307

42.

Li HB, Wang W, Ding HW, Dong J (2010) Trees weighting random forest method for classifying high-dimensional noisy data. In: 2010 IEEE 7th international conference on e-business engineering (ICEBE). IEEE, pp 160–163

43.

Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: MLDM. Springer, pp 154–168

44.

Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969MathSciNetMATH

45.

Tax N, Bockting S, Hiemstra D (2015) A cross-benchmark comparison of 87 learning to rank methods. Inf Process Manag 51(6):757–772

46.

Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 133–142

47.

Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 391–398

48.

Wu Q, Burges CJ, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retr 13(3):254–270

49.

Metzler D, Croft WB (2007) Linear feature-based models for information retrieval. Inf Retr 10(3):257–274

50.

Ibrahim M (2019) Sampling non-relevant documents of training sets for learning-to-rank algorithms. Int J Mach Learn Comput 10(2) (to appear)

51.

Ibrahim M (2019) Reducing correlation of random forest-based learning-to-rank algorithms using subsample size. Comput Intell 35(2):1–25MathSciNet

52.

Ibrahim M, Carman M (2014) Improving scalability and performance of random forest based learning-to-rank algorithms by aggressive subsampling. In: Proceedings of the 12th Australasian data mining conference, pp 91–99

53.

He B, Macdonald C, Ounis I (2008) Retrieval sensitivity under training using different measures. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 67–74

54.

Robertson S (2008) On the optimisation of evaluation metrics. In: Keynote, SIGIR 2008 workshop learning to rank for information retrieval (LR4IR)

55.

Donmez P, Svore KM, Burges CJ (2009) On the local optimality of lambdarank. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 460–467

56.

Yilmaz E, Robertson S (2010) On the choice of effectiveness measures for learning to rank. Inf Retr 13(3):271–290

57.

Hanbury A, Lupu M (2013) Toward a model of domain-specific search. In: Proceedings of the 10th conference on open research areas in information retrieval, Le Centre De Hautes Etudes Internationales D’Informatique Documentaire, pp 33–36

58.

Hawking D (2004) Challenges in enterprise search. In: Proceedings of the 15th Australasian database conference, vol 27. Australian Computer Society, Inc., pp 15–24

59.

McCallum A, Nigam K, Rennie J, Seymore K (1999) A machine learning approach to building domain-specific search engines. In: IJCAI, vol 99. Citeseer, pp 662–667

60.

Owens L, Brown M, Poore K, Nicolson N (2008) The forrester wave: enterprise search, q2 2008. For information and knowledge management professionals

61.

Yan X, Lau RY, Song D, Li X, Ma J (2011) Toward a semantic granularity model for domain-specific information retrieval. ACM TOIS 29(3):15

62.

Szummer M, Yilmaz E (2011) Semi-supervised learning to rank with preference regularization. In: Proceedings of the 20th ACM international conference on information and knowledge management (CIKM). ACM, pp 269–278

63.

Tyree S, Weinberger KQ, Agrawal K, Paykin J (2011) Parallel boosted regression trees for web search ranking. In: Proceedings of the 20th international conference on world wide web. ACM, pp 387–396

64.

Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory. Springer, pp 23–37

65.

Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232 (English summary)MathSciNetMATH

66.

Quoc C, Le V (2007) Learning to rank with nonsmooth cost functions. Proc Adv Neural Inf Process Syst 19:193–200

67.

Ganjisaffar Y, Caruana R, Lopes CV (2011) Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 85–94

68.

Ganjisaffar Y, Debeauvais T, Javanmardi S, Caruana R, Lopes CV (2011) Distributed tuning of machine learning algorithms using mapreduce clusters. In: Proceedings of the third workshop on large scale data mining: theory and applications. ACM, p 2

Titel: An empirical comparison of random forest-based and other learning-to-rank algorithms
verfasst von: Muhammad Ibrahim
Publikationsdatum: 28.10.2019
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 3/2020
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-019-00856-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2020

Robust hand gesture recognition system based on a new set of quaternion Tchebichef moment invariants

Separable property-based super-resolution of lousy image data

Segmentation of handwritten words using structured support vector machine

Instance-based entropy fuzzy support vector machine for imbalanced data

Learning discriminative hashing codes for cross-modal retrieval based on multi-view features

Ensemble Adaptation Networks with low-cost unsupervised hyper-parameter search

Premium Partner