Top

Knowledge and Information Systems

Published in:

01-06-2015 | Regular Paper

SFP-Rank: significant frequent pattern analysis for effective ranking

Authors: Yuanfeng Song, Wilfred Ng, Kenneth Wai-Ting Leung, Qiong Fang

Published in: Knowledge and Information Systems | Issue 3/2015

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Ranking documents in terms of their relevance to a given query is fundamental to many real-life applications such as information retrieval and recommendation systems. Extensive study in these application domains has given rise to the development of many efficient ranking models. While most existing research focuses on developing learning to rank (LTR) models, the quality of the training features, which plays an important role in ranking performance, has not been fully studied. Thus, we propose a new approach that discovers effective features for the LTR problem. In this paper, we present a theoretical analysis on which frequent patterns are potentially effective for improving the performance of LTR and then propose an efficient method that selects frequent patterns for LTR. First, we define a new criterion, namely feature significance (or simply significance). Specifically, we use each feature’s value to rank the training instances and define the ranking effectiveness in terms of a performance measure as the significance of the feature. We show that the significance of an infrequent pattern is limited by using formal connection between pattern support and its significance. Then, we propose a methodology that sets the support value when performing frequent pattern mining. Finally, since frequent patterns are not equally effective for LTR, we further provide a coverage-based significant pattern generation algorithm to discover effective patterns and propose a new ranking approach called Significant Frequent Pattern-based Ranking (SFP-Rank), in which the ranking model is built upon the original features as well as the significant frequent patterns. Our experiments confirm that, by incorporating significant frequent patterns to train the ranking model, the performance of the ranking model can be substantially improved.

previous article PGLCM: efficient parallel mining of closed frequent gradual itemsets

next article Stabilized sparse ordinal regression for medical risk stratification

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

http://learningtorankchallenge.yahoo.com/datasets.php.

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB ’94, pp 487–499

AOL Dataset (n.d.), http://zola.di.unipi.it/smalltext/datasets.html

Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, Reading, MA

Batal I, Hauskrecht M (2010) Constructing classification features using minimal predictive patterns. In: Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM ’10. ACM, New York, NY, USA, pp 869–878

Burges C, Ragno R, Le Q (2006) Learning to rank with nonsmooth cost functions. In: NIPS ’06, pp 193–200

Burges C, Shaked T, Renshaw E et al (2005) Learning to rank using gradient descent. In: ICML ’05, pp 89–96

Cao H, Jiang D, Pei J et al (2008) Context-aware query suggestion by mining click-through and session data. In: KDD ’08, pp 875–883

Cao Y, Xu J, Liu T-Y et al (2006) Adapting ranking svm to document retrieval. In: SIGIR ’06, pp 186–193

Cao Z, Qin T, Liu T-Y et al (2007) Learning to rank: from pairwise approach to listwise approach. In: ICML ’07, pp 129–136

10.

Cheng H, Yan X, Han J et al (2007) Discriminative frequent pattern analysis for effective classification. In: ICDE ’07, pp 169–178

11.

Cheng H, Yan X, Han J et al (2008) Direct discriminative pattern mining for effective classification. In: ICDE ’08, pp. 169–178

12.

Cossock D, Zhang T (2006) Subset ranking using regression. In: Learning theory, volume 4005 of LNCS’06, pp 605–619

13.

Fagin R, Kumar R, Sivakumar D (2003) Comparing top k lists. In: SODA ’03, pp 28–36

14.

Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: UAI ’93, pp 1022–1027

15.

Freund Y, Iyer R, Schapire RE et al (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969

16.

Geng X, Liu T-Y, Qin T et al (2007) Feature selection for ranking. In: SIGIR ’07, pp 407–414

17.

Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: FIMI’03

18.

Han J, Cheng H, Xin D et al (2007) Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1):55–86CrossRefMathSciNet

19.

Hong L, Bekkerman R, Adler J et al (2012) Learning to rank social update streams. In: SIGIR ’12, pp 651–660

20.

Jansen BJ, Spink A, Bateman J et al (1998) Real life information retrieval: a study of user queries on the web. SIGIR Forum 32(1):5–17CrossRef

21.

Jiang D, Leung KW-T, Ng W (2011) Context-aware search personalization with concept preference. In: CIKM ’11, pp 563–572

22.

Joachims T (2006) Training linear svms in linear time. In: KDD ’06, pp 217–226

23.

Karimzadehgan M, Li W, Zhang R et al (2011) A stochastic learning-to-rank algorithm and its application to contextual advertising. In: WWW ’11, pp 377–386

24.

Li P, Burges CJC, Wu Q (2007) Mcrank: learning to rank using multiple classification and gradient boosting. In: NIPS ’07, pp 845–852

25.

Li W, Han J, Pei J (2001) Cmar: Accurate and efficient classification based on multiple class-association rules. In: ICDM ’01, vol 0, pp 369–376

26.

Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: KDD ’98, pp 80–86

27.

Nallapati R (2004) Discriminative models for information retrieval. In: SIGIR ’04, pp 64–71

28.

Qin T, Liu T, Tsai M et al (2006) Learning to search web pages with query-level loss functions. Technical report, Microsoft Research

29.

Qin T, Liu T, Xu J et al (2010) Letor: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13:346–374CrossRef

30.

Qin T, Zhang X-D, Wang D-S et al (2007) Ranking with multiple hyperplanes. In: SIGIR ’07, pp 279–286

31.

Sculley D (2010) Combined regression and ranking. In: KDD ’10. ACM, New York, NY, USA, pp 979–988

32.

Song Y, Leung K, Fang Q et al (2013) Fp-rank: an effective ranking approach based on frequent pattern analysis. In: DASFAA ’13

33.

Tan J, Bu Y, Yang B (2009) An efficient close frequent pattern mining algorithm. In: ICICTA ’09, vol 1, pp 528–531

34.

Thomas Fasciano RS, Shin MC (2012) Learning to rank biological motion trajectories. Image Vis Comput 31(6–7):502–510

35.

Tong Y, Chen L, Cheng Y et al (2012) Mining frequent itemsets over uncertain databases. PVLDB’12 5(11):1650–1661

36.

Tong Y, Chen L, Ding B (2012) Discovering threshold-based frequent closed itemsets over probabilistic data. In: ICDE ’12, pp 270–281

37.

Tong Y, Chen L, Yu PS (2012) Ufimt: an uncertain frequent itemset mining toolbox. In: KDD ’12, pp 1508–1511

38.

Tsai M-F, Liu T-Y, Qin T et al (2007) Frank: a ranking method with fidelity loss. In: SIGIR ’07, pp 383–390

39.

Valizadegan H, Jin R, Zhang R et al (2009) Learning to rank by optimizing ndcg measure. In: NIPS ’09

40.

Veloso AA, Almeida HM, Gonçalves MA et al (2008) Learning to rank at query-time using association rules. In: SIGIR ’08, pp 267–274

41.

Verberne S, van Halteren H, Theijssen D et al (2011) Learning to rank for why-question answering. Inf Retr 14:107–132CrossRef

42.

Volkovs MN, Zemel RS (2009) Boltzrank: learning to maximize expected ranking gain. In: ICML ’09, pp 1089–1096

43.

Wang J, Karypis G (2006) On mining instance-centric classification rules. IEEE Trans. Knowl. Data Eng. 18:1497–1511CrossRef

44.

Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: SIGIR ’07, pp 391–398

45.

Yin X, Han J (2003) Cpar: classification based on predictive association rules. In: SDM’03

46.

Yue Y, Finley T, Radlinski F et al (2007) A support vector method for optimizing average precision. In: SIGIR’07, pp 271–278

Title: SFP-Rank: significant frequent pattern analysis for effective ranking
Authors: Yuanfeng Song
Wilfred Ng
Kenneth Wai-Ting Leung
Qiong Fang
Publication date: 01-06-2015
Publisher: Springer London
Published in: Knowledge and Information Systems / Issue 3/2015
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-014-0738-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 3/2015

Stabilized sparse ordinal regression for medical risk stratification

A modified adaptive differential evolution algorithm for color image segmentation

A new ant colony optimization-based algorithm for range query answering problem in relational schema-based P2P database systems

Scaling cut criterion-based discriminant analysis for supervised dimension reduction

Improving contextual advertising matching by using Wikipedia thesaurus knowledge

Online data-driven anomaly detection in autonomous robots

Premium Partner