nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

An Extended Random-Sets Model for Fusion-Based Text Feature Selection

verfasst von : Abdullah Semran Alharbi, Yuefeng Li, Yue Xu

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Selecting features that represent a specific corpus is important for the success of many machine learning and text mining applications. In information retrieval (IR), fusion-based techniques have shown remarkable performance compared to traditional models. However, in text feature selection (FS), popular models do not consider the fusion of the taxonomic features of the corpus. This research proposed an innovative and effective extended random-sets model for fusion-based FS. The model fused scores of different hierarchal features to accurately weight the representative words based on their appearance across the documents in the corpus and in several latent topics. The model was evaluated for information filtering (IF) using TREC topics and the standard RCV1 dataset. The results showed that the proposed model significantly outperformed eleven state-of-the-art baseline models in six evaluation metrics.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel An Interaction-Enhanced Feature Selection Algorithm

Nächstes Kapitel Attribute Reduction Algorithm Based on Improved Information Gain Rate and Ant Colony Optimization

Words, keywords and terms are used interchangeably in this paper.

SIF stands for Selection of Informative Features, and the ‘2’ refers to the utilisation of both local and global statistics.

http://trec.nist.gov/.

https://www.lemurproject.org/.

Albathan, M., Li, Y., Xu, Y.: Using extended random set to find specific patterns. In: WI 2014, vol. 2, pp. 30–37. IEEE (2014)

Algarni, A., Li, Y.: Mining specific features for acquiring user information needs. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013 Part I. LNCS (LNAI), vol. 7818, pp. 532–543. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37453-1_44CrossRef

Alharbi, A.S., Li, Y., Xu, Y.: Integrating LDA with clustering technique for relevance feature selection. In: Peng, W., Alahakoon, D., Li, X. (eds.) AI 2017. LNCS (LNAI), vol. 10400, pp. 274–286. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63004-5_22CrossRef

Anava, Y., Shtok, A., Kurland, O., Rabinovich, E.: A probabilistic fusion framework. In: CIKM 2016, pp. 1463–1472. ACM (2016)

Bashar, M.A., Li, Y.: Random set to interpret topic models in terms of ontology concepts. In: Peng, W., Alahakoon, D., Li, X. (eds.) AI 2017. LNCS (LNAI), vol. 10400, pp. 237–249. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63004-5_19CrossRef

Bashar, M.A., Li, Y., Gao, Y.: A framework for automatic personalised ontology learning. In: WI 2016, pp. 105–112. IEEE (2016)

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH

Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: SIGIR 2000, pp. 33–40. ACM (2000)

Croft, W.B.: Combining approaches to information retrieval. In: Croft, W.B. (ed.) Advances in Information Retrieval. INRE, vol. 7, pp. 1–36. Springer, Boston (2002). https://doi.org/10.1007/0-306-47019-5_1CrossRef

10.

Gao, Y., Xu, Y., Li, Y.: Pattern-based topic models for information filtering. In: ICDM 2013, pp. 921–928. IEEE (2013)

11.

Gao, Y., Xu, Y., Li, Y.: Topical pattern based document modelling and relevance ranking. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014 Part I. LNCS, vol. 8786, pp. 186–201. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11749-2_15CrossRef

12.

Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. IEEE TKDE 27(6), 1629–1642 (2015)

13.

Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)CrossRef

14.

Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002, pp. 133–142. ACM (2002)

15.

Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE TPAMI 31(4), 721–735 (2009)CrossRef

16.

Li, Y.: Extended random sets for knowledge discovery in information systems. In: Wang, G., Liu, Q., Yao, Y., Skowron, A. (eds.) RSFDGrC 2003. LNCS (LNAI), vol. 2639, pp. 524–532. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-39205-X_87CrossRef

17.

Li, Y., Algarni, A., Albathan, M., Shen, Y., Bijaksana, M.A.: Relevance feature discovery for text mining. IEEE TKDE 27(6), 1656–1669 (2015)

18.

Li, Y., Algarni, A., Zhong, N.: Mining positive and negative patterns for relevance feature discovery. In: KDD 2010, pp. 753–762. ACM (2010)

19.

Li, Y., Li, T., Liu, H.: Recent advances in feature selection and its applications. Knowl. Inf. Syst. 53, 1–27 (2017)CrossRef

20.

Macdonald, C., Ounis, I.: Global statistics in proximity weighting models. In: Web N-gram Workshop. p. 30. Citeseer (2010)

21.

Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRef

22.

Maxwell, K.T., Croft, W.B.: Compact query term selection using topically related text. In: SIGIR 2013, pp. 583–592. ACM (2013)

23.

McCallum, A.K.: Mallet: a machine learning for language toolkit (2002)

24.

Molchanov, I.: Theory of Random Sets. Springer, Heidelberg (2006). https://doi.org/10.1007/1-84628-150-4CrossRef

25.

Nguyen, H.T.: Random sets. Scholarpedia 3(7), 3383 (2008)CrossRef

26.

Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers Inc., Breda (2009)

27.

Robertson, S.E., Soboroff, I.: The TREC 2002 filtering track report. In: TREC, vol. 2002, p. 5 (2002)

28.

Steyvers, M., Griffiths, T.: Probabilistic topic models. Handb. Latent Semant. Anal. 427(7), 424–440 (2007)

29.

Wang, X., McCallum, A., Wei, X.: Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: ICDM 2007, pp. 697–702. IEEE (2007)

30.

Wu, S.: Data Fusion in Information Retrieval. Springer, Heidelberg (2012)CrossRef

31.

Zhang, S., Balog, K.: Design patterns for fusion-based object retrieval. In: Jose, J.M., Hauff, C., Altıngovde, I.S., Song, D., Albakour, D., Watt, S., Tait, J. (eds.) ECIR 2017. LNCS, vol. 10193, pp. 684–690. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56608-5_66CrossRef

32.

Zhong, N., Li, Y., Wu, S.T.: Effective pattern discovery for text mining. IEEE TKDE 24(1), 30–44 (2012)

Titel: An Extended Random-Sets Model for Fusion-Based Text Feature Selection
verfasst von: Abdullah Semran Alharbi
Yuefeng Li
Yue Xu
Verlag: Springer International Publishing
Buch: Advances in Knowledge Discovery and Data Mining
Print ISBN: 978-3-319-93039-8

Electronic ISBN: 978-3-319-93040-4

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-93040-4_11

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"