Skip to main content
Top

2018 | OriginalPaper | Chapter

Random-Sets for Dealing with Uncertainties in Relevance Feature

Authors : Abdullah Semran Alharbi, Md Abul Bashar, Yuefeng Li

Published in: AI 2018: Advances in Artificial Intelligence

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most relevance discovery models only consider document-level evidence, which may introduce uncertainties to relevance features. Research in information retrieval shows that adopting passage-level (i.e. paragraph-level) evidence can improve the performance of different models in various retrieval tasks. This paper proposes an innovative and effective relevance method based on paragraph evidence to reduce uncertainties in the relevance features discovered by existing models. The method exploits latent topics in the relevance feedback collection to estimate the implicit paragraph relevance. It uses random sets to effectively model the intricate relationships between paragraphs, topics and features to deal with the associated uncertainties. Experiments are conducted using the standard RCV1 dataset, its TREC filtering collections and six popular performance measures. The results confirm that the proposed Uncertainty Reduction (UR) method can significantly enhance the performance of 12 models for relevance feature selection.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Albathan, M., Li, Y., Xu, Y.: Using extended random set to find specific patterns. In: WI 2014, vol. 2, pp. 30–37. IEEE (2014) Albathan, M., Li, Y., Xu, Y.: Using extended random set to find specific patterns. In: WI 2014, vol. 2, pp. 30–37. IEEE (2014)
2.
go back to reference Alharbi, A.S., Li, Y., Xu, Y.: Enhancing topical word semantic for relevance feature selection. In: IJCAI-SML 2017, pp. 27–33. IJCAI (2017) Alharbi, A.S., Li, Y., Xu, Y.: Enhancing topical word semantic for relevance feature selection. In: IJCAI-SML 2017, pp. 27–33. IJCAI (2017)
4.
go back to reference Alharbi, A.S., Li, Y., Xu, Y.: Topical term weighting based on extended random sets for relevance feature selection. In: WI 2017, pp. 654–661. ACM (2017) Alharbi, A.S., Li, Y., Xu, Y.: Topical term weighting based on extended random sets for relevance feature selection. In: WI 2017, pp. 654–661. ACM (2017)
8.
go back to reference Blei, D., Carin, L., Dunson, D.: Probabilistic topic models. IEEE SPM 27(6), 55–65 (2010) Blei, D., Carin, L., Dunson, D.: Probabilistic topic models. IEEE SPM 27(6), 55–65 (2010)
9.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)MATH
10.
go back to reference Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: SIGIR 2000, pp. 33–40. ACM (2000) Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: SIGIR 2000, pp. 33–40. ACM (2000)
12.
go back to reference Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. IEEE TKDE 27(6), 1629–1642 (2015) Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. IEEE TKDE 27(6), 1629–1642 (2015)
13.
go back to reference Goutsias, J., Mahler, R.P., Nguyen, H.T.: Random Sets: Theory and Applications, vol. 97. Springer, Heidelberg (2012) Goutsias, J., Mahler, R.P., Nguyen, H.T.: Random Sets: Theory and Applications, vol. 97. Springer, Heidelberg (2012)
14.
go back to reference Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)CrossRef Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)CrossRef
15.
go back to reference Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)CrossRef Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)CrossRef
16.
go back to reference Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002, pp. 133–142. ACM (2002) Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002, pp. 133–142. ACM (2002)
17.
go back to reference Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: ACM SIGIR Forum, vol. 31, pp. 178–185. ACM (1997) Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: ACM SIGIR Forum, vol. 31, pp. 178–185. ACM (1997)
18.
go back to reference Kaszkiel, M., Zobel, J.: Effective ranking with arbitrary passages. JAIST 52(4), 344–364 (2001) Kaszkiel, M., Zobel, J.: Effective ranking with arbitrary passages. JAIST 52(4), 344–364 (2001)
19.
go back to reference Kruse, R., Schwecke, E., Heinsohn, J.: Uncertainty and Vagueness in Knowledge Based Systems: Numerical Methods. Springer, Heidelberg (2012)MATH Kruse, R., Schwecke, E., Heinsohn, J.: Uncertainty and Vagueness in Knowledge Based Systems: Numerical Methods. Springer, Heidelberg (2012)MATH
20.
go back to reference Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE TPAMI 31(4), 721–735 (2009)CrossRef Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE TPAMI 31(4), 721–735 (2009)CrossRef
21.
go back to reference Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. JMLR 5(Apr), 361–397 (2004) Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. JMLR 5(Apr), 361–397 (2004)
23.
go back to reference Li, Y., Algarni, A., Albathan, M., Shen, Y., Bijaksana, M.A.: Relevance feature discovery for text mining. IEEE TKDE 27(6), 1656–1669 (2015) Li, Y., Algarni, A., Albathan, M., Shen, Y., Bijaksana, M.A.: Relevance feature discovery for text mining. IEEE TKDE 27(6), 1656–1669 (2015)
24.
go back to reference Li, Y., Algarni, A., Zhong, N.: Mining positive and negative patterns for relevance feature discovery. In: KDD 2010, pp. 753–762. ACM (2010) Li, Y., Algarni, A., Zhong, N.: Mining positive and negative patterns for relevance feature discovery. In: KDD 2010, pp. 753–762. ACM (2010)
25.
go back to reference Liu, X., Croft, W.B.: Passage retrieval based on language models. In: CIKM 2002, pp. 375–382. ACM (2002) Liu, X., Croft, W.B.: Passage retrieval based on language models. In: CIKM 2002, pp. 375–382. ACM (2002)
26.
go back to reference Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRef Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRef
29.
go back to reference Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRef Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRef
30.
go back to reference Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers Inc., Hanove (2009) Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers Inc., Hanove (2009)
31.
go back to reference Robertson, S.E., Soboroff, I.: The TREC 2002 filtering track report. In: TREC, vol. 2002, p. 5 (2002) Robertson, S.E., Soboroff, I.: The TREC 2002 filtering track report. In: TREC, vol. 2002, p. 5 (2002)
32.
go back to reference Rocchio, J.J.: Relevance feedback in information retrieval. In: The Smart Retrieval System (1971) Rocchio, J.J.: Relevance feedback in information retrieval. In: The Smart Retrieval System (1971)
33.
go back to reference Scott, S., Matwin, S.: Feature engineering for text classification. In: ICML, vol. 99, pp. 379–388. Citeseer (1999) Scott, S., Matwin, S.: Feature engineering for text classification. In: ICML, vol. 99, pp. 379–388. Citeseer (1999)
34.
go back to reference Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE TKDE 25(1), 1–14 (2013) Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE TKDE 25(1), 1–14 (2013)
35.
go back to reference Tao, X., Li, Y., Zhong, N.: A personalized ontology model for web information gathering. IEEE TKDE 23(4), 496–511 (2011) Tao, X., Li, Y., Zhong, N.: A personalized ontology model for web information gathering. IEEE TKDE 23(4), 496–511 (2011)
36.
go back to reference Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 267–288 (1996) Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 267–288 (1996)
37.
go back to reference Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE TKDE 25(3), 619–632 (2013) Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE TKDE 25(3), 619–632 (2013)
38.
go back to reference Zhong, N., Li, Y., Wu, S.T.: Effective pattern discovery for text mining. IEEE TKDE 24(1), 30–44 (2012) Zhong, N., Li, Y., Wu, S.T.: Effective pattern discovery for text mining. IEEE TKDE 24(1), 30–44 (2012)
Metadata
Title
Random-Sets for Dealing with Uncertainties in Relevance Feature
Authors
Abdullah Semran Alharbi
Md Abul Bashar
Yuefeng Li
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-03991-2_59

Premium Partner