nach oben

Pattern Analysis and Applications

Erschienen in:

27.07.2022 | Theoretical Advances

A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data

verfasst von: Nuha Zamzami, Nizar Bouguila

Erschienen in: Pattern Analysis and Applications | Ausgabe 1/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Count data are commonly exploited in machine learning and computer vision applications; however, they often suffer from the well-known curse of dimensionality, which declines the performance of clustering algorithms dramatically. Feature selection is a major technique for handling a large number of features, which most are often redundant and noisy. In this paper, we propose a probabilistic approach for count data based on the concept of feature saliency in the context of mixture-based clustering using the generalized Dirichlet multinomial distribution. The saliency of irrelevant features is reduced toward zero by minimizing the message length, which equates to doing feature and model selection simultaneously. It is proved that the developed approach is effective in identifying both the optimal number of clusters and the most important features, and so enhancing clustering performance significantly, using a range of challenging applications including text and image clustering.

Vorheriger Artikel A multiple classifiers system with roulette-based feature subspace selection for one-vs-one scheme

Nächster Artikel Deep learning for location prediction on noisy trajectories

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

In our experiments, the values for \(M_{min}\) and \(M_{max}\) have been set to 2 and 50, respectively.

Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5(Aug):845–889MathSciNetMATH

Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182MATH

Liu H, Wu X, Zhang S (2011) Feature selection using hierarchical feature clustering. In: Proceedings of the 20th ACM international conference on information and knowledge management, ACM, pp 979–984

Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 333–342

Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef

Kohavi R, Sommerfield D (1995) Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In: KDD, pp 192–197

Wolf L, Shashua A (2005) Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach. J Mach Learn Res 6(Nov):1855–1887MathSciNetMATH

Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 4:491–502

Chuang L-Y, Chang H-W, Tu C-J, Yang C-H (2008) Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 32(1):29–38MATHCrossRef

10.

Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(4):1106–1119CrossRef

11.

Tang J, Liu H (2012) Feature selection with linked data in social media. In: Proceedings of the 2012 SIAM international conference on data mining, SIAM, pp 118–128

12.

Tang J, Liu H (2014) An unsupervised feature selection framework for social media data. IEEE Trans Knowl Data Eng 26(12):2914–2927CrossRef

13.

Liu L, Shao L, Rockett P (2013) Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition. Pattern Recogn 46(7):1810–1818CrossRef

14.

Lin C-H, Chen H-Y, Wu Y-S (2014) Study of image retrieval and classification based on adaptive features using genetic algorithm feature selection. Expert Syst Appl 41(15):6611–6621CrossRef

15.

Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550CrossRef

16.

Zeng Z, Wang X, Zhang J, Wu Q (2016) Semi-supervised feature selection based on local discriminative information. Neurocomputing 173:102–109CrossRef

17.

Chen X, Yuan G, Nie F, Huang JZ (2017) Semi-supervised feature selection via rescaled linear regression. In: IJCAI, vol 2017, pp 1525–1531

18.

Li Z, Tang J (2021) Semi-supervised local feature selection for data classification. Science China Inf Sci 64(9):1–12CrossRef

19.

Law MH, Figueiredo MA, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166CrossRef

20.

Bouguila N (2009) A model-based approach for discrete data clustering and feature weighting using MAP and stochastic complexity. IEEE Trans Knowl Data Eng 21(12):1649–1664CrossRef

21.

Luo M, Nie F, Chang X, Yang Y, Hauptmann AG, Zheng Q (2017) Adaptive unsupervised feature selection with structure regularization. IEEE Trans Neural Netw Learn Syst 29(4):944–956CrossRef

22.

Li Z, Liu J, Zhu X, Liu T, Lu H (2010) Image annotation using multi-correlation probabilistic matrix factorization. In: Proceedings of the 18th ACM international conference on multimedia, ACM, pp 1187–1190

23.

Li Z, Liu J, Yang Y, Zhou X, Lu H (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150CrossRef

24.

Hong X, Li H, Miller P, Zhou J, Li L, Crookes D, Lu Y, Li X, Zhou H (2019) Component-based feature saliency for clustering. IEEE transactions on knowledge and data engineering

25.

Ortega JM, Rheinboldt WC (1970) Iterative solution of nonlinear equations in several variables. vol 30. Siam

26.

Wu TT, Lange K (2010) The MM alternative to EM. Stat Sci 25(4):492–505MathSciNetMATHCrossRef

27.

Dempster AP (1977) Maximum likelihood estimation from incomplete data via the EM algorithm. J R Stat Soc Ser B (Statistical Methodology) 39:1–38MATH

28.

Wallace CS (2005) Statistical and inductive inference by minimum message length. Springer, New YorkMATH

29.

Bouguila N (2008) Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans Knowl Data Eng 20(4):462–474CrossRef

30.

Connor RJ, Mosimann JE (1969) Concepts of independence for proportions with a generalization of the Dirichlet distribution. J Am Stat Assoc 64(325):194–206MathSciNetMATHCrossRef

31.

Madsen RE, Kauchak D, Elkan C (2005) Modeling word burstiness using the Dirichlet distribution. In: Proceedings of the 22nd international conference on machine learning, ACM, pp 545–552

32.

Wong T-T (2009) Alternative prior assumptions for improving the performance of naïve Bayesian classifiers. Data Min Knowl Disc 18(2):183–213CrossRef

33.

Zamzami N, Bouguila N (2018) Consumption behavior prediction using hierarchical Bayesian frameworks. In: 2018 first international conference on artificial intelligence for industries (AI4I), IEEE, pp 31–34

34.

Graham MW, Miller DJ (2006) Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection. IEEE Trans Signal Process 54(4):1289–1303MATHCrossRef

35.

Zhou H, Lange K (2010) MM algorithms for some discrete multivariate distributions. J Comput Graph Stat 19(3):645–665MathSciNetCrossRef

36.

Wu X, Jiang B, Yu K, Miao C, Chen H (2019) Accurate Markov boundary discovery for causal feature selection. IEEE Trans Cybern 50:4983–4996CrossRef

37.

Liu C, Zheng C-T, Wu S, Yu Z, Wong H-S (2018) Multitask feature selection by graph-clustered feature sharing. IEEE Trans Cybern 50:74–86CrossRef

38.

Wu H, Liu T, Xie J (2017) Fine-grained product feature extraction in chinese reviews. In: 2017 international conference on computing intelligence and information system (CIIS), IEEE, pp. 327–331

39.

Marquetti I, Link JV, Lemes ALG, dos Santos Scholz MB, Valderrama P, Bona E (2016) Partial least square with discriminant analysis and near infrared spectroscopy for evaluation of geographic and genotypic origin of arabica coffee. Comput Electr Agric 121:313–319CrossRef

40.

Fan Z, Xu Y, Zuo W, Yang J, Tang J, Lai Z, Zhang D (2014) Modified principal component analysis: An integration of multiple similarity subspace models. IEEE Trans Neural Netw Learn Syst 25(8):1538–1552CrossRef

41.

Zhao H, Wang Z, Nie F (2018) A new formulation of linear discriminant analysis for robust dimensionality reduction. IEEE Trans Knowl Data Eng 31(4):629–640CrossRef

42.

Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78MathSciNetMATHCrossRef

43.

Dash M, Liu H (2000) Feature selection for clustering. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, pp 110–121

44.

Wang Y, Feng L (2019) A new hybrid feature selection based on multi-filter weights and multi-feature weights. Appl Intell 49:1–25CrossRef

45.

Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning. Springer, New YorkMATH

46.

Liu H, Motoda H (2007) Computational methods of feature selection. CRC Press, Boca RatonMATHCrossRef

47.

Dash M, Choi K, Scheuermann P, Liu H (2002) Feature selection for clustering-a filter solution. In: Proceedings of 2002 IEEE international conference on data mining, IEEE, pp 115–122

48.

Ambusaidi MA, He X, Nanda P, Tan Z (2016) Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput 65(10):2986–2998MathSciNetMATHCrossRef

49.

Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324MATHCrossRef

50.

Kabir MM, Islam MM, Murase K (2010) A new wrapper feature selection approach using neural network. Neurocomputing 73(16–18):3273–3283CrossRef

51.

Apolloni J, Leguizamón G, Alba E (2016) Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl Soft Comput 38:922–932CrossRef

52.

Moradkhani M, Amiri A, Javaherian M, Safari H (2015) A hybrid algorithm for feature subset selection in high-dimensional datasets using FICA and IWSSr algorithm. Appl Soft Comput 35:123–135CrossRef

53.

Tang B, Kay S, He H (2016) Toward optimal feature selection in naive Bayes for text categorization. IEEE Trans Knowl Data Eng 28(9):2508–2521CrossRef

54.

Bouillot F, Hai PN, Béchet N, Bringay S, Ienco D, Matwin S, Poncelet P, Roche M, Teisseire M (2012) How to extract relevant knowledge from tweets? In: International workshop on information search, integration, and personalization, Springer, pp 111–120

55.

Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and naive bayes. In: ICML, vol 99, pp 258–267

56.

Caropreso MF, Matwin S, Sebastiani F (2001) A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. Text Databases Doc Manage Theory Pract 5478:78–102

57.

Li Y, Luo C, Chung SM (2008) Text clustering with feature selection by using statistical data. IEEE Trans Knowl Data Eng 20(5):641–652CrossRef

58.

Galavotti L, Sebastiani F, Simi M (2000) Experiments on the use of feature selection and negative evidence in automated text categorization. In: International conference on theory and practice of digital libraries, Springer, pp 59–68

59.

Talavera L (1999) Feature selection as a preprocessing step for hierarchical clustering. In: ICML, vol 99, pp 389–397 (Citeseer)

60.

He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems; 18; pp 507–514

61.

Dasgupta A, Drineas P, Harb B, Josifovski V, Mahoney MW (2007) Feature selection methods for text classification. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 230–239

62.

Sharma KK, Seal A (2020) Clustering analysis using an adaptive fused distance. Eng Appl Artif Intell 96:103928CrossRef

63.

Sharma KK, Seal A (2021) Spectral embedded generalized mean based k-nearest neighbors clustering with s-distance. Expert Syst Appl 169:114326CrossRef

64.

Sharma KK, Seal A, Herrera-Viedma E, Krejcar O (2021) An enhanced spectral clustering algorithm with s-distance. Symmetry 13(4):596CrossRef

65.

Adams S, Beling PA (2017) A survey of feature selection methods for Gaussian mixture models and hidden Markov models. Artif Intell Rev 52:1–41CrossRef

66.

Boutemedjet S, Bouguila N, Ziou D (2008) A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Trans Pattern Anal Mach Intell 31(8):1429–1443CrossRef

67.

Fan W, Bouguila N, Ziou D (2012) Unsupervised hybrid feature extraction selection for high-dimensional non-gaussian data clustering with variational inference. IEEE Trans Knowl Data Eng 25(7):1670–1685CrossRef

68.

Vaithyanathan S, Dom B (2000) Generalized model selection for unsupervised learning in high dimensions. Adv Neural Inf Process Syst 12:970–976

69.

Wang X, Kabán A (2006) Model-based estimation of word saliency in text. In: International conference on discovery science, Springer, pp 279–290

70.

Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. In: Proceedings of the AAAI conference on artificial intelligence, vol 26

71.

Li Z, Tang J (2015) Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Trans Image Process 24(12):5343–5355MathSciNetMATHCrossRef

72.

Cheung Y-m, Zeng H (2007) A maximum weighted likelihood approach to simultaneous model selection and feature weighting in gaussian mixture. In: International conference on artificial neural networks, Springer, pp 78–87

73.

Tsai C-Y, Chiu C-C (2008) Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm. Comput Stat Data Anal 52(10):4658–4672MathSciNetMATHCrossRef

74.

Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 3:381–396CrossRef

75.

Wallace CS, Dowe DL (2000) MML clustering of multi-state, poisson, von mises circular and Gaussian distributions. Stat Comput 10(1):73–83CrossRef

76.

Mosimann JE (1962) On the compound multinomial distribution, the multivariate \(\beta\)-distribution, and correlations among proportions. Biometrika 49(1/2):65–82MathSciNetMATHCrossRef

77.

Wong T-T (2014) Generalized dirichlet priors for naïve bayesian classifiers with multinomial models in document classification. Data Min Knowl Disc 28(1):123–144MATHCrossRef

78.

Caballero KL, Barajas J, Akella R (2012) The generalized dirichlet distribution in enhanced topic detection. In: Proceedings of the 21st ACM international conference on information and knowledge management, ACM, pp 773–782

79.

Katz SM (1996) Distribution of content words and phrases in text and language modelling. Nat Lang Eng 2(1):15–59CrossRef

80.

Puig P, Valero J (2006) Count data distributions: some characterizations with applications. J Am Stat Assoc 101(473):332–340MathSciNetMATHCrossRef

81.

Haldane JB (1941) The fitting of binomial distributions. Ann Eugen 11(1):179–181MathSciNetMATHCrossRef

82.

Bailey NT (1957) The mathematical theory of epidemics. Technical report

83.

Griffiths D (1973) Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics, pp 637–648

84.

Pudil P, Novovičová J, Choakjarernwanit N, Kittler J (1995) Feature selection based on the approximation of class densities by finite mixtures of special type. Pattern Recogn 28(9):1389–1398CrossRef

85.

Nguyen HD (2017) An introduction to Majorization-Minimization algorithms for machine learning and statistical estimation. Wiley Interdiscip Rev Data Min Knowl Discov 7(2):1198CrossRef

86.

Tian G-L, Liu Y, Tang M-L, Li T (2019) A novel MM algorithm and the mode-sharing method in bayesian computation for the analysis of general incomplete categorical data. Comput Stat Data Anal 140:122–143MathSciNetMATHCrossRef

87.

Elkan C (2006) Clustering documents with an exponential-family approximation of the dirichlet compound multinomial distribution. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 289–296

88.

Baxter RA, Oliver JJ (2000) Finding overlapping components with mml. Stat Comput 10(1):5–16CrossRef

89.

Bernardo JM, Smith AF (2001) Bayesian Theory. IOP Publishing, Bristol

90.

Celeux G, Chrétien S, Forbes F, Mkhadri A (2001) A component-wise em algorithm for mixtures. J Comput Graph Stat 10(4):697–712MathSciNetCrossRef

91.

Novovičová J, Malik A (2003) Application of multinomial mixture model to text classification. In: Iberian conference on pattern recognition and image analysis, Springer, pp 646–653

92.

Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Eleventh international AAAI conference on web and social media

93.

Ortiz EG, Becker BC (2014) Face recognition for web-scale datasets. Comput Vis Image Underst 118:153–170CrossRef

94.

Kumar N, Berg A, Belhumeur PN, Nayar S (2011) Describable visual attributes for face verification and image search. IEEE Trans Pattern Anal Mach Intell 33(10):1962–1977CrossRef

95.

Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition, Erik Learned-Miller and Andras Ferencz and Frédéric Jurie, Oct 2008, Marseille, France. ffinria-00321923

96.

Zhang Z, Song Y, Qi H (2017) Age progression/regression by conditional adversarial autoencoder. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5810–5818

97.

Ricanek K, Tesafaye T (2006) Morph: A longitudinal image database of normal adult age-progression. In: 7th international conference on automatic face and gesture recognition (FGR06), IEEE, pp 341–345

98.

Guo G, Zhang C (2014) A study on cross-population age estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4257–4263

99.

He Z, Li X, Zhang Z, Wu F, Geng X, Zhang Y, Yang M-H, Zhuang Y (2017) Data-dependent label distribution learning for age estimation. IEEE Trans Image Process 26(8):3846–3858MathSciNetCrossRef

Titel: A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data
verfasst von: Nuha Zamzami
Nizar Bouguila
Publikationsdatum: 27.07.2022
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 1/2023
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-022-01094-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2023

Edge detection and characterization of digitized images

Interval regression model adequacy checking and its application to estimate school dropout in Brazilian municipality educational scenario

A multiple classifiers system with roulette-based feature subspace selection for one-vs-one scheme

Threshold prediction for detecting rare positive samples using a meta-learner

Novel color image steganalysis method based on RGB channel empirical modes to expose stego images with diverse payloads

Separable robust data hiding in encrypted image based on continuous quadrant tree and 2Bin N-nary

Premium Partner