Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 2/2014

01.04.2014 | Original Article

Sparse group LASSO based uncertain feature selection

verfasst von: Zongxia Xie, Yong Xu

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Uncertain data management and mining is becoming a hot topic in recent years. However, little attention has been paid to uncertain feature selection so far. In this paper, we introduce the sparse group least absolution shrinkage and selection operator (LASSO) technique to construct a feature selection algorithm for uncertain data. Each uncertain feature is represented with a probability density function. We take each feature as a group of values. Through analysis of the current four sparse feature selection methods, LASSO, elastic net, group LASSO and sparse group LASSO, the sparse group LASSO is introduced to select feature selection from uncertain data. The proposed algorithm can select not only the features between groups, but also the sub-features in groups. As the trained weights of feature groups are sparse, the groups of features with weight zero are removed. Experiments on nine UCI datasets show that feature selection for uncertain data can reduce the number of features and sub-features at the same time. Moreover it can produce comparable accuracy with all features.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Aggarwal C, Yu P (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623CrossRef Aggarwal C, Yu P (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623CrossRef
3.
Zurück zum Zitat Bernecker T, Kriegel H, Renz M, Verhein F, Zuefle A (2009) Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp. 119–128 Bernecker T, Kriegel H, Renz M, Verhein F, Zuefle A (2009) Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp. 119–128
4.
Zurück zum Zitat Bernecker T, Kriegel H, Renz M, Verhein F, Züfle A (2012) Probabilistic frequent pattern growth for itemset mining in uncertain databases. In: Scientific and Statistical Database Management. Springer, Berlin, pp. 38–55 Bernecker T, Kriegel H, Renz M, Verhein F, Züfle A (2012) Probabilistic frequent pattern growth for itemset mining in uncertain databases. In: Scientific and Statistical Database Management. Springer, Berlin, pp. 38–55
5.
Zurück zum Zitat Bi J, Zhang T (2004) Support vector classification with input data uncertainty. Adv Neural Info Process Syst 17(5):161–168 Bi J, Zhang T (2004) Support vector classification with input data uncertainty. Adv Neural Info Process Syst 17(5):161–168
6.
Zurück zum Zitat Chatterjee S, Steinhaeuser K, Banerjee A, Chatterjee S, Ganguly A (2012) Sparse group lasso: consistency and climate applications. SDM Chatterjee S, Steinhaeuser K, Banerjee A, Chatterjee S, Ganguly A (2012) Sparse group lasso: consistency and climate applications. SDM
7.
Zurück zum Zitat Cheng R, Chau M, Garofalakis M, Yu J (2010) Guest editors’ introduction: special section on mining large uncertain and probabilistic databases. IEEE Trans Knowl Data Eng 22(9):1201–1202CrossRef Cheng R, Chau M, Garofalakis M, Yu J (2010) Guest editors’ introduction: special section on mining large uncertain and probabilistic databases. IEEE Trans Knowl Data Eng 22(9):1201–1202CrossRef
8.
Zurück zum Zitat Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge university press, Cambridge Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge university press, Cambridge
9.
Zurück zum Zitat Doquire G, Verleysen M (2011) Feature selection with mutual information for uncertain data. Data Warehous Knowl Discov pp 330–341 Doquire G, Verleysen M (2011) Feature selection with mutual information for uncertain data. Data Warehous Knowl Discov pp 330–341
10.
Zurück zum Zitat Fletcher A, Rangan S, Goyal V (2009) Necessary and sufficient conditions for sparsity pattern recovery. IEEE Trans Info Theory 55(12):5758–5772CrossRefMathSciNet Fletcher A, Rangan S, Goyal V (2009) Necessary and sufficient conditions for sparsity pattern recovery. IEEE Trans Info Theory 55(12):5758–5772CrossRefMathSciNet
11.
Zurück zum Zitat Friedman J, Hastie T, Tibshirani R (2010) A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 Friedman J, Hastie T, Tibshirani R (2010) A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736
12.
Zurück zum Zitat Guyon I., Elisseeff A. (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH Guyon I., Elisseeff A. (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH
13.
Zurück zum Zitat Hu Q, Pan W, An S, Ma P, Wei J (2010) An efficient gene selection technique for cancer recognition based on neighborhood mutual information. Int J Mach Learn Cybern 1(1):63–74CrossRef Hu Q, Pan W, An S, Ma P, Wei J (2010) An efficient gene selection technique for cancer recognition based on neighborhood mutual information. Int J Mach Learn Cybern 1(1):63–74CrossRef
14.
15.
Zurück zum Zitat Jenatton R, Mairal J, Obozinski G, Bach F (2010) Proximal methods for sparse hierarchical dictionary learning. In: Proceedings of the international conference on machine learning (ICML) Jenatton R, Mairal J, Obozinski G, Bach F (2010) Proximal methods for sparse hierarchical dictionary learning. In: Proceedings of the international conference on machine learning (ICML)
16.
Zurück zum Zitat Kanagal B, Deshpande A (2008) Online filtering, smoothing and probabilistic modeling of streaming data. In: IEEE 24th international conference on data engineering (ICDE) pp 1160–1169 Kanagal B, Deshpande A (2008) Online filtering, smoothing and probabilistic modeling of streaming data. In: IEEE 24th international conference on data engineering (ICDE) pp 1160–1169
17.
Zurück zum Zitat Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society conference on computer vision and pattern recognition, vol 2, pp 2169–2178 Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society conference on computer vision and pattern recognition, vol 2, pp 2169–2178
18.
Zurück zum Zitat Lian X, Chen L (2012) Probabilistic top-k dominating queries in uncertain databases. Inf Sci Lian X, Chen L (2012) Probabilistic top-k dominating queries in uncertain databases. Inf Sci
19.
Zurück zum Zitat Liu J, Ji S, Ye J Slep (2009) Sparse learning with efficient projections. Arizona State University, Glendale Liu J, Ji S, Ye J Slep (2009) Sparse learning with efficient projections. Arizona State University, Glendale
20.
Zurück zum Zitat Liu J, Ye J (2009) Efficient euclidean projections in linear time. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, pp 657–664 Liu J, Ye J (2009) Efficient euclidean projections in linear time. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, pp 657–664
21.
Zurück zum Zitat Liu J, Ye J (2010) Moreau-yosida regularization for grouped tree structure learning. Adv Neural Info Process Syst 23:1459–1467 Liu J, Ye J (2010) Moreau-yosida regularization for grouped tree structure learning. Adv Neural Info Process Syst 23:1459–1467
22.
Zurück zum Zitat Maji S, Berg A, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8 Maji S, Berg A, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8
23.
Zurück zum Zitat Nesterov Y (2003) Introductory lectures on convex optimization: a basic course, vol 87. Springer, Berlin Nesterov Y (2003) Introductory lectures on convex optimization: a basic course, vol 87. Springer, Berlin
24.
Zurück zum Zitat Peng J, Zhu J, Bergamaschi A, Han W, Noh D, Pollack J, Wang P (2010) Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann Appl Stat 4(1):53–77CrossRefMATHMathSciNet Peng J, Zhu J, Bergamaschi A, Han W, Noh D, Pollack J, Wang P (2010) Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann Appl Stat 4(1):53–77CrossRefMATHMathSciNet
25.
Zurück zum Zitat Qin X, Zhang Y, Li X, Wang Y (2010) Associative classifier for uncertain data. In: Web-Age Information Management, pp 692–703 Qin X, Zhang Y, Li X, Wang Y (2010) Associative classifier for uncertain data. In: Web-Age Information Management, pp 692–703
26.
Zurück zum Zitat Ren J, Lee S, Chen X, Kao B, Cheng R, Cheung D (2009) Naive bayes classification of uncertain data. In: Ninth IEEE international conference on data mining. IEEE Computer Society, Washington, pp. 944–949 Ren J, Lee S, Chen X, Kao B, Cheng R, Cheung D (2009) Naive bayes classification of uncertain data. In: Ninth IEEE international conference on data mining. IEEE Computer Society, Washington, pp. 944–949
27.
Zurück zum Zitat Rockafellar R (1996) Convex analysis, vol. 28. Princeton university press, Princeton Rockafellar R (1996) Convex analysis, vol. 28. Princeton university press, Princeton
28.
Zurück zum Zitat Sharma A., Imoto S., Miyano S., Sharma V. (2011) Null space based feature selection method for gene expression data. Int J Mach Learn Cybern pp 1–8 Sharma A., Imoto S., Miyano S., Sharma V. (2011) Null space based feature selection method for gene expression data. Int J Mach Learn Cybern pp 1–8
29.
Zurück zum Zitat Shivaswamy P, Bhattacharyya C, Smola A (2006) Second order cone programming approaches for handling missing and uncertain data. J Mach Learn Res 7:1283–1314MATHMathSciNet Shivaswamy P, Bhattacharyya C, Smola A (2006) Second order cone programming approaches for handling missing and uncertain data. J Mach Learn Res 7:1283–1314MATHMathSciNet
30.
Zurück zum Zitat Subrahmanya N, Shin Y (2012) A variational bayesian framework for group feature selection. Int J Mach Learn Cybern pp 1–11 Subrahmanya N, Shin Y (2012) A variational bayesian framework for group feature selection. Int J Mach Learn Cybern pp 1–11
31.
Zurück zum Zitat Tang V., Yan H. (2012) Noise reduction in microarray gene expression data based on spectral analysis. Int J Mach Learn Cybern 3(1):51–57CrossRef Tang V., Yan H. (2012) Noise reduction in microarray gene expression data based on spectral analysis. Int J Mach Learn Cybern 3(1):51–57CrossRef
32.
Zurück zum Zitat Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological), pp 267–288 Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological), pp 267–288
33.
Zurück zum Zitat Tong Y, Chen L, Cheng Y, Yu P (2012) Mining frequent itemsets over uncertain databases. Proc VLDB Endow 5(11):1650–1661 Tong Y, Chen L, Cheng Y, Yu P (2012) Mining frequent itemsets over uncertain databases. Proc VLDB Endow 5(11):1650–1661
34.
Zurück zum Zitat Tsang S, Kao B, Yip K, Ho W, Lee S (2011) Decision trees for uncertain data. IEEE Trans Knowl Data Eng 23(1):64–78CrossRef Tsang S, Kao B, Yip K, Ho W, Lee S (2011) Decision trees for uncertain data. IEEE Trans Knowl Data Eng 23(1):64–78CrossRef
35.
Zurück zum Zitat Yuan M, Lin Y (2005) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Statistical Methodology) 68(1):49–67CrossRefMathSciNet Yuan M, Lin Y (2005) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Statistical Methodology) 68(1):49–67CrossRefMathSciNet
36.
Zurück zum Zitat Zou H., Hastie T. (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Statistical Methodology) 67(2):301–320CrossRefMATHMathSciNet Zou H., Hastie T. (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Statistical Methodology) 67(2):301–320CrossRefMATHMathSciNet
Metadaten
Titel
Sparse group LASSO based uncertain feature selection
verfasst von
Zongxia Xie
Yong Xu
Publikationsdatum
01.04.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 2/2014
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-013-0156-6

Weitere Artikel der Ausgabe 2/2014

International Journal of Machine Learning and Cybernetics 2/2014 Zur Ausgabe

Neuer Inhalt