Skip to main content

2017 | OriginalPaper | Buchkapitel

Rethinking Unsupervised Feature Selection: From Pseudo Labels to Pseudo Must-Links

verfasst von : Xiaokai Wei, Sihong Xie, Bokai Cao, Philip S. Yu

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

High-dimensional data are prevalent in various machine learning applications. Feature selection is a useful technique for alleviating the curse of dimensionality. Unsupervised feature selection problem tends to be more challenging than its supervised counterpart due to the lack of class labels. State-of-the-art approaches usually use the concept of pseudo labels to select discriminative features by their regression coefficients and the pseudo-labels derived from clustering is usually inaccurate. In this paper, we propose a new perspective for unsupervised feature selection by Discriminatively Exploiting Similarity (DES). Through forming similar and dissimilar data pairs, implicit discriminative information can be exploited. The similar/dissimilar relationship of data pairs can be used as guidance for feature selection. Based on this idea, we propose hypothesis testing based and classification based methods as instantiations of the DES framework. We evaluate the proposed approaches extensively using six real-world datasets. Experimental results demonstrate that our approaches achieve significantly outperforms the state-of-the-art unsupervised methods. More surprisingly, our unsupervised method even achieves performance comparable to a supervised feature selection method. Code related to this chapter is available at: http://​bdsc.​lab.​uic.​edu/​resources.​html.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: KDD, pp. 333–342 (2010) Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: KDD, pp. 333–342 (2010)
2.
Zurück zum Zitat Du, L., Shen, Y.-D.: Unsupervised feature selection with adaptive structure learning. In: KDD (2015) Du, L., Shen, Y.-D.: Unsupervised feature selection with adaptive structure learning. In: KDD (2015)
3.
Zurück zum Zitat Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, Hoboken (2001)MATH Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, Hoboken (2001)MATH
5.
Zurück zum Zitat He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: NIPS (2005) He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: NIPS (2005)
6.
Zurück zum Zitat Li, J., Hu, X., Jian, L., Liu, H.: Toward time-evolving feature selection on dynamic networks. In: IEEE 16th International Conference on Data Mining (ICDM), 12–15 December 2016, Barcelona, Spain, pp. 1003–1008 (2016) Li, J., Hu, X., Jian, L., Liu, H.: Toward time-evolving feature selection on dynamic networks. In: IEEE 16th International Conference on Data Mining (ICDM), 12–15 December 2016, Barcelona, Spain, pp. 1003–1008 (2016)
7.
Zurück zum Zitat Li, J., Tang, J., Liu, H.: Reconstruction-based unsupervised feature selection: an embedded approach. In: IJCAI (2017) Li, J., Tang, J., Liu, H.: Reconstruction-based unsupervised feature selection: an embedded approach. In: IJCAI (2017)
8.
Zurück zum Zitat Li, Z., Yang, Y., Liu, J., Zhou, X., Lu, H.: Unsupervised feature selection using nonnegative spectral analysis. In: AAAI (2012) Li, Z., Yang, Y., Liu, J., Zhou, X., Lu, H.: Unsupervised feature selection using nonnegative spectral analysis. In: AAAI (2012)
9.
Zurück zum Zitat Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence (1995) Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence (1995)
10.
Zurück zum Zitat Nie, F., Huang, H., Cai, X., Ding, C.H.Q.: Efficient and robust feature selection via joint l2, 1-norms minimization. In: NIPS, pp. 1813–1821 (2010) Nie, F., Huang, H., Cai, X., Ding, C.H.Q.: Efficient and robust feature selection via joint l2, 1-norms minimization. In: NIPS, pp. 1813–1821 (2010)
11.
Zurück zum Zitat Qian, M., Zhai, C.: Robust unsupervised feature selection. In: IJCAI (2013) Qian, M., Zhai, C.: Robust unsupervised feature selection. In: IJCAI (2013)
12.
Zurück zum Zitat Shi, L., Du, L., Shen, Y.-D.: Robust spectral learning for unsupervised feature selection. In: ICDM (2014) Shi, L., Du, L., Shen, Y.-D.: Robust spectral learning for unsupervised feature selection. In: ICDM (2014)
13.
Zurück zum Zitat Song, L., Smola, A.J., Gretton, A., Borgwardt, K.M., Bedo, J.: Supervised feature selection via dependence estimation. In: ICML, vol. 227, pp. 823–830. ACM (2007) Song, L., Smola, A.J., Gretton, A., Borgwardt, K.M., Bedo, J.: Supervised feature selection via dependence estimation. In: ICML, vol. 227, pp. 823–830. ACM (2007)
14.
Zurück zum Zitat Sun, L., Li, Z., Yan, Q., Srisa-an, W., Pan, Y.: SigPID: significant permission identification for android malware detection. In: 2016 11th International Conference on Malicious and Unwanted Software (MALWARE), pp. 1–8. IEEE (2016) Sun, L., Li, Z., Yan, Q., Srisa-an, W., Pan, Y.: SigPID: significant permission identification for android malware detection. In: 2016 11th International Conference on Malicious and Unwanted Software (MALWARE), pp. 1–8. IEEE (2016)
15.
Zurück zum Zitat Tang, J., Liu, H.: Unsupervised feature selection for linked social media data. In: KDD, pp. 904–912 (2012) Tang, J., Liu, H.: Unsupervised feature selection for linked social media data. In: KDD, pp. 904–912 (2012)
16.
Zurück zum Zitat Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. (Ser. B) 58, 267–288 (1996)MathSciNetMATH Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. (Ser. B) 58, 267–288 (1996)MathSciNetMATH
17.
Zurück zum Zitat Wei, X., Cao, B., Yu, P.S.: Unsupervised feature selection on networks: a generative view. In: AAAI, pp. 2215–2221 (2016) Wei, X., Cao, B., Yu, P.S.: Unsupervised feature selection on networks: a generative view. In: AAAI, pp. 2215–2221 (2016)
18.
Zurück zum Zitat Wei, X., Cao, B., Yu, P.S.: Multi-view unsupervised feature selection by cross-diffused matrix alignment. In: International Joint Conference on Neural Networks (IJCNN), pp. 494–501 (2017) Wei, X., Cao, B., Yu, P.S.: Multi-view unsupervised feature selection by cross-diffused matrix alignment. In: International Joint Conference on Neural Networks (IJCNN), pp. 494–501 (2017)
19.
Zurück zum Zitat Wei, X., Xie, S., Yu, P.S.: Efficient partial order preserving unsupervised feature selection on networks. In: SDM, pp. 82–90 (2015) Wei, X., Xie, S., Yu, P.S.: Efficient partial order preserving unsupervised feature selection on networks. In: SDM, pp. 82–90 (2015)
20.
Zurück zum Zitat Wei, X., Yu, P.S.: Unsupervised feature selection by preserving stochastic neighbors. In: AISTATS (2016) Wei, X., Yu, P.S.: Unsupervised feature selection by preserving stochastic neighbors. In: AISTATS (2016)
21.
Zurück zum Zitat Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 210–227 (2009)CrossRef Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 210–227 (2009)CrossRef
22.
Zurück zum Zitat Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: L2, 1-norm regularized discriminative feature selection for unsupervised learning. In: IJCAI, pp. 1589–1594 (2011) Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: L2, 1-norm regularized discriminative feature selection for unsupervised learning. In: IJCAI, pp. 1589–1594 (2011)
23.
Zurück zum Zitat Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: ICML, vol. 227, pp. 1151–1157 (2007) Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: ICML, vol. 227, pp. 1151–1157 (2007)
24.
Zurück zum Zitat Zhao, Z., Wang, L., Liu, H.: Efficient spectral feature selection with minimum redundancy. In: AAAI (2010) Zhao, Z., Wang, L., Liu, H.: Efficient spectral feature selection with minimum redundancy. In: AAAI (2010)
Metadaten
Titel
Rethinking Unsupervised Feature Selection: From Pseudo Labels to Pseudo Must-Links
verfasst von
Xiaokai Wei
Sihong Xie
Bokai Cao
Philip S. Yu
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-71249-9_17