Skip to main content

2016 | OriginalPaper | Buchkapitel

7. Dimensionality Reduction

verfasst von : Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus

Erschienen in: Multilabel Classification

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

High dimensionality is a profoundly studied problem in machine learning. Usually, a high-dimensional input space defies most classification algorithms, tending to produce more complex and less effective models. Multilabel data are also affected by high dimensionality in the output space, since many datasets have hundreds or even thousands of labels. This chapter aims to explain how high dimensionality affects multilabel classification, as well as the methods proposed to deal with this obstacle. A general overview of the curse of dimensionality in the multilabel field is provided in Sect. 7.1. Section 7.2 introduces feature space reduction techniques, outlining several specific proposals and testing how applying feature selection impacts multilabel classifiers results. Then, a similar discussion but related to label space dimensionality is given in Sect. 7.3, also including some experimental results. Section 7.4 summarizes the chapter.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bellman, R.: Dynamic Programming. P (Rand Corporation). Princeton University Press (1957) Bellman, R.: Dynamic Programming. P (Rand Corporation). Princeton University Press (1957)
3.
Zurück zum Zitat Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: LI-MLC: a label inference methodology for addressing high dimensionality in the label space for multilabel classification. IEEE Trans. Neural Networks Learn. Syst. 25(10), 1842–1854 (2014)CrossRef Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: LI-MLC: a label inference methodology for addressing high dimensionality in the label space for multilabel classification. IEEE Trans. Neural Networks Learn. Syst. 25(10), 1842–1854 (2014)CrossRef
4.
Zurück zum Zitat Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of 18th International Conference on Machine Learning, ICML’01, pp. 74–81. Morgan Kaufmann (2001) Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of 18th International Conference on Machine Learning, ICML’01, pp. 74–81. Morgan Kaufmann (2001)
5.
Zurück zum Zitat Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)CrossRef Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)CrossRef
6.
Zurück zum Zitat Dembszynski, K., Waegeman, W., Cheng, W., Hüllermeier, E.: On label dependence in multilabel classification. In: ICML Workshop on Learning from Multi-label Data, pp. 5–12 (2010) Dembszynski, K., Waegeman, W., Cheng, W., Hüllermeier, E.: On label dependence in multilabel classification. In: ICML Workshop on Learning from Multi-label Data, pp. 5–12 (2010)
7.
Zurück zum Zitat Doquire, G., Verleysen, M.: Mutual information-based feature selection for multilabel classification. Neurocomputing 122, 148–155 (2013)CrossRefMATH Doquire, G., Verleysen, M.: Mutual information-based feature selection for multilabel classification. Neurocomputing 122, 148–155 (2013)CrossRefMATH
8.
Zurück zum Zitat Fisher, R.A.: The statistical utilization of multiple measurements. Ann. Eugenics 8(4), 376–386 (1938)CrossRefMATH Fisher, R.A.: The statistical utilization of multiple measurements. Ann. Eugenics 8(4), 376–386 (1938)CrossRefMATH
9.
Zurück zum Zitat Guyon, I., Bitter, H.M., Ahmed, Z., Brown, M., Heller, J.: Multivariate non-linear feature selection with kernel multiplicative updates and Gram-Schmidt relief. In: Proceedings of International Joint Workshop on Soft Computing for Internet and Bioinformatics, BISC Flint-CIBI’03, pp. 1–11 (2003) Guyon, I., Bitter, H.M., Ahmed, Z., Brown, M., Heller, J.: Multivariate non-linear feature selection with kernel multiplicative updates and Gram-Schmidt relief. In: Proceedings of International Joint Workshop on Soft Computing for Internet and Bioinformatics, BISC Flint-CIBI’03, pp. 1–11 (2003)
10.
Zurück zum Zitat Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A. (eds.): Feature Extraction: Foundations and Applications. Springer (2008) Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A. (eds.): Feature Extraction: Foundations and Applications. Springer (2008)
11.
Zurück zum Zitat Hotelling, H.: Relations between two sets of variates. In: Breakthroughs in Statistics, pp. 162–190. Springer (1992) Hotelling, H.: Relations between two sets of variates. In: Breakthroughs in Statistics, pp. 162–190. Springer (1992)
12.
Zurück zum Zitat Hsu, D., Kakade, S., Langford, J., Zhang, T.: Multi-label prediction via compressed sensing. In: Proceedings of 22th Annual Conference on Advances in Neural Information Processing Systems, NIPS’09, vol. 22, pp. 772–780 (2009) Hsu, D., Kakade, S., Langford, J., Zhang, T.: Multi-label prediction via compressed sensing. In: Proceedings of 22th Annual Conference on Advances in Neural Information Processing Systems, NIPS’09, vol. 22, pp. 772–780 (2009)
13.
Zurück zum Zitat Jolliffe, I.: Principal Component Analysis. Springer Series in Statistics, vol. 1. Springer, Berlin (1986) Jolliffe, I.: Principal Component Analysis. Springer Series in Statistics, vol. 1. Springer, Berlin (1986)
14.
Zurück zum Zitat Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of 10th National Conference on Artificial Intelligence, AAAI’92, pp. 129–134. AAAI Press (1992) Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of 10th National Conference on Artificial Intelligence, AAAI’92, pp. 129–134. AAAI Press (1992)
15.
Zurück zum Zitat Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)CrossRefMATH Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)CrossRefMATH
16.
Zurück zum Zitat Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Machine Learning: ECML-94, pp. 171–182 (1994) Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Machine Learning: ECML-94, pp. 171–182 (1994)
17.
Zurück zum Zitat Lee, J.S., Kim, D.W.: Mutual information-based multi-label feature selection using interaction information. Expert Syst. Appl. 42, 2013–2025 (2015)CrossRef Lee, J.S., Kim, D.W.: Mutual information-based multi-label feature selection using interaction information. Expert Syst. Appl. 42, 2013–2025 (2015)CrossRef
18.
Zurück zum Zitat Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer Science & Business Media (2012) Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer Science & Business Media (2012)
19.
Zurück zum Zitat Read, J.: A pruned problem transformation method for multi-label classification. In: Proceedings of New Zealand Computer Science Research Student Conference, NZCSRS’08, pp. 143–150 (2008) Read, J.: A pruned problem transformation method for multi-label classification. In: Proceedings of New Zealand Computer Science Research Student Conference, NZCSRS’08, pp. 143–150 (2008)
20.
Zurück zum Zitat Read, J., Pfahringer, B., Holmes, G.: Multi-label classification using ensembles of pruned sets. In: Proceedings of 8th IEEE International Conference on Data Mining, ICDM’08, pp. 995–1000. IEEE (2008) Read, J., Pfahringer, B., Holmes, G.: Multi-label classification using ensembles of pruned sets. In: Proceedings of 8th IEEE International Conference on Data Mining, ICDM’08, pp. 995–1000. IEEE (2008)
21.
Zurück zum Zitat Spolaor, N., Cherman, E.A., Monard, M.C., Lee, H.D.: A comparison of multi-label feature selection methods using the problem transformation approach. Electron. Notes Theor. Comput. Sci. 292, 135–151 (2013)CrossRef Spolaor, N., Cherman, E.A., Monard, M.C., Lee, H.D.: A comparison of multi-label feature selection methods using the problem transformation approach. Electron. Notes Theor. Comput. Sci. 292, 135–151 (2013)CrossRef
22.
Zurück zum Zitat Sun, L., Ji, S., Ye, J.: Hypergraph spectral learning for multi-label classification. In: Proceedings of 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 668–676. ACM (2008) Sun, L., Ji, S., Ye, J.: Hypergraph spectral learning for multi-label classification. In: Proceedings of 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 668–676. ACM (2008)
23.
Zurück zum Zitat Sun, L., Ji, S., Ye, J.: Multi-Label Dimensionality Reduction. CRC Press (2013) Sun, L., Ji, S., Ye, J.: Multi-Label Dimensionality Reduction. CRC Press (2013)
24.
Zurück zum Zitat Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD Workshop on Mining Multidimensional Data, MMD’08, pp. 30–44 (2008) Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD Workshop on Mining Multidimensional Data, MMD’08, pp. 30–44 (2008)
25.
Zurück zum Zitat Tsoumakas, G., Vlahavas, I.: Random k-Labelsets: an ensemble method for multilabel classification. In: Proceedings of 18th European Conference on Machine Learning, ECML’07, vol. 4701, pp. 406–417. Springer (2007) Tsoumakas, G., Vlahavas, I.: Random k-Labelsets: an ensemble method for multilabel classification. In: Proceedings of 18th European Conference on Machine Learning, ECML’07, vol. 4701, pp. 406–417. Springer (2007)
26.
Zurück zum Zitat Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.: Kernel dependency estimation. In: Proceedings of 16th Annual Conference on Advances in Neural Information Processing Systems, NIPS’02, vol. 15, pp. 873–880 (2002) Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.: Kernel dependency estimation. In: Proceedings of 16th Annual Conference on Advances in Neural Information Processing Systems, NIPS’02, vol. 15, pp. 873–880 (2002)
27.
Zurück zum Zitat Wyse, N., Dubes, R., Jain, A.K.: A critical evaluation of intrinsic dimensionality algorithms. Pattern Recogn. Pract. 415–425 (1980) Wyse, N., Dubes, R., Jain, A.K.: A critical evaluation of intrinsic dimensionality algorithms. Pattern Recogn. Pract. 415–425 (1980)
28.
Zurück zum Zitat Yu, K., Yu, S., Tresp, V.: Multi-label informed latent semantic indexing. In: Proceedings of 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 258–265. ACM (2005) Yu, K., Yu, S., Tresp, V.: Multi-label informed latent semantic indexing. In: Proceedings of 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 258–265. ACM (2005)
29.
Zurück zum Zitat Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)CrossRef Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)CrossRef
30.
Zurück zum Zitat Zhang, Y., Zhou, Z.H.: Multilabel dimensionality reduction via dependence maximization. ACM Trans. Knowl. Discovery Data (TKDD) 4(3), 14 (2010) Zhang, Y., Zhou, Z.H.: Multilabel dimensionality reduction via dependence maximization. ACM Trans. Knowl. Discovery Data (TKDD) 4(3), 14 (2010)
31.
Zurück zum Zitat Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: Proceedings of 7th SIAM International Conference on Data Mining, SDM’07, pp. 641–646 (2007) Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: Proceedings of 7th SIAM International Conference on Data Mining, SDM’07, pp. 641–646 (2007)
32.
Zurück zum Zitat Zhou, T., Tao, D., Wu, X.: Compressed labeling on distilled labelsets for multi-label learning. Mach. Learn. 88(1–2), 69–126 (2012)MathSciNetCrossRefMATH Zhou, T., Tao, D., Wu, X.: Compressed labeling on distilled labelsets for multi-label learning. Mach. Learn. 88(1–2), 69–126 (2012)MathSciNetCrossRefMATH
Metadaten
Titel
Dimensionality Reduction
verfasst von
Francisco Herrera
Francisco Charte
Antonio J. Rivera
María J. del Jesus
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-41111-8_7

Premium Partner