Skip to main content
Top
Published in: Advances in Data Analysis and Classification 4/2023

02-01-2023 | Regular Article

Sparse correspondence analysis for large contingency tables

Authors: Ruiping Liu, Ndeye Niang, Gilbert Saporta, Huiwen Wang

Published in: Advances in Data Analysis and Classification | Issue 4/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We propose sparse variants of correspondence analysis (CA) for large contingency tables like documents-terms matrices used in text mining. By seeking to obtain many zero coefficients, sparse CA remedies to the difficulty of interpreting CA results when the size of the table is large. Since CA is a double weighted PCA (for rows and columns) or a weighted generalized SVD, we adapt known sparse versions of these methods with specific developments to obtain orthogonal solutions and to tune the sparseness parameters. We distinguish two cases depending on whether sparseness is asked for both rows and columns, or only for one set.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
There are 45 presidents, but the speech data of presidents William Henry Harrison and James Garfield are missing.
 
Literature
go back to reference Abdi H, Béra M (2014) Correspondence Analysis. Encyclopedia of Social Network Analysis and Mining. Springer, New York, New York, NY, pp 275–284CrossRef Abdi H, Béra M (2014) Correspondence Analysis. Encyclopedia of Social Network Analysis and Mining. Springer, New York, New York, NY, pp 275–284CrossRef
go back to reference Adachi K, Trendafilov NT (2016) Sparse principal component analysis subject to prespecified cardinality of loadings. Computational Statistics 31(4):1403–1427MathSciNetCrossRefMATH Adachi K, Trendafilov NT (2016) Sparse principal component analysis subject to prespecified cardinality of loadings. Computational Statistics 31(4):1403–1427MathSciNetCrossRefMATH
go back to reference Beh EJ, Lombardo R (2014) Correspondence analysis: Theory, practice and new strategies. John Wiley & SonsCrossRefMATH Beh EJ, Lombardo R (2014) Correspondence analysis: Theory, practice and new strategies. John Wiley & SonsCrossRefMATH
go back to reference Bernard A, Guinot C, Saporta G (2012) Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis. In: Colubi A et al (eds) Proceedings of the 20th international conference on computational statistics (COMPSTAT 2012). International Association for Statistical Computing, pp 99–106 Bernard A, Guinot C, Saporta G (2012) Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis. In: Colubi A et al (eds) Proceedings of the 20th international conference on computational statistics (COMPSTAT 2012). International Association for Statistical Computing, pp 99–106
go back to reference D’Ambra L, Lauro NC (1992) Non symmetrical exploratory data analysis. Statistica Applicata 4(4):511–529 D’Ambra L, Lauro NC (1992) Non symmetrical exploratory data analysis. Statistica Applicata 4(4):511–529
go back to reference Govaert G, Nadif M (2013) Co-clustering: models, algorithms and applications. John Wiley & SonsCrossRefMATH Govaert G, Nadif M (2013) Co-clustering: models, algorithms and applications. John Wiley & SonsCrossRefMATH
go back to reference Greenacre MJ (2010) Correspondence analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2(5):613–619CrossRef Greenacre MJ (2010) Correspondence analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2(5):613–619CrossRef
go back to reference Guerra-Urzola R, Van Deun K, Vera JC, Sijtsma K (2021) A Guide for Sparse PCA: Model Comparison and Applications. Psychometrika 86(4):893–919MathSciNetCrossRefMATH Guerra-Urzola R, Van Deun K, Vera JC, Sijtsma K (2021) A Guide for Sparse PCA: Model Comparison and Applications. Psychometrika 86(4):893–919MathSciNetCrossRefMATH
go back to reference Guillemot V, Beaton D, Gloaguen A, Löfstedt T, Levine B, Raymond N, Tenenhaus A, Abdi H (2019) A constrained singular value decomposition method that integrates sparsity and orthogonality. PloS one 14(3):e0211463CrossRef Guillemot V, Beaton D, Gloaguen A, Löfstedt T, Levine B, Raymond N, Tenenhaus A, Abdi H (2019) A constrained singular value decomposition method that integrates sparsity and orthogonality. PloS one 14(3):e0211463CrossRef
go back to reference Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics 12(3):531–547MathSciNetCrossRef Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics 12(3):531–547MathSciNetCrossRef
go back to reference Lebart L, Pincemin B, Poudat C (2019) Analyse des données textuelles. Presses de l’Université du Québec Lebart L, Pincemin B, Poudat C (2019) Analyse des données textuelles. Presses de l’Université du Québec
go back to reference Lebart L, Salem A, Berry L (1997) Exploring textual data. Springer Science & Business Media Lebart L, Salem A, Berry L (1997) Exploring textual data. Springer Science & Business Media
go back to reference Lebart L, Saporta G (2014) Historical elements of correspondence analysis and multiple correspondence analysis. In: Blasius J, Greenacre MJ (eds) Visualization and Verbalization of Data. Chapman and Hall, London, pp 31–44 Lebart L, Saporta G (2014) Historical elements of correspondence analysis and multiple correspondence analysis. In: Blasius J, Greenacre MJ (eds) Visualization and Verbalization of Data. Chapman and Hall, London, pp 31–44
go back to reference Mackey L (2009) Deflation Methods for Sparse PCA. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in Neural Information Processing Systems, vol 21. Curran Associates Inc, pp 1017–1024 Mackey L (2009) Deflation Methods for Sparse PCA. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in Neural Information Processing Systems, vol 21. Curran Associates Inc, pp 1017–1024
go back to reference Mori Y, Kuroda M, Makino N (2016) Sparse Multiple Correspondence Analysis. In: Mori Y, Kuroda M, Makino N (eds) Nonlinear Principal Component Analysis and Its Applications. Springer-Verlag, pp 47–56CrossRefMATH Mori Y, Kuroda M, Makino N (2016) Sparse Multiple Correspondence Analysis. In: Mori Y, Kuroda M, Makino N (eds) Nonlinear Principal Component Analysis and Its Applications. Springer-Verlag, pp 47–56CrossRefMATH
go back to reference Ning-min S, Jing L (2015) A literature survey on high-dimensional sparse principal component analysis. International Journal of Database Theory and Application 8(6):57–74CrossRef Ning-min S, Jing L (2015) A literature survey on high-dimensional sparse principal component analysis. International Journal of Database Theory and Application 8(6):57–74CrossRef
go back to reference Savoy J (2015) Text clustering: An application with the State of the Union addresses. Journal of the Association for Information Science and Technology 66(8):1645–1654CrossRef Savoy J (2015) Text clustering: An application with the State of the Union addresses. Journal of the Association for Information Science and Technology 66(8):1645–1654CrossRef
go back to reference Shen D, Shen H, Marron JS (2013) Consistency of sparse PCA in high dimension, low sample size contexts. Journal of Multivariate Analysis 115:317–333MathSciNetCrossRefMATH Shen D, Shen H, Marron JS (2013) Consistency of sparse PCA in high dimension, low sample size contexts. Journal of Multivariate Analysis 115:317–333MathSciNetCrossRefMATH
go back to reference Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99(6):1015–1034MathSciNetCrossRefMATH Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99(6):1015–1034MathSciNetCrossRefMATH
go back to reference Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3):515–534CrossRefMATH Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3):515–534CrossRefMATH
go back to reference Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. Journal of Computational and Graphical Statistics 15(2):265–286MathSciNetCrossRef Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. Journal of Computational and Graphical Statistics 15(2):265–286MathSciNetCrossRef
Metadata
Title
Sparse correspondence analysis for large contingency tables
Authors
Ruiping Liu
Ndeye Niang
Gilbert Saporta
Huiwen Wang
Publication date
02-01-2023
Publisher
Springer Berlin Heidelberg
Published in
Advances in Data Analysis and Classification / Issue 4/2023
Print ISSN: 1862-5347
Electronic ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-022-00531-5

Other articles of this Issue 4/2023

Advances in Data Analysis and Classification 4/2023 Go to the issue

Premium Partner