Top

Published in:

2014 | OriginalPaper | Chapter

Robustness and Stability Analysis of Factor PD-Clustering on Large Social Data Sets

Authors : Cristina Tortora, Marina Marino

Published in: Analysis and Modeling of Complex Data in Behavioral and Social Sciences

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Factor clustering methods were proposed to cluster large data sets. Among them factor probabilistic distance clustering (FPDC) shows interesting performance. The method is based on two main steps: a Tucker3 decomposition of the distance array and probabilistic distance (PD) clustering on the resulting factors. The aim of this paper is to apply FPDC on behavioral and social data sets of large dimensions, to obtain homogeneous and well-separated clusters of individuals. The scope is to evaluate the stability and the robustness of the method dealing with these data sets. Stability of results is referred to the invariance of results in each iteration of the method. Robustness is referred to the sensitivity of the method to errors in data. These characteristics of the method are evaluated using bootstrap resampling.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Changes in Japanese EFL Learners’ Proficiency: An Application of Latent Rank Theory

next chapter A Box-Plot and Outliers Detection Proposal for Histogram Data: New Tools for Data Stream Analysis

Arabie, P., & Hubert, L. (1994). Cluster analysis in marketing research. In R. P. Bagozzi (Ed.), Advanced methods in marketing research (pp 160–189). Oxford: Blackwell.

Ben-Hur, A., Elisseeff, A., & Guyon, I. (2002). A stability based method for discovering structure in clustered data. Pacific Symposium on Biocomputing, 7(6), 6–17.

Ben-Israel, A., & Iyigun, C. (2008). Probabilistic d-clustering. Journal of Classification, 25(1), 5–26.CrossRefMATHMathSciNet

Bezdek, J. C. (1974). Numerical taxonomy with fuzzy sets. Journal of Mathematical Biology, 1(1), 57–71.CrossRefMATHMathSciNet

Bryan, J. (2004). Problems in gene clustering based on gene expression data. Journal of Multivariate Analyis, 90, 67–89.CrossRefMathSciNet

Bubeck, S., Meilă, M., & Von Luxburg, U. (2012). How the initialization affects the stability of the k-means algorithm. Probability and statistics: PS, 16, 436–452.CrossRefMATH

De Soete, G., & Carroll, J. D. (1994). k-means clustering in a low-dimensional euclidean space. In E. Diday, Y. Lechevallier, M. Schader, et al. (Eds.), New approaches in classification and data analysis. (pp. 212–219). Heidelberg: Springer.

Devé, R. N., & Krishnapuram, R. (1997). Robust clustering methods: A unified view. IEEE Transiction on Fuzzy Systems, 5(2), 270–293.CrossRef

Dudoit, S., & Fridlyand, J. (2002). A prediction-based resampling method to estimate the number of clusters in a dataset. Genome Biology, 3, 0036.1–0036.21.CrossRef

Gettler Summa, M., Palumbo, F., & Tortora, C. (2011). Factor pd-clustering. Working paper [arXiv:1106.3830v3]

Ghahramani, Z., & Hinton, G. E. (1997). The em algorithm for mixtures of factor analyzers. Crg-tr-96-1, University of Toronto, Toronto.

Grün, B., & Leisch, F. (2004). Bootstrapping finite mixture models. Compstat 2004, proceedings in Computational Statistics, 1115–1122.

Hennig, C. (2007). Cluster-wise assessment of cluster stability. Computational Statistics & Data Analysis, 52(1), 258–271.CrossRefMATHMathSciNet

Huber, P. J. (1981). Robust Statistics. New York: Wiley.CrossRefMATH

Iyigun, C. (2007). Probabilistic Distance Clustering. Ph.D. thesis, New Brunswick Rutgers, The State University of New Jersey.

Kiers, H. A. L., & Kinderen, A. (2003). A fast method for choosing the numbers of components in tucker3 analysis. British Journal of Mathematical and Statistical Psychology, 56(1), 119–125.CrossRefMathSciNet

Kroonenberg, P. M. (2008). Applied multiway data analysis. Hoboken: Ebooks Corporation.CrossRefMATH

Lange, T., Roth, V., Braun, M. L., & Buhmann, J. M. (2004). Stability-based validation of clustering solutions. Neural Computation, 16(6), 1299–1323.CrossRefMATH

Lebart, A., Morineau, A., & Warwick, K. (1984). Multivariate statistical descriptive analysis. New York: Wiley.MATH

Maronna, R. A., & Zamar, R. H. (2002). Robust estimates of location and dispersion for high-dimensional datasets. Technometrics, 44(4), 307–317.CrossRefMathSciNet

McLachlan, G. J., & Peel, D. (2003). Modelling high-dimensional data by mixtures of factor analyzers, Computational Statistics and Data Analysis, 41(3), 379–388.CrossRefMATHMathSciNet

Monti, S., Tamayo, P., Mesirov, J., & Golub, T. (2001). Consensus clustering: A resampling-based method for class discovery and visualization of gene. Expression Microarray Data, Machine Learning, 52, 91–118.CrossRef

Rocci, R., Gattone, S. A., & Vichi, M. (2011). A new dimension reduction method: Factor discriminant k-means. Journal of Classification, 28(2), 210–226.CrossRefMATHMathSciNet

Tibshirani, R., & Walther, G. (2005). Cluster validation by prediction strength. Journal of Computational and Graphical Statistics, 14, 511–528.CrossRefMathSciNet

Timmerman, M. E., Ceulemans, E., Kiers, H. A. L., & Vichi, M. (2010). Factorial and reduced k-means reconsidered. Computational Statistics & Data Analysis, 54(7), 1858–1871.CrossRefMATHMathSciNet

Tortora, C., Gettler Summa, M., & Palumbo, F. (2013). Factor pd-clustering. In U. Alfred, L. Berthold, & V. Dirk (Eds.), Algorithms from and for nature and life (volume, in press).

Vendramin, L., Campello, R., & Hruschka, E. (2009). In SDM. On the comparison of relative clustering validity criteria (pp. 733–744).

Vichi, M., & Kiers, H. A. L. (2001). Factorial k-means analysis for two way data. Computational Statistics and Data Analysis, 37, 29–64.CrossRefMathSciNet

Title: Robustness and Stability Analysis of Factor PD-Clustering on Large Social Data Sets
Authors: Cristina Tortora
Marina Marino
Publisher: Springer International Publishing
Book: Analysis and Modeling of Complex Data in Behavioral and Social Sciences
Print ISBN: 978-3-319-06691-2

Electronic ISBN: 978-3-319-06692-9

Copyright Year: 2014
DOI: https://doi.org/10.1007/978-3-319-06692-9_29

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner