Skip to main content
Erschienen in: Journal of Intelligent Information Systems 1/2013

01.02.2013

Clustering interval data through kernel-induced feature space

verfasst von: Anderson F. B. F. da Costa, Bruno A. Pimentel, Renata M. C. R. de Souza

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recently, kernel-based clustering in feature space has shown to perform better than conventional clustering methods in unsupervised classification. In this paper, a partitioning clustering method in kernel-induce feature space for symbolic interval-valued data is introduced. The distance between an item and its prototype in feature space is expanded using a two-component mixture kernel to handle intervals. Moreover, tools for the partition and cluster interpretation of interval-valued data in feature space are also presented. To show the effectiveness of the proposed method, experiments with real and synthetic interval data sets were performed and a study comparing the proposed method with different clustering algorithms of the literature is also presented. The clustering quality furnished by the methods is measured by an external cluster validity index (corrected Rand index). These experiments showed the usefulness of the kernel K-means method for interval-valued data and the merit of the partition and cluster interpretation tools.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bock, H.H. (2002). Clustering algorithms and kohonen maps for symbolic data. In Proc. ICNCB, Osaka, J. Jpn. Soc. Comp. Statistic (Vol. 15, p. 113). Bock, H.H. (2002). Clustering algorithms and kohonen maps for symbolic data. In Proc. ICNCB, Osaka, J. Jpn. Soc. Comp. Statistic (Vol. 15, p. 113).
Zurück zum Zitat Celeux, G., Diday, E., Govaert, G., Lechevallier Y., Ralambondrainy, H. (1989). Classification Automatique des Données. Bordas, Paris. Celeux, G., Diday, E., Govaert, G., Lechevallier Y., Ralambondrainy, H. (1989). Classification Automatique des Données. Bordas, Paris.
Zurück zum Zitat Chavent, M., De Carvalho, F.A.T., Lechevallier, Y., Verde, R. (2006). New clustering methods for interval data. Computational Statistics, 21, 211230.CrossRef Chavent, M., De Carvalho, F.A.T., Lechevallier, Y., Verde, R. (2006). New clustering methods for interval data. Computational Statistics, 21, 211230.CrossRef
Zurück zum Zitat Costa, A.F.B.F., Pimentel, B.A., de Souza, R.M.C.R. (2010). K-means clustering for symbolic interval data based on aggregated kernel functions. In Proceedings of the 22nd IEEE international conference on tools with artificial intelligence, ICTAI 2010 (pp. 375–376). Costa, A.F.B.F., Pimentel, B.A., de Souza, R.M.C.R. (2010). K-means clustering for symbolic interval data based on aggregated kernel functions. In Proceedings of the 22nd IEEE international conference on tools with artificial intelligence, ICTAI 2010 (pp. 375–376).
Zurück zum Zitat Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge, MA: Cambridge University Press. Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge, MA: Cambridge University Press.
Zurück zum Zitat De Carvalho, F.A.T. (2007). Fuzzy c-means clustering methods for symbolic interval data. Pattern Recognition Letters, 28(4), 423437.MathSciNet De Carvalho, F.A.T. (2007). Fuzzy c-means clustering methods for symbolic interval data. Pattern Recognition Letters, 28(4), 423437.MathSciNet
Zurück zum Zitat De Carvalho, F.A.T., & de Souza, R.M.C.R. (2010). Unsupervised pattern recognition models for mixed feature-type symbolic data. Pattern Recognition Letters, 31, 430–443.CrossRef De Carvalho, F.A.T., & de Souza, R.M.C.R. (2010). Unsupervised pattern recognition models for mixed feature-type symbolic data. Pattern Recognition Letters, 31, 430–443.CrossRef
Zurück zum Zitat De Carvalho, F.A.T., & Lechevallier, Y. (2009a). Dynamic clustering of interval-valued data based on adaptive quadratic distances. IEEE Transactions on System, Man and Cybernetics - Part A, 39(6), 1295–1306.CrossRef De Carvalho, F.A.T., & Lechevallier, Y. (2009a). Dynamic clustering of interval-valued data based on adaptive quadratic distances. IEEE Transactions on System, Man and Cybernetics - Part A, 39(6), 1295–1306.CrossRef
Zurück zum Zitat De Carvalho, F.A.T., & Lechevallier, Y. (2009b). Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recognition, 42(7), 1223–1236.MATHCrossRef De Carvalho, F.A.T., & Lechevallier, Y. (2009b). Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recognition, 42(7), 1223–1236.MATHCrossRef
Zurück zum Zitat De Carvalho, F.A.T., & Tenório, C.P. (2010). Fuzzy K-means clustering algorithms for interval-valued data based on adaptive quadratic distances. Fuzzy Sets and Systems, 161, 297–2999. De Carvalho, F.A.T., & Tenório, C.P. (2010). Fuzzy K-means clustering algorithms for interval-valued data based on adaptive quadratic distances. Fuzzy Sets and Systems, 161, 297–2999.
Zurück zum Zitat De Carvalho, F.A.T., Souza, R.M.C.R., Chavent, M., Lechevallier, Y. (2006). Adaptive Hausdorff distances and dynamic clustering of symbolic data. Pattern Recognition Letters, 27(3), 167–179.CrossRef De Carvalho, F.A.T., Souza, R.M.C.R., Chavent, M., Lechevallier, Y. (2006). Adaptive Hausdorff distances and dynamic clustering of symbolic data. Pattern Recognition Letters, 27(3), 167–179.CrossRef
Zurück zum Zitat Dhillon, I.S., Guan, Y., Kulis, B. (2004). Kernel K-means, spectral clustering and normalized cuts. In Proc. ACMSIGKDD int’l conf knowledge discovery and data mining, Seattle, WA. Dhillon, I.S., Guan, Y., Kulis, B. (2004). Kernel K-means, spectral clustering and normalized cuts. In Proc. ACMSIGKDD int’l conf knowledge discovery and data mining, Seattle, WA.
Zurück zum Zitat Diday, E., & Noirhomme-Fraiture, M. (2008). Symbolic data analysis and the sodas software. Wiley-Interscience Publishers. Diday, E., & Noirhomme-Fraiture, M. (2008). Symbolic data analysis and the sodas software. Wiley-Interscience Publishers.
Zurück zum Zitat El-Sonbaty, Y., & Ismail, M.A. (1998). Fuzzy clustering for symbolic data. IEEE Transactions on Fuzzy Systems, 6, 195204.CrossRef El-Sonbaty, Y., & Ismail, M.A. (1998). Fuzzy clustering for symbolic data. IEEE Transactions on Fuzzy Systems, 6, 195204.CrossRef
Zurück zum Zitat Everitt, B. (2001). Cluster analysis. New York, Halsted. Everitt, B. (2001). Cluster analysis. New York, Halsted.
Zurück zum Zitat Filippone, M., Camastra, F., Masulli, F., Rovetta, S. (2008). A survey of kernel and spectral methods for clustering. Pattern Recognition, 41(1), 176–190.MATHCrossRef Filippone, M., Camastra, F., Masulli, F., Rovetta, S. (2008). A survey of kernel and spectral methods for clustering. Pattern Recognition, 41(1), 176–190.MATHCrossRef
Zurück zum Zitat Gan, G., Ma, C., Wu, J. (2007). Data clustering: Theory, algorithms, and applications (ASA-SIAM series on statistics and applied probability). SIAM. Gan, G., Ma, C., Wu, J. (2007). Data clustering: Theory, algorithms, and applications (ASA-SIAM series on statistics and applied probability). SIAM.
Zurück zum Zitat Ghosh, A.K. (2008). Kernel discriminant analysis using case-specific smoothing parameters. IEEE Transactions on Systems, Man, and Cybernetics B, 38(5), 1413–1418.CrossRef Ghosh, A.K. (2008). Kernel discriminant analysis using case-specific smoothing parameters. IEEE Transactions on Systems, Man, and Cybernetics B, 38(5), 1413–1418.CrossRef
Zurück zum Zitat Girolami, M. (2002). Mercer kernel based clustering in feature space. IEEE Transactions on Neural Networks, 13(3), 780–784.CrossRef Girolami, M. (2002). Mercer kernel based clustering in feature space. IEEE Transactions on Neural Networks, 13(3), 780–784.CrossRef
Zurück zum Zitat Giusti, A., & Grassini, L. (2008). Cluster analysis of census data using the symbolic data approach. Advances in Data Analysis and Classification, 2(2), 163–176.MathSciNetCrossRef Giusti, A., & Grassini, L. (2008). Cluster analysis of census data using the symbolic data approach. Advances in Data Analysis and Classification, 2(2), 163–176.MathSciNetCrossRef
Zurück zum Zitat Guru, D.S., Kiranagi, B.B., Nagabhushan, P. (2004). Multivalued type dissimilarity measure and concept of mutual dissimilarity value for clustering symbolic patterns. Pattern Recognition, 38, 1203–1213. Guru, D.S., Kiranagi, B.B., Nagabhushan, P. (2004). Multivalued type dissimilarity measure and concept of mutual dissimilarity value for clustering symbolic patterns. Pattern Recognition, 38, 1203–1213.
Zurück zum Zitat Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.CrossRef Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.CrossRef
Zurück zum Zitat Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition, 31(8), 651–666.CrossRef Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition, 31(8), 651–666.CrossRef
Zurück zum Zitat Kim, D-W., Lee, K.Y., Lee, D., Lee, K.H. (2005). Evaluation of the performance of clustering algorithms in kernel-induced feature space. Pattern Recognition, 38, 607–611.CrossRef Kim, D-W., Lee, K.Y., Lee, D., Lee, K.H. (2005). Evaluation of the performance of clustering algorithms in kernel-induced feature space. Pattern Recognition, 38, 607–611.CrossRef
Zurück zum Zitat Nasser, A., Hébert, P-A., Hamad, D. (2007). Clustering evaluation in feature space. In Artificial neural networks ICANN 2007. Lecture notes in computer science, 2007 (Vol. 4669/2007, pp. 321–33). Nasser, A., Hébert, P-A., Hamad, D. (2007). Clustering evaluation in feature space. In Artificial neural networks ICANN 2007. Lecture notes in computer science, 2007 (Vol. 4669/2007, pp. 321–33).
Zurück zum Zitat Peng, W., & Li, T. (2006). Interval data clustering with applications. In Proceedings of the 18th IEEE international conference on tools with artificial intelligence (ICTAI’06) (pp. 355–362). Peng, W., & Li, T. (2006). Interval data clustering with applications. In Proceedings of the 18th IEEE international conference on tools with artificial intelligence (ICTAI’06) (pp. 355–362).
Zurück zum Zitat Pimentel, B.A., Costa, A.F.B.F., de Souza, R.M.C.R. (2011). A partitioning method for symbolic interval data based on kernelized metric. In Proceedings of the 20th ACM conference on information and knowledge management, CKM 2011 (pp. 2189–2192). Pimentel, B.A., Costa, A.F.B.F., de Souza, R.M.C.R. (2011). A partitioning method for symbolic interval data based on kernelized metric. In Proceedings of the 20th ACM conference on information and knowledge management, CKM 2011 (pp. 2189–2192).
Zurück zum Zitat Souza, R.M.C.R., & De Carvalho, F.A.T. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letter, 25, 353–365.CrossRef Souza, R.M.C.R., & De Carvalho, F.A.T. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letter, 25, 353–365.CrossRef
Zurück zum Zitat Xiong, H., Wu, J., Chen, J. (2009). K-means clustering versus validation measures: a data-distribution perspective. IEEE Transactions on Systems, Man, and Cybernetics, 39(2), 318–331.CrossRef Xiong, H., Wu, J., Chen, J. (2009). K-means clustering versus validation measures: a data-distribution perspective. IEEE Transactions on Systems, Man, and Cybernetics, 39(2), 318–331.CrossRef
Zurück zum Zitat Yang, M.S., Hwang, P.Y., Chen, D.H. (2004). Fuzzy clustering algorithms for mixed feature variables. Fuzzy Sets and Systems, 141, 301317.MathSciNet Yang, M.S., Hwang, P.Y., Chen, D.H. (2004). Fuzzy clustering algorithms for mixed feature variables. Fuzzy Sets and Systems, 141, 301317.MathSciNet
Zurück zum Zitat Yi, H., & Sam, K. (2009). Learning assignment order of instances for the constrained K-means clustering algorithm. IEEE Transactions on Systems, Man, and Cybernetics B, 39(2), 568–574.CrossRef Yi, H., & Sam, K. (2009). Learning assignment order of instances for the constrained K-means clustering algorithm. IEEE Transactions on Systems, Man, and Cybernetics B, 39(2), 568–574.CrossRef
Metadaten
Titel
Clustering interval data through kernel-induced feature space
verfasst von
Anderson F. B. F. da Costa
Bruno A. Pimentel
Renata M. C. R. de Souza
Publikationsdatum
01.02.2013
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 1/2013
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-012-0219-2

Weitere Artikel der Ausgabe 1/2013

Journal of Intelligent Information Systems 1/2013 Zur Ausgabe