Skip to main content
Top

2017 | OriginalPaper | Chapter

Using a Reverse Engineering Type Paradigm in Clustering. An Evolutionary Programming Based Approach

Authors : Jan W. Owsiński, Janusz Kacprzyk, Karol Opara, Jarosław Stańczak, Sławomir Zadrożny

Published in: Fuzzy Sets, Rough Sets, Multisets and Clustering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The aim of this work is to propose a novel view on the well-known clustering approach that is here dealt with from a different perspective. We consider a kind of a reverse engineering related approach, which basically consists in discovering the broadly meant values of the parameters of the clustering algorithm, including the choice of the algorithm itself, or even – more generally – its class, and some other parameters, that have possibly led to a given partition of data, known a priori. We discuss the motivation and possible interpretations related to such a novel reversed process. In fact the main motivation is gaining insight into the structure of the given data set or even a family of data sets. The use of the evolutionary strategies is proposed to computationally implement such a reverse analysis. The idea and feasibility of the proposed computational approach is illustrated on two benchmark type data sets. The preliminary results obtained are promising in terms of a balance between analytic and computational effectiveness and efficiency, quality of results obtained and their comprehensiveness and intuitive appeal, a high application potential, as well as possibilities for further extensions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
It is possible to start with a data similarity/distance matrix, if available, without an explicit characterization of the data in terms of some attributes values, and such a setting also seems to provide a reasonable context for the paradigm proposed in this paper, but we will leave this case for a possible further study.
 
2
The concept of a Reverse Cluster Analysis has been introduced by Ríos and Velásquez [25] in case of the SOM based clustering but it is meant there in a rather different sense as associating original data points with the nodes in the trained network.
 
3
Possibly except for an identifier attribute, which makes it possible to distinguish particular elements of X.
 
4
Actually, even in the “absolute” case, doubts may arise, if the situation resembles the one of multiple overlapping distributions, i.e. although \(P_{A}\) is well established and “certain”, it is hardly reflected in the data, represented by X, so that many objects \(x_{i}\) might be equally well assigned to different clusters.
 
5
Series 2: PC class 4-core processor station, 3.2 GHz, under 64 bit Linux Fedora 17; simulation software written in C and compiled with g++.
 
Literature
1.
go back to reference Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I. (2013) An extensive comparative study of cluster validity indices. Pattern Recognition 46 (1), 243–256.CrossRef Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I. (2013) An extensive comparative study of cluster validity indices. Pattern Recognition 46 (1), 243–256.CrossRef
2.
go back to reference Brun, M., Sima, Ch., Hua, J., Lowey, J., Carroll, B., Suh, E., Dougherty, E.R. (2007) Model-based evaluation of clustering validation measures. Pattern Recognition 40(3), 807–824,CrossRefMATH Brun, M., Sima, Ch., Hua, J., Lowey, J., Carroll, B., Suh, E., Dougherty, E.R. (2007) Model-based evaluation of clustering validation measures. Pattern Recognition 40(3), 807–824,CrossRefMATH
3.
go back to reference Charrad, M., Ghazzali, N., Boiteau, V., Niknafs, A. (2014) NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. Journal of Statistical Software, 61(6), 1–36.CrossRef Charrad, M., Ghazzali, N., Boiteau, V., Niknafs, A. (2014) NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. Journal of Statistical Software, 61(6), 1–36.CrossRef
4.
go back to reference Choi, S.-S., Cha, S.-K., Tappert, Ch.C. (2010) A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, vol. 8, no 1, 43–48. Choi, S.-S., Cha, S.-K., Tappert, Ch.C. (2010) A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, vol. 8, no 1, 43–48.
5.
go back to reference Craven, M.W., Shavlik, J.W. (1995). Extracting comprehensible concept representations from trained neural networks. In: Working Notes of the IJCAI’95 Workshop on Comprehensibility in Machine Learning, Montreal, Canada, 61–75. Craven, M.W., Shavlik, J.W. (1995). Extracting comprehensible concept representations from trained neural networks. In: Working Notes of the IJCAI’95 Workshop on Comprehensibility in Machine Learning, Montreal, Canada, 61–75.
6.
go back to reference Cross, V. Sudkamp, Th.A. (2002) Similarity and compatibility in fuzzy set theory: assessment and applications. Physica-Verlag, Heidelberg; New York. Cross, V. Sudkamp, Th.A. (2002) Similarity and compatibility in fuzzy set theory: assessment and applications. Physica-Verlag, Heidelberg; New York.
7.
go back to reference Das, S., Suganthan, P.N. (2011) Differential evolution: a survey of the state-of-the-art. Evolutionary Computation, IEEE Transactions on, 15(1), 4–31. Das, S., Suganthan, P.N. (2011) Differential evolution: a survey of the state-of-the-art. Evolutionary Computation, IEEE Transactions on, 15(1), 4–31.
8.
go back to reference De Amorim, R. (2015) Feature Relevance in Ward’s Hierarchical Clustering Using the L\(p\) Norm. Journal of Classification 32, 46–62.MathSciNetCrossRefMATH De Amorim, R. (2015) Feature Relevance in Ward’s Hierarchical Clustering Using the L\(p\) Norm. Journal of Classification 32, 46–62.MathSciNetCrossRefMATH
9.
go back to reference Denœud, L., Guénoche, A. (2006) Comparison of distance indices between partitions. In Data Science and Classification, Springer, Berlin Heidelberg, 21–28.CrossRef Denœud, L., Guénoche, A. (2006) Comparison of distance indices between partitions. In Data Science and Classification, Springer, Berlin Heidelberg, 21–28.CrossRef
10.
go back to reference Fisch, D., Gruber, T., Sick, B. (2011) SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis. IEEE Transactions on Knowledge and Data Engineering: 23 (5), 774–787. Fisch, D., Gruber, T., Sick, B. (2011) SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis. IEEE Transactions on Knowledge and Data Engineering: 23 (5), 774–787.
11.
go back to reference Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 (2), 179–188. Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 (2), 179–188.
12.
go back to reference Halkidi, M., Batistakis, Y., Vazirgiannis, M. (2001) On clustering validation techniques. Journal of Intelligent Information Systems, 17(2–3), pp. 107–145.CrossRefMATH Halkidi, M., Batistakis, Y., Vazirgiannis, M. (2001) On clustering validation techniques. Journal of Intelligent Information Systems, 17(2–3), pp. 107–145.CrossRefMATH
13.
go back to reference Hastie, T., Tibsihrani, R., Friedman, J. (2009) The Elements of Statistical Learning Data Mining, Inference, and Prediction. Second Edition. Springer-Verlag New York. Hastie, T., Tibsihrani, R., Friedman, J. (2009) The Elements of Statistical Learning Data Mining, Inference, and Prediction. Second Edition. Springer-Verlag New York.
14.
go back to reference Hubert, L., Arabie, P. (1985) Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRefMATH Hubert, L., Arabie, P. (1985) Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRefMATH
15.
go back to reference Kacprzyk, J., Zadrożny, S. (2013) Comprehensiveness of Linguistic Data Summaries: A Crucial Role of Protoforms. In: Christian Moewes and Andreas Nürnberger (Eds.): Computational Intelligence in Intelligent Data Analysis. Springer-Verlag, Berlin, Heidelberg, 207–221.CrossRef Kacprzyk, J., Zadrożny, S. (2013) Comprehensiveness of Linguistic Data Summaries: A Crucial Role of Protoforms. In: Christian Moewes and Andreas Nürnberger (Eds.): Computational Intelligence in Intelligent Data Analysis. Springer-Verlag, Berlin, Heidelberg, 207–221.CrossRef
16.
go back to reference Kaufman, L., Rousseeuw, P.J. (1990) Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York. Kaufman, L., Rousseeuw, P.J. (1990) Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
17.
go back to reference Lance, G.N.,Williams. W.T. (1966) A General Theory of Classificatory Sorting Strategies. 1. Hierarchical systems. Computer Journal, 9, 373–380. Lance, G.N.,Williams. W.T. (1966) A General Theory of Classificatory Sorting Strategies. 1. Hierarchical systems. Computer Journal, 9, 373–380.
18.
go back to reference Maechler, Martin, et al. (2015) “cluster: cluster analysis extended Rousseeuw et al.” R package, version 2.0.3. Maechler, Martin, et al. (2015) “cluster: cluster analysis extended Rousseeuw et al.” R package, version 2.0.3.
19.
go back to reference Michalski, R. (1983) A theory and methodology of inductive learning. Artificial Intelligence: 20(2), 111–161. Michalski, R. (1983) A theory and methodology of inductive learning. Artificial Intelligence: 20(2), 111–161.
20.
go back to reference Miyamoto, S. (2014) Classification Rules in Methods of Clustering (featured article). IEEE Intelligent Informatics Bulletin, 15(1), 15–21.MathSciNet Miyamoto, S. (2014) Classification Rules in Methods of Clustering (featured article). IEEE Intelligent Informatics Bulletin, 15(1), 15–21.MathSciNet
21.
go back to reference Miyamoto, S., Ichihashi, H., Honda, K. (2008) Algorithms for Fuzzy Clustering: Methods in c-Means Clustering with Applications. Springer-Verlag, Berlin Heidelberg, Studies in Fuzziness and Soft Computing 229.MATH Miyamoto, S., Ichihashi, H., Honda, K. (2008) Algorithms for Fuzzy Clustering: Methods in c-Means Clustering with Applications. Springer-Verlag, Berlin Heidelberg, Studies in Fuzziness and Soft Computing 229.MATH
22.
go back to reference Mullen, K.M, Ardia, D., Gil, D., Windover, D., Cline, J. (2011) DEoptim: An R Package for Global Optimization by Differential Evolution. Journal of Statistical Software, 40(6), 1–26.CrossRef Mullen, K.M, Ardia, D., Gil, D., Windover, D., Cline, J. (2011) DEoptim: An R Package for Global Optimization by Differential Evolution. Journal of Statistical Software, 40(6), 1–26.CrossRef
23.
go back to reference Pryke, A., Beale, R. (2004) Interactive Comprehensible Data Mining. In: Y. Cai (ed.): Ambient Intelligence for Scientific Discovery, Springer, LNCS 3345, 48–65. Pryke, A., Beale, R. (2004) Interactive Comprehensible Data Mining. In: Y. Cai (ed.): Ambient Intelligence for Scientific Discovery, Springer, LNCS 3345, 48–65.
25.
go back to reference Ríos, S.A., Velásquez J.D. (2011) Finding Representative Web Pages Based on a SOM and a Reverse Cluster Analysis. International Journal on Artificial Intelligence Tools 20(1) 93–118. Ríos, S.A., Velásquez J.D. (2011) Finding Representative Web Pages Based on a SOM and a Reverse Cluster Analysis. International Journal on Artificial Intelligence Tools 20(1) 93–118.
26.
go back to reference Stańczak, J. (2003) Biologically inspired methods for control of evolutionary algorithms. Control and Cybernetics, 32 (2), 411–433.MATH Stańczak, J. (2003) Biologically inspired methods for control of evolutionary algorithms. Control and Cybernetics, 32 (2), 411–433.MATH
27.
go back to reference Storn, R., Price, K. (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341–359.MathSciNetCrossRefMATH Storn, R., Price, K. (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341–359.MathSciNetCrossRefMATH
28.
go back to reference Torra, V., Endo, Y., Miyamoto, S. (2011) Computationally intensive parameter selection for clustering algorithms: The case of fuzzy \(c\)-means with tolerance. International Journal of Intelligent Systems, 26 (4), 313–322.CrossRef Torra, V., Endo, Y., Miyamoto, S. (2011) Computationally intensive parameter selection for clustering algorithms: The case of fuzzy \(c\)-means with tolerance. International Journal of Intelligent Systems, 26 (4), 313–322.CrossRef
29.
go back to reference Zhou, Z.H. (2005) Comprehensibility of data mining algorithms. In: J. Wang (ed.): Encyclopedia of Data Warehousing and Mining, IGI Global: Hershey, 190–195. Zhou, Z.H. (2005) Comprehensibility of data mining algorithms. In: J. Wang (ed.): Encyclopedia of Data Warehousing and Mining, IGI Global: Hershey, 190–195.
Metadata
Title
Using a Reverse Engineering Type Paradigm in Clustering. An Evolutionary Programming Based Approach
Authors
Jan W. Owsiński
Janusz Kacprzyk
Karol Opara
Jarosław Stańczak
Sławomir Zadrożny
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-47557-8_9

Premium Partner