Top

Published in:

2017 | OriginalPaper | Chapter

Using a Reverse Engineering Type Paradigm in Clustering. An Evolutionary Programming Based Approach

Authors : Jan W. Owsiński, Janusz Kacprzyk, Karol Opara, Jarosław Stańczak, Sławomir Zadrożny

Published in: Fuzzy Sets, Rough Sets, Multisets and Clustering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The aim of this work is to propose a novel view on the well-known clustering approach that is here dealt with from a different perspective. We consider a kind of a reverse engineering related approach, which basically consists in discovering the broadly meant values of the parameters of the clustering algorithm, including the choice of the algorithm itself, or even – more generally – its class, and some other parameters, that have possibly led to a given partition of data, known a priori. We discuss the motivation and possible interpretations related to such a novel reversed process. In fact the main motivation is gaining insight into the structure of the given data set or even a family of data sets. The use of the evolutionary strategies is proposed to computationally implement such a reverse analysis. The idea and feasibility of the proposed computational approach is illustrated on two benchmark type data sets. The preliminary results obtained are promising in terms of a balance between analytic and computational effectiveness and efficiency, quality of results obtained and their comprehensiveness and intuitive appeal, a high application potential, as well as possibilities for further extensions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Consensus-Based Agglomerative Hierarchical Clustering

next chapter On Hesitant Fuzzy Clustering and Clustering of Hesitant Fuzzy Data

It is possible to start with a data similarity/distance matrix, if available, without an explicit characterization of the data in terms of some attributes values, and such a setting also seems to provide a reasonable context for the paradigm proposed in this paper, but we will leave this case for a possible further study.

The concept of a Reverse Cluster Analysis has been introduced by Ríos and Velásquez [25] in case of the SOM based clustering but it is meant there in a rather different sense as associating original data points with the nodes in the trained network.

Possibly except for an identifier attribute, which makes it possible to distinguish particular elements of X.

Actually, even in the “absolute” case, doubts may arise, if the situation resembles the one of multiple overlapping distributions, i.e. although \(P_{A}\) is well established and “certain”, it is hardly reflected in the data, represented by X, so that many objects \(x_{i}\) might be equally well assigned to different clusters.

Series 2: PC class 4-core processor station, 3.2 GHz, under 64 bit Linux Fedora 17; simulation software written in C and compiled with g++.

Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I. (2013) An extensive comparative study of cluster validity indices. Pattern Recognition 46 (1), 243–256.CrossRef

Brun, M., Sima, Ch., Hua, J., Lowey, J., Carroll, B., Suh, E., Dougherty, E.R. (2007) Model-based evaluation of clustering validation measures. Pattern Recognition 40(3), 807–824,CrossRefMATH

Charrad, M., Ghazzali, N., Boiteau, V., Niknafs, A. (2014) NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. Journal of Statistical Software, 61(6), 1–36.CrossRef

Choi, S.-S., Cha, S.-K., Tappert, Ch.C. (2010) A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, vol. 8, no 1, 43–48.

Craven, M.W., Shavlik, J.W. (1995). Extracting comprehensible concept representations from trained neural networks. In: Working Notes of the IJCAI’95 Workshop on Comprehensibility in Machine Learning, Montreal, Canada, 61–75.

Cross, V. Sudkamp, Th.A. (2002) Similarity and compatibility in fuzzy set theory: assessment and applications. Physica-Verlag, Heidelberg; New York.

Das, S., Suganthan, P.N. (2011) Differential evolution: a survey of the state-of-the-art. Evolutionary Computation, IEEE Transactions on, 15(1), 4–31.

De Amorim, R. (2015) Feature Relevance in Ward’s Hierarchical Clustering Using the L\(p\) Norm. Journal of Classification 32, 46–62.MathSciNetCrossRefMATH

Denœud, L., Guénoche, A. (2006) Comparison of distance indices between partitions. In Data Science and Classification, Springer, Berlin Heidelberg, 21–28.CrossRef

10.

Fisch, D., Gruber, T., Sick, B. (2011) SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis. IEEE Transactions on Knowledge and Data Engineering: 23 (5), 774–787.

11.

Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 (2), 179–188.

12.

Halkidi, M., Batistakis, Y., Vazirgiannis, M. (2001) On clustering validation techniques. Journal of Intelligent Information Systems, 17(2–3), pp. 107–145.CrossRefMATH

13.

Hastie, T., Tibsihrani, R., Friedman, J. (2009) The Elements of Statistical Learning Data Mining, Inference, and Prediction. Second Edition. Springer-Verlag New York.

14.

Hubert, L., Arabie, P. (1985) Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRefMATH

15.

Kacprzyk, J., Zadrożny, S. (2013) Comprehensiveness of Linguistic Data Summaries: A Crucial Role of Protoforms. In: Christian Moewes and Andreas Nürnberger (Eds.): Computational Intelligence in Intelligent Data Analysis. Springer-Verlag, Berlin, Heidelberg, 207–221.CrossRef

16.

Kaufman, L., Rousseeuw, P.J. (1990) Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.

17.

Lance, G.N.,Williams. W.T. (1966) A General Theory of Classificatory Sorting Strategies. 1. Hierarchical systems. Computer Journal, 9, 373–380.

18.

Maechler, Martin, et al. (2015) “cluster: cluster analysis extended Rousseeuw et al.” R package, version 2.0.3.

19.

Michalski, R. (1983) A theory and methodology of inductive learning. Artificial Intelligence: 20(2), 111–161.

20.

Miyamoto, S. (2014) Classification Rules in Methods of Clustering (featured article). IEEE Intelligent Informatics Bulletin, 15(1), 15–21.MathSciNet

21.

Miyamoto, S., Ichihashi, H., Honda, K. (2008) Algorithms for Fuzzy Clustering: Methods in c-Means Clustering with Applications. Springer-Verlag, Berlin Heidelberg, Studies in Fuzziness and Soft Computing 229.MATH

22.

Mullen, K.M, Ardia, D., Gil, D., Windover, D., Cline, J. (2011) DEoptim: An R Package for Global Optimization by Differential Evolution. Journal of Statistical Software, 40(6), 1–26.CrossRef

23.

Pryke, A., Beale, R. (2004) Interactive Comprehensible Data Mining. In: Y. Cai (ed.): Ambient Intelligence for Scientific Discovery, Springer, LNCS 3345, 48–65.

24.

R Core Team (2014) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

25.

Ríos, S.A., Velásquez J.D. (2011) Finding Representative Web Pages Based on a SOM and a Reverse Cluster Analysis. International Journal on Artificial Intelligence Tools 20(1) 93–118.

26.

Stańczak, J. (2003) Biologically inspired methods for control of evolutionary algorithms. Control and Cybernetics, 32 (2), 411–433.MATH

27.

Storn, R., Price, K. (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341–359.MathSciNetCrossRefMATH

28.

Torra, V., Endo, Y., Miyamoto, S. (2011) Computationally intensive parameter selection for clustering algorithms: The case of fuzzy \(c\)-means with tolerance. International Journal of Intelligent Systems, 26 (4), 313–322.CrossRef

29.

Zhou, Z.H. (2005) Comprehensibility of data mining algorithms. In: J. Wang (ed.): Encyclopedia of Data Warehousing and Mining, IGI Global: Hershey, 190–195.

Title: Using a Reverse Engineering Type Paradigm in Clustering. An Evolutionary Programming Based Approach
Authors: Jan W. Owsiński
Janusz Kacprzyk
Karol Opara
Jarosław Stańczak
Sławomir Zadrożny
Publisher: Springer International Publishing
Book: Fuzzy Sets, Rough Sets, Multisets and Clustering
Print ISBN: 978-3-319-47556-1

Electronic ISBN: 978-3-319-47557-8

Copyright Year: 2017
DOI: https://doi.org/10.1007/978-3-319-47557-8_9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner