Skip to main content
Top
Published in: Cluster Computing 3/2019

27-11-2017

Euclidean space based hierarchical clusterers combinations: an application to software clustering

Authors: Rashid Naseem, Mustafa Mat Deris, Onaiza Maqbool, Sara Shahzad

Published in: Cluster Computing | Special Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Hierarchical clustering groups similar entities on the basis of some similarity (or distance) association and results in a tree like structure, called dendrogram. Dendrograms represent clusters in a nested manner, where at each step an entity makes a new cluster or merges into an existing cluster. Hierarchical clustering has many applications, therefore researchers have made efforts to come up with improved hierarchical clustering approaches. An approach that has received attention is based on combining clustering results, since different hierarchical clustering algorithms produce different dendrograms and their combination has produced more promising results as compared to individual hierarchical clustering. This paper proposes the hierarchical clustering combination (HCC) approach which uses the different types of structural features present in the dendrogram. Firstly, the dendrograms are represented in a 4+N (4 is the extracted number of features and can be extended to N number) dimensional euclidean space (4+NDES) which results in vector matrices. 4+NDES is the structural representation of the dendrogram which contains not only the relative features but also the absolute features of the entities in the dendrogram. Then the vector matrices are aggregated and the distance is calculated between each two vector using the Euclidean distance measure. The final hierarchy is obtained using a recovery tool like individual hierarchical clustering. 4+NDES-HCC utilizes the structural contents of the dendrogram and has the flexibility to handle an increasing number of features. The proposed approach is tested for software clustering which plays an important role in maintenance of software systems. The experimental results of the proposed approach and comparative analysis with existing approaches reveal the effectiveness of the HCC for software clustering.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
2
  • First Cluster: (t,u)
  • Second Cluster: ((t,u)v)
  • Third Cluster: (((t,u)v)w)
  • Fourth Cluster: ((((t,u)v)w)x)
  • Fifth Cluster: (((((t,u)v)w)x)(y,z))
 
6
Combination can be calculated using following formula: \(\left( {\begin{array}{c}n\\ k\end{array}}\right) ={}^{n}C_{k}=\frac{n!}{k!(n-k)!}\) where n is the number of clusterers and k is number of choice.
 
Literature
1.
go back to reference Abi-Antoun, M., Ammar, N., Hailat, Z.: Extraction of ownership object graphs from object-oriented code. In: Proceedings of the 8th International ACM SIGSOFT Conference on Quality of Software Architectures—QoSA ’12, p. 133. ACM Press, New York (2012). https://doi.org/10.1145/2304696.2304719 Abi-Antoun, M., Ammar, N., Hailat, Z.: Extraction of ownership object graphs from object-oriented code. In: Proceedings of the 8th International ACM SIGSOFT Conference on Quality of Software Architectures—QoSA ’12, p. 133. ACM Press, New York (2012). https://​doi.​org/​10.​1145/​2304696.​2304719
2.
go back to reference Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2013)CrossRef Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2013)CrossRef
8.
go back to reference Bittencourt, R.A., Guerrero, D.D.S.: Comparison of graph clustering algorithms for recovering software architecture module views. In: 2009 13th European Conference on Software Maintenance and Reengineering, pp. 251–254. IEEE (2009). https://doi.org/10.1109/CSMR.2009.28 Bittencourt, R.A., Guerrero, D.D.S.: Comparison of graph clustering algorithms for recovering software architecture module views. In: 2009 13th European Conference on Software Maintenance and Reengineering, pp. 251–254. IEEE (2009). https://​doi.​org/​10.​1109/​CSMR.​2009.​28
16.
go back to reference Deursen, A.V., Kuipers, T.: Finding classes in legacy code using cluster analysis. In: Workshop on Object Oriented Reengineering, pp. 1–5 (1997) Deursen, A.V., Kuipers, T.: Finding classes in legacy code using cluster analysis. In: Workshop on Object Oriented Reengineering, pp. 1–5 (1997)
21.
24.
go back to reference Hoffman, K.: Analysis in Euclidean Space. Courier Corporation, Mineola (2013)MATH Hoffman, K.: Analysis in Euclidean Space. Courier Corporation, Mineola (2013)MATH
29.
go back to reference Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River (1988)MATH Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River (1988)MATH
32.
go back to reference Kanellopoulos, Y., Antonellis, P., Tjortjis, C., Makris, C.: k-Attractors: a clustering algorithm for software measurement data analysis. In: IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007), pp. 358–365. IEEE (2007). https://doi.org/10.1109/ICTAI.2007.31 Kanellopoulos, Y., Antonellis, P., Tjortjis, C., Makris, C.: k-Attractors: a clustering algorithm for software measurement data analysis. In: IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007), pp. 358–365. IEEE (2007). https://​doi.​org/​10.​1109/​ICTAI.​2007.​31
38.
go back to reference Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Comparing software architecture recovery techniques using accurate dependencies. In: IEEE International Conference on Software Engineering (ICSE), vol. 2, pp. 69–78. IEEE, ACM, USA, Canada (2015). https://doi.org/10.1109/ICSE.2015.136 Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Comparing software architecture recovery techniques using accurate dependencies. In: IEEE International Conference on Software Engineering (ICSE), vol. 2, pp. 69–78. IEEE, ACM, USA, Canada (2015). https://​doi.​org/​10.​1109/​ICSE.​2015.​136
40.
go back to reference Lutellier, T.: Measuring the impact of code dependencies on software architecture recovery techniques. Ph.D. thesis, University of Waterloo (2015) Lutellier, T.: Measuring the impact of code dependencies on software architecture recovery techniques. Ph.D. thesis, University of Waterloo (2015)
45.
go back to reference Mirzaei, A., Rahmati, M., Ahmadi, M.: A new method for hierarchical clustering combination. Intell. Data Anal. 12, 549–571 (2008)CrossRef Mirzaei, A., Rahmati, M., Ahmadi, M.: A new method for hierarchical clustering combination. Intell. Data Anal. 12, 549–571 (2008)CrossRef
50.
go back to reference Naseem, R., Maqbool, O., Muhammad, S.: An improved similarity measure for binary features in software clustering. In: 2010 Second International Conference on Computational Intelligence, Modelling and Simulation, pp. 111–116. IEEE (2010). https://doi.org/10.1109/CIMSiM.2010.34 Naseem, R., Maqbool, O., Muhammad, S.: An improved similarity measure for binary features in software clustering. In: 2010 Second International Conference on Computational Intelligence, Modelling and Simulation, pp. 111–116. IEEE (2010). https://​doi.​org/​10.​1109/​CIMSiM.​2010.​34
53.
go back to reference Naseem, R., Deris, M.B.M., Li, J., Shahzad, S.: Improved binary similarity measures for software modularization. Front. Inf. Technol. Electron. Eng. 18(8), 1–28 (2017)CrossRef Naseem, R., Deris, M.B.M., Li, J., Shahzad, S.: Improved binary similarity measures for software modularization. Front. Inf. Technol. Electron. Eng. 18(8), 1–28 (2017)CrossRef
61.
62.
go back to reference Scanniello, G., Risi, M., Tortora, G.: Architecture recovery using latent semantic indexing and K-means: an empirical evaluation. In: 2010 8th IEEE International Conference on Software Engineering and Formal Methods, pp. 103–112. IEEE (2010). https://doi.org/10.1109/SEFM.2010.19 Scanniello, G., Risi, M., Tortora, G.: Architecture recovery using latent semantic indexing and K-means: an empirical evaluation. In: 2010 8th IEEE International Conference on Software Engineering and Formal Methods, pp. 103–112. IEEE (2010). https://​doi.​org/​10.​1109/​SEFM.​2010.​19
64.
go back to reference Shah, Z., Naseem, R., Orgun, M., Mahmood, A.N., Shahzad, S.: Software clustering using automated feature subset selection. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) International Conference on Advanced Data Mining and Applications (ADMA). Lecture Notes in Computer Science, vol. 8347, pp. 47–58. Springer, Italy (2013). https://doi.org/10.1007/978-3-642-53917-6_5 CrossRef Shah, Z., Naseem, R., Orgun, M., Mahmood, A.N., Shahzad, S.: Software clustering using automated feature subset selection. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) International Conference on Advanced Data Mining and Applications (ADMA). Lecture Notes in Computer Science, vol. 8347, pp. 47–58. Springer, Italy (2013). https://​doi.​org/​10.​1007/​978-3-642-53917-6_​5 CrossRef
75.
80.
go back to reference Xanthos, S., Goodwin, N.: Clustering object-oriented software systems using spectral graph partitioning. Urbana 51(1), 1–5 (2006) Xanthos, S., Goodwin, N.: Clustering object-oriented software systems using spectral graph partitioning. Urbana 51(1), 1–5 (2006)
81.
go back to reference Zheng, L.I., Li, T.A.O., Ding, C.: A framework for hierarchical ensemble clustering. ACM Trans. Knowl. Discov. Data 9(2), 1–23 (2014)CrossRef Zheng, L.I., Li, T.A.O., Ding, C.: A framework for hierarchical ensemble clustering. ACM Trans. Knowl. Discov. Data 9(2), 1–23 (2014)CrossRef
82.
Metadata
Title
Euclidean space based hierarchical clusterers combinations: an application to software clustering
Authors
Rashid Naseem
Mustafa Mat Deris
Onaiza Maqbool
Sara Shahzad
Publication date
27-11-2017
Publisher
Springer US
Published in
Cluster Computing / Issue Special Issue 3/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1408-0

Other articles of this Special Issue 3/2019

Cluster Computing 3/2019 Go to the issue

Premium Partner