Article

An empirical study of the robustness of two module clustering fitness functions

Authors:
Mark Harman

King's College, Strand, London, UK

King's College, Strand, London, UK
View Profile

,
Stephen Swift

Brunel University, Uxbridge, Middlesex, UK

Brunel University, Uxbridge, Middlesex, UK
View Profile

,
Kiarash Mahdavi

King's College, Strand, London, UK

King's College, Strand, London, UK
View Profile

GECCO '05: Proceedings of the 7th annual conference on Genetic and evolutionary computationJune 2005Pages 1029–1036https://doi.org/10.1145/1068009.1068184

Published:25 June 2005Publication History

GECCO '05: Proceedings of the 7th annual conference on Genetic and evolutionary computation

Pages 1029–1036

ABSTRACT

Two of the attractions of search-based software engineering (SBSE) derive from the nature of the fitness functions used to guide the search. These have proved to be highly robust (for a variety of different search algorithms) and have yielded insight into the nature of the search space itself, shedding light upon the software engineering problem in hand.This paper aims to exploit these two benefits of SBSE in the context of search based module clustering. The paper presents empirical results which compare the robustness of two fitness functions used for software module clustering: one (MQ) used exclusively for module clustering. The other is EVM, a clustering fitness function previously applied to time series and gene expression data.The results show that both metrics are relatively robust in the presence of noise, with EVM being the more robust of the two. The results may also yield some interesting insights into the nature of software graphs.

References

D. G. Altman. Practical Statistics for Medical Research. Chapman and Hall, 1997.]] Google ScholarDigital Library
J. Clark, J. J. Dolado, M. Harman, R. M. Hierons, B. Jones, M. Lumkin, B. Mitchell, S. Mancoridis, K. Rees, M. Roper, and M. Shepperd. Reformulating software engineering as a search problem. IEE Proceedings - Software, 150(3):161--175, 2003.]]Google ScholarCross Ref
L. L. Constantine and E. Yourdon. Structured Design. Prentice Hall, 1979.]]Google Scholar
D. Doval, S. Mancoridis, and B. S. Mitchell. Automatic clustering of software systems using a genetic algorithm. In International Conference on Software Tools and Engineering Practice (STEP'99), Pittsburgh, PA, 30 August - 2 September 1999.]] Google ScholarDigital Library
M. Harman, R. Hierons, and M. Proctor. A new representation and crossover operator for search-based optimization of software modularization. In GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages 1351--1358, New York, 9-13 July 2002. Morgan Kaufmann Publishers.]]Google ScholarDigital Library
D. Hutchens and V. Basili. System structure analysis: clustering with data bindings. IEEE Transactions on Software Engineering, SE-11(8):749--757, 1985.]] Google ScholarDigital Library
P. Kellam, X. Liu, N. Martin, C. Orengo, S. Swift, and A. Tucker. A framework for modelling virus gene expression data. Intelligent Data Analysis, 6(3):267--279, 2002.]]Google ScholarCross Ref
C. Kirsopp, M. Shepperd, and J. Hart. Search heuristics, case-based reasoning and software project effort prediction. In GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages 1367--1374, New York, 9-13 July 2002. Morgan Kaufmann Publishers.]]Google ScholarDigital Library
C. Lindig and G. Snelting. Assessing modular structure of legacy code based on mathematical concept analysis. In Proceedings of the 1997 International Conference on Software Engineering, pages 349--359. ACM Press, 1997.]] Google ScholarDigital Library
R. Lutz. Evolving good hierarchical decompositions of complex systems. Journal of Systems Architecture, 47:613--634, 2001.]] Google ScholarDigital Library
K. Mahdavi, M. Harman, and R. Hierons. Finding building blocks for software clustering. In Genetic and Evolutionary Computation - GECCO-2003, volume 2724 of LNCS, pages 2513--2514, Chicago, 12-16 July 2003. Springer-Verlag.]] Google ScholarDigital Library
K. Mahdavi, M. Harman, and R. M. Hierons. A multiple hill climbing approach to software module clustering. In IEEE International Conference on Software Maintenance (ICSM 2003), pages 315--324, Amsterdam, Netherlands, Sept. 2003. IEEE Computer Society Press, Los Alamitos, California, USA.]] Google ScholarDigital Library
S. Mancoridis, B. S. Mitchell, Y.-F. Chen, and E. R. Gansner. Bunch: A clustering tool for the recovery and maintenance of software system structures. In Proceedings; IEEE International Conference on Software Maintenance, pages 50--59. IEEE Computer Society Press, 1999.]] Google ScholarDigital Library
S. Mancoridis, B. S. Mitchell, C. Rorres, Y.-F. Chen, and E. R. Gansner. Using automatic clustering to produce high-level system organizations of source code. In International Workshop on Program Comprehension (IWPC'98), pages 45--53, Ischia, Italy, 1998. IEEE Computer Society Press, Los Alamitos, California, USA.]] Google ScholarDigital Library
B. S. Mitchell. A Heuristic Search Approach to Solving the Software Clustering Problem. PhD Thesis, Drexel University, Philadelphia, PA, Jan. 2002.]] Google ScholarDigital Library
B. S. Mitchell and S. Mancoridis. Using heuristic search techniques to extract design abstractions from source code. In GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages 1375--1382, New York, 9-13 July 2002. Morgan Kaufmann Publishers.]]Google ScholarDigital Library
B. S. Mitchell and S. Mancoridis. Using interconnection style rules to infer software architecture relations. In 8th Genetic and Evolutionary Computing Conference (GECCO'04), Seattle, USA, July 2004. Springer-Verlag.]]Google ScholarCross Ref
H. Pohlheim and J. Wegener. Testing the temporal behavior of real-time software modules using extended evolutionary algorithms. In W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith, editors, Proceedings of the Genetic and Evolutionary Computation Conference, volume 2, page 1795, Orlando, Florida, USA, 13-17 July 1999. Morgan Kaufmann.]]Google Scholar
R. Pressman. Software Engineering: A Practitioner's Approach. McGraw-Hill Book Company Europe, Maidenhead, Berkshire, England, UK., 3rd edition, 1992. European adaptation (1994). Adapted by Darrel Ince. ISBN 0-07-707936-1.]] Google ScholarDigital Library
R. W. Schwanke. An intelligent tool for re-engineering software modularity. In Proceedings of the 13th International Conference on Software Engineering, pages 83--92, May 1991.]] Google ScholarDigital Library
C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379--423 and 623--656, July and October 1948.]]Google ScholarCross Ref
A. Tucker, S. Swift, and X. Liu. Grouping multivariate time series via correlation. IEEE Transactions on Systems, Man, and Cybernetics. Part B: Cybernetics, 31(2):235--245, 2001.]] Google ScholarDigital Library
A. van Deursen and T. Kuipers. Identifying objects using cluster and concept analysis. Technical Report SEN-R9814, Centrum voor Wiskunde en Informatica (CWI), Sept. 1998.]] Google ScholarDigital Library

Index Terms

An empirical study of the robustness of two module clustering fitness functions
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Software reverse engineering

Recommendations

A two-leveled symbiotic evolutionary algorithm for clustering problems

Because of its unsupervised nature, clustering is one of the most challenging problems, considered as a NP-hard grouping problem. Recently, several evolutionary algorithms (EAs) for clustering problems have been presented because of their efficiency for ...
Read More
A Comparative Landscape Analysis of Fitness Functions for Search-Based Testing
SYNASC '08: Proceedings of the 2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing

Landscape analysis of fitness functions is an important topic.This paper makes an attempt to characterize the search problems associated with the fitness functions used in search-based testing, employingthe following measures: diameter, autocorrelation ...
Read More
Robustness of density-based clustering methods with various neighborhood relations

Cluster analysis is one of the most crucial techniques in statistical data analysis. Among the clustering methods, density-based methods have great importance due to their ability to recognize clusters with arbitrary shape. In this paper, robustness of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '05: Proceedings of the 7th annual conference on Genetic and evolutionary computation
June 2005
2272 pages
ISBN:1595930108
DOI:10.1145/1068009
Editor:
Hans-Georg Beyer
Vorarlberg University of Applied Sciences, Austria
,
General Chair:
Una-May O'Reilly
CSAIL, MIT
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
fitness functions
modularization
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 64
  Total Citations
  View Citations
- 490
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An empirical study of the robustness of two module clustering fitness functions

GECCO '05: Proceedings of the 7th annual conference on Genetic and evolutionary computation

ABSTRACT

References

Cited By

Index Terms

Recommendations

A two-leveled symbiotic evolutionary algorithm for clustering problems

A Comparative Landscape Analysis of Fitness Functions for Search-Based Testing

Robustness of density-based clustering methods with various neighborhood relations