Skip to main content
Top
Published in: Cluster Computing 3/2019

19-02-2018

Feature selection for software effort estimation with localized neighborhood mutual information

Authors: Qin Liu, Jiakai Xiao, Hongming Zhu

Published in: Cluster Computing | Special Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Feature selection is usually employed before applying case based reasoning (CBR) for Software Effort Estimation (SEE). Unfortunately, most feature selection methods treat CBR as a black box method so there is no guarantee on the appropriateness of CBR on selected feature subset. The key to solve the problem is to measure the appropriateness of CBR assumption for a given feature set. In this paper, a measure called localized neighborhood mutual information (LNI) is proposed for this purpose and a greedy method called LNI based feature selection (LFS) is designed for feature selection. Experiment with leave-one-out cross validation (LOOCV) on 6 benchmark datasets demonstrates that: (1) CBR makes effective estimation with the LFS selected subset compared with a randomized baseline method. Compared with three representative feature selection methods, (2) LFS achieves optimal MAR value on 3 out of 6 datasets with a 14% average improvement and (3) LFS achieves optimal MMRE on 5 out of 6 datasets with a 24% average improvement.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Boehm, B., Abts, C., Chulani, S.: Software development cost estimation approaches: a survey. Ann. Softw. Eng. 10(1–4), 177–205 (2000)CrossRef Boehm, B., Abts, C., Chulani, S.: Software development cost estimation approaches: a survey. Ann. Softw. Eng. 10(1–4), 177–205 (2000)CrossRef
2.
go back to reference Vergara, J.R., Esté, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)CrossRef Vergara, J.R., Esté, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)CrossRef
4.
go back to reference Keung, J.W., Kitchenham, B.A., Jeffery, D.R.: Analogy-X: providing statistical inference to analogy-based software cost estimation. IEEE Softw. Eng. Trans. 34(4), 471–484 (2008)CrossRef Keung, J.W., Kitchenham, B.A., Jeffery, D.R.: Analogy-X: providing statistical inference to analogy-based software cost estimation. IEEE Softw. Eng. Trans. 34(4), 471–484 (2008)CrossRef
5.
go back to reference Guyon, I., Elisseeff, A.E.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH Guyon, I., Elisseeff, A.E.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH
6.
go back to reference Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In; proceedings of the ICML (2003) Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In; proceedings of the ICML (2003)
7.
go back to reference Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)CrossRef Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)CrossRef
8.
go back to reference Esté, V., Pablo, A., et al.: Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20(2), 189–201 (2009)CrossRef Esté, V., Pablo, A., et al.: Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20(2), 189–201 (2009)CrossRef
9.
go back to reference Liu, H., et al.: Feature selection with dynamic mutual information. Pattern Recogn. 42(7), 1330–1339 (2009)CrossRef Liu, H., et al.: Feature selection with dynamic mutual information. Pattern Recogn. 42(7), 1330–1339 (2009)CrossRef
10.
go back to reference Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRef Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRef
11.
go back to reference Hu, Q., et al.: Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst. Appl. 38(9), 10737–10750 (2011)CrossRef Hu, Q., et al.: Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst. Appl. 38(9), 10737–10750 (2011)CrossRef
12.
go back to reference Hall, M.A.: Correlation-based feature selection for machine learning. The University of Waikato, Hamilton (1999) Hall, M.A.: Correlation-based feature selection for machine learning. The University of Waikato, Hamilton (1999)
13.
go back to reference Cover, T.M., Thomas, J.A., Kieffer, J.: Elements of information theory. SIAM Rev. 36(3), 509–510 (1994)CrossRef Cover, T.M., Thomas, J.A., Kieffer, J.: Elements of information theory. SIAM Rev. 36(3), 509–510 (1994)CrossRef
14.
go back to reference Arunkumar, N., Kumar, K.R., Venkataraman, V.: Automatic detection of epileptic seizures using new entropy measures. J. Med. Imaging Health Inform 6(3), 724–730 (2016)CrossRef Arunkumar, N., Kumar, K.R., Venkataraman, V.: Automatic detection of epileptic seizures using new entropy measures. J. Med. Imaging Health Inform 6(3), 724–730 (2016)CrossRef
15.
go back to reference Menzies, T., Krishna, R., Pryor D.: The promise repository of empirical software engineering data. (2015) Menzies, T., Krishna, R., Pryor D.: The promise repository of empirical software engineering data. (2015)
16.
go back to reference Van Hulse, J., Khoshgoftaar, T.M.: A comprehensive empirical evaluation of missing value imputation in noisy software measurement data. J. Syst. Softw. 81(5), 691–708 (2008)CrossRef Van Hulse, J., Khoshgoftaar, T.M.: A comprehensive empirical evaluation of missing value imputation in noisy software measurement data. J. Syst. Softw. 81(5), 691–708 (2008)CrossRef
17.
go back to reference Shepperd, M., MacDonell, S.: Evaluating prediction systems in software project estimation. Inf. Softw. Technol. 54(8), 820–827 (2012)CrossRef Shepperd, M., MacDonell, S.: Evaluating prediction systems in software project estimation. Inf. Softw. Technol. 54(8), 820–827 (2012)CrossRef
18.
go back to reference Kitchenham, B.A., et al. What accuracy statistics really measure [software estimation]. In: Proceedings in Software, IEE (2001)CrossRef Kitchenham, B.A., et al. What accuracy statistics really measure [software estimation]. In: Proceedings in Software, IEE (2001)CrossRef
19.
go back to reference Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH
20.
go back to reference Kampenes, V.B., et al.: A systematic review of effect size in software engineering experiments. Inf. Softw. Technol. 49(11), 1073–1086 (2007)CrossRef Kampenes, V.B., et al.: A systematic review of effect size in software engineering experiments. Inf. Softw. Technol. 49(11), 1073–1086 (2007)CrossRef
21.
go back to reference Rosenthal, R.: Parametric measures of effect size. In: Cooper, H., Hedges, L.V., Valentine, J.C. (eds.) The Handbook of Research Synthesis, pp. 231–244. Russell Sage Foundation, New York (1994) Rosenthal, R.: Parametric measures of effect size. In: Cooper, H., Hedges, L.V., Valentine, J.C. (eds.) The Handbook of Research Synthesis, pp. 231–244. Russell Sage Foundation, New York (1994)
22.
go back to reference Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Academic Press, Hillsdale (1988)MATH Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Academic Press, Hillsdale (1988)MATH
Metadata
Title
Feature selection for software effort estimation with localized neighborhood mutual information
Authors
Qin Liu
Jiakai Xiao
Hongming Zhu
Publication date
19-02-2018
Publisher
Springer US
Published in
Cluster Computing / Issue Special Issue 3/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-018-1884-x

Other articles of this Special Issue 3/2019

Cluster Computing 3/2019 Go to the issue

Premium Partner