Top

Cluster Computing

Published in:

19-02-2018

Feature selection for software effort estimation with localized neighborhood mutual information

Authors: Qin Liu, Jiakai Xiao, Hongming Zhu

Published in: Cluster Computing | Special Issue 3/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Feature selection is usually employed before applying case based reasoning (CBR) for Software Effort Estimation (SEE). Unfortunately, most feature selection methods treat CBR as a black box method so there is no guarantee on the appropriateness of CBR on selected feature subset. The key to solve the problem is to measure the appropriateness of CBR assumption for a given feature set. In this paper, a measure called localized neighborhood mutual information (LNI) is proposed for this purpose and a greedy method called LNI based feature selection (LFS) is designed for feature selection. Experiment with leave-one-out cross validation (LOOCV) on 6 benchmark datasets demonstrates that: (1) CBR makes effective estimation with the LFS selected subset compared with a randomized baseline method. Compared with three representative feature selection methods, (2) LFS achieves optimal MAR value on 3 out of 6 datasets with a 14% average improvement and (3) LFS achieves optimal MMRE on 5 out of 6 datasets with a 24% average improvement.

previous article Artificial bee colony optimization-based weighted extreme learning machine for imbalanced data learning

next article Joint deadline-constrained and influence-aware design for allocating MapReduce jobs in cloud computing systems

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Boehm, B., Abts, C., Chulani, S.: Software development cost estimation approaches: a survey. Ann. Softw. Eng. 10(1–4), 177–205 (2000)CrossRef

Vergara, J.R., Esté, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)CrossRef

Fernandes, S.L., Gurupur, V.P., Sunder, N.R., Arunkumar, N., Kadry, S.: A novel nonintrusive decision support approach for heart rate measurement. Pattern Recognit. Lett. (2017). https://doi.org/10.1016/j.patrec.2017.07.002 CrossRef

Keung, J.W., Kitchenham, B.A., Jeffery, D.R.: Analogy-X: providing statistical inference to analogy-based software cost estimation. IEEE Softw. Eng. Trans. 34(4), 471–484 (2008)CrossRef

Guyon, I., Elisseeff, A.E.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH

Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In; proceedings of the ICML (2003)

Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)CrossRef

Esté, V., Pablo, A., et al.: Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20(2), 189–201 (2009)CrossRef

Liu, H., et al.: Feature selection with dynamic mutual information. Pattern Recogn. 42(7), 1330–1339 (2009)CrossRef

10.

Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRef

11.

Hu, Q., et al.: Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst. Appl. 38(9), 10737–10750 (2011)CrossRef

12.

Hall, M.A.: Correlation-based feature selection for machine learning. The University of Waikato, Hamilton (1999)

13.

Cover, T.M., Thomas, J.A., Kieffer, J.: Elements of information theory. SIAM Rev. 36(3), 509–510 (1994)CrossRef

14.

Arunkumar, N., Kumar, K.R., Venkataraman, V.: Automatic detection of epileptic seizures using new entropy measures. J. Med. Imaging Health Inform 6(3), 724–730 (2016)CrossRef

15.

Menzies, T., Krishna, R., Pryor D.: The promise repository of empirical software engineering data. (2015)

16.

Van Hulse, J., Khoshgoftaar, T.M.: A comprehensive empirical evaluation of missing value imputation in noisy software measurement data. J. Syst. Softw. 81(5), 691–708 (2008)CrossRef

17.

Shepperd, M., MacDonell, S.: Evaluating prediction systems in software project estimation. Inf. Softw. Technol. 54(8), 820–827 (2012)CrossRef

18.

Kitchenham, B.A., et al. What accuracy statistics really measure [software estimation]. In: Proceedings in Software, IEE (2001)CrossRef

19.

Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH

20.

Kampenes, V.B., et al.: A systematic review of effect size in software engineering experiments. Inf. Softw. Technol. 49(11), 1073–1086 (2007)CrossRef

21.

Rosenthal, R.: Parametric measures of effect size. In: Cooper, H., Hedges, L.V., Valentine, J.C. (eds.) The Handbook of Research Synthesis, pp. 231–244. Russell Sage Foundation, New York (1994)

22.

Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Academic Press, Hillsdale (1988)MATH

Title: Feature selection for software effort estimation with localized neighborhood mutual information
Authors: Qin Liu
Jiakai Xiao
Hongming Zhu
Publication date: 19-02-2018
Publisher: Springer US
Published in: Cluster Computing / Issue Special Issue 3/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-018-1884-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Special Issue 3/2019

Adaptive notch filter design for levitation sensing system based on vibration computation

An anomaly detection method based on Lasso

Simulation and real time analysis of network protection tripping strategy based on behavior trees

Partial least squares regression analysis to factor of influence for ecological footprint

Constrained short-term and long-term multi-objective production optimization using general stochastic approximation algorithm

SFabric: a scalable SDN based large layer 2 data center network fabric

Premium Partner