nach oben

Empirical Software Engineering

Erschienen in:

19.02.2020

Deriving a usage-independent software quality metric

verfasst von: Tapajit Dey, Audris Mockus

Erschienen in: Empirical Software Engineering | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Context

The extent of post-release use of software affects the number of faults, thus biasing quality metrics and adversely affecting associated decisions. The proprietary nature of usage data limited deeper exploration of this subject in the past.

Objective

To determine how software faults and software use are related and how, based on that, an accurate quality measure can be designed.

Method

Via Google Analytics we measure new users, usage intensity, usage frequency, exceptions, and release date and duration for complex proprietary mobile applications for Android and iOS. We utilize Bayesian Network and Random Forest models to explain the interrelationships and to derive the usage independent release quality measure. To increase external validity, we also investigate the interrelationship among various code complexity measures, usage (downloads), and number of issues for 520 NPM packages. We derived a usage-independent quality measure from these analyses, and applied it on 4430 popular NPM packages to construct timelines for comparing the perceived quality (number of issues) and our derived measure of quality during the lifetime of these packages.

Results

We found the number of new users to be the primary factor determining the number of exceptions, and found no direct link between the intensity and frequency of software usage and software faults. Crashes increased with the power of 1.02-1.04 of new user for the Android app and power of 1.6 for the iOS app. Release quality expressed as crashes per user was independent of other usage-related predictors, thus serving as a usage independent measure of software quality. Usage also affected quality in NPM, where downloads were strongly associated with numbers of issues, even after taking the other code complexity measures into consideration. Unlike in mobile case where exceptions per user decrease over time, for 45.8% of the NPM packages the number of issues per download increase.

Conclusions

We expect our result and our proposed quality measure will help gauge release quality of a software more accurately and inspire further research in this area.

Vorheriger Artikel Cross-version defect prediction: use historical data, cross-project data, or both?

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

In fact, we mean to measure one aspect of the quality of software

https://support.avaya.com/products/P1574/avaya-equinox-for-android

https://support.avaya.com/products/P0949/avaya-onex-mobile-sip-for-ios

https://www.npmjs.com/package/escomplex#result-format

A generative model specifies a joint probability distribution over all observed variables, whereas a discriminative model (like the ones obtained from regression or decision trees) provides a model only for the target variable(s) conditional on the predictor variables. Thus, while a discriminative model allows only sampling of the target variables conditional on the predictors, a generative model can be used, for example, to simulate (i.e. generate) values of any variable in the model, and consequently, to gain an understanding of the underlying mechanics of a system, generative models are essential.

Hartemink’s pairwise mutual information method (Hartemink 2001).

One extra / missing / reversed edge

Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? An empirical case study on npm. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 385–395

Alain H, Buehlmann P (2012) Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. J Mach Learn Res 13:2409–2464. http://jmlr.org/papers/v13/hauser12a.html MathSciNetMATH

Amreen S, Bichescu B, Bradley R, Dey T, Ma Y, Mockus A, Mousavi S, Zaretzki R (2019) A methodology for measuring FLOSS ecosystems. Springer, SingaporeCrossRef

Balov N, Salzman P (2016) catnet: categorical Bayesian network inference. https://CRAN.R-project.org/package=catnet. R package version 1.15.0

Boehm BW, Brown JR, Lipow M (1976) Quantitative evaluation of software quality. In: Proceedings of the 2nd international conference on software engineering. IEEE Computer Society Press, pp 592–605

Borges H, Hora A, Valente MT (2016) Understanding the factors that impact the popularity of github repositories. In: 2016 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 334–344

Bottcher SG, Dethlefsen C (2013) deal: learning Bayesian networks with mixed variables. https://CRAN.R-project.org/package=deal. R package version 1.2-37

Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Software 51(3):245–273CrossRef

Chatzidimitriou KC, Papamichail MD, Diamantopoulos T, Tsapanos M, Symeonidis AL (2018) npm-miner: an infrastructure for measuring the quality of the npm registry. In: Proceedings of the 15th international conference on mining software repositories. ACM, pp 42–45

Chickering DM (1996) Learning bayesian networks is np-complete. Learning from data: Artificial intelligence and statistics V 112:121–130MathSciNetCrossRef

Chlebus BS, Nguyen SH (1998) On finding optimal discretizations for two attributes. In: International conference on rough sets and current trends in computing. Springer, pp 537–544

Dalal SR, Mallows CL (1988) When should one stop testing software? J Am Stat Assoc 83(403):872–879MathSciNetCrossRef

David (2014) https://developers.slashdot.org/story/17/01/14/0222245/nodejss-npm-is-now-the-largest-package-registry-in-the-world

Dey T, Mockus A (2018a) Are software dependency supply chain metrics useful in predicting change of popularity of npm packages?. In: Proceedings of the 14th international conference on predictive models and data analytics in software engineering (pp. 66–69). ACM

Dey T, Mockus A (2018b) Modeling relationship between post-release faults and usage in mobile software. In: Proceedings of the 14th international conference on predictive models and data analytics in software engineering. ACM, pp 56–65

Dey T, Ma Y, Mockus A (2019) Patterns of effort contribution and demand and user classification based on participation patterns in npm ecosystem. In: Proceedings of the fifteenth international conference on predictive models and data analytics in software engineering (pp. 36–45). ACM

Duc AN, Mockus A, Hackbarth R, Palframan J (2014) Forking and coordination in multi-platform development: a case study. In: ESEM, Torino, pp 59:1–59:10. http://dl.acm.org/authorize?N14215

Fenton N, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689CrossRef

Fenton N, Krause P, Neil M (2002) Software measurement: uncertainty and causal modeling. IEEE Softw 19(4):116–122CrossRef

Fenton N, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007) Predicting software defects in varying development lifecycles using Bayesian nets. Inf Softw Technol 49(1):32–43CrossRef

Fenton N, Neil M, Marquez D (2008) Using bayesian networks to predict software defects and reliability. Proceedings of the Institution of Mechanical Engineers Part O: Journal of Risk and Reliability 222(4):701–712

Friedman N, Goldszmidt M, Wyner A (1999) Data analysis with Bayesian networks: a bootstrap approach. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc, pp 196–205

Geron T (2012) Do ios apps crash more than android apps? A data dive. https://www.forbes.com/sites/tomiogeron/2012/02/02/does-ios-crash-more-than-android-a-data-dive

Hackbarth R, Mockus A, Palframan J, Sethi R (2016a) Customer quality improvement of software systems. Softw IEEE 33(4):40–45. papers/cqm2.pdf CrossRef

Hackbarth R, Mockus A, Palframan J, Sethi R (2016b) Improving software quality as customers perceive it. IEEE Softw 33(4):40–45CrossRef

Hahsler M, Chelluboina S, Hornik K, Buchta C (2011) The arules r-package ecosystem: analyzing interesting patterns from large transaction datasets. J Mach Learn Res 12:1977–1981. http://jmlr.csail.mit.edu/papers/v12/hahsler11a.html MathSciNetMATH

Hartemink AJ (2001) Principled computational methods for the validation and discovery of genetic regulatory networks. Ph.D. thesis Massachusetts Institute of Technology

Herbsleb JD, Mockus A (2003) An empirical study of speed and communication in globally-distributed software development. IEEE Trans Softw Eng 29(6):481–494. papers/delay.pdf CrossRef

Jones C (2011) Software quality in 2011: a survey of the state of the art. http://sqgne.org/presentations/2011-12/Jones-Sep-2011.pdf. President, Namcook Analytics LLC, www.Namcook.com Email: Capers.Jones3@GMAILcom

Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P (2012) Causal inference using graphical models with the R package pcalg. J Stat Softw 47(11):1–26. http://www.jstatsoft.org/v47/i11/ CrossRef

Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773. http://doi.ieeecomputersociety.org/10.1109/TSE.2012.70 CrossRef

Kan SH (2002) Metrics and models in software quality engineering. Addison-Wesley Longman Publishing Co. Inc

Kenny GQ (1993) Estimating defects in commercial software during operational use. IEEE Trans Reliab 42(1):107–115CrossRef

Khomh F, Dhaliwal T, Zou Y, Adams B (2012) Do faster releases improve software quality?: An empirical case study of mozilla firefox. In: Proceedings of the 9th IEEE Working conference on mining software repositories. IEEE Press, pp 179–188

Kitchenham B, Pfleeger SL (1996) Software quality: the elusive target [special issues section]. IEEE Softw 13(1):12–21CrossRef

Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press

Kononenko O, Baysal O, Guerrouj L, Cao Y, Godfrey MW (2015) Investigating code review quality: do people and participation matter?. In: 2015 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 111–120

Li PL, Kivett R, Zhan Z, Jeon Se, Nagappan N, Murphy B, Ko AJ (2011) Characterizing the differences between pre-and post-release versions of software. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 716–725

Scutari M (2010) Learning Bayesian networks with the bnlearn r package. J Stat Softw 35(3):1–22. http://www.jstatsoft.org/v35/i03/ CrossRef

McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality: a case study of the qt, vtk, and itk projects. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 192–201

Mcintosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empirical Softw Engg 21(5):2146–2189. https://doi.org/10.1007/s10664-015-9381-9 CrossRef

Mockus A (2007) Software support tools and experimental work. In: Basili V et al. (eds) Empirical software engineering issues: critical assessments and future directions, vol LNCS 4336. Springer, pp 91–99. papers/SSTaEW.pdf

Mockus A (2013) Law of minor release: more bugs implies better software quality. http://mockus.org/papers/IWPSE13.pdf. International Workshop on Principles of Software Evolution, St Petersburg, Russia, Aug 18-19 2013. Keynote

Mockus A (2014) Engineering big data solutions. In: ICSE’14 FOSE, pp 85–99. http://dl.acm.org/authorize?N14216

Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Tech J 5(2):169–180. papers/bltj13.pdf CrossRef

Mockus A, Weiss D (2008a) Interval quality: relating customer-perceived quality to process quality. In: 2008 International conference on software engineering. ACM Press, Leipzig, pp 733–740. http://dl.acm.org/authorize?063910

Mockus A, Weiss D (2008b) Interval quality: relating customer-perceived quality to process quality. In: Proceedings of the 30th international conference on software engineering. ACM, pp 723–732

Mockus A, Zhang P, Li P (2005) Drivers for customer perceived software quality. In: ICSE 2005. ACM Press, St Louis, pp 225–233. http://dl.acm.org/authorize?860140

Mockus A, Zhang P, Li PL (2005) Predictors of customer perceived software quality. In: 27th International conference on software engineering, 2005. ICSE 2005. Proceedings. IEEE, pp 225–233

Mockus A, Hackbarth R, Palframan J (2013) Risky files: an approach to focus quality improvement effort. In: 9th Joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 691–694. http://dl.acm.org/authorize?6845890

Motulsky H When is r squared negative? Cross validated. https://stats.stackexchange.com/q/12991 (version: 2014-05-06)

Nagarajan R, Scutari M, Lèbre S (2013) Bayesian networks in r, vol 122. Springer, pp 125–127

Neil M, Fenton N (1996) Predicting software quality using Bayesian belief networks. In: Proceedings of the 21st annual software engineering workshop. NASA Goddard Space Flight Centre, pp 217–230

Okutan A, Yıldız OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181CrossRef

Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans Softw Eng 33(10):675–686CrossRef

Pearl J (2011) Bayesian networks. Department of Statistics UCLA

Pendharkar PC, Subramanian GH, Rodger JA (2005) A probabilistic model for predicting software development effort. IEEE Trans Softw Eng 31(7):615–624CrossRef

Perez A, Larranaga P, Inza I (2006) Supervised classification with conditional gaussian networks: increasing the structure complexity from naive Bayes. Int J Approx Reason 43(1):1–25MathSciNetCrossRef

R Core Team (2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 202–212

Rotella P, Chulani S (2011) Implementing quality metrics and goals at the corporate level. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 113–122

Rubin J, Rinard M (2016) The challenges of staying together while moving fast: an exploratory study. In: Proceedings of the 38th international conference on software engineering. ACM, pp 982–993

Schulmeyer GG, McManus JI (1992) Handbook of software quality assurance. Van Nostrand Reinhold Co

Scutari M (2013) Learning Bayesian networks in r, an example in systems biology. http://www.bnlearn.com/about/slides/slides-useRconf13.pdf

Scutari M, Strimmer K (2010) Introduction to graphical modelling. arXiv:1005.1036

Shmueli G (2010) To explain or to predict? Stat Sci, 289–310MathSciNetCrossRef

Sober E (2002) Instrumentalism, parsimony, and the akaike framework. Philos Sci 69(S3):S112–S123MathSciNetCrossRef

Stamelos I, Angelis L, Dimou P, Sakellaris E (2003) On the use of Bayesian belief networks for the prediction of software productivity. Inf Softw Technol 45(1):51–60CrossRef

Subramanyam R, Krishnan MS (2003) Empirical analysis of ck metrics for object-oriented design complexity: implications for software defects. IEEE Transactions on software engineering 29(4):297–310CrossRef

Voss L (2014) Numeric precision matters: how npm download counts work. https://blog.npmjs.org/post/92574016600/numeric-precision-matters-how-npm-download-counts

Voss L (2018) The state of javascript frameworks, 2017. https://www.npmjs.com/npm/state-of-javascript-frameworks-2017-part-1

Wittern E, Suter P, Rajagopalan S (2016) A look at the dynamics of the javascript package ecosystem. In: 2016 IEEE/ACM 13th Working conference on mining software repositories (MSR). IEEE, pp 351–361

Yu P, Systa T, Muller H (2002) Predicting fault-proneness using oo metrics. an industrial case study. In: Sixth European conference on software maintenance and reengineering, 2002. Proceedings. IEEE, pp 99–107

Zerouali A, Constantinou E, Mens T, Robles G, González-Barahona J (2018) An empirical analysis of technical lag in npm package dependencies. In: International conference on software reuse. Springer, pp 95–110

Zhang F, Mockus A, Keivanloo I, Zou Y (2015) Towards building a universal defect prediction model with rank transformed predictors. Empir Softw Eng, 1–39

Zheng Q, Mockus A, Zhou M (2015) A method to identify and correct problematic software activity data: exploiting capacity constraints and data redundancies. In: ESEC/FSE’15. ACM, Bergamo, pp 637–648. http://dl.acm.org/authorize?N14200

Titel: Deriving a usage-independent software quality metric
verfasst von: Tapajit Dey
Audris Mockus
Publikationsdatum: 19.02.2020
Verlag: Springer US
Erschienen in: Empirical Software Engineering / Ausgabe 2/2020
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI: https://doi.org/10.1007/s10664-019-09791-w

Springer Professional

Abstract

Context

Objective

Method

Results

Conclusions

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 2/2020

How developers engage with static analysis tools in different contexts

A systemic framework for crowdsourced test report quality assessment

Guest Editorial: Special Issue on Predictive Models and Data Analytics in Software Engineering

CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues

2019 Reviewer Acknowledgment

Predicting software defect type using concept-based classification