Skip to main content
Top
Published in: Software Quality Journal 2/2018

07-11-2016

Coherence of comments and method implementations: a dataset and an empirical investigation

Authors: Anna Corazza, Valerio Maggio, Giuseppe Scanniello

Published in: Software Quality Journal | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we present the results of a manual assessment on the coherence between the comments and the implementation of 3636 methods in three open source software applications (for one of these applications, we considered two different subsequent versions) implemented in Java. The results of this assessment have been collected in a dataset we made publicly available on the Web. The creation of this dataset is based on a protocol that is detailed in this paper. We present that protocol to let researchers evaluate the goodness of our dataset and to ease its future possible extensions. Another contribution of this paper consists in preliminarily investigating on the effectiveness of adopting a Vector Space Model (VSM) with the tf-idf schema to discriminate coherent and non-coherent methods. We observed that the lexical similarity alone is not sufficient for this distinction, while encouraging results have been obtained by applying an Support Vector Machine (SVM) classifier on the whole vector space.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
1 The comment right before/after of the definition of a method, a class, abstract class and so on.
 
3
3 In our case, an annotator is a person that produces annotations to software associating coherence information to methods.
 
10
10 Such approach is usually referred to as macroaveraging (Manning et al. 2008).
 
Literature
go back to reference Antoniol, G., Canfora, G., Casazza, G., & De Lucia, A. (2000). Information retrieval models for recovering traceability links between code and documentation. In Proceedings of the international conference on software maintenance (pp. 40–51): IEEE Computer Society. Antoniol, G., Canfora, G., Casazza, G., & De Lucia, A. (2000). Information retrieval models for recovering traceability links between code and documentation. In Proceedings of the international conference on software maintenance (pp. 40–51): IEEE Computer Society.
go back to reference Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.MathSciNetMATH Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.MathSciNetMATH
go back to reference Binkley, D., Lawrie, D., Pollock, L., Hill, E., & Vijay-Shanker, K. (2013). A dataset for evaluating identifier splitters, IEEE Computer Society. Binkley, D., Lawrie, D., Pollock, L., Hill, E., & Vijay-Shanker, K. (2013). A dataset for evaluating identifier splitters, IEEE Computer Society.
go back to reference Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics), Springer-Verlag New York, Inc., Secaucus. Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics), Springer-Verlag New York, Inc., Secaucus.
go back to reference Campbell, I., & Yiming, Y. (2011). Learning with support vector machines, Morgan and Claypool. Campbell, I., & Yiming, Y. (2011). Learning with support vector machines, Morgan and Claypool.
go back to reference Caprile, B., & Tonella, P. (2000). Restructuring program identifier names. In Proceedings of international conference on software maintenance (pp. 97–107): IEEE Computer Society. Caprile, B., & Tonella, P. (2000). Restructuring program identifier names. In Proceedings of international conference on software maintenance (pp. 97–107): IEEE Computer Society.
go back to reference Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.CrossRef Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.CrossRef
go back to reference Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.CrossRef Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.CrossRef
go back to reference Corazza, A., Di Martino, S., & Maggio, V. (2012). LINSEN: an efficient approach to split identifiers and expand abbreviations. In Proceedings of international conference on software maintenance (pp. 233–242): IEEE Computer Society. Corazza, A., Di Martino, S., & Maggio, V. (2012). LINSEN: an efficient approach to split identifiers and expand abbreviations. In Proceedings of international conference on software maintenance (pp. 233–242): IEEE Computer Society.
go back to reference Corazza, A., Di Martino, S., Maggio, V., & Scanniello, G. (2011). Investigating the use of lexical information for software system clustering. In Proceedings of European conference on software maintenance and reengineering (pp. 35–44): IEEE Computer Society. Corazza, A., Di Martino, S., Maggio, V., & Scanniello, G. (2011). Investigating the use of lexical information for software system clustering. In Proceedings of European conference on software maintenance and reengineering (pp. 35–44): IEEE Computer Society.
go back to reference Corazza, A., Maggio, V., & Scanniello, G. (2015). On the coherence between comments and implementations in source code. In Proceedings of EUROMICRO conference on software engineering and advanced applications (pp. 76–83): IEEE Computer Society. Corazza, A., Maggio, V., & Scanniello, G. (2015). On the coherence between comments and implementations in source code. In Proceedings of EUROMICRO conference on software engineering and advanced applications (pp. 76–83): IEEE Computer Society.
go back to reference de Souza, S. C. B., Anquetil, N., & de Oliveira, K. M. (2005). A study of the documentation essential to software maintenance. In Proceedings of the international conference on design of communication: documenting & designing for pervasive information (pp. 68–75): ACM. de Souza, S. C. B., Anquetil, N., & de Oliveira, K. M. (2005). A study of the documentation essential to software maintenance. In Proceedings of the international conference on design of communication: documenting & designing for pervasive information (pp. 68–75): ACM.
go back to reference DeLine, R., Khella, A., Czerwinski, M., & Robertson, G. (2005). Towards understanding programs through wear-based filtering. In Proceedings of the 2005 ACM symposium on Software visualization, SoftVis ’05 (pp. 183–192): ACM. DeLine, R., Khella, A., Czerwinski, M., & Robertson, G. (2005). Towards understanding programs through wear-based filtering. In Proceedings of the 2005 ACM symposium on Software visualization, SoftVis ’05 (pp. 183–192): ACM.
go back to reference Dit, B., Revelle, M., Gethers, M., & Poshyvanyk, D. (2013). Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process, 25 (1), 53–95. Dit, B., Revelle, M., Gethers, M., & Poshyvanyk, D. (2013). Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process, 25 (1), 53–95.
go back to reference Fluri, B., Wursch, M., & Gall, H. (2007). Do code and comments co-evolve? on the relation between source code and comment changes. In Proceedings of the working conference on reverse engineering (pp. 70–79): IEEE Computer Society. Fluri, B., Wursch, M., & Gall, H. (2007). Do code and comments co-evolve? on the relation between source code and comment changes. In Proceedings of the working conference on reverse engineering (pp. 70–79): IEEE Computer Society.
go back to reference Fowler, M. (1999). Refactoring: improving the design of existing code. Boston: Addison-Wesley Longman Publishing Co., Inc.MATH Fowler, M. (1999). Refactoring: improving the design of existing code. Boston: Addison-Wesley Longman Publishing Co., Inc.MATH
go back to reference Freund, R. J., & Wilson, W. J. (2003). Statistical methods, 2nd edn. Academic Press. Freund, R. J., & Wilson, W. J. (2003). Statistical methods, 2nd edn. Academic Press.
go back to reference Jiang, Z. M., & Hassan, A. E. (2006). Examining the evolution of code comments in postgresql. In Diehl, S., Gall, H., & Hassan, A. E. (Eds.) Proceedings of mining software repositories (pp. 179–180. ACM). Jiang, Z. M., & Hassan, A. E. (2006). Examining the evolution of code comments in postgresql. In Diehl, S., Gall, H., & Hassan, A. E. (Eds.) Proceedings of mining software repositories (pp. 179–180. ACM).
go back to reference Keyes, J. (2002). Software engineering handbook: Taylor & Francis. Keyes, J. (2002). Software engineering handbook: Taylor & Francis.
go back to reference Kuhn, A., Ducasse, S., & Gîrba, T. (2007). Semantic clustering identifying topics in source code. Information & Software Technology, 49(3), 230–243.CrossRef Kuhn, A., Ducasse, S., & Gîrba, T. (2007). Semantic clustering identifying topics in source code. Information & Software Technology, 49(3), 230–243.CrossRef
go back to reference LaToza, T. D., Venolia, G., & DeLine, R. (2006). Maintaining mental models: a study of developer work habits. In Proceedings of the 28th international conference on software engineering, ICSE ’06 (pp. 492–501): ACM. LaToza, T. D., Venolia, G., & DeLine, R. (2006). Maintaining mental models: a study of developer work habits. In Proceedings of the 28th international conference on software engineering, ICSE ’06 (pp. 492–501): ACM.
go back to reference Lawrie, D., Binkley, D., & Morrell, C. (2010). Normalizing source code vocabulary. In Proceedings of working conference on reverse engineering (pp. 3–12): IEEE Computer Society. Lawrie, D., Binkley, D., & Morrell, C. (2010). Normalizing source code vocabulary. In Proceedings of working conference on reverse engineering (pp. 3–12): IEEE Computer Society.
go back to reference Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.CrossRefMATH Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.CrossRefMATH
go back to reference McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., & Xie, Q. (2012). Exemplar: a source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering, 38(5), 1069–1087.CrossRef McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., & Xie, Q. (2012). Exemplar: a source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering, 38(5), 1069–1087.CrossRef
go back to reference Robillard, M. P., Coelho, W., & code, G. C. Murphy. (2004). How effective developers investigate source. An exploratory study. IEEE Transactions on Software Engineering, 30(12), 889–903.CrossRef Robillard, M. P., Coelho, W., & code, G. C. Murphy. (2004). How effective developers investigate source. An exploratory study. IEEE Transactions on Software Engineering, 30(12), 889–903.CrossRef
go back to reference Roehm, T., Tiarks, R., Koschke, R., & Maalej, W. (2012). How do professional developers comprehend software?. In Proceedings of the 2012 international conference on software engineering, ICSE 2012 (pp. 255–265). Piscataway, NJ, USA: IEEE Press. Roehm, T., Tiarks, R., Koschke, R., & Maalej, W. (2012). How do professional developers comprehend software?. In Proceedings of the 2012 international conference on software engineering, ICSE 2012 (pp. 255–265). Piscataway, NJ, USA: IEEE Press.
go back to reference Salviulo, F., & Scanniello, G. (2014). Dealing with identifiers and comments in source code comprehension and maintenance: Results from an ethnographically-informed study with students and professionals. In Proceedings of International Conference on Evaluation and Assessment in Software Engineering (pp. 423–432): ACM Press. Salviulo, F., & Scanniello, G. (2014). Dealing with identifiers and comments in source code comprehension and maintenance: Results from an ethnographically-informed study with students and professionals. In Proceedings of International Conference on Evaluation and Assessment in Software Engineering (pp. 423–432): ACM Press.
go back to reference Scanniello, G., Marcus, A., & Pascale, D. (2015). Link analysis algorithms for static concept location: an empirical assessment. Empirical Software Engineering, 20 (6), 1666–1720.CrossRef Scanniello, G., Marcus, A., & Pascale, D. (2015). Link analysis algorithms for static concept location: an empirical assessment. Empirical Software Engineering, 20 (6), 1666–1720.CrossRef
go back to reference Singer, J., Lethbridge, T., Vinson, N., & Anquetil, N. (1997). An examination of software engineering work practices. In Proceedings of the conference of the centre for advanced studies on collaborative research (p. 21): IBM Press. Singer, J., Lethbridge, T., Vinson, N., & Anquetil, N. (1997). An examination of software engineering work practices. In Proceedings of the conference of the centre for advanced studies on collaborative research (p. 21): IBM Press.
go back to reference Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, 10(5), 595–609.CrossRef Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, 10(5), 595–609.CrossRef
go back to reference Steidl, D., Hummel, B., & Jürgens, E. (2013). Quality analysis of source code comments. In Proceedings of international conference on program comprehension (pp. 83–92): IEEE Computer Society. Steidl, D., Hummel, B., & Jürgens, E. (2013). Quality analysis of source code comments. In Proceedings of international conference on program comprehension (pp. 83–92): IEEE Computer Society.
go back to reference Tan, L., Yuan, D., Krishna, G., & Zhou, Y. (2007). iComment: Bugs or bad comments? ACM. Tan, L., Yuan, D., Krishna, G., & Zhou, Y. (2007). iComment: Bugs or bad comments? ACM.
go back to reference Tan, S. H., Marinov, D., Tan, L., & Leavens, G. T. (2012). @tcomment: Testing javadoc comments to detect comment-code inconsistencies. In Proceedings of international conference on software testing (pp. 260–269): IEEE Computer Society. Tan, S. H., Marinov, D., Tan, L., & Leavens, G. T. (2012). @tcomment: Testing javadoc comments to detect comment-code inconsistencies. In Proceedings of international conference on software testing (pp. 260–269): IEEE Computer Society.
go back to reference Van Der Maaten, L. (2014). Accelerating t-sne using tree-based algorithms. Journal of Machine Learning Research, 15(1), 3221–3245.MathSciNetMATH Van Der Maaten, L. (2014). Accelerating t-sne using tree-based algorithms. Journal of Machine Learning Research, 15(1), 3221–3245.MathSciNetMATH
go back to reference Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., & Wesslén, A. (2012). Experimentation in software engineering. Computer science: Springer. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., & Wesslén, A. (2012). Experimentation in software engineering. Computer science: Springer.
Metadata
Title
Coherence of comments and method implementations: a dataset and an empirical investigation
Authors
Anna Corazza
Valerio Maggio
Giuseppe Scanniello
Publication date
07-11-2016
Publisher
Springer US
Published in
Software Quality Journal / Issue 2/2018
Print ISSN: 0963-9314
Electronic ISSN: 1573-1367
DOI
https://doi.org/10.1007/s11219-016-9347-1

Other articles of this Issue 2/2018

Software Quality Journal 2/2018 Go to the issue

Premium Partner