Skip to main content
Erschienen in: Software Quality Journal 2/2018

07.11.2016

Coherence of comments and method implementations: a dataset and an empirical investigation

verfasst von: Anna Corazza, Valerio Maggio, Giuseppe Scanniello

Erschienen in: Software Quality Journal | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we present the results of a manual assessment on the coherence between the comments and the implementation of 3636 methods in three open source software applications (for one of these applications, we considered two different subsequent versions) implemented in Java. The results of this assessment have been collected in a dataset we made publicly available on the Web. The creation of this dataset is based on a protocol that is detailed in this paper. We present that protocol to let researchers evaluate the goodness of our dataset and to ease its future possible extensions. Another contribution of this paper consists in preliminarily investigating on the effectiveness of adopting a Vector Space Model (VSM) with the tf-idf schema to discriminate coherent and non-coherent methods. We observed that the lexical similarity alone is not sufficient for this distinction, while encouraging results have been obtained by applying an Support Vector Machine (SVM) classifier on the whole vector space.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
1 The comment right before/after of the definition of a method, a class, abstract class and so on.
 
3
3 In our case, an annotator is a person that produces annotations to software associating coherence information to methods.
 
10
10 Such approach is usually referred to as macroaveraging (Manning et al. 2008).
 
Literatur
Zurück zum Zitat Antoniol, G., Canfora, G., Casazza, G., & De Lucia, A. (2000). Information retrieval models for recovering traceability links between code and documentation. In Proceedings of the international conference on software maintenance (pp. 40–51): IEEE Computer Society. Antoniol, G., Canfora, G., Casazza, G., & De Lucia, A. (2000). Information retrieval models for recovering traceability links between code and documentation. In Proceedings of the international conference on software maintenance (pp. 40–51): IEEE Computer Society.
Zurück zum Zitat Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.MathSciNetMATH Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.MathSciNetMATH
Zurück zum Zitat Binkley, D., Lawrie, D., Pollock, L., Hill, E., & Vijay-Shanker, K. (2013). A dataset for evaluating identifier splitters, IEEE Computer Society. Binkley, D., Lawrie, D., Pollock, L., Hill, E., & Vijay-Shanker, K. (2013). A dataset for evaluating identifier splitters, IEEE Computer Society.
Zurück zum Zitat Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics), Springer-Verlag New York, Inc., Secaucus. Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics), Springer-Verlag New York, Inc., Secaucus.
Zurück zum Zitat Campbell, I., & Yiming, Y. (2011). Learning with support vector machines, Morgan and Claypool. Campbell, I., & Yiming, Y. (2011). Learning with support vector machines, Morgan and Claypool.
Zurück zum Zitat Caprile, B., & Tonella, P. (2000). Restructuring program identifier names. In Proceedings of international conference on software maintenance (pp. 97–107): IEEE Computer Society. Caprile, B., & Tonella, P. (2000). Restructuring program identifier names. In Proceedings of international conference on software maintenance (pp. 97–107): IEEE Computer Society.
Zurück zum Zitat Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.CrossRef Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.CrossRef
Zurück zum Zitat Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.CrossRef Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.CrossRef
Zurück zum Zitat Corazza, A., Di Martino, S., & Maggio, V. (2012). LINSEN: an efficient approach to split identifiers and expand abbreviations. In Proceedings of international conference on software maintenance (pp. 233–242): IEEE Computer Society. Corazza, A., Di Martino, S., & Maggio, V. (2012). LINSEN: an efficient approach to split identifiers and expand abbreviations. In Proceedings of international conference on software maintenance (pp. 233–242): IEEE Computer Society.
Zurück zum Zitat Corazza, A., Di Martino, S., Maggio, V., & Scanniello, G. (2011). Investigating the use of lexical information for software system clustering. In Proceedings of European conference on software maintenance and reengineering (pp. 35–44): IEEE Computer Society. Corazza, A., Di Martino, S., Maggio, V., & Scanniello, G. (2011). Investigating the use of lexical information for software system clustering. In Proceedings of European conference on software maintenance and reengineering (pp. 35–44): IEEE Computer Society.
Zurück zum Zitat Corazza, A., Maggio, V., & Scanniello, G. (2015). On the coherence between comments and implementations in source code. In Proceedings of EUROMICRO conference on software engineering and advanced applications (pp. 76–83): IEEE Computer Society. Corazza, A., Maggio, V., & Scanniello, G. (2015). On the coherence between comments and implementations in source code. In Proceedings of EUROMICRO conference on software engineering and advanced applications (pp. 76–83): IEEE Computer Society.
Zurück zum Zitat de Souza, S. C. B., Anquetil, N., & de Oliveira, K. M. (2005). A study of the documentation essential to software maintenance. In Proceedings of the international conference on design of communication: documenting & designing for pervasive information (pp. 68–75): ACM. de Souza, S. C. B., Anquetil, N., & de Oliveira, K. M. (2005). A study of the documentation essential to software maintenance. In Proceedings of the international conference on design of communication: documenting & designing for pervasive information (pp. 68–75): ACM.
Zurück zum Zitat DeLine, R., Khella, A., Czerwinski, M., & Robertson, G. (2005). Towards understanding programs through wear-based filtering. In Proceedings of the 2005 ACM symposium on Software visualization, SoftVis ’05 (pp. 183–192): ACM. DeLine, R., Khella, A., Czerwinski, M., & Robertson, G. (2005). Towards understanding programs through wear-based filtering. In Proceedings of the 2005 ACM symposium on Software visualization, SoftVis ’05 (pp. 183–192): ACM.
Zurück zum Zitat Dit, B., Revelle, M., Gethers, M., & Poshyvanyk, D. (2013). Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process, 25 (1), 53–95. Dit, B., Revelle, M., Gethers, M., & Poshyvanyk, D. (2013). Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process, 25 (1), 53–95.
Zurück zum Zitat Fluri, B., Wursch, M., & Gall, H. (2007). Do code and comments co-evolve? on the relation between source code and comment changes. In Proceedings of the working conference on reverse engineering (pp. 70–79): IEEE Computer Society. Fluri, B., Wursch, M., & Gall, H. (2007). Do code and comments co-evolve? on the relation between source code and comment changes. In Proceedings of the working conference on reverse engineering (pp. 70–79): IEEE Computer Society.
Zurück zum Zitat Fowler, M. (1999). Refactoring: improving the design of existing code. Boston: Addison-Wesley Longman Publishing Co., Inc.MATH Fowler, M. (1999). Refactoring: improving the design of existing code. Boston: Addison-Wesley Longman Publishing Co., Inc.MATH
Zurück zum Zitat Freund, R. J., & Wilson, W. J. (2003). Statistical methods, 2nd edn. Academic Press. Freund, R. J., & Wilson, W. J. (2003). Statistical methods, 2nd edn. Academic Press.
Zurück zum Zitat Jiang, Z. M., & Hassan, A. E. (2006). Examining the evolution of code comments in postgresql. In Diehl, S., Gall, H., & Hassan, A. E. (Eds.) Proceedings of mining software repositories (pp. 179–180. ACM). Jiang, Z. M., & Hassan, A. E. (2006). Examining the evolution of code comments in postgresql. In Diehl, S., Gall, H., & Hassan, A. E. (Eds.) Proceedings of mining software repositories (pp. 179–180. ACM).
Zurück zum Zitat Keyes, J. (2002). Software engineering handbook: Taylor & Francis. Keyes, J. (2002). Software engineering handbook: Taylor & Francis.
Zurück zum Zitat Kuhn, A., Ducasse, S., & Gîrba, T. (2007). Semantic clustering identifying topics in source code. Information & Software Technology, 49(3), 230–243.CrossRef Kuhn, A., Ducasse, S., & Gîrba, T. (2007). Semantic clustering identifying topics in source code. Information & Software Technology, 49(3), 230–243.CrossRef
Zurück zum Zitat LaToza, T. D., Venolia, G., & DeLine, R. (2006). Maintaining mental models: a study of developer work habits. In Proceedings of the 28th international conference on software engineering, ICSE ’06 (pp. 492–501): ACM. LaToza, T. D., Venolia, G., & DeLine, R. (2006). Maintaining mental models: a study of developer work habits. In Proceedings of the 28th international conference on software engineering, ICSE ’06 (pp. 492–501): ACM.
Zurück zum Zitat Lawrie, D., Binkley, D., & Morrell, C. (2010). Normalizing source code vocabulary. In Proceedings of working conference on reverse engineering (pp. 3–12): IEEE Computer Society. Lawrie, D., Binkley, D., & Morrell, C. (2010). Normalizing source code vocabulary. In Proceedings of working conference on reverse engineering (pp. 3–12): IEEE Computer Society.
Zurück zum Zitat Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.CrossRefMATH Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.CrossRefMATH
Zurück zum Zitat McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., & Xie, Q. (2012). Exemplar: a source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering, 38(5), 1069–1087.CrossRef McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., & Xie, Q. (2012). Exemplar: a source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering, 38(5), 1069–1087.CrossRef
Zurück zum Zitat Robillard, M. P., Coelho, W., & code, G. C. Murphy. (2004). How effective developers investigate source. An exploratory study. IEEE Transactions on Software Engineering, 30(12), 889–903.CrossRef Robillard, M. P., Coelho, W., & code, G. C. Murphy. (2004). How effective developers investigate source. An exploratory study. IEEE Transactions on Software Engineering, 30(12), 889–903.CrossRef
Zurück zum Zitat Roehm, T., Tiarks, R., Koschke, R., & Maalej, W. (2012). How do professional developers comprehend software?. In Proceedings of the 2012 international conference on software engineering, ICSE 2012 (pp. 255–265). Piscataway, NJ, USA: IEEE Press. Roehm, T., Tiarks, R., Koschke, R., & Maalej, W. (2012). How do professional developers comprehend software?. In Proceedings of the 2012 international conference on software engineering, ICSE 2012 (pp. 255–265). Piscataway, NJ, USA: IEEE Press.
Zurück zum Zitat Salviulo, F., & Scanniello, G. (2014). Dealing with identifiers and comments in source code comprehension and maintenance: Results from an ethnographically-informed study with students and professionals. In Proceedings of International Conference on Evaluation and Assessment in Software Engineering (pp. 423–432): ACM Press. Salviulo, F., & Scanniello, G. (2014). Dealing with identifiers and comments in source code comprehension and maintenance: Results from an ethnographically-informed study with students and professionals. In Proceedings of International Conference on Evaluation and Assessment in Software Engineering (pp. 423–432): ACM Press.
Zurück zum Zitat Scanniello, G., Marcus, A., & Pascale, D. (2015). Link analysis algorithms for static concept location: an empirical assessment. Empirical Software Engineering, 20 (6), 1666–1720.CrossRef Scanniello, G., Marcus, A., & Pascale, D. (2015). Link analysis algorithms for static concept location: an empirical assessment. Empirical Software Engineering, 20 (6), 1666–1720.CrossRef
Zurück zum Zitat Singer, J., Lethbridge, T., Vinson, N., & Anquetil, N. (1997). An examination of software engineering work practices. In Proceedings of the conference of the centre for advanced studies on collaborative research (p. 21): IBM Press. Singer, J., Lethbridge, T., Vinson, N., & Anquetil, N. (1997). An examination of software engineering work practices. In Proceedings of the conference of the centre for advanced studies on collaborative research (p. 21): IBM Press.
Zurück zum Zitat Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, 10(5), 595–609.CrossRef Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, 10(5), 595–609.CrossRef
Zurück zum Zitat Steidl, D., Hummel, B., & Jürgens, E. (2013). Quality analysis of source code comments. In Proceedings of international conference on program comprehension (pp. 83–92): IEEE Computer Society. Steidl, D., Hummel, B., & Jürgens, E. (2013). Quality analysis of source code comments. In Proceedings of international conference on program comprehension (pp. 83–92): IEEE Computer Society.
Zurück zum Zitat Tan, L., Yuan, D., Krishna, G., & Zhou, Y. (2007). iComment: Bugs or bad comments? ACM. Tan, L., Yuan, D., Krishna, G., & Zhou, Y. (2007). iComment: Bugs or bad comments? ACM.
Zurück zum Zitat Tan, S. H., Marinov, D., Tan, L., & Leavens, G. T. (2012). @tcomment: Testing javadoc comments to detect comment-code inconsistencies. In Proceedings of international conference on software testing (pp. 260–269): IEEE Computer Society. Tan, S. H., Marinov, D., Tan, L., & Leavens, G. T. (2012). @tcomment: Testing javadoc comments to detect comment-code inconsistencies. In Proceedings of international conference on software testing (pp. 260–269): IEEE Computer Society.
Zurück zum Zitat Van Der Maaten, L. (2014). Accelerating t-sne using tree-based algorithms. Journal of Machine Learning Research, 15(1), 3221–3245.MathSciNetMATH Van Der Maaten, L. (2014). Accelerating t-sne using tree-based algorithms. Journal of Machine Learning Research, 15(1), 3221–3245.MathSciNetMATH
Zurück zum Zitat Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., & Wesslén, A. (2012). Experimentation in software engineering. Computer science: Springer. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., & Wesslén, A. (2012). Experimentation in software engineering. Computer science: Springer.
Metadaten
Titel
Coherence of comments and method implementations: a dataset and an empirical investigation
verfasst von
Anna Corazza
Valerio Maggio
Giuseppe Scanniello
Publikationsdatum
07.11.2016
Verlag
Springer US
Erschienen in
Software Quality Journal / Ausgabe 2/2018
Print ISSN: 0963-9314
Elektronische ISSN: 1573-1367
DOI
https://doi.org/10.1007/s11219-016-9347-1

Weitere Artikel der Ausgabe 2/2018

Software Quality Journal 2/2018 Zur Ausgabe

OriginalPaper

In this issue

Premium Partner