Skip to main content
Top
Published in: Empirical Software Engineering 6/2013

01-12-2013

Automated topic naming

Supporting cross-project analysis of software maintenance activities

Authors: Abram Hindle, Neil A. Ernst, Michael W. Godfrey, John Mylopoulos

Published in: Empirical Software Engineering | Issue 6/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Software repositories provide a deluge of software artifacts to analyze. Researchers have attempted to summarize, categorize, and relate these artifacts by using semi-unsupervised machine-learning algorithms, such as Latent Dirichlet Allocation (LDA). LDA is used for concept and topic analysis to suggest candidate word-lists or topics that describe and relate software artifacts. However, these word-lists and topics are difficult to interpret in the absence of meaningful summary labels. Current attempts to interpret topics assume manual labelling and do not use domain-specific knowledge to improve, contextualize, or describe results for the developers. We propose a solution: automated labelled topic extraction. Topics are extracted using LDA from commit-log comments recovered from source control systems. These topics are given labels from a generalizable cross-project taxonomy, consisting of non-functional requirements. Our approach was evaluated with experiments and case studies on three large-scale Relational Database Management System (RDBMS) projects: MySQL, PostgreSQL and MaxDB. The case studies show that labelled topic extraction can produce appropriate, context-sensitive labels that are relevant to these projects, and provide fresh insight into their evolving software development activities.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
Generated using David A. Wheeler’s SLOCCount, http://​dwheeler.​com/​sloccount.
 
6
Since the MySQL and MaxDB data had poor records for developer ids, we focused on PostgreSQL.
 
Literature
go back to reference Baldi PF, Lopes CV, Linstead EJ, Bajracharya SK (2008) A theory of aspects as latent topics. In: Conference on object oriented programming systems languages and applications, pp 543–562. Nashville Baldi PF, Lopes CV, Linstead EJ, Bajracharya SK (2008) A theory of aspects as latent topics. In: Conference on object oriented programming systems languages and applications, pp  543–562. Nashville
go back to reference Boehm B, Brown JR, Lipow M (1976) Quantitative evaluation of software quality. In: International conference on software engineering, pp 592–605 Boehm B, Brown JR, Lipow M (1976) Quantitative evaluation of software quality. In: International conference on software engineering, pp 592–605
go back to reference Chung L, Nixon BA, Yu ES, Mylopoulos J (1999) Non-functional requirements in software engineering. In: International series in software engineering, vol 5. Kluwer Academic, Boston Chung L, Nixon BA, Yu ES, Mylopoulos J (1999) Non-functional requirements in software engineering. In: International series in software engineering, vol 5. Kluwer Academic, Boston
go back to reference Cleland-Huang J, Settimi R, Zou X, Solc P (2006) The detection and classification of non-functional requirements with application to early aspects. In: International requirements engineering conference, pp 39–48. Minneapolis, Minnesota. doi:10.1109/RE.2006.65 Cleland-Huang J, Settimi R, Zou X, Solc P (2006) The detection and classification of non-functional requirements with application to early aspects. In: International requirements engineering conference, pp 39–48. Minneapolis, Minnesota. doi:10.​1109/​RE.​2006.​65
go back to reference Ernst NA, Mylopoulos J (2010) On the perception of software quality requirements during the project lifecycle. In: International working conference on requirements engineering: foundation for software quality. Essen, Germany Ernst NA, Mylopoulos J (2010) On the perception of software quality requirements during the project lifecycle. In: International working conference on requirements engineering: foundation for software quality. Essen, Germany
go back to reference Fellbaum C (ed) (1998) WordNet: an electronic lexical database. MIT Press, CambridgeMATH Fellbaum C (ed) (1998) WordNet: an electronic lexical database. MIT Press, CambridgeMATH
go back to reference Hindle A, Godfrey MW, Holt RC (2007) Release pattern discovery via partitioning: methodology and case study. In: International workshop on mining software repositories at ICSE, pp 19–27. Minneapolis, MN. doi:10.1109/MSR.2007.28 Hindle A, Godfrey MW, Holt RC (2007) Release pattern discovery via partitioning: methodology and case study. In: International workshop on mining software repositories at ICSE, pp 19–27. Minneapolis, MN. doi:10.​1109/​MSR.​2007.​28
go back to reference Hindle A, German DM, Holt R (2008) What do large commits tell us?: a taxonomical study of large commits. In: MSR ’08: Proceedings of the 2008 international working conference on mining software repositories. ACM, New York, pp 99–108. doi:10.1145/1370750.1370773 CrossRef Hindle A, German DM, Holt R (2008) What do large commits tell us?: a taxonomical study of large commits. In: MSR ’08: Proceedings of the 2008 international working conference on mining software repositories. ACM, New York, pp 99–108. doi:10.​1145/​1370750.​1370773 CrossRef
go back to reference Hindle A, Godfrey MW, Holt RC (2009) What’s hot and what’s not: windowed developer topic analysis. In: International conference on software maintenance, pp 339–348. Edmonton, Alberta, Canada. doi:10.1109/ICSM.2009.5306310 Hindle A, Godfrey MW, Holt RC (2009) What’s hot and what’s not: windowed developer topic analysis. In: International conference on software maintenance, pp 339–348. Edmonton, Alberta, Canada. doi:10.​1109/​ICSM.​2009.​5306310
go back to reference Hindle A, Ernst NA, Godfrey MW, Mylopoulos J (2011) Automated topic naming to support cross-project analysis of software maintenance activities. In: International conference on mining software repositories Hindle A, Ernst NA, Godfrey MW, Mylopoulos J (2011) Automated topic naming to support cross-project analysis of software maintenance activities. In: International conference on mining software repositories
go back to reference ISO (2001) Software engineering—product quality—part 1: quality model. Tech. rep., International Standards Organization - JTC 1/SC 7 ISO (2001) Software engineering—product quality—part 1: quality model. Tech. rep., International Standards Organization - JTC 1/SC 7
go back to reference Kayed A, Hirzalla N, Samhan A, Alfayoumi M (2009) Towards an ontology for software product quality attributes. In: International conference on internet and web applications and services, pp 200–204. doi:10.1109/ICIW.2009.36 Kayed A, Hirzalla N, Samhan A, Alfayoumi M (2009) Towards an ontology for software product quality attributes. In: International conference on internet and web applications and services, pp 200–204. doi:10.​1109/​ICIW.​2009.​36
go back to reference Marcus A, Sergeyev A, Rajlich V, Maletic J (2004) An information retrieval approach to concept location in source code. In: 11th working conference on reverse engineering, pp 214–223. doi:10.1109/WCRE.2004.10 Marcus A, Sergeyev A, Rajlich V, Maletic J (2004) An information retrieval approach to concept location in source code. In: 11th working conference on reverse engineering, pp 214–223. doi:10.​1109/​WCRE.​2004.​10
go back to reference Massey B (2002) Where do open source requirements come from (and what should we do about it)? In: Workshop on Open source software engineering at ICSE. Orlando, FL, USA Massey B (2002) Where do open source requirements come from (and what should we do about it)? In: Workshop on Open source software engineering at ICSE. Orlando, FL, USA
go back to reference Mei Q, Shen X, Zhai C (2007) Automatic labeling of multinomial topic models. In: International conference on knowledge discovery and data mining, pp 490–499. San Jose, California. doi:10.1145/1281192.1281246 Mei Q, Shen X, Zhai C (2007) Automatic labeling of multinomial topic models. In: International conference on knowledge discovery and data mining, pp 490–499. San Jose, California. doi:10.​1145/​1281192.​1281246
go back to reference Scacchi W, Jensen C, Noll J, Elliott M (2005) Multi-modal modeling, analysis and validation of open source software requirements processes. In: International conference on open source systems, vol 1, pp 1–8. Genoa, Italy Scacchi W, Jensen C, Noll J, Elliott M (2005) Multi-modal modeling, analysis and validation of open source software requirements processes. In: International conference on open source systems, vol 1, pp 1–8. Genoa, Italy
go back to reference Treude C, Storey MA (2009) ConcernLines: a timeline view of co-occurring concerns. In: International conference on software engineering, pp 575–578. Vancouver Treude C, Storey MA (2009) ConcernLines: a timeline view of co-occurring concerns. In: International conference on software engineering, pp 575–578. Vancouver
go back to reference Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer
Metadata
Title
Automated topic naming
Supporting cross-project analysis of software maintenance activities
Authors
Abram Hindle
Neil A. Ernst
Michael W. Godfrey
John Mylopoulos
Publication date
01-12-2013
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 6/2013
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-012-9209-9

Other articles of this Issue 6/2013

Empirical Software Engineering 6/2013 Go to the issue

Premium Partner