Skip to main content
Erschienen in: Empirical Software Engineering 6/2013

01.12.2013

Automated topic naming

Supporting cross-project analysis of software maintenance activities

verfasst von: Abram Hindle, Neil A. Ernst, Michael W. Godfrey, John Mylopoulos

Erschienen in: Empirical Software Engineering | Ausgabe 6/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Software repositories provide a deluge of software artifacts to analyze. Researchers have attempted to summarize, categorize, and relate these artifacts by using semi-unsupervised machine-learning algorithms, such as Latent Dirichlet Allocation (LDA). LDA is used for concept and topic analysis to suggest candidate word-lists or topics that describe and relate software artifacts. However, these word-lists and topics are difficult to interpret in the absence of meaningful summary labels. Current attempts to interpret topics assume manual labelling and do not use domain-specific knowledge to improve, contextualize, or describe results for the developers. We propose a solution: automated labelled topic extraction. Topics are extracted using LDA from commit-log comments recovered from source control systems. These topics are given labels from a generalizable cross-project taxonomy, consisting of non-functional requirements. Our approach was evaluated with experiments and case studies on three large-scale Relational Database Management System (RDBMS) projects: MySQL, PostgreSQL and MaxDB. The case studies show that labelled topic extraction can produce appropriate, context-sensitive labels that are relevant to these projects, and provide fresh insight into their evolving software development activities.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Generated using David A. Wheeler’s SLOCCount, http://​dwheeler.​com/​sloccount.
 
6
Since the MySQL and MaxDB data had poor records for developer ids, we focused on PostgreSQL.
 
Literatur
Zurück zum Zitat Baldi PF, Lopes CV, Linstead EJ, Bajracharya SK (2008) A theory of aspects as latent topics. In: Conference on object oriented programming systems languages and applications, pp 543–562. Nashville Baldi PF, Lopes CV, Linstead EJ, Bajracharya SK (2008) A theory of aspects as latent topics. In: Conference on object oriented programming systems languages and applications, pp  543–562. Nashville
Zurück zum Zitat Boehm B, Brown JR, Lipow M (1976) Quantitative evaluation of software quality. In: International conference on software engineering, pp 592–605 Boehm B, Brown JR, Lipow M (1976) Quantitative evaluation of software quality. In: International conference on software engineering, pp 592–605
Zurück zum Zitat Chung L, Nixon BA, Yu ES, Mylopoulos J (1999) Non-functional requirements in software engineering. In: International series in software engineering, vol 5. Kluwer Academic, Boston Chung L, Nixon BA, Yu ES, Mylopoulos J (1999) Non-functional requirements in software engineering. In: International series in software engineering, vol 5. Kluwer Academic, Boston
Zurück zum Zitat Cleland-Huang J, Settimi R, Zou X, Solc P (2006) The detection and classification of non-functional requirements with application to early aspects. In: International requirements engineering conference, pp 39–48. Minneapolis, Minnesota. doi:10.1109/RE.2006.65 Cleland-Huang J, Settimi R, Zou X, Solc P (2006) The detection and classification of non-functional requirements with application to early aspects. In: International requirements engineering conference, pp 39–48. Minneapolis, Minnesota. doi:10.​1109/​RE.​2006.​65
Zurück zum Zitat Ernst NA, Mylopoulos J (2010) On the perception of software quality requirements during the project lifecycle. In: International working conference on requirements engineering: foundation for software quality. Essen, Germany Ernst NA, Mylopoulos J (2010) On the perception of software quality requirements during the project lifecycle. In: International working conference on requirements engineering: foundation for software quality. Essen, Germany
Zurück zum Zitat Fellbaum C (ed) (1998) WordNet: an electronic lexical database. MIT Press, CambridgeMATH Fellbaum C (ed) (1998) WordNet: an electronic lexical database. MIT Press, CambridgeMATH
Zurück zum Zitat Hindle A, Godfrey MW, Holt RC (2007) Release pattern discovery via partitioning: methodology and case study. In: International workshop on mining software repositories at ICSE, pp 19–27. Minneapolis, MN. doi:10.1109/MSR.2007.28 Hindle A, Godfrey MW, Holt RC (2007) Release pattern discovery via partitioning: methodology and case study. In: International workshop on mining software repositories at ICSE, pp 19–27. Minneapolis, MN. doi:10.​1109/​MSR.​2007.​28
Zurück zum Zitat Hindle A, German DM, Holt R (2008) What do large commits tell us?: a taxonomical study of large commits. In: MSR ’08: Proceedings of the 2008 international working conference on mining software repositories. ACM, New York, pp 99–108. doi:10.1145/1370750.1370773 CrossRef Hindle A, German DM, Holt R (2008) What do large commits tell us?: a taxonomical study of large commits. In: MSR ’08: Proceedings of the 2008 international working conference on mining software repositories. ACM, New York, pp 99–108. doi:10.​1145/​1370750.​1370773 CrossRef
Zurück zum Zitat Hindle A, Godfrey MW, Holt RC (2009) What’s hot and what’s not: windowed developer topic analysis. In: International conference on software maintenance, pp 339–348. Edmonton, Alberta, Canada. doi:10.1109/ICSM.2009.5306310 Hindle A, Godfrey MW, Holt RC (2009) What’s hot and what’s not: windowed developer topic analysis. In: International conference on software maintenance, pp 339–348. Edmonton, Alberta, Canada. doi:10.​1109/​ICSM.​2009.​5306310
Zurück zum Zitat Hindle A, Ernst NA, Godfrey MW, Mylopoulos J (2011) Automated topic naming to support cross-project analysis of software maintenance activities. In: International conference on mining software repositories Hindle A, Ernst NA, Godfrey MW, Mylopoulos J (2011) Automated topic naming to support cross-project analysis of software maintenance activities. In: International conference on mining software repositories
Zurück zum Zitat ISO (2001) Software engineering—product quality—part 1: quality model. Tech. rep., International Standards Organization - JTC 1/SC 7 ISO (2001) Software engineering—product quality—part 1: quality model. Tech. rep., International Standards Organization - JTC 1/SC 7
Zurück zum Zitat Kayed A, Hirzalla N, Samhan A, Alfayoumi M (2009) Towards an ontology for software product quality attributes. In: International conference on internet and web applications and services, pp 200–204. doi:10.1109/ICIW.2009.36 Kayed A, Hirzalla N, Samhan A, Alfayoumi M (2009) Towards an ontology for software product quality attributes. In: International conference on internet and web applications and services, pp 200–204. doi:10.​1109/​ICIW.​2009.​36
Zurück zum Zitat Marcus A, Sergeyev A, Rajlich V, Maletic J (2004) An information retrieval approach to concept location in source code. In: 11th working conference on reverse engineering, pp 214–223. doi:10.1109/WCRE.2004.10 Marcus A, Sergeyev A, Rajlich V, Maletic J (2004) An information retrieval approach to concept location in source code. In: 11th working conference on reverse engineering, pp 214–223. doi:10.​1109/​WCRE.​2004.​10
Zurück zum Zitat Massey B (2002) Where do open source requirements come from (and what should we do about it)? In: Workshop on Open source software engineering at ICSE. Orlando, FL, USA Massey B (2002) Where do open source requirements come from (and what should we do about it)? In: Workshop on Open source software engineering at ICSE. Orlando, FL, USA
Zurück zum Zitat Mei Q, Shen X, Zhai C (2007) Automatic labeling of multinomial topic models. In: International conference on knowledge discovery and data mining, pp 490–499. San Jose, California. doi:10.1145/1281192.1281246 Mei Q, Shen X, Zhai C (2007) Automatic labeling of multinomial topic models. In: International conference on knowledge discovery and data mining, pp 490–499. San Jose, California. doi:10.​1145/​1281192.​1281246
Zurück zum Zitat Scacchi W, Jensen C, Noll J, Elliott M (2005) Multi-modal modeling, analysis and validation of open source software requirements processes. In: International conference on open source systems, vol 1, pp 1–8. Genoa, Italy Scacchi W, Jensen C, Noll J, Elliott M (2005) Multi-modal modeling, analysis and validation of open source software requirements processes. In: International conference on open source systems, vol 1, pp 1–8. Genoa, Italy
Zurück zum Zitat Treude C, Storey MA (2009) ConcernLines: a timeline view of co-occurring concerns. In: International conference on software engineering, pp 575–578. Vancouver Treude C, Storey MA (2009) ConcernLines: a timeline view of co-occurring concerns. In: International conference on software engineering, pp 575–578. Vancouver
Zurück zum Zitat Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer
Metadaten
Titel
Automated topic naming
Supporting cross-project analysis of software maintenance activities
verfasst von
Abram Hindle
Neil A. Ernst
Michael W. Godfrey
John Mylopoulos
Publikationsdatum
01.12.2013
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 6/2013
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-012-9209-9

Weitere Artikel der Ausgabe 6/2013

Empirical Software Engineering 6/2013 Zur Ausgabe