Elsevier

Journal of Systems and Software

Volume 81, Issue 9, September 2008, Pages 1456-1469
Journal of Systems and Software

Architectural knowledge discovery with latent semantic analysis: Constructing a reading guide for software product audits

https://doi.org/10.1016/j.jss.2007.12.815Get rights and content

Abstract

Architectural knowledge is reflected in various artifacts of a software product. In a software product audit this architectural knowledge needs to be uncovered and its effects assessed in order to evaluate the quality of the software product. A particular problem is to find and comprehend the architectural knowledge that resides in the software product documentation. In this article, we discuss how the use of a technique called Latent Semantic Analysis can guide auditors through the documentation to the architectural knowledge they need. We validate the use of Latent Semantic Analysis for discovering architectural knowledge by comparing the resulting vector-space model with the mental model of documentation that auditors possess.

Introduction

The architectural design of a software product and the architectural design decisions taken play a key role in software product audits. Architectural design decisions and their rationale provide, for instance, insight into the trade-offs that were considered, the forces that influenced the decisions, and the constraints that were in place. The architectural design that is the result of these decisions allows for comprehension of such matters as the structure of the software product, its interactions with external systems, and the enterprise environment in which the software product is to be deployed. Following a recent trend in software architecture research (e.g., Bosch, 2004, Jansen and Bosch, 2005, Kruchten et al., 2006, van der Ven et al., 2006) we refer to the collection of architectural design decisions and the resulting architectural design as ‘architectural knowledge’.

For a given software product there is no single source that contains or provides all relevant architectural knowledge. Instead, architectural knowledge is reflected in various artifacts such as source code, data models, and documentation. A complicating factor in distilling relevant architectural knowledge from software product documentation is the fact that there are often many different documents. Each of these documents is tailored to specific stakeholders and different documents can therefore reflect architectural knowledge at different levels of abstraction. A high-level project management summary, for instance, will reflect architectural design decisions and their effects differently than a document describing detailed technical design.

The ISO/IEC 14598-1 international standard (ISO/IEC, 1999) defines a software product as ‘the set of computer programs, procedures, and possibly associated documentation and data’. Quality is defined as ‘the totality of characteristics of an entity that bear on its ability to satisfy stated and implied needs’, while quality evaluation is ‘a systematic examination of the extent to which an entity is capable of fulfilling specified requirements’. Consequently, when we refer in this article to a software product quality audit – i.e., an audit in which the quality of a software product is evaluated – we refer to ‘the systematic examination of the extent to which a set of computer programs, procedures, and possibly associated documentation and data are capable of fulfilling specified requirements’.

We have conducted a study at a company that has broad experience in performing software product audits. This company conducts independent quality audits of software products. Its customers range from large private companies to governmental institutions. In this study we have investigated the use of architectural knowledge in software product audits. To this end we observed an audit that was being conducted for one of the company’s customers. We attended and observed the audit team meetings and had discussions with the audit team members on their use of architectural knowledge in the audit. In addition, we held more general interviews on this topic with five employees who had been involved in various audits, two of whom were also directly involved in the observed audit. The interviewed employees possess different levels of experience and have different focal points when conducting an audit. The problem of finding relevant architectural knowledge sketched above corresponds to a problem that is perceived by all auditors as being difficult to deal with. In short, the auditors need a reading guide that guides them through the documentation.

In this article we outline the problem of discovering architectural knowledge in software product documentation and present a technique that can be used to alleviate this problem. This technique, Latent Semantic Analysis, uses a mathematical technique called Singular Value Decomposition to discover the semantic structure underlying a set of documents. We employ this latent semantic structure to guide the auditors through the documentation to the architectural knowledge needed. A comparison of the discovered semantic structure with the ideas auditors have of software product documentation shows that Latent Semantic Analysis produces a good approximation of the auditors’ mental models.

The remainder of this article is organized as follows. The next section discusses the use of architectural knowledge in software product audits based on our observations in the case study we conducted. Section 3 presents Latent Semantic Analysis (LSA) and its mathematical background. Section 4 discusses the application of LSA to a set of documents that contain software product documentation and shows how we can employ the semantic structure uncovered by LSA to guide the auditor to relevant architectural knowledge. In Section 5 we validate the LSA results through a comparison with auditors’ mental models of software product documentation. Section 6 contains a discussion on related work regarding the application of LSA to similar problems as well as related work in the area of research into architectural knowledge. Section 7 outlines research areas that are still open for further study. In Section 8 we sketch the use of Architectural Knowledge Discovery in a broader scope, and Section 9 contains concluding remarks on this article.

Section snippets

Architectural knowledge in a software product audit

In a software product audit, two types of architectural knowledge can be distinguished. On the one hand there is architectural knowledge pertaining to the current state of the software product; this knowledge reflects the architectural decisions made. On the other hand there is architectural knowledge pertaining to the desired state of the software product; this knowledge reflects the architectural decisions demanded (or expected). It is the auditor’s job to compare the current state with the

Latent semantic analysis

A method that can be used to capture the meaning of a collection of documents is the construction of a vector-space model. Vector-space models are based on the assumption that the meaning of a document can be derived from the terms that are used in that document. In a vector-space model, a document d is represented as a vector of terms d=(t1,t2,,tn), with ti(i=1,2,,n) being the number of occurrences of term i in document d (Letsche and Berry, 1997).

Fig. 1 depicts a matrix based on the

Constructing a reading guide: A case study

The LSA technique introduced in Section 3 forms the basis of a detailed case study in which we examine how the semantic structure discovered by LSA can be employed to guide the auditors through the documentation. This section presents the results of this case study.

Fig. 5 depicts the interactive process by which an auditor is guided through the documentation. Initially, auditors start with a set of unread documents. Although the content of these documents is still unknown, the auditors have a

Validation of the use of LSA

The previous section shows how the application of LSA delivers results that support auditors in finding a route through the documentation. The auditors indicate that the results show correspondence to their preferences for selecting and reading documents. In this section we empirically validate this correspondence.

The knowledge discovered by using LSA can only be regarded valid if it fits the expectations of the auditor. In other words, the discovered semantic structure must conform to the

Related work

The application of Latent Semantic Analysis to architectural knowledge discovery discussed in this article bears some relation to other work, both within and outside of the software engineering research domain. The origin of LSA lies in information retrieval. LSA was presented in 1990 by Deerwester et al. as ‘a new method for automatic indexing and retrieval’ of documents (Deerwester et al., 1990). Later research also focused on the psycholinguistic significance of LSA. Landauer and Dumais, for

Future work

The work presented in this article gives rise to a number of issues that warrant further research. An overall issue that remains to be investigated is the scalability of our approach. LSA proved to be feasible for a corpus of 80 documents, but in practice software product documentation might comprise many more documents. Document sets of several hundreds of documents are not uncommon.

Furthermore, the selection of the right number of reduced dimensions is still difficult. In this area, a

Architectural knowledge discovery in a broader scope

This article considers Architectural Knowledge Discovery (AKD) as a means to construct a reading guide for software product audits. Although this application is undoubtedly valuable, we believe AKD has merit in a broader scope.

We envision AKD as one particular technique used in a broad range of architectural knowledge management tools and methods. The role of AKD would mainly be to refine existing (codified) architectural knowledge from such diverse sources as documents, email, meeting minutes,

Conclusion

Document inspection is a method used in software product audits to distill architectural knowledge from the software product documentation. Unfortunately, document inspection is often hard to perform. Auditors are in need of a reading guide that tells them where to start reading, how to progress reading, and which documents to consult for more detail on a particular topic.

We have demonstrated how auditors can be guided through the documentation in a case study in which we reconstructed the

Acknowledgement

This research has been partially sponsored by the Dutch Joint Academic and Commercial Quality Research & Development (Jacquard) program on Software Engineering Research via contract 638.001.406 GRIFFIN: a GRId For inFormatIoN about architectural knowledge. The authors would like to thank Eefje Cuppen for helpful discussions on the repertory grid technique

Remco de Boer is a PhD researcher in Software Engineering at the VU University, Amsterdam, The Netherlands. He obtained his MSc in business informatics from the Erasmus University Rotterdam. His research interests include software architecture, knowledge management, and knowledge technologies. Prior to joining the Vrije Universiteit, he worked as a software developer and later as a researcher in knowledge technologies. He has been involved in various Dutch and EU research and development

References (30)

  • T.A. Letsche et al.

    Large-Scale Information Retrieval with Latent Semantic Indexing

    Information Sciences

    (1997)
  • G. Salton et al.

    Term-Weighting Approaches in Automatic Text Retrieval

    Information Processing & Management

    (1988)
  • Ali Babar, M., de Boer, R.C., Dingsøyr, T., Farenhorst, R., 2007. Architectural Knowledge Management Strategies:...
  • Babu T., L., Seetha Ramaiah, M., Prabhakar, T., Rambabu, D., 2007. ArchVoc–Towards an Ontology for Software...
  • M.W. Berry et al.

    Matrices Vector Spaces and Information Retrieval

    SIAM Review

    (1999)
  • Berry, M.W., Dumais, S.T., O’Brien, G.W., 1994. Using Linear Algebra for Intelligent Information Retrieval. Tech. Rep....
  • E. Bonnet et al.

    zt: A Software Tool for Simple and Partial Mantel Tests

    Journal of Statistical Software

    (2002)
  • Booch, G., http://www.booch.com/architecture/. Handbook of Software...
  • Bosch, J., 2004. Software Architecture: The Next Step. In: Oquendo, F., Warboys, B., Morrison, R. (Eds.), Software...
  • de Boer, R.C., 2006. Architectural Knowledge Discovery: Why and How? In: First Workshop on SHAring and Reusing...
  • R.C. de Boer et al.

    Constructing a Reading Guide for Software Product Audits

  • S. Deerwester et al.

    Indexing by Latent Semantic Analysis

    Journal of the American Society for Information Science (JASIS)

    (1990)
  • F. Fransella et al.

    A Manual for Repertory Grid Technique

    (1977)
  • G.H. Golub et al.

    Matrix Computations

    (1996)
  • J.H. Hayes et al.

    Improving After-the-Fact Tracing and Mapping: Supporting Software Quality Predictions

    IEEE Software

    (2005)
  • Cited by (25)

    • A systematic mapping study on text analysis techniques in software architecture

      2018, Journal of Systems and Software
      Citation Excerpt :

      Architectural Impact Analysis (AIA) intends to identify the elements in architecture affected by a change scenario. The identified elements include the components affected directly as well as the components affected indirectly by the change scenario (Bengtsson et al., 2004). Architectural Reuse (ARu) is to reuse the existing architectural designs, decisions, patterns, styles, and so on (IEEE, 2010).

    • Usage-based chunking of Software Architecture information to assist information finding

      2016, Journal of Systems and Software
      Citation Excerpt :

      In keyword-based searching, items retrieved are related because they contain the same or similar terms as the searched terms. In query-initiated discovery of the semantic structure of documents based on words in the documents (de Boer, 2006; de Boer and van Vliet, 2008), the documents or the units of texts retrieved are related because of their semantic structures. In the retrieval of architectural information chained by underlying models (de Boer and van Vliet, 2011; de Graaf et al., 2012; Jansen et al., 2009; Su et al., 2009; Tang et al., 2011), architectural elements or knowledge instances retrieved are related because of the pre-defined relations in the underlying models.

    • How organisation of architecture documentation affects architectural knowledge retrieval

      2016, Science of Computer Programming
      Citation Excerpt :

      Locating relevant architectural knowledge Knowledge is often spread over multiple documents [20] which makes it hard to locate AK, especially if documents lack finer details. Support for traceability between different entities

    • Enriching software architecture documentation

      2009, Journal of Systems and Software
      Citation Excerpt :

      Finding relevant AK in (large) software architecture documentation is often problematic. The knowledge needed is often spread around multiple documents (de Boer and van Vliet, 2008). The first obstacle is to find the relevant documents in the big set of documents accompanying a system.

    • An Empirical Study on the Architecture Instability of Software Projects

      2019, International Journal of Software Engineering and Knowledge Engineering
    View all citing articles on Scopus

    Remco de Boer is a PhD researcher in Software Engineering at the VU University, Amsterdam, The Netherlands. He obtained his MSc in business informatics from the Erasmus University Rotterdam. His research interests include software architecture, knowledge management, and knowledge technologies. Prior to joining the Vrije Universiteit, he worked as a software developer and later as a researcher in knowledge technologies. He has been involved in various Dutch and EU research and development projects. His PhD research is part of the GRIFFIN project, in which two universities and four industrial partners collaborate to develop methods and tools for architectural knowledge management.

    Hans van Vliet is Professor in Software Engineering at the VU University, Amsterdam, The Netherlands. He got his PhD from the University of Amsterdam. His research interests include software architecture and empirical software engineering. Before joining the VU University, he worked as a researcher at the Centrum voor Wiskunde en Informatica (Amsterdam). He is the author of “Software Engineering: Principles and Practice”, published by Wiley (3rd Edition, 2008). He is the Editor in Chief of the Journal of Systems and Software.

    This article has been based on earlier work by the authors, presented at the 6th Working IEEE/IFIP Conference on Software Architecture in January 2007 in Mumbai, India (de Boer and van Vliet, 2007).

    View full text