An empirically-based characterization and quantification of information seeking through mailing lists during Open Source developers’ software evolution

https://doi.org/10.1016/j.infsof.2014.09.003Get rights and content

Abstract

Context

Several authors have proposed information seeking as an appropriate perspective for studying software evolution. Empirical evidence in this area suggests that substantial time delays can accrue, due to the unavailability of required information, particularly when this information must travel across geographically distributed sites.

Objective

As a first step in addressing the time delays that can occur in information seeking for distributed Open Source (OS) programmers during software evolution, this research characterizes the information seeking of OS developers through their mailing lists.

Method

A longitudinal study that analyses 17 years of developer mailing list activity in total, over 6 different OS projects is performed, identifying the prevalent information types sought by developers, from a qualitative, grounded analysis of this data. Quantitative analysis of the number-of-responses and response time-lag is also performed.

Results

The analysis shows that Open Source developers are particularly implementation centric and team focused in their use of mailing lists, mirroring similar findings that have been reported in the literature. However novel findings include the suggestion that OS developers often require support regarding the technology they use during development, that they refer to documentation fairly frequently and that they seek implementation-oriented specifics based on system design principles that they anticipate in advance. In addition, response analysis suggests a large variability in the response rates for different types of questions, and particularly that participants have difficulty ascertaining information on other developer’s activities.

Conclusion

The findings provide insights for those interested in supporting the information needs of OS developer communities: They suggest that the tools and techniques developed in support of co-located developers should be largely mirrored for these communities: that they should be implementation centric, and directed at illustrating “how” the system achieves its functional goals and states. Likewise they should be directed at determining the reason for system bugs: a type of question frequently posed by OS developers but less frequently responded to.

Section snippets

Introduction and motivation

Software maintenance and evolution are large components of a successful software system’s lifecycle. The amount of software lifecycle effort consumed during this phase has been estimated to range between 60% and 80% of the entire lifecycle effort [1], [2], [3], [4]. While the empirical basis for such statements is dated and suggestions have been made that it should be revisited [4], the increasing scale and complexity of newer software systems [3], [5] implies that the effort invested in

OS governance

Information seeking in OS projects is contextualized by the governance model and working practices employed in those projects. The work presented in this paper focuses specifically on one aspect of governance, that is the use of information and tools [36]. This occurs in the context of two other aspects, these being the software development processes, and community management that implicitly guides these processes.

The two main types of governance models are where a single individual governs a

Research objective

This research has two objectives. The first is to empirically derive a schema of information types sought by Open Source programmers through mailing lists, during post-deployment activities like maintenance and evolution. The second is to quantify the prevalent types of information sought through this medium and the response rates for those queries. This section discusses the empirical process used to derive the information type schema. For a fuller description of the derivation process and the

The information types schema

The information schema derived from the mailing lists is presented in Fig. 2, Fig. 3. These Venn Diagrams present a hierarchical representation of the schema, each diagram representing 1 of the 2 top-level categories revealed by axial coding: Information Focus and Information Aspect. Focus refers to the target-entity that information is sought about, and Aspect refers to the type of information sought about that target-entity, so each question has a focus and an aspect. For example, in the case

The schema

The schema derived from the mailing lists can be aligned partially with existing schemas, most notably with Erdem’s et al. [52]. Erdem proposed that all information seeking events could be classified into Topic, Question type and Relation type. The Topic was the entity referenced in the question, and the Question Type consisted of Why, What, Where, When, How and Verification type questions. Erdem identified 9 different Relation types: Topic, Behavior, Structure, Function, Use, Goal,

Conclusion

This paper reports on the derivation of an Information Seeking schema for OS developers through a grounded analysis of 6 OS developer mailing lists, spanning 17 years of mail activity. The resultant schema is largely congruent with the findings of [11], [19], [74], [75] and closely echoes the schema proposed by Erdem et al. [52]. However, several of the categories differ, particularly with respect to Contextual Technology, and Documentation.

The resultant schema was then applied to the dataset to

Acknowledgment

This work was supported, in part, by Science Foundation Ireland Grant 10/CE/I1855 to Lero – the Irish Software Engineering Research Centre (www.lero.ie).

References (92)

  • R.S. Pressman, Software Engineering: A Practitioner’s Approach, fifth ed. McGraw-Hill Publishing Company,...
  • I. Zayour, T.C. Lethbridge, Adoption of reverse engineering tools: a cognitive perspective and methodology, in: 9th...
  • I. Sommerville

    Software Engineering

    (2004)
  • L. Prechelt

    Re-evaluating inheritance depth on the maintainability of object-oriented software

    Int. J. Empirical Softw. Eng.

    (1998)
  • T. Roehm, et al., How do professional developers comprehend software? in: 33rd ICSE 2011,...
  • A. De Lucia, A.R. Fasolino, M. Munro, Understanding function behaviors through program slicing, in: International...
  • B. Curtis et al.

    A field study of the software design process for large systems

    Commun. ACM

    (1988)
  • C.B. Seaman, The information gathering strategies of software maintainers, in: International Conference on Software...
  • J. Singer, Work practices of software maintenance engineers, in: International Conference on Software Maintenance (ICSM...
  • J. Singer, T. Lethbridge, Studying work practices to assist tool design in software engineering, in: 6th International...
  • S.E. Sim, Supporting multiple program comprehension strategies during software maintenance, in: Department of Computer...
  • M.P. O’Brien, Evolving a Model of the Information-Seeking Behaviour of Industrial Programmers, University of Limerick,...
  • K.P. Kingrey

    Concepts of information seeking and their presence in the practical library literature

    Libr. Philos. Pract.

    (2002)
  • J. Starke, C. Luce, J. Sillito, Searching and skimming: an exploratory study, in: IEEE International Conference on...
  • T.D. LaToza, B.A. Myers, Developers ask reachability questions, in: ACM/IEEE 32nd International Conference on Software...
  • M.G. Bradac et al.

    Prototyping a process monitoring experiment

    IEEE Trans. Softw. Eng.

    (1994)
  • A.J. Ko, R. DeLine, G. Venolia, Information needs in collocated software development teams, in: 29th International...
  • W. Liu, et al., A design for evidence-based software architecture research, in: Workshop on REBSE’2005,...
  • A. Mockus et al.

    Two case studies of Open Source software development: Apache and Mozilla

    ACM Trans. Softw. Eng. Methodol.

    (2002)
  • C. Wilson, Network centric operations: background and oversight issues for congress, in: Congressional Research Service...
  • T. Gasperson, To Iraq and Back: Soldier uses Linux in War, 2006 30 September 2009....
  • E.S. Raymond, The cathedral and the bazaar, in: The Cathedral and the Bazaar: Musings on Linux and Open Source by an...
  • L. Torvalds et al.

    Just for Fun: The Story of an Accidental Revolutionary

    (2001)
  • T. Koponen, V. Hotti, Open source software maintenance process framework, in: Proceedings of the Fifth Workshop on Open...
  • J. Feller et al.

    Understanding Open Source Software Development

    (2002)
  • B. Fitzgerald

    A critical look at open source

    Computer

    (2004)
  • K.Y. Sharif, J. Buckley, Further observation of open source programmers’ information seeking, in: Psychology of...
  • C. Gutwin, R. Penner, K. Schneider, Group awareness in distributed software development, in: Proceedings of the 2004...
  • T.D. LaToza, et al., Program comprehension as fact finding, in: Proceedings of the 6th Joint Meeting of the European...
  • P.A. O’Shea, An Investigation of Views and Abstractions Employed by Software Engineers during Software Maintenance – An...
  • OpenOffice, Key Open Source “Best Practices” Supported on This Site....
  • K.Y. Sharif, J. Buckley, Observing open source programmers’ information seeking, in: The 20th Annual Psychology of...
  • S. Daniel, K. Stewart, D. Darcy, Patterns of evolution in open source projects: a categorization schema and...
  • M.L. Markus

    The governance of free/open source software projects: monolithic, multidimensional, or configurational?

    J. Manage. Governance

    (2007)
  • E.S. Raymond, Homesteading the Noosphere, First Monday, vol. 3(10),...
  • R. Viseur

    Forks impacts and motivations in free and open source projects

    Int. J. Adv. Comput. Sci. Appl. (IJACSA)

    (2012)
  • Cited by (21)

    • Architecture information communication in two OSS projects: The why, who, when, and what

      2021, Journal of Systems and Software
      Citation Excerpt :

      With a long history of developers’ communication provided in these mailing lists, the goal of this study is to understand architecture information communication in OSS development. Mailing lists in OSS development have been investigated recently for traceability between emails and source code (Bacchelli et al., 2010), communication in development using mailing lists (Guzzi et al., 2013), and information seeking through mailing lists (Sharif et al., 2015). For example, a recent study on 37 Apache projects shows that 89.51% of all design discussions occur in project mailing lists (Mannan et al., 2020).

    • Towards a unified criteria model for usability evaluation in the context of open source software based on a fuzzy Delphi method

      2021, Information and Software Technology
      Citation Excerpt :

      Open-source software (OSS) is a software with source code that anyone can use, inspect, modify, and enhance [17,49,60,92,100,112,117]. Many organisations have been adopted OSS applications due to significant advantages that the application offer [28, 74, 94, 118, 124, 135]. The quality of the software is essential when considering which software package to adopt [23, 35, 41, 46, 62, 65].

    • A systematic examination of knowledge loss in open source software projects

      2019, International Journal of Information Management
      Citation Excerpt :

      Moreover, the number of contributions made on the project can determine the expertise level of the group responding to knowledge seekers. A study on the schema of information types sought in OSS mailing lists asserted that mailing lists are a strong representative of communication in Open Source Software and offer an insight into information seeking needs (Sharif et al., 2015). The findings suggest that 42% of information sought on mailing lists is on understanding task implementation and understanding bugs.

    • Investigating software modularity using class and module level metrics

      2016, Software Quality Assurance: In Large Scale and Complex Software-intensive Systems
    • Committer Assessment Practice in Blockchain Project: A Systematic Literature Review

      2023, Journal of Information and Communication Technology
    View all citing articles on Scopus
    View full text