Abstract
This paper explores the potentially pivotal role of Empirical Methods in addressing existential questions about the nature of software. Building upon an earlier paper that asked the question “What is software?”, this paper suggests that a key way to gain such understanding is to ponder the question of how to determine the size of a software entity. The paper notes that there have been a variety of indirect approaches to measuring software size, such as measuring the amount of time taken to produce software, and measuring the number of lines of code in a software entity. But these assume implicitly that such measures correlate positively with the inherent size of the software entity, broadly construed to include the entire panoply of code and non-code artifacts and their interconnections that comprise this entity. As in the original paper, this paper makes the case that entities such as recipes, laws, and processes are types of software, and that learning about their natures illuminates the nature of computer software—and conversely. This paper discusses possible approaches to measuring the size of these other types of artifacts, and uses observations about these approaches to suggest a possible approach to measuring the size of computer software entities. All of this is aimed at making progress in gaining understandings about the nature of software, broadly construed.
Preface: This paper is an updating of a paper previously published in Automated Software Engineering, entitled “What is Software?” [1]. That previous paper, written over 5 years ago, made a case for the importance of understanding the essence of what “software” is, noting that computer software is one of a number of different kinds of intellectual products that can and should be considered to be closely related to each other. The paper noted that laws, processes, and recipes all seem to be closely related in fundamental ways to computer software, and suggested that all might be considered to be subtypes of a type of intellectual product that might be called “software”. That being the case, the earlier paper suggested that studying any of these might well produce results of interest and value to the others, and studying the relations among these types of artifacts might ultimately provide insight into the fundamental nature of the type of thing of which all might be considered to be subtypes.
The main addition that this paper makes to the previous version is to note a potentially key contribution that Empirical Methods could make to these understandings. In the paper we argue that the understanding of an object (physical or non-physical) is greatly enhanced by the ability to measure that object. Indeed, Lord Kelvin suggested, over 100 years ago, that
… when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be.
That being the case, Empirical Methods research should be viewed as being essential to gaining knowledge and establishing the science of the nature of software, in that it addresses issues of how to measure various aspects of software. This paper focuses as a case in point on how to define one particular basic measure of software, namely its size. This would seem to be a basic measure and yet we note that no such satisfactory measure of software size seems to exist. Grappling with this and related questions has been a focus of the Empirical Methods community. The community’s success in understanding how to establish such measures of computer software is clearly important to progress in being more effective in computer software engineering, but might indeed also have important ramifications for improvements in the engineering of other kinds of software, such as processes and laws, as well. For that reason the ongoing efforts of the Empirical Methods research community should be viewed by the entire “software” community as being of fundamental importance.