4.1 Will PAM, TAA and OPS Yield Similar Results?
The plots drawn by the data gathered showed an unexpected and interesting result. Not only do the tools lack a correlation, but they do not even have a monotonic relationship when compared to each other for the agile practices covered, resulting in absence of convergent validity. This could indicate two things; the absence of monotonicity and the negative or very low correlations show that the questions used by the tools in order to cover an agile practice do it differently as well as that PAM, TAA and OPS measure the agility of software development teams in their own unique way.
Almost all groups had different responses to the same questions. With regards to the research question “Does convergent validity exist among the tools?”, we showed that convergent validity could not be established due to the low (if existing) correlations among the tools. Concerning the research question “Will the questions that are exactly the same among the tools yield the same results?”, we saw that a considerable amount of respondents’ answers were different.
The reasons for this somewhat unexpected results are explained in the following paragraphs.
Few or no questions for measuring a practice. A reason for not being able to calculate the correlation of the tools is that they cover slightly or even not at all some of the practices. An example of this is the Smaller and Frequent Product Releases practice. OPS includes four questions, while on the other hand, PAM and TAA have a single question each. Furthermore, Appropriate Distribution of Expertise is not covered at all by PAM. In case the single question gets a low score, this will affect how effectively the tool will measure an agile practice. On the contrary, multiple questions can better cover the practice by examining more factors that affect it.
The same practice is measured differently. Something interesting that came up during the data analysis was that although the tools cover the same practices, they do it in different ways, leading to different results. An example of this is the practice of Refactoring. PAM checks whether there are enough unit tests and automated system tests to allow the safe code refactoring. In case the course unit/system tests are not developed by a team, the respondents will give low scores to the question, as the team members in Company A did. Nevertheless, this does not mean that the team never refactors the software or does it with bad results. All teams in Company A choose to refactor when it adds value to the system, but the level of unit tests is very low and they exist only for specific teams. On the other hand, TAA and OPS check how often the teams refactor, among other aspects.
The same practice is measured in opposite questions. The
Continuous Integration practice has a unique paradox among TAA, PAM and OPS. The first two tools include a question about the members of the team having synchronized to the latest code, while OPS checks for the exact opposite. According to Soundararajan [
18], it is preferable for the teams not to share the same code in order to measure the practice.
Questions phrasing. Although the tools might cover the same areas for each practice, the results could differ because of how a question is structured. An example of this is the Test Driven Development practice. Both TAA and PAM ask about automated code coverage, while OPS just asks about the existence of code coverage. Furthermore, TAA focuses on 100 % automation, while PAM does not. Thus, if a team has code coverage but it is not automated, then the score of the respective question should be low. In case of TAA, if the code coverage is not fully automated, its score should be even lower. It is evident that the abstraction level of a question has a great impact. The more specific it is, the more a reply to it will differ, resulting in possible low scores.
Better understanding of agile concepts. In pre-post studies there is a possibility of the subjects becoming more aware of a problem in the second test due to the first test [
26]. Although the
testing threat, as it is called, does not directly apply here, the similar surveys on consecutive weeks could have enabled the respondents to take a deeper look into the agile concepts, resulting in better understanding of them, and consequently, providing different answers to the surveys’ questions.
How people perceive agility. Although the concept of agility is not new, people do not seem to fully understand it, as Conboy and Wang [
27] also mention. This is actually the reason behind the existence of so many tools in the field which are trying to measure how agile the teams are or the methodologies used. The teams implement agile methodologies differently and researchers create different measurement tools. There are numerous definitions of what agility is [
28‐
31], and each of the tool creators adopt or adapt the tools to match their needs. Their only common basis is the agile manifesto and its twelve principles [
32], which are (and should be considered as) a compass for the agile practitioners. Nevertheless, they are not enough and this resulted in the saturation of the field. Moreover, Conboy and Fitzgerald [
33] state that the agile manifesto principles do not provide practical understanding of the concept of agility. Consequently, all the reasons behind the current survey results are driven by the way in which tool creators and tool users perceive agility.
The questions in the surveys were all based on how their creators perceived the agile concept which is quite vague, as Tsourveloudis and Valavanis [
34] have pointed out. None of the Soundararajan [
18], So and Scholl [
16], Leffingwell [
17] claimed, of course, to have created the most complete measurement tool, but still, this leads to the oxymoron that the tools created by specialists to measure the agility of software development teams actually do it differently and without providing substantial solution to the problem. On the contrary, this leads to more confusion for the agile practitioners.
Considering that the researchers and specialists in the agile field perceive the concept of agility differently, it would be naive to say that the teams do not do the same. The answers to surveys are subjective and people reply to them depending on how they understand them. Ambler [
15] stated the following: “I suspect that developers and management have different criteria for what it means to be agile”. This is also corroborated by the fact that, although a team works in the same room and follows the same processes for weeks, it is rather unlikely that its members will have the same understanding of what a retrospection or a releasing planning meeting means to them, a statement which is also supported by Murphy et al. [
35].