1 Introduction
MOGen
(Gote et al. 2020), which captures non-Markovian patterns in paths in complex networks, i.e. patterns that require memory to be modelled. Our contributions are as follows:-
We mine a unique curated dataset comprised of fine-grained time-stamped path data from two issue trackers and a code review platform tracking the actions of a product team at genua over 20 years.
-
To identify community smells, we consider five centrality measures that serve as proxy for the influence of specific nodes and node sequences in dynamical processes. For these centrality measures, we demonstrate that utilising a path model results in improved predictions of influential nodes in time-series data compared to a simpler network-based model, provided there is enough training data. However, this approach results in a significant generalisation error for smaller datasets.
-
To address this problem, we define equivalent measures for
MOGen
, a higher-order generative model for paths in complex networks (Gote et al. 2020). We show that ourMOGen
-based centrality measures effectively mitigate the generalisation error, i.e. they balance between underfitting and overfitting the data. -
We apply our five
MOGen
-based centrality measures to identify team members consistently taking over tasks that no other team members perform, identifying two community smells in the dynamic development process that could have not been identified using static SNA. -
We validate our findings in semi-structured interviews with five developers from genua. The team is aware of one of these community smells and employs active measures against it. However, the team was not aware of the second community smell but could confirm it ex post. Thus, we prove that our approach successfully uncovers community smells and can aid software teams in countering them.
MOGen
-based centrality measures by applying them to the empirical software engineering domain (Sects. 3, 6, 7). In doing so, we show the capability of higher-order network methods in a real-world application.2 Related work
2.1 Community smells
2.2 Social network analysis
3 Data
3.1 The development process at genua
3.2 Extracting paths capturing the development process
3.3 Characteristics of the software development team



4 Methods
MOGen
(Gote et al. 2020) in Sect. 4.3. Finally, we introduce the MOGen
-based centrality measures predicting influential nodes and higher-order patterns to ultimately detect community smells in Sect. 4.4.4.1 Paths on network topologies
4.2 Modelling higher-order patterns in path data




4.3 MOGen
MOGen
, a multi-order generative model for paths (Gote et al. 2020) that combines information from multiple higher-order models. In addition, MOGen
explicitly considers the start- and end-points of paths using the special initial and terminal states \(*\) and \(\dagger\). MOGen
represents a path \(v_1 \rightarrow v_2 \rightarrow \dots \rightarrow v_{l}\) asMOGen
model is fully described by a multi-order transition matrix \({{\textbf {T}}}^{(K)}\) shown in Fig. 6. The entries \({{\textbf {T}}}^{(K)}_{ij}\) of \({{\textbf {T}}}^{(K)}\) capture the probability of a transition between two higher-order nodes.MOGen
model with \(K=1\) is equivalent to a network model but for nodes \(*\) and \(\dagger\) that additionally consider the starts and ends of paths. In turn, a MOGen
model with K matching the maximum path length observed in P is a lossless representation of the set of paths. Thus, MOGen
allows us to find a balance between the network model—allowing all observed transitions in any order—and the observed set of paths—only allowing for transitions in the order in which they were observed.4.3.1 MOGen: Fundamental matrix
MOGen
model with maximum-order K. We split \({\textbf {T}}^{(K)}\) into a transient part \({\textbf {Q}}\) (


MOGen
as an absorbing Markov chain, where the states \((v_1, \ldots , v_{n-1}, v_n)\) represent a path in node \(v_n\) having previously traversed nodes \(v_1, \ldots , v_{n-1}\). Using this interpretation allows us to split \({{\textbf {T}}}^{(K)}\) into a transient part \({{\textbf {Q}}}\) representing the transitions to different nodes on the paths and an absorbing part \({{\textbf {R}}}\) describing the transitions to the end state \(\dagger\). We can further extract the starting distribution \({{\textbf {S}}}\). All properties are represented in Fig. 6.MOGen
model analytically.4.4 Centrality measures
MOGen
-based centrality measures that we use throughout the subsequent analyses. For all MOGen
-based centrality measures, we also introduce the corresponding measures for a network and a path model.4.4.1 Betweenness centrality
MOGen
, we can utilise the properties of the fundamental matrix \({{\textbf {F}}}\). Entries (v, w) of \({{\textbf {F}}}\) represent the number of times we expect to observe a node w on a path continuing from v before the path ends. Hence, by multiplying \({{\textbf {F}}}\) with the starting distribution \({{\textbf {S}}}\), we obtain a vector containing the expected number of visits to a node on any path. To match the notions of betweenness for networks and paths, we subtract the start and end probabilities of all nodes, yieldingMOGen
model—i.e. higher-order nodes. The betweenness centrality of a first-order node v can be obtained as the sum of the higher-order nodes ending in v.4.4.2 Closeness centrality (harmonic)
MOGen
models contain different higher-order nodes, \({{\textbf {D}}}\) captures the distances between higher-order nodes based on the multi-order network topology considering temporal correlations up to length K. While we aim to maintain the network constraints set by the multi-order topology, we are interested in computing the closeness centralities for first-order nodes. We can achieve this by projecting the distance matrix to its first-order form, containing the distances between any pair of first-order nodes but constrained by the multi-order topology. For example, for the distances \(d\{(A, B), (C, A)\} = 3\) and \(d\{(B, B), (C, A)\} = 2\), the distance between the first-order nodes B and A is 2. Hence, while for the network, the distances are computed based on the shortest path assumption, multi-order models with increasing maximum-order K allow us to capture the tendency of actual paths to deviate from these shortest paths. Based on the resulting distance matrix \({{\textbf {D}}}\), the closeness centrality is again computed following Eq. (6). Therefore, while for all representations, we compute the closeness centrality of a node using the same formula, the differences in the results originate from the constraints in the topologies considered when obtaining the distance matrix \({{\textbf {D}}}\).4.4.3 Path end
MOGen
, all paths end with the state \(\dagger\). Therefore, \(e_v\) is obtained from the transition probabilities to \(\dagger\) of a single path starting in \(*\). This last transition can—and is likely to—be made from a higher-order node. We can obtain the path end probability for a first-order node by summing the path end probabilities of all corresponding higher-order nodes.4.4.4 Path continuation
MOGen
, path continuation is given directly by summing the probabilities of all transitions in the row of \({{\textbf {T}}}^{(K)}\) corresponding to node v leading to the terminal state \(\dagger\). As for other measures, for MOGen
, the continuation probabilities are computed for higher-order nodes. We can obtain continuation probabilities for a first-order node v as the weighted average of the continuation probabilities of the corresponding higher-order nodes, where weights are assigned based on the relative visitation probabilities of the higher-order nodes.4.4.5 Path reach
MOGen
, we can again use the properties of the fundamental matrix \({{\textbf {F}}}\) and obtain the expected number of remaining transitions for any node v as the row sum5 Evaluating MOGen-based centralities in empirical path data
5.1 Experimental setup
MOGen
-based centrality against network- and path-based measures in five empirical path datasets. We refer to Appendix A for further information and summary statistics of these datasets. For each path dataset, we compare three types of models: First, a network model containing all nodes and edges observed in the set of paths. Second, a path model which precisely captures the observed paths, i.e. the model is identical to the set of paths. Third, MOGen
models with different maximum-orders K that capture all higher-order patterns up to a distance of K.5.1.1 Train-test split
5.1.2 Ground truth ranking
5.1.3 Prediction of influential nodes and node sequences
MOGen
model resembles a network model with added states capturing the start- and end-points of paths. By setting \(K=l_{max}\), where \(l_{max}\) is the maximum path length in a given set of paths, we obtain a lossless representation of the path data. By varying K between 1 and \(l_{max}\), we can adjust the MOGen
model’s restrictiveness between the levels of the network and the path model. We hypothesise that network and path models under- and overfit the higher-order patterns in the data, respectively, leading them to misidentify influential nodes and node sequences in out-of-sample data. Consequently, by computing node centralities based on the MOGen
model, we can reduce this error.MOGen
models with \(1 \le K \le 5\) to our set of training paths. We then apply the centrality measures introduced in Sect. 4.4 to compute a ranking of nodes and node sequences according to each of the models. In a final step, we compare the computed rankings to the ground truth ranking that we computed for our test paths.5.1.4 Comparison to ground truth
5.2 Comparison of the prediction quality
MOGen
models with maximum orders between 1 and 5MOGen
models to predict the influence of nodes and node sequences in out-of-sample data. For ease of discussion, we start our analysis by focusing on the two datasets, BMS1 and HOSPITAL. Figure 8 shows the results for our five centrality measures. For betweenness and closeness, we do not require information on the start- and end-point of paths. Therefore, equivalent measures for the network model exist. In contrast, no equivalent measures for the network model can be computed for path end, path continuation, and path reach.MOGen
models (MK) with increasing K, the models become more restrictive until ending with the path model (P).MOGen
models outperform both the network model and the path models. With less training data, the AUC scores of all models decrease. However, as expected, these decreases are larger for the network and path models. For the betweenness and closeness measures, this results in AUC curves that resemble “inverted U-shapes”. For the remaining measures, for which no equivalent network measures are available, we generally find that MOGen
models with K between 1 and 3 perform best, and the prediction performance decreases for more restrictive models, such as the path model. Our results highlight the risk of underfitting for network models and overfitting for path models. We further show that this risk increases when less training data are available.
MOGen
and path models decreases, and for some measures, the path model even yields better performance. WORK and TUBE are those datasets for which we have the highest fraction of total observed paths compared to the number of unique paths in the datasets. As shown in Table 4, BMS1 contains 59,601 total paths, of which 18,473 are unique. This means that, on average, each unique path is observed 3.2 times. These counts increase to 4 for SCHOOL, 4.6 for HOSPITAL, 6.7 for WORK, and 132.9 for TUBE. The good performance of the path model for these datasets shows that the error we found with fewer observations is indeed due to overfitting. In other words, if we have a sufficient number of observations, we can compute the centralities on the path data directly. However, if the number of observations is insufficient, the path model overfits the patterns in the training data and consequently performs worse on out-of-sample data. How many observations are required to justify using the path model depends on the number of unique paths contained in the dataset.MOGen
models that prevent underfitting by capturing higher-order patterns up to a distance of K while simultaneously preventing overfitting by ignoring patterns at larger distances.6 Detecting community smells at genua
MOGen
-based centrality measures to identify community smells within the development process. We visualise our approach in Fig. 9. In addition, we provide a detailed sequence of steps in Table 2.MOGen
model to the extracted paths and identify the most central team members according to the centrality measures introduced in Sect. 4.4. c We identify those team members that are most central to the team and track them over time. d We identify community smells by comparing the centralities of those members with the values obtained for the remaining team. e We validate the detected community smells in semi-structured interviews with team members from genua. f As our centrality measures are computationally effective, they can be employed in real-time to provide actionable insights on existing and emerging community smells to software development teams1. Start with development platforms storing team’s time-stamped actions, labelled by issue. | |
2. Extract all actions from the databases. | |
3. Sort all actions according to their time-stamp. | |
4. Aggregate them at issue-level to form sequence of actions, i.e. paths. | |
5. Assign paths to rolling time windows (one-year long with three-month shifts, cf. Section 3). | |
6. Fit a MOGen model for each time window. | |
7. Compute path centralities for team members (cf. Section 4.4). | |
8. Identify members consistently deviating from average centrality (cf. Section 6.2). | |
9. Examine these members’ centrality score time-series. | |
10. Detect time-series anomalies. | |
11. Explain anomalies, forming community smell hypotheses. | |
12. Validate hypotheses through semi-structured interviews. |
6.1 Higher-order interaction patterns
MOGen
’s built-in model selection approach. To account for changes over time, we fit a separate MOGen
model to all paths starting in a given one-year period. We then move the one-year window by three-month increments and repeat the process.MOGen
accounts for the start- and end-points of paths. The results presented later will show that the consideration of these start- and end-points, neglected by static network models, is essential.MOGen
models, respectively.6.2 Identifying community smells in software development processes
6.2.1 Community smells in the issue tracking process
6.2.2 Community smells in the code review process
MOGen
-based centralities allow us to quantify the importance of not only nodes, but also edges in the network. In Sect. B, we examine the centralities of all edges, where C is either the target or the source to determine the extent of C’s interactions with various team members. Through this analysis, we assess whether C’s interactions are limited to a small subset of the team or if they involve a broader range of team members. This ultimately helps us understand the extent of knowledge sharing involving C. Our results show that while C is holding significant project knowledge, C’s broad interactions with various team members, both as sender and recipient, indicate that C is not a lone wolf, and the team likely has active measures in place to promote knowledge exchange. Therefore, we do not consider C as a potential community smell.7 Validating our results in semi-structured interviews
7.1 Essential team members
A | B | C | D | E | other (H) |
---|---|---|---|---|---|
1 | 4 | 4 | 3 | 1 | 2 |
7.2 The quality assurance process
B confirmed that the development process itself did not change, arguing that“The process didn’t change much. The categories and priorities may have looked minimally different, but the process itself [...] there nothing has changed.” (statement by D)
When asking for the reason behind the transition, we learned that“[t]his [Bugzilla] was just a tool.” (statement by B)
Further, the team is not aware of any causal relationship between the transition and E joining the team:“[t]he switch [from Bugzilla to Redmine] was made mainly because they wanted to track features that were promised to customers, which was not possible with Bugzilla.” (statement by E)
Thus, we conclude that the transition from Bugzilla to Redmine was not accompanied by any significant change in the development process.“No, I think this was pure chance.” (statement by E)
In addition to the testing by the development team,“A unique aspect of Aegis is that for each bugfix or feature that you develop, you have to write a test. You then need to execute this test twice—once with your changes and once without your changes. Without your changes, the test needs to fall on its nose and fail. With your changes, the feature is now there, and the test is successful. Through this process, we have already tested the code. Not only developed but simultaneously also tested.” (statement by D)
However, with both the product and the customer base growing significantly, the team eventually introduced an external quality assurance process:“[t]he customer managers have a natural interest to test new features before installing them for a customer.” (statement by B)
Over the last ten years, E has developed a meticulous testing setup allowing the evaluation of new releases in environments similar to those used by genua’s clients. However, as we hypothesised,“The BSI [Federal Office for Information Security] responsible for the certification of [the product] requested an additional quality assurance process conducted by an external person not part of the development team. I became this person.” (statement by E)
E confirmed this, stating:“[n]obody apart from E knows how the testing environment works.” (statement by B)
So what happens if E can no longer perform the work?“I am effectively the entire quality assurance department for [the product]. In moments where I don’t feel like doing it, go on holiday, or am busy with other things, it [new changes] remains without quality assurance.” (statement by E)
In other words, there would be severe consequences for the team:“Certainly, nobody would know my test environment and my tests at a deeper level. One or two people have performed tests of patches—this means they have effectively tested a new version of the software with the existing set of tests. However, I am the only one who knows the setup in depth and knows how to properly create new tests or adapt tests for new features.” (statement by E)
Despite these consequences, only G, who has also worked in quality assurance, mentioned E as one of the most important team members. Also, no steps are taken to mitigate the potential consequences:“If [E] is absent or unable to perform the work, we have a massive problem. Quality assurance is undoubtedly something where we have [E] who has done this for many years and is genuinely the only one.” (statement by G)
We argue this is particularly critical as the external quality assurance process, which E is responsible for, was explicitly requested by the Federal Office for Information Security and is thus required to obtain the product’s certification.“Concerning me, if there are any steps taken to moderate the consequences if I was no longer there? I don’t know; I haven’t witnessed any.” (statement by E)
7.3 The integration of changes
This rich history also contains numerous design decisions that might look wrong initially but were made for good reasons.“[The product] is now significantly older than 20 years, and it still exists, [...], but there were multiple generations of developers that have worked on it. Those are now gone again, which means that there is quite a lot of knowledge—especially undocumented knowledge—concentrated in just a few team members.” (statement by D)
This complexity makes it very difficult for newcomers to get started contributing to the project, and even experienced team members have a hard time understanding all aspects and use cases:“With certainty, things that look peculiar at first glance [today] had to be done exactly like this for reasons that date back six, eight, or ten years. The longer you are part of the team, the more corners of the product you get to know, allowing you to make changes relatively efficiently without running the risk of breaking everything in another place.” (statement by E)
To cope with this complexity, the team has defined multiple roles corresponding to different tasks in the development process.“I think the initial barrier of entry is very high for [the product]. This means that while experienced Unix, Pearl, and C developers can collaborate on a small subarea, they lack the bigger picture of how the product works as a whole and how it is deployed with our clients.”“I would personally argue that even a long-term developer of [the product] could not put the product in operation or configure the product for a client. Even those developers don’t know how the different components interact and how they, therefore, need to be configured as a whole.” (statements by F)
To become an integrator, team members indeed need multiple years of experience working on the product.“There are three different roles: the role of the developer, the role of the reviewer, and the role of the integrator. All those who participate are, in any case, first developers. A few less are reviewers. For example, new trainees are initially not allowed to review.” (statement by D)
We note that commonly a single team member has multiple roles, with experienced team members taking over all three roles depending on the need.“We only assign the integrator role to experienced developers who see the bigger picture.” (statement by B)
If there is a lack of integrators, the tasks are given to the next most experienced team member.“It’s purely a practical problem as if one [integrator] is on holiday and another one is sick, nobody can integrate. Therefore, you need a third one that can integrate.” (statement by D)
However, the constraints of the review process do not explain why the team does not form subgroups that continuously review and integrate the changes developed by each other. Here B, who has been with the team for the full observation period, told us:“You have to keep the process going. When you realise, ok, we have one resource that integrates and five developers waiting for someone to integrate, then you say, ok, who is the second most experienced now? If it still doesn’t work, you say, who is the third most experienced now? And then you give the integrator role to so many people until it works.” (statement by B)
Thus, by introducing Scrum, the team actively and consciously tries to spread knowledge about the product throughout the team.“In the beginning, everyone had their own area. There was one who did the WWW relay, and the other one had to do the IP relay. When we wanted to do five features in one release, then everyone was assigned a different feature.”“Then we restructured the whole thing by introducing the Scrum process where now the team as a whole is responsible for all things that are done. Thus we tried to break the whole thing up a bit.” (statements by B)
In addition, the review process also contributes to knowledge sharing within the team.“One of the philosophies of Scrum is that everyone can do everything to address exactly the problems arising when the bus comes [referring to the truck factor, which is also known as the bus factor], or Google simply pays more. Thus we try to counteract exactly these problems in advance through XP [Extreme Programming] and pair programming [deliberate pairing of team members with different expertise].” (statement by D)
However, while, as we could show, these efforts lead to a strong mixing of interaction partners, there remain some areas in which team members specialise.“Given the complexity of [the product], we only have relatively few developers. Whenever something is changed, someone has to look at it [review and integrate it], which means that there is inevitably a mix of what people see.” (statement by G)
Also, as D experienced in another project, even three experienced team members can leave in short succession, causing trouble for the remaining team.“However, I think there are still comfort zones where people make initial changes and whom you let do it [make changes in a certain area of the codebase].” (statement by G)
Thus, based on our“I had the plan to switch from [the analysed project] to [the project I’m in now] for two months to have a look at the other project. They have a different programming language, a different framework etc. We were in the process of scheduling when we would start when one of the colleagues from that project quit. Then we said, ok, let’s do that right away as then I’ll get some more information from him. Because he quit, we decided that I’d switch completely. Afterwards, it didn’t take long, and two more core developers from the team quit, and that was pretty hard. A lot of knowledge left very quickly, and it often happens that I read some code that I don’t know and that I don’t understand right away. Then it takes me a lot longer. Also, the other remaining core developers need to look these things up because it’s not their expertise. In that situation, I felt the bus factor very hard. There you notice that there was quite a bit of sand in the gears.” (statement by D)
MOGen
-based centrality measures, we correctly identified the integration process as a code-red situation.MOGen
model proposed in this manuscript are a powerful, versatile, and fine-granular way to capture and quantify who holds knowledge and performs certain aspects of the development process and how this knowledge is shared within the team.8 Threats to validity
9 Conclusion
MOGen
, we proposed measures to quantify the influence of both nodes and node sequences in path data according to five different notions of centrality. Our centrality measures range from simple concepts like betweenness to complex measures such as path reach. We demonstrate in a prediction experiment with five empirical datasets that utilising our MOGen
-based centrality measures results in improved predictions of influential nodes in time-series data compared to both network and path models.10 Archival and reproducibility
MOGen
model is available at https://github.com/pathpy/pathpy3.