Erschienen in:

Open Access 2023 | OriginalPaper | Buchkapitel

6. Architecture of AISEMA System

verfasst von : Artem Kruglov, Giancarlo Succi, Xavier Vasquez

Erschienen in: Developing Sustainable and Energy-Efficient Software Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

Over time, the data collection process has evolved from manual data collection to automated mechanisms where human intervention is minimal and therefore has minimal impact on work processes. In this section, we describe the main components that should be included in the design of an automated in-process software engineering measurement and analysis (AISEMA) system.

At an industrial level, data collection is an important task, since, from these, it is possible to gather information on the current state of the processes, identify possible problems, or find mechanisms to be able to optimize stages of the production process, and the case of software production processes is not exempt from this type of analysis.

Over time, the data collection process has evolved, going from an initial stage of manual and systematic collection carried out by a person, which could reach the point of even hindering or slowing down the process under study, to today where we can incorporate automated data collection mechanisms, where human intervention is minimal and therefore generates a minimal impact on the development of the process. In our particular case, it is a study of the software development process.

Based on the situation raised above, in this section we will describe from a high-level perspective the main components that must be included in the design of an Automated In-process Software Engineering Measurement and Analysis (AISEMA) system, which broadly consists of one or more components responsible for collecting data in a noninvasive way, a component in charge of centralizing the collected data, and a third component in charge of presenting indicators or other types of information generated from the previously collected data.

6.1 Data Collectors

6.1.1 Quality Attributes

We will start by describing one of the key components of an AISEMA system, the data collection component, which must meet the following quality attributes:

Extensibility is necessary since, over time, the incorporation of new metrics could become a need; due to that, it is important that our design allows the inclusion of new modules in a simple way.
Performance is one of the fundamental pillars in design, since we want our data collector to have the least possible impact in the execution of daily activities.
Reliability is another point to highlight, since, in order to carry out an adequate analysis, it is necessary for our component to be able to obtain the data correctly and precisely all the time, and these data must be stored and transmitted as expected.
Security is a point that many could overlook; however, depending on the degree of sensitivity of the data captured, security plays an important role in our design, since many times we will be able to capture information about individuals or work groups, which we would not want to see compromised, since we do not know what kind of manipulation could be carried out on them.

6.1.2 Features

Once the quality attributes of the data collection components have been defined, let us see in detail the desired features; for this particular case, we will analyze the data collectors developed for the project called Innometrics, which from a high-level perspective is responsible for collecting data on resource utilization during the software development process.

As mentioned above, the main feature of the data collector is to monitor the impact that each application running on the developer’s computer has on the use of available resources. For example, we want to be able to monitor CPU usage, GPU usage, RAM, and I/O operations, as this will help us in the future in making our predictions on energy consumption or to predict the number of computational resources needed to run a certain project.

On the other hand, it is also required to monitor the application that runs in the foreground, so that it is possible to try to understand which activity the user performs and in turn the time that the user spends actively on it (Fig. 6.1). Also, for reasons of transparency, it is necessary to provide access to the user for the data that has been collected, either historically or in real time, so a minimal graphical interface will be required to display this type of information. In addition, since the data collection is intended to be noninvasive and we would not like to disturb the normal execution of the developers’ activities, it will be necessary to include a control panel with which the data collection period can be adjusted, and the periodicity with which the data is synchronized with the central database (Fig. 6.2).

6.1.3 Internal Design

Up to this point, we have already defined the bases of our system to take the next step, to describe the architectural design that our system should have. We are going to detail it starting from the highest level components until we reach the detail of each of them. Also it should be clarified that the design presented can be taken as a guide with good practices and which can be improved in various ways.

We can divide our data collection application into four different components:

Graphic interface
Data collectors
Persistence layer
API controller

6.1.3.1 Graphic Interface

This component is mainly responsible for presenting the collected data, giving access to the configuration section and a summary view of the data that is monitored in real time.

However, it is this same component which is responsible for managing the different execution threads, which are used to track each of the different metrics that we want to monitor, so in this way, we can isolate each one of the follow-up processes, without their execution being limited or influenced by any of the other threads.

And it is in this same way that an additional thread is also included, which is responsible for synchronizing the data to the central database and in turn monitoring in case any notification is received from the backend, either to notify the user or as a request to update the component.

6.1.3.2 Data Collectors

In order to provide extensibility to the system, the data collection processes have been separated into an independent module, which provides the necessary interfaces so that more metrics can be incorporated later. In Listing 6.1, we can see a code snippet that shows the implementation for Windows operating systems of the function which generates the report on all the processes that are being executed in the system.

Listing 6.1

Data collector snippet

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-11658-2_6/MediaObjects/525714_1_En_6_Figa_HTML.png

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-11658-2_6/MediaObjects/525714_1_En_6_Figb_HTML.png

This component (Fig. 6.3) is able to:

Monitor equipment battery status information

Fig. 6.3
Data collector internal design

×
Track the window on which the user actively performs his task
Collect metrics on each process in terms of resource utilization, as:
- GPU
- CPU
- RAM
- Network usage
- I/O operations

6.1.3.3 Persistence Layer and API Controller

Finally, we have two complementary modules, which are the persistence layer and the API controller. In the case of the persistence layer, it implements the necessary operations for the storage and management of the data collected locally on the user’s device. On the other hand, the API controller is responsible for defining and implementing the interfaces that are used for communication with the backend. And it is in this way that we can decouple our data collector from any database that we need to use and also from the API that is used to store the information.

6.2 Backend System

We have already described the main characteristics that the data collection agents must have and introduced a high-level design for them, and it is necessary to do the same with the complementary part within the AISEMA scheme.

In this case, we are talking about a component which will help us to consolidate the data that has already been collected by those agents, which we will call backend.

6.2.1 Quality Attributes

It is because of the above that we must have an ecosystem of services that support these operations, which must meet at least the following quality attributes:

Compatibility: This section is very important, since on some occasions we could find ourselves with different technologies, so it is important to use a communication protocol that is widely used in most platforms, in addition to being able to have controls over the version of the API and the data collectors that are deployed, so we can know if any of the components are out of date and request the end-user to perform an update or perform the update unattended.
Reliability, in addition to our system being able to support connectivity with various platforms, it is also important to guarantee that the backend will continue to function even in less favorable conditions, such as a high transactional load, tolerating possible failures either in communication or regarding the transmitted data.
Security: Given the sensitivity of the stored data, it is important to implement an adequate security protocol that helps us to prevent possible malicious use of the information.
Maintainability: In terms of maintainability, it is important that there is good modularity in the system, since this helps us to expand the frontiers of the initial implementation, to be able to add new functionalities, or to achieve integrations with other systems in the future.

In this case, the performance of the system could go into the background, since, due to what is implemented in the data collector, we have a local backup of the data, so in case the system is not capable of processing all transactions, these could be attended to later.

6.2.2 Features

We can detail the desired characteristics as follows:

It is necessary to have an API that serves us mainly to be able to receive the information that has been collected through the data collectors.
A system for the authentication and authorization of system users, since it is necessary to have different levels of authorization; in our case, it is necessary to have developers, managers, and system administrators.
A series of dedicated APIs for displaying information on the system’s dashboard.
In addition to this, it is necessary to have a component that allows us to have integration with external systems in order to collect data from the software development process that cannot be collected directly on the development team’s devices.

6.2.3 Internal Design

6.2.3.1 Main Server

In the main server, as we can see in Fig. 6.4, we have a set of services that support us in order to provide the essential functionalities for data collectors and dashboards, as detailed below:

DataCollectorAPI: This service is in charge of providing the methods that are consumed by the data collectors, in which reports with the collected data are periodically sent, and it also provides external authentication services, which will be used by the dashboard and by the data collectors themselves. It also provides the interfaces to be able to adjust the system configurations and user management, and these tasks will be executed from the dashboard by those administrator users.

Fig. 6.4
Backend internal design

×
Authorization service: On the other hand, for reasons of reuse and scaling, we have decided to implement user authentication and authorization in an independent service, so that if necessary we can have multiple instances of this service and any service of ours. The ecosystem can make use of the services provided.
Activity collector service: However, the service that will bear the greatest transactional load will be the activity collection service. This service will basically be in charge of processing the reports sent by the data collectors and generating the data sets that are necessary for generating the graphs shown in the dashboard; in the same way, by having this service independently, we can have the facility to scale the system more easily if this is required.

Also note that these services are deployed in Docker containers, which help us scale in a very simple way, and in turn have a service registration server. In this case, we’re talking about the Eureka server, which provides tools for automatic service discovery and has a load balancing mechanism.

6.2.3.2 Database

Regarding the database, we can mention that we have a PostgreSQL implementation, which is divided into two different sections: 1) a database with transactional data and data about users, and projects, and 2) additional configurations of both the system and integrations with external systems.

6.2.3.3 External Agents

In terms of external agents, there is an additional server, which has the services for direct integration with external systems, for example, consuming REST services and collecting the necessary information related to a specific project.

Also as an additional mechanism, it is capable of being able to connect directly to the databases of external agents, since in some cases the integration can be done directly via API or, in other cases, it is necessary to install plugins that will have their own database.

6.3 Conclusion

As we have seen throughout the chapter, we have been able to describe two big important components in the architecture of the AISEMA system, where it is very important to have a good design for the data collectors because they can hinder the activities of the development team if the appropriate measures are not taken. It is also worth noting that it is necessary to calibrate both the process scanning and data sending periods to the backend in order to achieve accurate data sampling without overloading the system. For our particular case, through numerous experiments, we were able to determine that collecting data every 2 minutes and synchronizing with the central database every 10 minutes provides us with an acceptable level of accuracy.

Regarding the backend, we note that one of the main advantages of the system is its flexibility in terms of the many types of integration that can be supported and the ease of scaling at peak transactions, for which it is important to pre-measure the number of simultaneous users that will be working in the system.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Springer Professional