Skip to main content
Top
Published in:
Cover of the book

Open Access 2022 | OriginalPaper | Chapter

Self-healing Approach for IoT Architecture: AMI Platform

Authors : Bessam Abdulrazak, Josué Ayi Codjo, Suvrojoti Paul

Published in: Participative Urban Health and Healthy Aging in the Age of AI

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The fast growth of the IoT and the unlimited possibilities in terms of applications and processing brought forth by the 5G which is around the corner is making IoT an active part of the activity of daily living. Those massive architectures become then the target of security issues, broken services, broken hardware or malfunctioning of applications. Moreover, in a system of million connected devices, operating each one of them is impossible making the platform unmaintainable. In this study, we present our attempt to achieve the autonomy of IoT infrastructure and we present some of the existing and recurrent issues undermining the IoT architecture. Then we review existing self-healing techniques that enable a system to be autonomous and solve issues. We also described our IoT platform that targets the self-healing concern. Finally, we point out some recommendations to make an overall reliable, resilient IoT system with cognitive entities for self-management.

1 Introduction

An extensive growth in the development of the Internet of Things (IoT) in recent times has enabled the possibility of gathering a large amount of data from every aspect of life. This rapid growth in every sector imposes several challenging operational issues. It is possible to deal with these issues for a limited number of IoT devices however extremely difficult in case of a large number of devices. We argue that the self-healing approach can efficiently handle operational issues generated when various IoT devices are incorporated into an architecture.
The global IoT market size is forecasted to grow around 1.6 trillion by 2025 [1]. These millions of connected devices create a massive amount of data and unlimited possibilities. Foreseeing that infinite connection allows IoT to bring its benefits in every aspect of life (e.g., in domains such as healthcare and industrialization). IoT-based systems have an important role in improving health-related quality of life. Still, they introduce various issues that require immediate attention from developers before considering a large-scale deployment. IoT solutions, e.g., used in activity trackers, and sleep trackers aim at abstracting daily routines to offer better services. Albeit such solutions promote people to improve their typical daily activities, they are recurrently exposed to security breaches due to exposing acquired data to online services [2]. The IoT security aspect in today’s world is the number one operational issue addressed in literature [3]. Researchers proposed various self-protection solutions as countermeasures for reverting a system to a normal state [3]. Still, managing and handling a large number of devices impose challenging operational issues, e.g., energy depletion (battery saving), broken communication (network connectivity), operating environment effects (overheating, cybercrime, storage management), discontinuity in services. Thus, these issues need to be addressed for every single device within an IoT infrastructure. Self-healing is an optimal approach for mitigating such issues with the ultimate goal to build reliable systems. It is related to a set of concepts (e.g., resilient systems [4], cognitive systems [5], responsible and autonomous systems [6]). It is a property of the system to identify and diagnose (i.e., independently) breakdowns and to autonomously determine and implement proper mitigation strategies. More specifically, self-healing confers system reliability through responsibility and awareness of the environment, issues, and expected countermeasures during unanticipated situations. Therefore, a self-healed IoT system should incorporate monitoring, awareness, and knowledge for detecting unforeseen states. Upon detecting a conceivable issue, the system initiates optimized planning for executing proper actions.
The contribution of this study is twofold. First, highlighting the self-healing concerns of IoT solutions. Second, providing an insight into how a self-healing system should behave. We strive to achieve a self-healing system capable of reasoning over security, software, hardware, networking concerns and reacting in a proper way to keep the overall system on tracks without much human intervention. We consider, in our research, the capability of the system to sense, be aware of its environment and take proper actions for overcoming a problem.
The rest of the paper is organized as follows: Sect. 2 introduces the IoT architecture and existing issues. Section 3, present existing self-healing solutions. Section 4 depicts an IoT based case study. Section 5 describes the results of the experiments and evaluations conducted in order to validate the functionality of the system. Section 6 highlights our discussions for a reliable system. We finally outline the conclusion in Sect. 7.

2 IoT Architecture and Existing Issues

We introduce in this section an IoT architecture to better illustrate the operational issues and the required intervention at diverse levels (based on our AMI-Architecture [7]) (Fig. 1). * The Edge (or Device) layer (i.e., the closest layer to end-users/devices) consists of several devices, e.g., sensors, actuators, and smart devices. Geographically distributed, those devices are responsible for sensing environmental information and the edge layer is tasked with acquiring this data. * The fog layer (i.e., the middle layer or the bridge between the cloud layer and the latest layer) encompasses a large number of fog nodes, e.g., routers and gateways. They oversee scheduling, storing, data transmission and managing distributed computation. * The cloud layer is responsible for permanent data storage and extensive computational analysis of data. * The business layer is the data analytics and visualization layer which tends to user needs and specifications.
We can encounter various issues at the diverse layers of the architecture, which in turn prevent IoT solutions to be fully operational. Issues are considered as faults, where faults are unintended defects that channelize to the cause of an error [8]. Hence, an error indicates the incorrect state of a system. Eventually, errors lead to the ineffectiveness of IoT systems. To describe a well-suited self-healing system, we construe the variety and nature of issues. Following IoT architecture depicted in Fig. 1, issues can be categorized into five major tracks as follows:
1)
Networking issues: The Edge layer consists in sensors, actuators, and gateways (Fig. 1). The networks that include these devices communicate with each other through a medium. In most cases, sensors/actuators communicate with the gateway through a local sensor network protocol (e.g., Bluetooth, Zigbee, Wibree, Zwave) to send the acquired environmental data/receive control signals. The gateway communicates with the cloud nodes (i.e., cloud layer), through the fog layer, via the Internet supported protocols (e.g., LTE, 5G, Ethernet, Wi-Fi) to send the acquired data into the cloud database or receive control signals from the cloud. These communication links in the IoT system are frequently exposed to broken communication links, malfunctioning (e.g., incorrect state) or unavailability of the medium [4], which reduces the effectiveness of the overall system.
 
2)
Software issues: Software issues exist in all the layers (Fig. 1), including sensor reading, operating environment, faults in source code, services/application issues. * a) Sensors and/or actuators record environmental data and relay it to gateways. Hence, an error can occur at the reading moment leading to a sensor reading issues [8]. The sensor reading issue is an oscillation in the reading, which can be a flat signal giving an invariant repetition of random/arbitrary values. * b) The change in the operating environment [9] is an issue that affects badly gateways and cloud nodes. It is mainly related to a setting/configuration of a node that becomes not suitable upon an update. A real example of the change in the operating environment can be mistaken in libraries, the direction of changes and version control [8]. Methods can be changed over version update and versions of components or libraries can be no longer compatible. * c) Faults in source code [8] which went unchecked in any prior patches often generate system/service/application failures. * d) Services and applications are also other kinds of the issue [10, 11]. The applications and services built upon the sensors can cause failures when they are in an incorrect state (e.g., on/off/crashed).
 
3)
Hardware issues: Hardware issues are one of the dominant aspects that entangle IoT systems regrouping: Energy depletion, Environmental conditions and Lack of memory and limited computational capacity. * a) Energy depletion is one of the best-known issues encountered by sensors. Indeed, once a sensor’s battery dies, it affects the application related to this sensor and its functionality. Also, it causes a major data loss [12]. Though considerable efforts have been carried toward fixing this issue by implementing an energy-aware, communication concept [4, 13] or expanding the battery capacity to sustain longer activity, the issue still remains. * b) Environmental conditions are also other aspects of the issue [14, 15]. Humidity, heat, vibrations or cosmic rays have a high impact on how sensors or gateways act [8]. Hereafter, these conditions can lower the sensor’s life or capacity preventing its optimal state. * c) Lack of memory and processors overheat are serious issues in IoT systems [4, 8, 9]. Since data is collected through sensors, a lack of memory resources leads to data loss and overload of the processors block the gateways or cloud servers from operating in their optimal way and even disable them.
 
4)
Human interaction issues: There is a high probability that human intervention can be part of a system failure [16] through: Implementation defects and operational mistakes, Co-programming and Formal Design. * a) Implementation defects and operational mistakes signify faults caused by human errors [8]. They are related to the process of deployment or nodes installation. * b) Co-programming is another form of human intervention issues [8]. This case involves a co-working environment where parts of code are uploaded without a proper upload process to avoid inconsistency and discontinuity. * c) Formal design enables building a common data model for the interaction with all the sensors despite their heterogeneity. Thus, a pattern that does not fit the sensors/actuators fields, prevent the gateways/cloud from processing the obtained data.
 
5)
Security issues: Security remains a major hurdle in IoT systems since data is growing fast and privacy is highly required. Security spreads through all the layers making it one of the important aspects to be taken care of [3, 17, 18]. From a data point of view, security flaws can be regarded from two perspectives: Data Security and Security of functioning. * a) Data Security regroups data stealing, data loss, and data privacy. The importance of self-awareness of issues compromising data has been documented [16]. Attackers are well versed in a data attack through techniques such as man-in-the-middle attack, phishing, intrusion to get the information they need. * b) The security of functioning restricts the nodes to work properly. Attackers can access nodes (i.e., Edge layer/cloud layer) through security techniques (e.g., phishing, jamming, flooding). Hence, they prevent the devices to work properly.
 
We present in the next section attempts that have been conducted to solve the aforementioned issues.

3 Existing Self-healing Solutions

Faults and failures are inevitable issues in computer-based systems, due to the inhospitable environment, unattended deployment, or unforeseen situations. Thus, researchers proposed numerous solutions to address these issues. Autonomic computing is the common approach used in IoT to detect and resolve issues in an autonomous manner. This model regroups the capabilities of a self-healing system as follows; monitoring, analyzing, planning, execution, and building knowledge. Following our review of IoT self-healing solutions: * Wallgren et al. [18] designed, implemented and evaluated a system called ‘SVELTE’. It is a lightweight intrusion detection system for the IoT. The workflow of this system is as follows. First, it acts as a listener to gather the information around. Then, an intrusion detection component analyzes the mapped data and detects an intrusion. At last, a firewall was implemented to reduce the node overload by filtering unwanted traffic. * Dai et al. [19] proposed a system based on acquisition, detection, and reaction. It’s a self-protected system based on feature recognition using virtual neurons. They proposed five self-protecting mechanisms each one of them equipped with an algorithm to sense, detect, and prevent possible attacks. * Mendonça et al. [3] proposed an architecture based on MAPE-K [20] loop, which consisted of a sensing phase present on all the nodes and a monitoring phase driven by an Artificial Neural Network Multi-layer perceptron (MLP). Information gathered by the latter phase was processed and analyzed to classify the information as safe, danger or inflammatory signals. Apart from security categories, hardware, software, and networking categories have also been studied. They include hardware failures, energy depletion or disrupted communications due to environmental conditions wind, rain, likewise communication link failure between the gateway and the sensors. * G. Gupta et al. [4] proposed an architecture for the runtime recovery of the sensors from the clusters in which the gateway has experienced faults. It’s a detection and recovery architecture centered around communicating gateways. It can detect whenever one of the gateways suffers a fault and allocate the sensors to a new cluster where the gateway is functional. * Nguyen et al. [21] integrated the MAPE-K to a self-healing system for cyber-physical systems in smart buildings and cities. They emphasize building services for monitoring and processing data and the planning and execution of actions. * Angarita et al. [6, 22] proposed a self-healing system based on transactional web services. They worked on the recovery of web services upon failure, while Al-Dahoud et al. [15] emphasis on fault detection which is the first step to achieving self-healing. * Another effective way to self-healing is IoT virtualization. It’s easy to deal with services and software than hardware. Hence the capacity of a virtualized system becomes easily manageable. e.g., the concept of virtual objects has been invoked to overcome battery consumption and high-processing capabilities that lie in hardware and software categories [23, 24]. The virtual objects replace physical objects of IoT, consequently provide a lot of possibilities in terms of computation, storage and recovery over a fault or a failure [25]. Indeed, services can be handled more flexibly than hardware and can be monitored in different ways improving the capability of the system to recover from an unexpected state.
We present in the next section our AMI IoT case-based system and how the concept of self-healing of IoT is implemented.

4 Self-healing Approach for AMI IoT Architecture

The AMI-lab has been developing several IoT architectures for the past decade as infrastructure to provide adaptable services for aging in place [7]. Based on our experience in developing and deploying IoT solutions, we decided to incorporate a self-healing solution that is not limited to security issues, but also targets hardware, software, human interaction, networking issues as well.
The simplified presentation of AMI-IoT platform, as depicted in Fig. 2 includes four main layers (i.e., end-user environment, network, and cloud) which build the chain of communication. * End-user environment: includes all the environmental components such as sensors (e.g., physical nodes and sensors such as doors, motion, oxygen, bed-embedded sensor), actuators and nodes gathering the data. This component is the key point of the AMI-Platform, being the starting gathering point. * Fog Layer: includes the gateways which are tasked with the data modeling and preprocessing of data before it is transmitted to the cloud. * Network: includes all the components enabling the link between the end-user layer and the cloud layer and responsible for establishing secure communication. * Cloud layer: includes all the components enabling processes and store data (i.e., cloud nodes, database, computational nodes). This layer embeds the virtualization concept by replacing IoT physical objects with virtual objects.
We adopted the autonomic computing approach, that is based on the three main phases (monitoring, analyzing, planning, execution, and knowledge), in implementing self-healing in AMI-Platform. Therefore, we hypothesized that the three elements, Sensing Approach (SenS), Awareness of Issues (AwaR) and Responsibility & Actions (ReAct), are prerequisites to address the IoT self-healing concerns in the three aforementioned layers. Additionally, the virtualization technique has been used and a set of self-healing agents/listeners are spread through the AMI-Platform (Fig. 3).

4.1 Sensing Approach (SenS)

The SenS element is the entry point of the architecture, and it is considered as the most important element. It ensures that the design and the goal of the architecture are both respected. If any unexpected event happens, it takes the responsibility to report the abnormality. The SenS can be regarded as the listening state of the AMI architecture. It is the foundation of the self-healing concept, enabling the power of monitoring and covering the behavior of the architecture.
The AMI-Platform is composed of a set of components and the SenS is responsible for monitoring the whole chain from data detection (i.e.: by sensors/actuators) to its migration from environmental nodes (i.e., sensors to gateway) to the database (i.e., cloud nodes). To fully incorporate the SenS element, the cloud servers monitor all the components (i.e., from the end-user environment to the cloud architecture), but the components of the end-user environment should also monitor themselves. Knowledge is required to be gathered regarding the components of the AMI-Platform to be able to sense accurately and efficiently the components. This knowledge will serve as a mark for the working state of these components. Following SenS in the AMI architecture.
  • SenS in the end-user environment: It includes detection of battery level of all devices and sensors, data detection coming from nodes to the gateway, security level, network stability, connection to the cloud nodes, the gateway’s processor state, and the services’ state.
  • SenS in the middle components: The SenS will be directed to the servers establishing the connection between the peers from the user environment to the cloud environment, the level of security, the services, and the working state(on/off).
  • SenS in the cloud environment includes the security level, the services, data retrieval, data insertion in the database, the working state of the database, the network, the storage, and the connection to the cloud nodes.

4.2 Awareness of Issues (AwaR)

The awareness element intends to make the system aware of abnormalities, which can occur and put the overall system in a disabled state. On this note, a system cannot be aware of failure state if the knowledge around the component is not accurate or well defined. Hence, though the SenS is important for gathering all the components information through monitoring, the AwaR plays its role by providing a parameter (i.e., what is an abnormality based on the sensed information and the knowledge). As a result, AwaR is related to the the layers of AMI-Platform as follows.
  • AwaR in the End-user environment: The IoT infrastructure in the end-user environment is exposed to numerous issues preventing the data to achieve its main course. To face those challenges, AwaR relied on SenS to monitor the component and crosscheck the gathered information with the knowledge at hand. As an example, the battery level is one of the most prevailing issues in IoT. If the battery level falls below 50%, it means the battery is going to run out soon and there will be a discontinuity in the service. Consequently, a notification will be sent describing the sampling (i.e., data acquired by the sensors) as being too high or otherwise the encoding is too consumptive. * Services used by the gateway are another kind of issue. A service state is either on or off. Thus, an “off-state” defines the service as broken and impacts the transmission depending on how high the influence of the service is. Listeners are then applied to services that are vital to the system and upon a failure, the system notifies the issue stating the disabled state of the affected component. * Network state in one of the multiple issues invading the IoT architecture. The network might cut or appear as working but not working which results in the discontinuity of the data (i.e., Data stops being transferred from sensors to gateways or from gateways to cloud nodes). In this case, being aware means being able to notify that something is wrong with the connectivity across the AMI-Platform. On the security level, the firewall is deployed on the end-user side (i.e., gateways). Then, listeners to these firewalls have been developed to check the availability of the rules, a change in those rules, an intrusion into the system, communication with other peers, data anonymity and data leakage. Additionally, human interaction is strictly limited to avoid any change in the operating environment.
  • AwaR in Cloud architecture: it represents the core of the AMI-Platform. It’s all the technologies and methods put together to enable a peer for each environmental node and the storing of the data in the database. Due to the knowledge accumulated and the SenS component, listeners applied to services will notify upon failure and point out the exact service/component which is having an issue. Network issues are rare at this point and battery issue is nearly nonexistent. On the security level, listeners are developed and applied to the firewall to notify at the slightest change in the firewall table and an intrusion that occurred.
    Regarding the security of the data in the database, everyone who is not allowed to access the data will be reported as soon as an attempt is made. Since the peer (i.e., cloud nodes getting the data from the gateway in the end-user environment) is virtualized, when the peer system encounters a high-level failure, the system should also notify the situation in an optimal amount of time. Another critical point is the storage of the data. AMI-Platform deployed listeners to notify when the storage capacity on the cloud servers reached a certain point so that servers will not be overloaded before actions are taken.
    Regarding human interaction, a pipeline has been deployed to notify all the programmers when a code has been uploaded. Validation of the code is then processed to check the continuity with the overall code. In this way, there will be fewer co-programming issues and likewise fewer failures upon deployment.
  • AwaR in Network: Named in IoT architecture, the weak link, due to its public nature, it can be subject to many issues mentioned in the previous subsection. Listeners have been improved and applied to the security level for the firewall table. Notifications will be sent upon addition, subtraction or change in the table and for intrusion detection in the system. The system will notify as well when there is a missing link between the end-user node and the peer in the cloud. Services that are monitored by the SenS will be applied listeners as well to make the system aware of the availability of the services.

4.3 Responsibility and Actions (ReAct)

The ReAct element confers the “responsibility” feature to the AMI-Platform, which enables the autonomy in decision-making (Angarita and Kelaidonis et al. work [6, 23, 24]). A responsible component is achievable only because the SenS and AwaR elements exist. The more elaborated knowledge, the more accurate data and the more efficient the method to deal with an unexpected event. Hence, making a system able to take actions, depending on the outcome of a situation is the key role of this part. Toward a meticulous work, we strive to achieve a self-healing architecture. Since “responsibility” is the way, a component autonomously manages itself, we target to spread the concept through the all the layers of AMI-Platform. Following concrete examples of ReAct implementation in AMI-Platform.
  • ReAct in the End-user environment Concerning the battery level, actions will be taken dynamically on the sampling ratio or the encoding format of the data. Therefore, the battery consumption will drop, and the battery life will be maintained for a longer period. As for services that are monitored, they can be restarted upon failure. Regarding the network state, the link can be reestablished after an idle time. Afterward, notification is sent upon resolution. As for security, AMI-lab deployed a set of agents responsible for reconfiguring the firewall table upon a change detection and logging out a user upon an intrusion detection and reconfiguring the whole gateway environment to ensure the same level of security is respected. Other agents are deployed for data detection and transmission and the communication with the other peer in the cloud.
  • ReAct in Cloud architecture for this part, agents are used in the same way regarding the security level, acting upon change detection in the firewall table, intrusion detection, service failure. Regarding the storage management, agents that have been deployed will clean up space and make more space than notify about what happened and which course of action has been taken to deal with the ongoing issue.
  • ReAct in Network Agents have been deployed much more regarding the security level.

5 Testbed Implementation

To evaluate the reliability of the proposed model, in this section we first focus on the implementation of each component used, located at different layers of the system, then we present and discuss the obtained results.

5.1 System Architecture

A bed-embedded sensor, a motion sensor and a smart IoT gateway were used in the IoT end-user layer, whereas and an IoT virtual gateway and servers were used in the IoT Cloud layer. Bed-embedded sensor: The sensor used in our system, monitors and collect ambient temperature, vital features such as heart rate, wake-up time, sleep time, total time of sleep, bed interruptions and the out of bedtime. The sensor used is a fiber optic mattress. Motion sensor: The sensor used in our system, is an event sensor reacting to movement in a specific area, collecting upon detection.
Both sensors send the collected information to an IoT smart gateway which will process the information. The motion sensor is connected to the gateway through Zwave protocol which is a low power wireless protocol. However, the bed-embedded sensor sends information through a serial connection which is physical and wired.
Smart IoT Gateway:
We used a Raspberry Pi 3 model B with a 1.2 GHz Quad-Core ARM Cortex processor, 1 GB of RAM, and which is permanently connected to an electrical power supply and located in the person room. To allow interoperability of the heterogeneous sensors, Raspberry Pi is equipped with several communication modules. Therefore, a Zwave controller was used to establish a communication with the motion sensor, while a serial connection was used to interact with the bed-embedded sensor. The overall information is then processed and sent over MQTT protocol. An MQTT broker has been implemented on the Raspberry Pi to enhance light communication between two entities i.e., one publishing and the other one subscribing.
IoT Virtual Gateway:
It’s a virtual smart gateway based on python programming language and virtualization technology. It leverages virtualization and through an MQTT client (paho-mqtt) subscribes to the information published by the MQTT broker. The received information is processed and then stored inside a database (Elasticsearch).
Servers:
The servers are built through a virtualization platform (Vsphere). Regarding the database, we used Elasticsearch coupled with Kibana to enable full-text search and process the information in different ways.

5.2 Results and Analysis

The experimentation has been conducted over 4 days. We compared the results with an architecture without self-healing components and an architecture with self-healing components. Upon analysis, several components revealed to be important for the purpose of the architecture. They are mainly:
  • Internet connectivity: It ensures data is not lost and sent properly and keeps track of all the downtimes and uptimes.
  • Connectivity between the gateway and the virtual gateway: Ensures the connection between the physical nodes and the virtual nodes in the cloud to have a reliable data communication channel.
  • Services: They are responsible for collecting the data, modeling the data and pre-processing it before publishing it through MQTT.
  • CPU load: Ensures the gateways are not overloaded.
  • Storage: The Kubernetes system produces a lot of data which can, once reached a threshold, hinder the proper working of the system.
On a prediction base, the amount of data which is supposed to be gathered by the end of the 4th day is the Awaited Data (AWD). The acquired data will be titled as ACD, the number of actions that have been taken and fixed the issue will be titled NAS, while the number of actions that failed is titled NAF. Other metrics such as the Data Coverage (DC), Failed Self-Healing Rate (FSHR), the Successful Self-Healing Rate (SSHR) and the Self-Healing Model Accuracy (SHMA) are described to measure the performance of our system.
  • Data coverage (DC): Is the Acquired Data (ACD) over the Awaited Data (AWD). This metric is used to verify the successful data acquired within a timeframe.
  • Failed Self-Healing Rate (FSHR): It reflects the failed actions (NAF) over the total of actions taken (NTA). It represents the misunderstanding of the system.
  • Successful Self-Healing Rate (SSHR): It is the successful actions (NAS) over the total number of actions (NTA). It represents the understanding of the system and how much of the actions have been taken to fix an issue.
  • Self-Healing Model Accuracy (SHMA): It is the measure of the overall Self-Healing component which determines the accuracy of the Self-Healing component design.
Table 1.
Metrics before self-healing.
Métrics
Hits
ACD
73920
AWD
1774080
DC
0,04(4%)
Table 2.
Metrics after self-healing.
Métrics
Hits
ACD
1331057
AWD
1774080
DC
0,75(75%)
NAF
1
NTA
92
FSHR
0,010(1%)
NAS
91
NTA
92
SSHR
0,98(98%)
SHMA
0,99(99%)
Finally, over the four days, several values were recorded. Looking at the system without the self-healing components, the awaited data was 1774080 hits whereas the acquired data was 73920, giving a DC of 4,2%. Compared with the system coupled with the Self-Healing model the values are different. They were for the awaited data 1774080 hits, the acquired data 1331057 producing a DC of 75%. From those result, we noticed that the implementation of a Self-Healing model, makes the system reliable. The results can be seen through Tables 1 and 2.

6 Discussion

Apart from the solutions described in Sect. 3, we argue that the self-healing in IoT can benefit from the virtualization and the test coverage technics. To start with, Applications/Firmwares that are developed in a way to incorporate unit tests and log functions (i.e., test coverage) make the system alive and make the sensing phase more complete. Additionally, developers that feed IoT applications with the possible issues that can arise and the way to solve them, from the start (at the development and deployment phases), re-enforce the self-healing concept. Hence, the system is already prepared to sense (SenS) the environment, detect the identified issue (AwaR) and react accordingly (ReAct). Furthermore, from the real-world applications of IoT experiments, information and issues shall be gathered more to make an efficient aware system. Consequently, the more knowledge gathered the more efficient the self-healing system.
Bringing forth the virtualization concept in the process, the AMI-Platform emphasized replacing the physical objects by a virtual object. The platform handles more flexibly the process of recovery and embeds the concept of microservices. Thus applications/services are made modular to enable significant adaptability and scalability of the platform offering numerous possibilities Kelaidonis et al. [26].

7 Conclusion

Failures are bound to happen in today IoT solutions. Thus, we focus in this paper on the approach to address these issues. We start in this paper by reviewing the progress of the research works related to IoT issues and classify them into five categories (i.e., Networking, Software, Hardware, Human interactions and Security issues). Moreover, to mitigate such issues, proposed solutions from researchers are analyzed. We found that most of the efforts focus only on one issue, none focuses on a comprehensive approach that addresses all the five.
We reflected in this paper on our approach that covers most of the issues. It used the concept of self-healing agents that act in listener, detector and healer modes. The listener mode allows the agents to sense the environment in the main layers of IoT architecture (i.e., Device, Fog and Cloud layers). This mode enables the collection of data from vital components (e.g., network, security, services). Through the knowledge and the sensed information, the detector mode crosschecks the information to get an accurate detection of an abnormality in the system (i.e., network state down, services down, security lacking). Finally, the healer mode defines the proper course of action to take to revert the system to a normal state.
Moreover, instead of a centralized monitoring/self-healing system, we opted for a agent-based distributed self-healing system which focus on spreading the agents through all the layers. This system builds an optimal and autonomous way to recover from a failure right after an unexpected situation occurs.
We also discussed in this paper the prerequisites (i.e., knowledge, listeners, detectors, healers, virtualization and test coverage technics) to deploy a scalable and resilient IoT infrastructure in order to ensure a self-healing mode. We target in our future work a more efficient IoT system with Quality of Service (QoS) integration via developing an advanced monitoring approaches and fault tolerating models which are to be based on data-analytics and reasoning algorithms over the large amount of data collected.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Literature
3.
go back to reference de Almeida, F.M., Ribeiro, A., Moreno, E.D.: An architecture for self-healing in internet of things. In: Ubicomm (2015) de Almeida, F.M., Ribeiro, A., Moreno, E.D.: An architecture for self-healing in internet of things. In: Ubicomm (2015)
4.
go back to reference Gupta, G., Younis, M.: Fault-tolerant clustering of wireless sensor networks. IEEE Wirel. Commun. Networking Conf. WCNC 3, 1579–1584 (2003) Gupta, G., Younis, M.: Fault-tolerant clustering of wireless sensor networks. IEEE Wirel. Commun. Networking Conf. WCNC 3, 1579–1584 (2003)
7.
go back to reference Abdulrazak, B., Paul, S., Maraoui, S., Rezaei, A., Xiao, T.: IoT Architecture with Plug and Play for fast deployment and system reliability: AMI Platform. In: International Conference On Smart Living and Public Health (2022) Abdulrazak, B., Paul, S., Maraoui, S., Rezaei, A., Xiao, T.: IoT Architecture with Plug and Play for fast deployment and system reliability: AMI Platform. In: International Conference On Smart Living and Public Health (2022)
8.
go back to reference Raghunath, K.K.: Investigation of faults, errors and failures in wireless sensor network: a systematical survey. Int. J. Adv. Comput. Res. 3(12), 3 (2013) Raghunath, K.K.: Investigation of faults, errors and failures in wireless sensor network: a systematical survey. Int. J. Adv. Comput. Res. 3(12), 3 (2013)
9.
go back to reference Koopman, P.: Elements of the Self-Healing System Problem Space. System (2003) Koopman, P.: Elements of the Self-Healing System Problem Space. System (2003)
11.
go back to reference Porter, B., Ta¨ıani, T., Coulson, G.: Generalised repair for overlay networks (2006) Porter, B., Ta¨ıani, T., Coulson, G.: Generalised repair for overlay networks (2006)
12.
go back to reference Lazarescu, M.T.: Design of a WSN platform for long-term environmental monitoring for IoT applications. IEEE J. Emerg. Sel. Top. Circuits Syst. 3(1), 45–54 (2013)CrossRef Lazarescu, M.T.: Design of a WSN platform for long-term environmental monitoring for IoT applications. IEEE J. Emerg. Sel. Top. Circuits Syst. 3(1), 45–54 (2013)CrossRef
14.
go back to reference Begum, R., Shaikh, Syed, A.: Sensor node failure detection in wireless sensor network: a survey (2018) Begum, R., Shaikh, Syed, A.: Sensor node failure detection in wireless sensor network: a survey (2018)
16.
go back to reference Sadek, I., Rehman, S.U., Codjo, J., Abdulrazak, B.: Privacy and security of IoT based healthcare systems: concerns, solutions, and recommendations. In: Pagán, J., Mokhtari, M., Aloulou, H., Abdulrazak, B., Cabrera, M.F. (eds.) ICOST 2019. LNCS, vol. 11862, pp. 3–17. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32785-9_1CrossRef Sadek, I., Rehman, S.U., Codjo, J., Abdulrazak, B.: Privacy and security of IoT based healthcare systems: concerns, solutions, and recommendations. In: Pagán, J., Mokhtari, M., Aloulou, H., Abdulrazak, B., Cabrera, M.F. (eds.) ICOST 2019. LNCS, vol. 11862, pp. 3–17. Springer, Cham (2019). https://​doi.​org/​10.​1007/​978-3-030-32785-9_​1CrossRef
22.
go back to reference Angarita, R., Rukoz, M., Cardinale, Y.: Modeling dynamic recovery strategy for composite web services execution. World Wide Web 19(1), 89–109 (2015)CrossRef Angarita, R., Rukoz, M., Cardinale, Y.: Modeling dynamic recovery strategy for composite web services execution. World Wide Web 19(1), 89–109 (2015)CrossRef
26.
Metadata
Title
Self-healing Approach for IoT Architecture: AMI Platform
Authors
Bessam Abdulrazak
Josué Ayi Codjo
Suvrojoti Paul
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-031-09593-1_1

Premium Partner