Architecture for hybrid modelling and its application to diagnosis and prognosis with missing data
Introduction
Information and communication technologies (ICTs) have made considerable progress in sensing, data storage, data mining, simulation capabilities and online computing. When these ICTs are integrated with physical assets, the result is a cyber-physical system. The internet of things and big data have transformed the connectivity between the elements of an enterprise, leading to the fourth stage of industrialisation, known as Industry 4.0 [1], characterised by smart assets that dynamically interact with each other and with all levels of the business model.
The self-awareness and self-capabilities of cyber-physical systems have an impact on maintenance. Since its first application in the middle of the 20th century to the automotive, military and aerospace industries, condition based maintenance (CBM) has proved to be an efficient maintenance policy from economic and safety points of view. It has many advantages over the two traditional approaches, corrective and scheduled maintenance, including the ability to anticipate fatal failure thanks to updated information on the state of assets. This results in better maintenance planning; tasks are performed only when it is completely necessary, the risk if reaching a faulty state is reduced, and the use of the asset is maximised. Thus, despite the need for an initial investment when first applying this maintenance approach, it reduces the cost of maintenance [2].
CBM, combined with proper decision support systems, leads to a maximisation of resources and increased productivity and, therefore, to business efficiency. Taking this to the next level of analysis and opening the door to 21st century realities, Lee et al. [3] propose a five-level structure (named 5C architecture) for the development of cyber-physical systems within a manufacturing environment. These five levels, shown in Fig. 1, have the following functions: acquiring and storing data using a sensor network; transforming the acquired data into valuable information; connecting the assets to get more knowledge about the individuals using the network information; developing proper human-machine interfaces; and allowing the assets themselves to make decisions about their operation. With this intelligent equipment, the maintenance processes can be automated in such a way as to optimise resources. These tasks require a distributed system framework, as centralising the information decreases the performance [4].
The application of CBM has a number of challenges: defining the problem for which the maintenance policy is to be applied, identifying the application level, measuring performance, selecting the method to apply diagnosis and prognosis, defining the sensing strategy, defining the monitoring strategy, allocating time and resources to conduct experiments, selecting the solution, and, finally, analysing the costs and benefits [5]. This paper focuses on the first step: the methods for carrying out diagnosis and prognosis. The objective of the diagnosis process is to examine symptoms and syndromes to determine the nature of faults or failures (kind, situation, extent), whereas prognosis deals with the analysis of the symptoms of faults to predict future condition and residual life within design parameters [6].
A new trend in modelling combines traditional methods, i.e. physics-based modelling with data-driven modelling [7]. Physics-based modelling is based on the first principles for constructing a set of ordinary or partial differential equations representing the dynamics of the system in certain conditions, even those difficult to achieve by testing. Simulations of different component faults can be done without any cost associated with damage seeding [8]. In contrast, data-driven modelling is based on the construction of a set of equations without any knowledge of the system, simply by relating the inputs to a set of outputs by means of a learning process using a large amount of data obtained from the monitored asset. The combination of these modelling approaches, known as hybrid modelling, uses the advantages of each approach. Its ability to fuse data from different sources could be extremely helpful to maintenance decision making and production scheduling [9].
The new technologies mentioned at the beginning can be used to good effect in hybrid modelling. For one thing, the connectivity of cyber-physical assets allows them to share data and receive information about the required tasks. As a result, they become self-aware and can self-actuate. For another, the computational cost related to the use of physics-based modelling for virtual commissioning is reduced by the use of supercomputers in a cloud computing framework [10]. Moreover, big data capabilities lead to the proper management of the high volume of data stored over the life of the machines of an industry and to improved data mining. In short, hybrid modelling can be used to create smart assets, thereby facilitating and improving CBM.
Despite its obvious promise, at this point, little work discusses hybrid modelling. The diagnosis process is tackled by Matei et al. [11]. They propose a hybrid framework for a railway switch, obtaining accurate results in detection but succeeding only partially in fault identification. Medjaher and Zerhouni [12] present a two-phase methodology for hybrid prognosis. The first phase develops a physics-based model in both healthy and damaged conditions; the second phase computes the residuals when comparing the measurements with the simulation results. These residuals are indicative of the state of the monitored asset; its remaining useful life (RUL) can be computed by comparing the residuals with a predefined performance. A framework called hybrid mathematical informational modelling (HMIM) in which a neural network is used to analyse the differences between the measurements and the mathematical model’s response is proposed by Ghaboussi et al. [13]. They apply the HMIM framework to a beam-to-column connection and conclude that the hybrid approach is capable of representing the issues the mathematical model cannot capture by itself. In contrast, Didona and Romano [14] propose a data fusion framework for measurements and synthetic data generated by a physics-based model and study its implementation in computer systems. Other authors present a data fusion strategy for the prognosis of rolling element bearings (REBs) in flight operating conditions [15] and the degradation of a battery [16]. Data fusion can also relate continuous data with categorical data; the latter type are very common in the real world. Working in this area, Otey et al. [4] suggest a model for outliers and anomaly detection. Although some authors propose frameworks for hybrid modelling [11], [12], [13], there are no clear architectures for this purpose in the research literature.
There are two important things to consider when implementing a CBM strategy for an asset: how to deal with missing data on its reliability evolution and the role of contextual information in its operation.
Maintenance data are formed by pieces of information very different in nature. Data can be acquired from sensors placed at different points of the asset while the operators produce information, either handwritten or digitalised, including work orders, maintenance reports, information about stocks and maintenance planning, among others. These latter records often have poor quality because of inappropriate reporting equipment, missing or lost information during data migration from paper documents to digital sources, unrecorded events, etc. [17].
Another common scenario is a lack of data when assets cannot be operated to their maintenance limit. This situation occurs in many industries, such as the transport, energy or chemistry sectors, in which safety is more important than other factors of efficiency and reliability. Only those elements of low criticality are operated until failure; all components and subsystems affecting the safety of the systems are replaced in early stages of degradation, even far from the maintenance limit, because of strong regulatory conditions. A lack of data also occurs with early replacements of components as a result of opportunistic maintenance [9]. Overprotection and excessive maintenance tasks lead to a situation in which maintainers have little historical information about the behaviour of the assets – a handicap when trying to estimate their future response.
Other reasons for getting missing data or incomplete data are sensor failure, communication failure and storage size restrictions. Thus, there is an interest to prepare the models in advance to overcome these limitations. There are some approaches based on data-driven modelling in the literature, in which some authors use artificial neural networks to improve diagnosis in systems such as wind turbines or cutting machines [18], [19].
It should be highlighted that the aforementioned assets are considered to be systems of systems, characterised by nonlinear structures, with a large scale spatial scope, dynamic and responsive behaviour, and going beyond a single scientific discipline [20]. This implies complexity when implementing a maintenance approach as the interaction between components and subsystems means it is difficult to obtain a complete fault catalogue corresponding to all the individual systems and to the different combinations when they work together and produce new faults.
To summarise, there is a lack of data on the operation of these assets. The amount of data available for maintenance planning is schematically represented in Fig. 2. As the figure shows, few components can be operated until failure (i.e., minimum criticality) or have no degradation. Thus, in the majority of cases, data are obtained until intermediate points are reached between the operating start points and the maintenance threshold. These data points are called suspensions.
Given the lack of data, the use of synthetic data generated by physics-based models describing the operation of the assets is a must. Those scenarios involving common operating conditions can be simulated, as well as those difficult to reproduce in real operation, such as extreme operating conditions or damage situations that cannot be seeded to the system to learn about its behaviour because of safety, economic or environmental reasons. The fusion of synthetic data and acquired data from sensors placed in the assets combining physics-based modelling techniques with purely data-driven methods results in a hybrid modelling approach.
When combining the modelling strategies and fusing data to improve maintenance performance, the concept of context is a key to efficient diagnosis and prognosis [21]. Context is defined as “any information that can be used to characterise the situation of entities (i.e., whether a person, place, or object) that are considered relevant to the interaction between a user and an application, including the user and the application themselves” [22]. This definition, applied to the field of maintenance, means having the appropriate information on the conditions of the operation of an asset. The context includes information such as working temperature, humidity, applied loads, operating speed and information about any other system with which the asset interacts, among others. The context has a great influence on the behaviour of physical assets and should not be omitted. Sensors must be added to obtain this information and provide context-driven services, as shown in Fig. 3.
As mentioned, this paper proposes a framework for hybrid modelling combining the physics-based and data-driven modelling approaches and using both data fusion and context awareness. It suggests a two level architecture. One level is responsible for analysing the condition monitoring (CM) data acquired from an asset for early failure detection. The second level carries out the virtual emulation of the behaviour of the asset and performs deeper analysis using data fusion. The architecture is validated by being applied to a rotating machine; more specifically, the response of a gearbox’s REBs is monitored.
The paper has the following structure. The architecture for hybrid modelling for CBM is proposed in Sections 2 Proposed architecture for hybrid modelling, 3 Validation of the architecture validates the architecture by applying it to a rotating machine; finally, Section 4 offers concluding remarks.
Section snippets
Proposed architecture for hybrid modelling
The proposed hybrid modelling architecture seeks to perform diagnosis and prognosis to provide the information required to optimise operation and maintenance based on the RUL, assuring the reliability and safety of the monitored asset. The workflow of this architecture is depicted in Fig. 4. In this scenario, a physical asset is given the technology necessary for it to acquire smart capabilities. Meeting this goal requires the use of complementary processes and tools.
The architecture is based
Validation of the architecture
This section explains the validation of the proposed architecture, specifically, the hybrid methodology of data fusion and the combination of modelling techniques. The section begins by describing the physical asset used for the validation process. It goes on to explain the physics-based model developed to represent the behaviour of the asset. Next, it introduces the data-driven modelling and the data generation process. It concludes by presenting and discussing the results.
Conclusions
Optimising both the maintenance resources and the maintenance costs are key concerns of maintainers. In recent decades, CBM has been proved a useful tool to achieve these goals. New technologies involving big data, cloud computing and virtual commissioning are now being used in Industry 4.0 to strengthen maintenance. Hybrid modelling is still in its infancy, but it has great potential to improve diagnosis and prognosis and, consequently, to optimise maintenance.
This paper proposes an
Acknowledgements
This study is partially funded by the Ministry of Economy and Competitiveness of the Spanish Government under the Retos-Colaboración Program (LEMA project, RTC-2014-1768-4). Any opinions, findings and conclusions expressed in this article are those of the authors and do not necessarily reflect the views of funding agencies. The authors would also like to thank Fundación de Centros Tecnológicos – Iñaki Goenaga.
References (51)
- et al.
Industrial technologies and applications for the Internet of Things
Comput. Netw.
(2016) - et al.
A cyber-physical systems architecture for industry 4.0-based manufacturing systems
Manufact. Lett.
(2015) - et al.
The enhancement of fault detection and diagnosis in rolling element bearings using minimum entropy deconvolution combined with spectral kurtosis
Mech. Syst. Signal Process.
(2007) - et al.
Virtual machine monitoring in cloud computing
Proc. Comput. Sci.
(2016) - et al.
A hybrid framework combining data-driven and model-based methods for system remaining useful life prediction
Appl. Soft Comput.
(2016) - et al.
Statistical inference for imperfect maintenance models with missing data
Reliab. Eng. Syst. Saf.
(2016) System of systems dependability – theoretical models and applications examples
Reliab. Eng. Syst. Saf.
(2016)- et al.
Context awareness for maintenance decision making: a diagnosis and prognosis approach
Measurement
(2015) - et al.
A summary of fault modelling and predictive health monitoring of rolling element bearings
Mech. Syst. Signal Process.
(2015) - et al.
Rolling element bearing diagnostics – a tutorial
Mech. Syst. Signal Process.
(2011)
Unsupervised noise cancellation for vibration signals: part II – a novel frequency-domain algorithm
Mech. Syst. Signal Process.
Enhancement of autoregressive model based gear tooth fault detection technique by the use of minimum entropy deconvolution filter
Mech. Syst. Signal Process.
The spectral kurtosis: a useful tool for characterising non-stationary signals
Mech. Syst. Signal Process.
Fast computation of the kurtogram for the detection of transient faults
Mech. Syst. Signal Process.
Semi-supervised feature selection based on local discriminative information
Neurocomputing
Algorithms of fuzzy clustering with partial supervision
Pattern Recogn. Lett.
Condition based maintenance: a survey
J. Quality Maintenance Eng.
Fast distributed outlier detection in mixed-attribute data sets
Data Mining Knowl. Discovery
A systematic approach for predictive maintenance service design: methodology and applications
Int. J. Internet Manufact. Services
Condition Monitoring and Diagnostics of Machines – Vocabulary
Remaining useful life estimation: review
Int. J. Syst. Assurance Eng. Manage.
Modelización híbrida para el diagnóstico y pronóstico de fallos en el sector del transporte: Datos adquiridos y datos sintéticos
Dyna
The case for a hybrid approach to diagnosis: a railway switch
Framework for a hybrid prognostics
Chem. Eng. Trans.
Hybrid modelling framework by using mathematics-based and information-based methods
IOP Conf. Ser.: Mater. Sci. Eng.
Cited by (38)
Semi-supervised learning for industrial fault detection and diagnosis: A systemic review
2023, ISA TransactionsVisualization methodology of the health state for wind turbines based on dimensionality reduction techniques
2022, Sustainable Energy Technologies and AssessmentsCitation Excerpt :In this context, it is necessary and of great importance to devise methods to track the performance degradation process of the WTs and visualize their health state by extracting effective information from SCADA data based on the data-driven method. The data-driven method includes model-based and similarity metric methods [8–11]. The former, also known as the residual method, is a “black box” machine learning approach that has disadvantages such as lack of physical explanation, sample size dependency, and under/over-fitting.
Prognostics and Health Management (PHM): Where are we and where do we (need to) go in theory and practice
2022, Reliability Engineering and System SafetyCitation Excerpt :A model based on Auto-Regressive Moving Average (ARMA) and Auto-Associative Neural Networks (AANN), has been developed for fault diagnostics and prognostics of water process systems with incomplete data [130]. An integrated Extreme Learning Machine (ELM)-based imputation-prediction scheme for prognostics of battery data with missing data [125] and an hybrid architecture of physics-based and data-driven approaches have been proposed to deal with missing data in a rotating machinery prognostic application [131]. In the medical field, a Bayesian simulator has been used to generate missing data for developing prognostic models [132] and a Multiple Imputation approach has been embedded within a prognostic model for assessing overall survival of ovarian cancer in presence of missing covariate data [133].
Semi-supervised data modeling and analytics in the process industry: Current research status and challenges
2021, IFAC Journal of Systems and ControlCitation Excerpt :Potocnik and Govekar (2017) proposed a semi-supervised vibration-based classification and condition monitoring method for compressors, which combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods. Leturiondo et al. (2017) built an architecture for hybrid modeling and fault diagnosis and prognosis applications with missing data. In this framework, A multi-body model and a semi-supervised learning algorithm have been used to perform the hybrid modeling.
Fault prognostics by an ensemble of Echo State Networks in presence of event based measurements
2020, Engineering Applications of Artificial IntelligenceCitation Excerpt :To the best of our knowledge, few research works have considered fault prognostics in presence of missing data. A model based on Auto-Regressive Moving Average (ARMA) and an auto-associative neural networks, is developed for fault diagnostics and prognostics of water processes with incomplete data (Xiao et al., 2017) and an hybrid architecture including physics-based and data-driven approaches are proposed to deal with missing data in case of rotating machinery (Leturiondo et al., 2017). In the medical field, a Bayesian simulator is used to generate missing data for developing prognostic models (Marshall et al., 2010) and a Multiple Imputation approach is used within a prognostic model for assessing overall survival of ovarian cancer in presence of missing covariate data (Clark and Altman, 2003).