Architecture for hybrid modelling and its application to diagnosis and prognosis with missing data

doi:10.1016/j.measurement.2017.02.003

Measurement

Volume 108, October 2017, Pages 152-162

https://doi.org/10.1016/j.measurement.2017.02.003 Get rights and content

Highlights

•
An architecture for hybrid modelling is proposed oriented to diagnosis and prognosis.
•
Physics-based and data-driven modelling are combined to deal with missing data.
•
Context-driven services are identified as the key to increase the results’ accuracy.
•
A test case is presented for rotating machinery, determining the state of a bearing.
•
A multi-body model and a semi-supervised learning algorithm are used in this work.

Abstract

The advances in technology involving internet of things, cloud computing and big data mean a new perspective in the calculation of reliability, maintainability, availability and safety by combining physics-based modelling with data-driven modelling. This paper proposes an architecture to implement hybrid modelling based on the fusion of real data and synthetic data obtained in simulations using a physics-based model. This architecture has two levels of analysis: an online process carried out locally and virtual commissioning performed in the cloud. The former results in failure detection analysis to avoid upcoming failures whereas the latter leads to both diagnosis and prognosis. The proposed hybrid modelling architecture is validated in the field of rotating machinery using time-domain and frequency-domain analysis. A multi-body model and a semi-supervised learning algorithm are used to perform the hybrid modelling. The state of a rolling element bearing is analysed and accurate results for fault detection, localisation and quantification are obtained. The contextual information increases the accuracy of the results; the results obtained by the model can help improve maintenance decision making and production scheduling. Future work includes a prescriptive analysis approach.

Introduction

Information and communication technologies (ICTs) have made considerable progress in sensing, data storage, data mining, simulation capabilities and online computing. When these ICTs are integrated with physical assets, the result is a cyber-physical system. The internet of things and big data have transformed the connectivity between the elements of an enterprise, leading to the fourth stage of industrialisation, known as Industry 4.0 [1], characterised by smart assets that dynamically interact with each other and with all levels of the business model.

The self-awareness and self-capabilities of cyber-physical systems have an impact on maintenance. Since its first application in the middle of the 20th century to the automotive, military and aerospace industries, condition based maintenance (CBM) has proved to be an efficient maintenance policy from economic and safety points of view. It has many advantages over the two traditional approaches, corrective and scheduled maintenance, including the ability to anticipate fatal failure thanks to updated information on the state of assets. This results in better maintenance planning; tasks are performed only when it is completely necessary, the risk if reaching a faulty state is reduced, and the use of the asset is maximised. Thus, despite the need for an initial investment when first applying this maintenance approach, it reduces the cost of maintenance [2].

CBM, combined with proper decision support systems, leads to a maximisation of resources and increased productivity and, therefore, to business efficiency. Taking this to the next level of analysis and opening the door to 21st century realities, Lee et al. [3] propose a five-level structure (named 5C architecture) for the development of cyber-physical systems within a manufacturing environment. These five levels, shown in Fig. 1, have the following functions: acquiring and storing data using a sensor network; transforming the acquired data into valuable information; connecting the assets to get more knowledge about the individuals using the network information; developing proper human-machine interfaces; and allowing the assets themselves to make decisions about their operation. With this intelligent equipment, the maintenance processes can be automated in such a way as to optimise resources. These tasks require a distributed system framework, as centralising the information decreases the performance [4].

The application of CBM has a number of challenges: defining the problem for which the maintenance policy is to be applied, identifying the application level, measuring performance, selecting the method to apply diagnosis and prognosis, defining the sensing strategy, defining the monitoring strategy, allocating time and resources to conduct experiments, selecting the solution, and, finally, analysing the costs and benefits [5]. This paper focuses on the first step: the methods for carrying out diagnosis and prognosis. The objective of the diagnosis process is to examine symptoms and syndromes to determine the nature of faults or failures (kind, situation, extent), whereas prognosis deals with the analysis of the symptoms of faults to predict future condition and residual life within design parameters [6].

A new trend in modelling combines traditional methods, i.e. physics-based modelling with data-driven modelling [7]. Physics-based modelling is based on the first principles for constructing a set of ordinary or partial differential equations representing the dynamics of the system in certain conditions, even those difficult to achieve by testing. Simulations of different component faults can be done without any cost associated with damage seeding [8]. In contrast, data-driven modelling is based on the construction of a set of equations without any knowledge of the system, simply by relating the inputs to a set of outputs by means of a learning process using a large amount of data obtained from the monitored asset. The combination of these modelling approaches, known as hybrid modelling, uses the advantages of each approach. Its ability to fuse data from different sources could be extremely helpful to maintenance decision making and production scheduling [9].

The new technologies mentioned at the beginning can be used to good effect in hybrid modelling. For one thing, the connectivity of cyber-physical assets allows them to share data and receive information about the required tasks. As a result, they become self-aware and can self-actuate. For another, the computational cost related to the use of physics-based modelling for virtual commissioning is reduced by the use of supercomputers in a cloud computing framework [10]. Moreover, big data capabilities lead to the proper management of the high volume of data stored over the life of the machines of an industry and to improved data mining. In short, hybrid modelling can be used to create smart assets, thereby facilitating and improving CBM.

Despite its obvious promise, at this point, little work discusses hybrid modelling. The diagnosis process is tackled by Matei et al. [11]. They propose a hybrid framework for a railway switch, obtaining accurate results in detection but succeeding only partially in fault identification. Medjaher and Zerhouni [12] present a two-phase methodology for hybrid prognosis. The first phase develops a physics-based model in both healthy and damaged conditions; the second phase computes the residuals when comparing the measurements with the simulation results. These residuals are indicative of the state of the monitored asset; its remaining useful life (RUL) can be computed by comparing the residuals with a predefined performance. A framework called hybrid mathematical informational modelling (HMIM) in which a neural network is used to analyse the differences between the measurements and the mathematical model’s response is proposed by Ghaboussi et al. [13]. They apply the HMIM framework to a beam-to-column connection and conclude that the hybrid approach is capable of representing the issues the mathematical model cannot capture by itself. In contrast, Didona and Romano [14] propose a data fusion framework for measurements and synthetic data generated by a physics-based model and study its implementation in computer systems. Other authors present a data fusion strategy for the prognosis of rolling element bearings (REBs) in flight operating conditions [15] and the degradation of a battery [16]. Data fusion can also relate continuous data with categorical data; the latter type are very common in the real world. Working in this area, Otey et al. [4] suggest a model for outliers and anomaly detection. Although some authors propose frameworks for hybrid modelling [11], [12], [13], there are no clear architectures for this purpose in the research literature.

There are two important things to consider when implementing a CBM strategy for an asset: how to deal with missing data on its reliability evolution and the role of contextual information in its operation.

Maintenance data are formed by pieces of information very different in nature. Data can be acquired from sensors placed at different points of the asset while the operators produce information, either handwritten or digitalised, including work orders, maintenance reports, information about stocks and maintenance planning, among others. These latter records often have poor quality because of inappropriate reporting equipment, missing or lost information during data migration from paper documents to digital sources, unrecorded events, etc. [17].

Another common scenario is a lack of data when assets cannot be operated to their maintenance limit. This situation occurs in many industries, such as the transport, energy or chemistry sectors, in which safety is more important than other factors of efficiency and reliability. Only those elements of low criticality are operated until failure; all components and subsystems affecting the safety of the systems are replaced in early stages of degradation, even far from the maintenance limit, because of strong regulatory conditions. A lack of data also occurs with early replacements of components as a result of opportunistic maintenance [9]. Overprotection and excessive maintenance tasks lead to a situation in which maintainers have little historical information about the behaviour of the assets – a handicap when trying to estimate their future response.

Other reasons for getting missing data or incomplete data are sensor failure, communication failure and storage size restrictions. Thus, there is an interest to prepare the models in advance to overcome these limitations. There are some approaches based on data-driven modelling in the literature, in which some authors use artificial neural networks to improve diagnosis in systems such as wind turbines or cutting machines [18], [19].

It should be highlighted that the aforementioned assets are considered to be systems of systems, characterised by nonlinear structures, with a large scale spatial scope, dynamic and responsive behaviour, and going beyond a single scientific discipline [20]. This implies complexity when implementing a maintenance approach as the interaction between components and subsystems means it is difficult to obtain a complete fault catalogue corresponding to all the individual systems and to the different combinations when they work together and produce new faults.

To summarise, there is a lack of data on the operation of these assets. The amount of data available for maintenance planning is schematically represented in Fig. 2. As the figure shows, few components can be operated until failure (i.e., minimum criticality) or have no degradation. Thus, in the majority of cases, data are obtained until intermediate points are reached between the operating start points and the maintenance threshold. These data points are called suspensions.

Given the lack of data, the use of synthetic data generated by physics-based models describing the operation of the assets is a must. Those scenarios involving common operating conditions can be simulated, as well as those difficult to reproduce in real operation, such as extreme operating conditions or damage situations that cannot be seeded to the system to learn about its behaviour because of safety, economic or environmental reasons. The fusion of synthetic data and acquired data from sensors placed in the assets combining physics-based modelling techniques with purely data-driven methods results in a hybrid modelling approach.

When combining the modelling strategies and fusing data to improve maintenance performance, the concept of context is a key to efficient diagnosis and prognosis [21]. Context is defined as “any information that can be used to characterise the situation of entities (i.e., whether a person, place, or object) that are considered relevant to the interaction between a user and an application, including the user and the application themselves” [22]. This definition, applied to the field of maintenance, means having the appropriate information on the conditions of the operation of an asset. The context includes information such as working temperature, humidity, applied loads, operating speed and information about any other system with which the asset interacts, among others. The context has a great influence on the behaviour of physical assets and should not be omitted. Sensors must be added to obtain this information and provide context-driven services, as shown in Fig. 3.

As mentioned, this paper proposes a framework for hybrid modelling combining the physics-based and data-driven modelling approaches and using both data fusion and context awareness. It suggests a two level architecture. One level is responsible for analysing the condition monitoring (CM) data acquired from an asset for early failure detection. The second level carries out the virtual emulation of the behaviour of the asset and performs deeper analysis using data fusion. The architecture is validated by being applied to a rotating machine; more specifically, the response of a gearbox’s REBs is monitored.

The paper has the following structure. The architecture for hybrid modelling for CBM is proposed in Sections 2 Proposed architecture for hybrid modelling, 3 Validation of the architecture validates the architecture by applying it to a rotating machine; finally, Section 4 offers concluding remarks.

Section snippets

Proposed architecture for hybrid modelling

The proposed hybrid modelling architecture seeks to perform diagnosis and prognosis to provide the information required to optimise operation and maintenance based on the RUL, assuring the reliability and safety of the monitored asset. The workflow of this architecture is depicted in Fig. 4. In this scenario, a physical asset is given the technology necessary for it to acquire smart capabilities. Meeting this goal requires the use of complementary processes and tools.

The architecture is based

Validation of the architecture

This section explains the validation of the proposed architecture, specifically, the hybrid methodology of data fusion and the combination of modelling techniques. The section begins by describing the physical asset used for the validation process. It goes on to explain the physics-based model developed to represent the behaviour of the asset. Next, it introduces the data-driven modelling and the data generation process. It concludes by presenting and discussing the results.

Conclusions

Optimising both the maintenance resources and the maintenance costs are key concerns of maintainers. In recent decades, CBM has been proved a useful tool to achieve these goals. New technologies involving big data, cloud computing and virtual commissioning are now being used in Industry 4.0 to strengthen maintenance. Hybrid modelling is still in its infancy, but it has great potential to improve diagnosis and prognosis and, consequently, to optimise maintenance.

This paper proposes an

Acknowledgements

This study is partially funded by the Ministry of Economy and Competitiveness of the Spanish Government under the Retos-Colaboración Program (LEMA project, RTC-2014-1768-4). Any opinions, findings and conclusions expressed in this article are those of the authors and do not necessarily reflect the views of funding agencies. The authors would also like to thank Fundación de Centros Tecnológicos – Iñaki Goenaga.

References (51)

D. Zhang et al.
Industrial technologies and applications for the Internet of Things
Comput. Netw.
(2016)
J. Lee et al.
A cyber-physical systems architecture for industry 4.0-based manufacturing systems
Manufact. Lett.
(2015)
N. Sawalhi et al.
The enhancement of fault detection and diagnosis in rolling element bearings using minimum entropy deconvolution combined with spectral kurtosis
Mech. Syst. Signal Process.
(2007)
N. Saswade et al.
Virtual machine monitoring in cloud computing
Proc. Comput. Sci.
(2016)
L. Liao et al.
A hybrid framework combining data-driven and model-based methods for system remaining useful life prediction
Appl. Soft Comput.
(2016)
Y. Dijoux et al.
Statistical inference for imperfect maintenance models with missing data
Reliab. Eng. Syst. Saf.
(2016)
L. Bukowski
System of systems dependability – theoretical models and applications examples
Reliab. Eng. Syst. Saf.
(2016)
D. Galar et al.
Context awareness for maintenance decision making: a diagnosis and prognosis approach
Measurement
(2015)
I. El-Thalji et al.
A summary of fault modelling and predictive health monitoring of rolling element bearings
Mech. Syst. Signal Process.
(2015)
R.B. Randall et al.
Rolling element bearing diagnostics – a tutorial
Mech. Syst. Signal Process.
(2011)

J. Antoni et al.

Unsupervised noise cancellation for vibration signals: part II – a novel frequency-domain algorithm

Mech. Syst. Signal Process.

(2004)

H. Endo et al.

Enhancement of autoregressive model based gear tooth fault detection technique by the use of minimum entropy deconvolution filter

Mech. Syst. Signal Process.

(2007)

J. Antoni

The spectral kurtosis: a useful tool for characterising non-stationary signals

Mech. Syst. Signal Process.

(2006)

J. Antoni

Fast computation of the kurtogram for the detection of transient faults

Mech. Syst. Signal Process.

(2007)

Z. Zeng et al.

Semi-supervised feature selection based on local discriminative information

Neurocomputing

(2016)

W. Pedrycz

Algorithms of fuzzy clustering with partial supervision

Pattern Recogn. Lett.

(1985)

A. Prajapati et al.

Condition based maintenance: a survey

J. Quality Maintenance Eng.

(2012)

M.E. Otey et al.

Fast distributed outlier detection in mixed-attribute data sets

Data Mining Knowl. Discovery

(2006)

J. Lee et al.

A systematic approach for predictive maintenance service design: methodology and applications

Int. J. Internet Manufact. Services

(2009)

ISO13372:2012

Condition Monitoring and Diagnostics of Machines – Vocabulary

(2012)

F. Ahmadzadeh et al.

Remaining useful life estimation: review

Int. J. Syst. Assurance Eng. Manage.

(2014)

M. Mishra et al.

Modelización híbrida para el diagnóstico y pronóstico de fallos en el sector del transporte: Datos adquiridos y datos sintéticos

Dyna

(2015)

I. Matei et al.

The case for a hybrid approach to diagnosis: a railway switch

K. Medjaher et al.

Framework for a hybrid prognostics

Chem. Eng. Trans.

(2013)

J. Ghaboussi et al.

Hybrid modelling framework by using mathematics-based and information-based methods

IOP Conf. Ser.: Mater. Sci. Eng.

(2010)

Cited by (38)

Semi-supervised learning for industrial fault detection and diagnosis: A systemic review
2023, ISA Transactions
The automation of Fault Detection and Diagnosis (FDD) is a central task for many industries today. A myriad of methods are in use, although the most recent leading contenders are data-driven approaches and especially Machine Learning (ML) methods. ML algorithms fall into two main categories: supervised and unsupervised methods, depending on whether or not the instances are labeled with the expected outputs. However, a new approach called Semi-Supervised Learning (SSL) has recently emerged that uses a few labeled instances together with other unlabeled instances for the training process. This new approach can significantly improve the accuracy of conventional ML models for industrial environments where labeled data are scarce. SSL has been tested as a promising solution over the past few years for several FDD problems, although there have been no systemic reviews of this sort of approach up until the present review. In this study, an attempt to organize the existing literature on SSL for FDD using the taxonomy of van Engelen & Hoos is reported. The most and the least frequently used SSL algorithms are identified and considered in terms of different fault detection tasks and their most common dataset structure. Moreover, a set of best practices are proposed in the conclusions of this work for implementation under real industrial conditions, so as to avoid some of the most common faults.
A methodology for performance assessment at system level—Identification of operating regimes and anomaly detection in wind turbines
2023, Renewable Energy
In the growing wind energy sector, as in other high investment sectors, the need to make assets profitable has put the spotlight on maintenance. Efficient solutions which leverage from condition or performance based maintenance policies have been proposed during the last decades, but the proposed methods generally focus on individual components or stand for specific application areas. This paper aims to contribute to the development of performance based maintenance strategies within the wind energy sector by providing a condition monitoring based generic methodology for wind turbine performance assessment at system level. The proposed methodology is based on the detection of critical periods in which low performance is detected repeatedly. Multiple machine learning methods and models are applied to assess the wind turbine performance. This methodology has been applied in a case study with SCADA data of eight wind turbines. An analyst could benefit from the implementation of the methodology and the easy-to-interpret results shown in the proposed control chart, especially in cases in which there is less know-how about which variables have higher impact on systems performance.
Visualization methodology of the health state for wind turbines based on dimensionality reduction techniques
2022, Sustainable Energy Technologies and Assessments
Citation Excerpt :
In this context, it is necessary and of great importance to devise methods to track the performance degradation process of the WTs and visualize their health state by extracting effective information from SCADA data based on the data-driven method. The data-driven method includes model-based and similarity metric methods [8–11]. The former, also known as the residual method, is a “black box” machine learning approach that has disadvantages such as lack of physical explanation, sample size dependency, and under/over-fitting.
Performance degradation of wind turbines (WT) is inevitable in long-term operation. Therefore, reliable and intuitive visualization of their health state are essential for condition-based maintenance. It is, however, difficult to extract effective information from the SCADA data due to its massive scale, high dimensionality and nonlinearity. Dimensionality reduction (DR) techniques provide an efficient way for structure visualization of complex datasets. In this paper, we use the diffusion map (DM) algorithm for DR and propose a health state visualization methodology, diffusion map-features geodesic distance (DM-FGD). The DM-FGD fulfills online operation by calculating multivariate statistics of the SCADA data and then reducing its dimensionality. The extracted features present the typical manifold and clustering characteristics. Accordingly, we construct a health indicator, confidence value (CV), and analyze the embedding features by clustering. The CV describes the performance degradation process of the WT and the clustering results illustrate its health status. We further propose a centroid sliding technique to extract the monotonicity characteristic of the CV. Case studies confirm the efficiency of the DM-FGD in real-time health state visualization, and justify its effectiveness and reliability by comparing it with other deviation-based methods. We also verify its extendibility through comparison studies with other DR algorithms.
Prognostics and Health Management (PHM): Where are we and where do we (need to) go in theory and practice
2022, Reliability Engineering and System Safety
Citation Excerpt :
A model based on Auto-Regressive Moving Average (ARMA) and Auto-Associative Neural Networks (AANN), has been developed for fault diagnostics and prognostics of water process systems with incomplete data [130]. An integrated Extreme Learning Machine (ELM)-based imputation-prediction scheme for prognostics of battery data with missing data [125] and an hybrid architecture of physics-based and data-driven approaches have been proposed to deal with missing data in a rotating machinery prognostic application [131]. In the medical field, a Bayesian simulator has been used to generate missing data for developing prognostic models [132] and a Multiple Imputation approach has been embedded within a prognostic model for assessing overall survival of ovarian cancer in presence of missing covariate data [133].
We are performing the digital transition of industry, living the 4th industrial revolution, building a new World in which the digital, physical and human dimensions are interrelated in complex socio-cyber-physical systems. For the sustainability of these transformations, knowledge, information and data must be integrated within model-based and data-driven approaches of Prognostics and Health Management (PHM) for the assessment and prediction of structures, systems and components (SSCs) evolutions and process behaviors, so as to allow anticipating failures and avoiding accidents, thus, aiming at improved safe and reliable design, operation and maintenance.
There is already a plethora of methods available for many potential applications and more are being developed: yet, there are still a number of critical problems which impede full deployment of PHM and its benefits in practice. In this respect, this paper does not aim at providing a survey of existing works for an introduction to PHM nor at providing new tools or methods for its further development; rather, it aims at pointing out main challenges and directions of advancements, for full deployment of condition-based and predictive maintenance in practice.
Semi-supervised data modeling and analytics in the process industry: Current research status and challenges
2021, IFAC Journal of Systems and Control
Citation Excerpt :
Potocnik and Govekar (2017) proposed a semi-supervised vibration-based classification and condition monitoring method for compressors, which combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods. Leturiondo et al. (2017) built an architecture for hybrid modeling and fault diagnosis and prognosis applications with missing data. In this framework, A multi-body model and a semi-supervised learning algorithm have been used to perform the hybrid modeling.
Semi-supervised data are quite common in the process industry, which has caught much attention in recent years. The semi-supervised feature of process data not only has a great impact on data mining and analytics, but also matters in feature extraction and knowledge discovery in the process. In this paper, the framework of semi-supervised data modeling and applications is formulated for the process industry. First, the semi-supervised data structure is introduced, including the causes of semi-supervised data structure, the main feature of the semi-supervised data, and its effects on data modeling and applications in the process industry. Second, detailed research statuses on semi-supervised data modeling and applications in the process industry are illustrated, with introductions of some representative approaches. Third, several challenges and promising issues on modeling and application of semi-supervised data are discussed and highlighted for future research.
Fault prognostics by an ensemble of Echo State Networks in presence of event based measurements
2020, Engineering Applications of Artificial Intelligence
Citation Excerpt :
To the best of our knowledge, few research works have considered fault prognostics in presence of missing data. A model based on Auto-Regressive Moving Average (ARMA) and an auto-associative neural networks, is developed for fault diagnostics and prognostics of water processes with incomplete data (Xiao et al., 2017) and an hybrid architecture including physics-based and data-driven approaches are proposed to deal with missing data in case of rotating machinery (Leturiondo et al., 2017). In the medical field, a Bayesian simulator is used to generate missing data for developing prognostic models (Marshall et al., 2010) and a Multiple Imputation approach is used within a prognostic model for assessing overall survival of ovarian cancer in presence of missing covariate data (Clark and Altman, 2003).
Fault prognostics aims at predicting the degradation of equipment for estimating the Remaining Useful Life (RUL). Traditional data-driven fault prognostic approaches face the challenge of dealing with incomplete and noisy data collected at irregular time steps, e.g. in correspondence of the occurrence of triggering events in the system. Since the values of all the signals are missing at the same time and the number of missing data largely exceeds the number of triggering events, missing data reconstruction approaches are difficult to apply. In this context, the objective of the present work is to develop a one-step method, which directly receives in input the event-based measurement and produces in output the system RUL with the associated uncertainty. Two strategies based on the use of ensembles of Echo State Networks (ESNs), properly adapted to deal with data collected at irregular time steps, have been proposed to this aim. A synthetic and a real-world case study are used to show their effectiveness and their superior performance with respect to state-of-the-art prognostic methods.

View all citing articles on Scopus

View full text

Architecture for hybrid modelling and its application to diagnosis and prognosis with missing data

Highlights

Abstract

Introduction

Section snippets

Proposed architecture for hybrid modelling

Validation of the architecture

Conclusions

Acknowledgements

Comput. Netw.

Manufact. Lett.

Mech. Syst. Signal Process.

Proc. Comput. Sci.

Appl. Soft Comput.

Reliab. Eng. Syst. Saf.

Reliab. Eng. Syst. Saf.

Measurement

Mech. Syst. Signal Process.

Mech. Syst. Signal Process.

Mech. Syst. Signal Process.

Mech. Syst. Signal Process.

Mech. Syst. Signal Process.

Mech. Syst. Signal Process.

Neurocomputing

Pattern Recogn. Lett.

Condition based maintenance: a survey

J. Quality Maintenance Eng.

Fast distributed outlier detection in mixed-attribute data sets

Data Mining Knowl. Discovery

A systematic approach for predictive maintenance service design: methodology and applications

Int. J. Internet Manufact. Services

Condition Monitoring and Diagnostics of Machines – Vocabulary

Remaining useful life estimation: review

Int. J. Syst. Assurance Eng. Manage.

Modelización híbrida para el diagnóstico y pronóstico de fallos en el sector del transporte: Datos adquiridos y datos sintéticos

Dyna

The case for a hybrid approach to diagnosis: a railway switch

Framework for a hybrid prognostics

Chem. Eng. Trans.

Hybrid modelling framework by using mathematics-based and information-based methods

IOP Conf. Ser.: Mater. Sci. Eng.