1 Introduction
2 Context of the study
2.1 Subject
-
cost (e.g., the number of maintenance visits per week),
-
time (e.g., the time needed to investigate and diagnose and then file a report relating to an operation or the time needed for the event management architecture to update data regarding equipment and mobile system health status),
-
quality (e.g., the quality of the diagnosis at fleet level, avoiding false alarms and breakdowns of critical equipment),
-
adaptability (e.g., the time needed to characterize, understand, or organize the monitoring process for a new or overhauled piece of equipment or system).
2.2 Issues identified in fleet EMADM
2.3 Literature review on fleet EMADM
2.3.1 Centralized fleet EMADM
2.3.2 Edge-centralized fleet EMADM
2.3.3 Decentralized fleet EMADM
2.3.4 Decentralized and cooperative fleet EMADM
2.4 Motivations of the work
3 EMH2
3.1 SurfEvent model
-
A unique identification (its name).
-
Two possible data types: quantitative (e.g., the average time for the pantograph of a train to connect or the average duration of a door access opening cycle on a train) or qualitative (e.g., the global health status of a train door system described as normal, degraded, critical, etc.).
-
Two possible statuses: The first is called “testing and development” when it is used by engineers during a teaching process for a specific holon, and the second is called “production” when it is generated by a mobile system during use.
-
Two possible origins: “calculated” (generated from other events) or “original” (obtained directly using sensors).
-
A unique source called “emitter.” An emitter is associated with the hierarchical structure of every mobile system.
3.2 Holonic architecture
-
A set of holons grouped by family of equipment.
-
An interface module for inter- and intra-connection of the different components of the holonic level.
-
A workflow management system [29] looks out for events from sensors and/or other holonic levels. When an event arrives, the system identifies its origin (the sending system) and triggers the corresponding holons in charge of supporting it. Figure 11 schematizes the layout of all the holonic levels in EMH2.×
3.3 Holon design
-
The proposed holon knowledge base is based on a context-free grammar. The choice of this grammar [31] has been the subject of a comparative study not described in this paper. A type-2 grammar has been selected to allow us to generate the mathematical, logical, textual, and temporal expressions associated with the field (calculate) of a SurfEvent. As with a SurfEvent, knowledge can be in “test mode” (using a simulator, for example) or in “production mode” (validated and currently in use).
-
The proposed holon inference engine is based on a pushdown automaton [31] to allow context-free grammar recognition. The choice of this automaton has been the subject of a comparative study not described in this paper.
-
Cycle-based reasoning mode: A cycle is defined by its start and its end conditions. Each condition is a set of SurfEvents. Its calculation depends heavily on the physical characteristics of the target system. A cycle has the following characteristics:
-
All SurfEvents within the cycle are unique.
-
A cycle has a start date and an end date.
-
The duration of a cycle is variable.
-
-
State-based reasoning mode: In this mode, reasoning processes start when a change in the value of each SurfEvent is observed. For SurfEvents that are not observed, their last observed values are retained. The advantage of this mode is that no knowledge about the system to be monitored is necessary. However, continuous monitoring and a memory device are required. A typical example of a physical piece of equipment concerned by this mode is a lamp (switched on/off). A formal description of this reasoning is as follows:$$ state\left( {s,t} \right) \leftrightarrow \forall SurfEvent \in s, \left( { SurfEvent_{t} = SurfEvent_{t - 1} } \right) $$where \( SurfEvent_{t} \) is the change in state of a SurfEvent at a given time t and \( state(s,t) \) represents the state of the system s at the time t.$$ \vee \left( { SurfEvent_{t} \ne SurfEvent_{t - 1} } \right),\;{t} \in \left[ {0,+\infty } \right), $$Backward chaining is used for these two modes (cycle-based or state-based):
-
When a SurfEvent occurs, the expert system considers it as a fact. All the rules of the knowledge base where this SurfEvent is identified are selected.
-
The execution order of the selected rules is established according to their abstraction level. The signal level is of the highest priority, followed by the indicator level, and so on.
-
For a given abstraction level, an order of execution of the rules is established starting with simple rules and finishing with complex ones. A rule is said to be “simple” when its evaluation does not require any cooperation or exchange with other holons or any heterogeneous system; otherwise, it is called “complex.”
-
A rule is executed in either cycle- or state-based mode.
-
The result serves as a fact for the other rules.
-
If the expert system cannot explain a fact, the holon inference engine requests an explanation from a specific module in charge of learning, named adaptation module (not described in this paper).
-
4 Deployment process and methodological aspects
-
For mobile systems Iterative deployment of the different holonic levels for each layer (data acquisition, then data manipulation, then state detection, etc.) can be achieved progressively and specifically depending on the mobile systems chosen and the equipment to be monitored. In addition, within a single holonic level, the progressive implementation and localization of hardware can be defined for every holon according to the constraints of embedded system calculators (available memory space, computing power, communication bandwidth, etc.).
-
For intermediary edge computing (EC) nodes Progressive deployment of EC nodes can limit the transmission of large volumes of data and events from mobile systems through the implementation of data acquisition, data manipulation, state detection, and health assessment holonic levels at intermediary nodes between the mobile system and the MC. These nodes can be deployed according to criteria expressed by fleet operators, for example by building facility, by region, or by country.
-
For the MC Whatever the state of the progressive implementation, mobile systems that are not connected to EC nodes will be directly connected to the MC.
-
A public key, known by all the mobile systems and heterogeneous systems, used to encrypt and decrypt SurfEvents during exchanges between the mobile systems and the MC.
-
A private key, known by the unique mobile system that holds it, used to encrypt, decrypt, and authenticate the signature of its mobile system.
5 A case study: a real application of the proposed method to a fleet of trains
-
KPI#1 Number of fleet maintenance visits (corrective, preventive, and unplanned) per week. This indicator is a measure of the maintenance costs generated using a given EMADM.
-
KPI#2 Time needed by a maintenance operator to investigate and diagnose, and then generate reports and follow-ups relating to a maintenance operation. This KPI translates the quality of the diagnostic processes and the rapidity of the operation using the given EMADM.
-
KPI#3 Time needed by the given EMADM to update data regarding the health status and monitoring of a train. This indicator translates the reactivity of the event management architecture when events occur.
KPI | #1 | #2 | #3 |
---|---|---|---|
Number of maintenance visits per week | Time needed to investigate and diagnose and then generate reports relating to an operation | Time needed for the event management architecture to update data regarding health status | |
Current situation |
\( > 9\; {\text{visits}}\;{\text{per}}\;{\text{week}} \)
|
\( \ge 45\,{ \hbox{min} } \)
|
\( 24\,{\text{h}} \)
|
KPI | #1 | #2 | #3 |
---|---|---|---|
Target | < 9 visits per week | < 30 min | ≤ 5 h |
5.1 Experimental study
5.2 Teaching the team in charge of maintenance operations
5.3 Assessment of the results obtained and comparison with targets
KPI | #1 | #2 | #3 |
---|---|---|---|
Results achieved | A maximum of eight visits per week | 20 min | 2 h |
-
KPI#1 The entire life cycle of a SurfEvent is effectively under control, from its generation to the final maintenance tasks. A detailed calculation of this KPI over several weeks (sliding mode) is provided in Fig. 16. As one can see, the maximum number of visits (eight) was only encountered once. Generally, only one visit was conducted per week.×
-
KPI#2 Today, thanks to the troubleshooting assistance tool, all the tasks are automated. This explains the decrease in the average duration of interventions to less than 20 min, which is mainly due to the direct control mechanisms of the systems concerned and sharing experience feedback from the entire fleet.
-
KPI#3 Up to now, the reactivity cycle has been reduced to 2 h. This is due to the ability of the EMH2 architecture to control the reactivity of the entire maintenance chain, that is to say, the supervision, diagnosis, planning, and optimization phases of maintenance operations.
5.4 Global discussion about the case study
-
First, during the experiment, we encountered a lack of efficiency of the adaptation process that leads to new accurate knowledge regarding a newly integrated piece of equipment and the ability of the event management architecture to integrate this new knowledge. For example, changing a door requires the knowledge of its monitoring holon to be updated. An adaptation module is under development to solve this kind of issue and generate the relevant knowledge automatically.
-
Second, despite the fact that for this case study the KPI improved, the complete validation and the optimization of the EMH2 remain to be done through simulations, for example, with the integration of prognosis and dynamic maintenance holonic levels to assess the performance of a global and dynamic fleet level maintenance strategy. Indeed, a single case study is insufficient to draw conclusions as to the effectiveness of the EMH2 for any kind of train fleet.