Sequential pattern mining algorithm for automotive warranty data

doi:10.1016/j.cie.2008.11.006

Computers & Industrial Engineering

Volume 57, Issue 1, August 2009, Pages 137-147

https://doi.org/10.1016/j.cie.2008.11.006 Get rights and content

Abstract

This paper presents a sequential pattern mining algorithm that allows product and quality engineers to extract hidden knowledge from a large automotive warranty database. The algorithm uses the elementary set concept and database manipulation techniques to search for patterns or relationships among occurrences of warranty claims over time. These patterns are represented as IF–THEN sequential rules, where the IF portion of the rule includes one or more occurrences of warranty problems at one time and the THEN portion includes warranty problem(s) that occur at a later time. Once sequential patterns are generated, the algorithm uses rule strength parameters to filter out insignificant patterns, so that only important (significant) rules are reported. Significant patterns provide knowledge of one or more product failures that leads to future product fault(s). The effectiveness of the algorithm is illustrated with the warranty data mining application from the automotive industry. A discussion on the sequential patterns generated by the algorithm and their interpretation for the automotive example are also provided.

Introduction

Many industries, including the automotive industry are faced with the tasks of improving product quality and minimizing warranty costs. Product quality is by-product of the effectiveness of product development processes and their production systems. Thus, product quality can be improved through continuous improvements in product design and development of robust manufacturing and assembly systems. However, no matter how well a product is designed and manufactured, it may fail in the usage environment, either by chance or by some assignable causes. When a product fails within a certain time period, the warranty is a manufacturer’s assurance to a buyer that the product will be repaired without a cost to the customer. In a service environment where dealers are more likely to replace than to repair, the cost of component failure during the warranty period can easily equal three to ten times the supplier’s unit price (Baird, 2000, Feng et al., 2001, Cali, 1993). Consequently, companies invest significant amounts of time and resources to monitor, document, and analyze product warranty data.

Product quality problems are monitored during the warranty period through the claims filed against the products. This process generates large volumes of warranty data records, such as product problems in the form of repair related labor codes, problem descriptions, actions taken, repair dates, and repair costs (labor and parts). Sequential pattern analyses of these data records may provide significant benefits to product manufacturers. A sequential pattern analysis searches for patterns or relationships between data objects in a database that occur over time. The analysis is particularly of interest to automotive Original Equipment Manufacturers (OEM), because it identifies important sequential relationships between various product faults. For example, sequential pattern analysis results may reveal a fault pattern that shows how previous product failures may have led to other product fault(s) at a later time. This knowledge enables companies to effectively predict or discover the root causes of failures that are caused by, or are associated with, the earlier problems. This helps in formulating an action plan to remedy the problems and improve product quality, which leads to significant savings in warranty costs and the attainment of product goodwill.

In this paper, a sequential pattern mining algorithm for automotive warranty data is presented. The proposed algorithm is based on the elementary set concept and database manipulation techniques. The algorithm is constructed to search for significant sequential patterns in preprocessed data sets that are obtained from a large automotive warranty database. The sequential patterns are represented in a form of IF–THEN association rules, where the IF portion of the rule includes quality/warranty problems, represented as labor codes, that occurred in an earlier time, and the THEN portion includes labor codes that occurred at a later time. Once a set of unique sequential patterns is generated, the algorithm applies a set of thresholds to evaluate the significance of the rules and the rules that pass these thresholds are reported in the solution. The major differences of the proposed approach and those reported in the literature are presented at the end of this section.

Several association rule mining algorithms (Agrawal and Srikant, 1994, Agrawal and Shafer, 1996, Han and Kamber, 2006) and sequential pattern mining algorithms (Agrawal and Srikant, 1995, Thomas, 1998, Pei et al., 2004) have been reported in the literature. Agrawal and Srikant (1994) introduced an Apriori algorithm that generates significant association rules between items in a database such that support and confidence of the rules are greater than the user-specified thresholds. However, the algorithm generates a large number of candidate itemsets, whose sizes grow exponentially with the size of a database. To overcome this problem, Agrawal and Srikant (1995) introduced three different Apriori algorithms that define the problem of sequential pattern mining as finding the maximal (longest) sequences of items that have a certain user-specified minimum support. These algorithms use candidate generation technique to address the scalability related shortcomings of their previous approach. Bayardo and Agrawal (1999) proposed metrics for ranking association rules and introduced an algorithm that uses rule support and confidence for extracting best rules from the large data-sets. Pei et al. (2004) proposed the efficient PrefixSpan approach for sequential pattern mining. In PrefixSpan, the global database is projected into a set of smaller (local) databases and sequential patterns are constructed by exploring frequently occurring datasets of local databases.

Many new efficient algorithms are proposed to mine sequential patterns. The differences between these algorithms are mostly related to how they improve computational time by imposing some constraints on the mining process, or in some subtle differences in how they handle the sequential mining process. For example, Yun (2008) uses weight constraints to reduce the number of unimportant patterns, Chen, Cao, Li, and Qian (2008) incorporate user-defined constraints so that the discovered knowledge better meets user needs, Masseglia, Poncelet, and Teisseire (2008) introduce time constraints in early stages of the data mining process, and Chen and Huang, 2008, Fiot et al., 2007 use fuzzy set techniques and the K-means algorithm (Kuo, Chao, & Liu 2009) to achieve better computational efficiency.

Kum, Chang, and Wang (2006) proposed a new sequential pattern mining method based on multiple alignment (rather than the usual support-based approach) for mining multiple databases. Multiple databases are mined and summarized at the local level, and only the summarized patterns are used in the global mining process. Laur, Symphor, Nock, and Poncelet (2007) introduced statistical supports to maximize mining precision and improve the computational efficiency of the incremental mining process. Kum, Chang, and Wang (2007) benchmarked the effectiveness of sequential pattern mining methods by comparing a support-based sequential pattern model with an approximate pattern model based on sequence alignment. Chen and Hu (2007) introduced concepts of recency (an ability to quickly adapt to changes in a database) and compactness, which can cause reasonable time spans for discovering data patterns. They have proposed algorithms that use these concepts to adapt to the frequency of changes in discovered patterns in the database. Lin, Chen, Hao, Chueh, and Chang (2008) introduced the notion of positive and negative sequential patterns, where positive patterns include the presence of an itemset of a pattern, and negative patterns are the ones with the absence of an itemset. Ren, Sun, and Guo (2008) developed an incremental sequential pattern mining process that stores the results from the previous mining and uses them to efficiently mine the database when additional data are added.

Typically, warranty data are strictly confidential for most companies because they relate to product quality, reliability, and are therefore critical to consumers’ product goodwill. As a result, literature on the warranty data analysis of real-life applications is limited to a few published reports (see Blischke and Murthy, 1994, Majeske Herrin, 1995, and Lu 1998). Most models and algorithms developed in warranty analysis studies involve warranty cost analysis and can be divided into two categories: (1) one-dimensional studies, which model product failures and warranty costs as a function of the warranty period (see Blischke and Murthy, 1996, Sahin and Polatoglu, 1998), and (2) two-dimensional studies, which model failures and perform warranty analysis by considering both warranty period and length or frequency of usage (see Murthy et al., 1995, Singpurwalla and Wilson, 1998, Majeske, 2007). In most studies, the warranty analysis concentrates on: (a) modeling of failure patterns to estimate the number of occurrences (or recurrences) of failures (components, subassemblies, or systems) over the warranty period, assuming all the usage conditions are statistically similar and all the warranty claims are reported with no delay, (b) modeling of rectification costs incurred by failures, and (c) modeling of the expected warranty costs (see Karim et al., 2001, Lawless, 1998, Polatoglu and Sahin, 1998, Suzuki et al., 2000, Suzuki et al., 2001, Majeske, 2007, Fredette and Lawless, 2007, and Kulkarni & Resnick 2008). Several studies developed empirical models based on the manufacturer’s field data (i.e., failures and costs over the warranty period) for the warranty cost analysis (see Robinson and McDonald, 1991, Lawless and Kalbfleisch, 1992, Hu and Lawless, 1996). Others use probability distribution functions and statistical models for estimating warranty costs with the incomplete data (see Karim et al., 2001, Wang and Suzuki, 2001). More recent studies are: Gutie´rrez-Pulido, Aguirre-Torres, and Christen (2006), which used a utility-function-based method to determine the appropriate warranty length of a product (brake linings), and Jung and Bai (2007), which applied a bivariate reliability model to estimate the lifetime distribution for products. A comprehensive literature review on warranty data analysis can be found in Murthy and Djamaludin (2002).

Although a number of research studies have been reported on warranty analysis, most of them use statistical approaches for cost and/or reliability analysis (Majeske et al., 1997, Kalbfleisch et al., 1991, Hu and Lawless, 1996, Lawless, 1998), while very few have applied data mining techniques to warranty data (Hotz et al, 1999, Buddhakulsomsiri et al., 2006). Hotz et al. (1999) implemented a data mining support environment for planning warranty and goodwill costs in the automotive industry. Regression analysis and back-propagation neural network were used to construct an automatic prediction tool based on the historical warranty data and goodwill costs. Hotz et al. (2001) later developed statistical and machine learning methods for detecting deviation of warranty costs and for the analysis of warranty and goodwill cost statements. Buddhakulsomsiri et al. (2006) implemented a data mining approach to explore the potential benefits of data mining in automotive warranty data analysis. Potential data mining tasks were identified, based on the type of knowledge to be mined. An association rule generation algorithm was developed for important mining tasks. The algorithm was applied to automotive warranty data to illustrate its effectiveness.

In this paper, a new data mining algorithm is presented that uses the elementary set concept of rough set theory (Pawlak 1997) with some important modifications and database manipulation techniques for identifying significant sequential patterns from a large automotive warranty database. Specifically, the algorithm considers all the possible rules that may be generated from a data set rather than the rules determined from the upper and lower approximations of rough set theory. Furthermore, the algorithm proposed in this paper uses important database set operations to reduce computation time of the rule generation (Buddhakulsomsiri et al., 2006, Siradeghyan et al., 2008). In addition, sequential mining of warranty data has some unique characteristics not encountered in typical data mining problems. Meaning, the same product problem can occur more than once in a given day, which may result in a significant number of duplicate rules during the rule generation process. The proposed algorithm introduces an important procedure (Step 2 of the proposed algorithm) that effectively combines duplicate rules and improves the algorithm’s computational efficiency. We demonstrate the effectiveness of this procedure by showing the number of rules generated by the algorithm with and without the use of this procedure. Finally, this paper presents a unique and perhaps the first data mining application to the automotive warranty problems that arise over time.

The remainder of the paper is organized as follows: Section 2 provides a discussion on the source and characteristics of automotive warranty data and the data preprocessing process used to extract necessary data attributes for the sequential pattern mining. Section 3 presents the sequential pattern mining algorithm. Section 4 presents computation results of the algorithm when applied to a larger automotive warranty data set, with a detailed discussion on sequential pattern generation and interpretation. Conclusions and future research directions are provided in Section 5.

Section snippets

Source of automotive warranty data and data preprocessing

The automotive warranty database contains vehicle attributes and warranty problem related data. Typically, automotive warranty data are obtained from: (1) manufacturing and assembly plants (e.g., vehicle identification number (VIN), production date, product options (attributes), plant ID, supplier data, and so on); (2) automobile dealerships (e.g., VIN, sales date); and (3) repair shops (e.g., repair-related labor code, repair date, mileage-at-repair, labor and part costs, and so on). These

Sequential pattern mining algorithm

The goal of the sequential pattern mining algorithm is to determine associations between two sets of labor codes that occur sequentially and frequently. Such associations provide knowledge about the temporal relationships between diverse product quality problems. The algorithm developed in this study is an extension of the association rule generation algorithm reported in Buddhakulsomsiri et al. (2006). The algorithm includes three different stages. Stage 1 uses the elementary set concept and

Computational results

All three stages of the Sequential Pattern Mining algorithm have been coded in the C#.NET programming environment and the Oracle 9i database is used to organize and manipulate the automotive warranty data. The computation study presented in this section is conducted on actual automotive warranty data sets of a vehicle model that were collected over a 27-month period. A Pentium 4, 2.8 GHz, 512 Mb RAM, personal computer is used in the experiment. The warranty data are analyzed in three-month

Conclusion

This paper presents a data mining algorithm for extracting significant sequential patterns from a large automotive warranty database. The algorithm used the elementary set concept and database manipulation techniques to search for patterns or relationships among occurrences of warranty claims over time. Significant patterns provided knowledge of one (or more) product failures that led to future product fault(s). These patterns were represented as IF–THEN sequential rules, where the IF portion

References (56)

E. Chen et al.
Efficient strategies for tough aggregate constraint-based sequential pattern mining
Information Sciences
(2008)
Y.L. Chen et al.
Constraint-based sequential pattern mining: The consideration of recency and compactness
Decision Support Systems
(2006)
Y.L. Chen et al.
A novel knowledge discovering model for mining fuzzy multi-level sequential patterns in sequence databases
Data and Knowledge Engineering
(2008)
M. Jung et al.
Analysis of field data under two-dimensional warranty
Reliability Engineering and System Safety
(2007)
H.C. Kum et al.
Benchmarking the effectiveness of sequential pattern mining methods
Data and Knowledge Engineering
(2007)
R.J. Kuo et al.
Integration of K-means algorithm and AprioriSome algorithm for fuzzy sequential pattern mining
Applied Soft Computing Journal
(2009)
K.D. Majeske
A non-homogeneous Poisson process predictive model for automobile warranty claims
Reliability Engineering and System Safety
(2007)
K.D. Majeske et al.
Evaluating product and process design changes with warranty data
International Journal of Production Economics
(1997)
D.N.P. Murthy et al.
New product warranty: A literature review
International Journal of Production Economics
(2002)
Z. Pawlak
Rough set approach to knowledge-based decision support
European Journal of Operational Research
(1997)

H. Polatoglu et al.

Probability distributions of cost, revenue and profit over a warranty cycle

European Journal of Operational Research

(1998)

K. Suzuki et al.

Statistical analysis of reliability warranty data

U. Yun

A new framework for detecting weighted sequential patterns in large sequence databases

Knowledge-Based Systems

(2008)

R. Agrawal et al.

Parallel mining of association rules

IEEE Transactions on Knowledge and Data Engineering

(1996)

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th...

Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In P.S. Yu & A.L.P. Chen, Proceedings of the...

P. Baird

Robert Bosch Corporation failure modes effects analysis (FMEA)

(2000)

Bayardo, R. J., & Agrawal, R. (1999). Mining the most interesting rules. In Proceedings of the 5th ACM SIGKDD...

W.R. Blischke et al.

Warranty cost analysis

(1994)

W.R. Blischke et al.

Product warranty handbook

(1996)

J. Buddhakulsomsiri et al.

Association rule generation algorithm for mining automotive warranty data. The special issue on data mining applications in engineering design, manufacturing, and logistics engineering

International Journal of Production Research

(2006)

J. Cali

TQM for purchasing management

(1993)

V. Dhar et al.

Abstract-driven pattern discovery in databases

IEEE Transactions on Knowledge and Data Engineering

(1993)

J. Feng et al.

An optimization model for concurrent selection of tolerances and suppliers

Computers and Industrial Engineering

(2001)

C. Fiot et al.

From crispness to fuzziness: Three algorithms for soft sequential pattern mining

IEEE Transactions on Fuzzy Systems

(2007)

M. Fredette et al.

Finite-horizon prediction of recurrent events, with application to forecasts of warranty claims

Technometrics

(2007)

H. Garcia-Molina et al.

Database systems: The complete book

(2001)

H. Gutie´rrez-Pulido et al.

A Bayesian approach for the determination of warranty length

Journal of Quality Technology

(2006)

Cited by (39)

Two-stage attention network for fault diagnosis and retrieval of fault logs
2024, Expert Systems with Applications
In industrial systems, textual failure records note the failure mechanisms, the parts involved, and the failure symptoms; these records guide fault analysis and repair. However, case retrieval and feature extraction require extensive prior knowledge and diagnostic expertise; this method is time-consuming and labor-intensive. In this article, we present a novel two-stage framework that automatically extracts and generates features from very large textual records. We use an improved weighted latent Dirichlet allocation model and the Word2vec method to extract topic category and semantic features from fault texts; this approach accelerates training convergence. Next, we build a topic-context attention model in which word-embedding semantic features interact with topic features. Finally, we use classification and similarity calculation models to diagnose faults and retrieve similar cases; this approach ensures feature generation. Our method is very granular in terms of case representation which significantly improves diagnosis. The method robustly identifies similar cases by interrogating vehicle maintenance records.
Early detection of reliability related problems from two-dimensional warranty data considering labour code priority index
2022, Reliability Engineering and System Safety
Early detection of reliability related problems through the use of sensitive statistical methods, allowing early actions to mitigate potential reliability problems could save a considerable amount of money and product goodwill for large manufacturing companies. The items which are covered under two-dimensional warranty, both age and usage are important and apart from the number of failures, the cost of repair needs to be considered to adequately tackle the problem. In the present paper, we have addressed the above issues and developed an index, known as Labour Code Priority Index ( $L C P I$ ), based on number of failure at different age-usage windows as well as the cost of repair of the failed items. We have explored various desirable and interesting properties of the index and borrowed the strength of the $C U S U M$ technique applied on the sequentially generated index values to identify early reliability related issues both for a simulated as well as, a real-life synthetic data.
Discovery of path-attribute dependency in manufacturing environments: A process mining approach
2021, Journal of Manufacturing Systems
Citation Excerpt :
The primary task of SPM is to discover frequent sequential patterns in sequence databases. That is widely applied to market-basket data analysis and weblog mining [20], as well as in product failure [9]. Table 1 shows an example of sequence database.
The more knowledge industrial practitioners detain of their production processes, the more they are capable of performing process improvements. Nonetheless, there may exist process characteristics and dependencies that are not easily extractable from business models, such as routing dependent attributes. This paper introduces an algorithm-driven framework to establish whether process path decisions influence the attributes in non-direct sequences, e.g., deploying machine A instead of machine B affects the % of rejected parts on the process, 4 stages down the line. This problem is shown to bears similarities with sequential pattern mining problems. The basis of the solution framework relies on process mining and data mining techniques. The approach proposed is applied on a real industrial log, unveiling deficiencies in the system and providing further improvement recommendations.
Machine learning and data mining in manufacturing
2021, Expert Systems with Applications
Manufacturing organizations need to use different kinds of techniques and tools in order to fulfill their foundation goals. In this aspect, using machine learning (ML) and data mining (DM) techniques and tools could be very helpful for dealing with challenges in manufacturing. Therefore, in this paper, a comprehensive literature review is presented to provide an overview of how machine learning techniques can be applied to realize manufacturing mechanisms with intelligent actions. Furthermore, it points to several significant research questions that are unanswered in the recent literature having the same target. Our survey aims to provide researchers with a solid understanding of the main approaches and algorithms used to improve manufacturing processes over the past two decades. It presents the previous ML studies and recent advances in manufacturing by grouping them under four main subjects: scheduling, monitoring, quality, and failure. It comprehensively discusses existing solutions in manufacturing according to various aspects, including tasks (i.e., clustering, classification, regression), algorithms (i.e., support vector machine, neural network), learning types (i.e., ensemble learning, deep learning), and performance metrics (i.e., accuracy, mean absolute error). Furthermore, the main steps of knowledge discovery in databases (KDD) process to be followed in manufacturing applications are explained in detail. In addition, some statistics about the current state are also given from different perspectives. Besides, it explains the advantages of using machine learning techniques in manufacturing, expresses the ways to overcome certain challenges, and offers some possible further research directions.
Integrating social media and warranty data for fault identification in the cyber ecosystem: A cloud-based collaborative framework
2020, Strategy, Leadership, and AI in the Cyber Ecosystem: The Role of Digital Societies in Information Governance and Decision Making
Fault identification during warranty is quite complex because of sophisticated product design and distributed manufacturing. Various supply chain facilities located at diverse geographical locations are usually utilised to manufacture a particular product. If a fault occurs in one component of a product, it may be linked with other components which are procured and manufactured by other segments of the globally distributed supply chain. Hence, in this multifaceted scenario, the information systems have to be integrated and responsive enough to respond proactively in sharing data from heterogeneous systems across the supply chain in the cyber ecosystem. To achieve this goal, in this chapter, we integrate warranty data from multiple datasets. Initially, social media dataset is used. Consumers increasingly engage in information sharing on weblogs, forums, Facebook, and Twitter, among others. This valuable information is mostly untapped by the automotive manufacturers. To explore the large amount of hidden fault-related data, we used data analytics. Then, we develop a cloud-based collaborative framework to manage the warranty data from other supply chain information systems, namely, design, manufacture and service. The framework provides integration and access of warranty data from multiple datasets of supply chain. The proposed ‘autonomous smart agents’ interaction assists to establish real-time warranty data exchange across the supply chain. The combined data can then be used for detailed expert analysis by fault learning and rectification agent. The execution of the framework is demonstrated using an illustrative execution process. Our contributions are clearly detailed, and some important managerial insights are provided for warranty management in globally distributed supply chain.
Predicting the need for vehicle compressor repairs using maintenance records and logged vehicle data
2015, Engineering Applications of Artificial Intelligence
Citation Excerpt :
In a survey of artificial intelligence solutions in the automotive industry, Gusikhin et al. (2007) discuss fault prognostics, after-sales service and warranty claims. Two representative examples of work in this area are Buddhakulsomsiri and Zakarian (2009) and Rajpathak (2013). Buddhakulsomsiri and Zakarian (2009) present a data mining algorithm that extracts associative and sequential patterns from a large automotive warranty database, capturing relationships among occurrences of warranty claims over time.
Methods and results are presented for applying supervised machine learning techniques to the task of predicting the need for repairs of air compressors in commercial trucks and buses. Prediction models are derived from logged on-board data that are downloaded during workshop visits and have been collected over three years on a large number of vehicles. A number of issues are identified with the data sources, many of which originate from the fact that the data sources were not designed for data mining. Nevertheless, exploiting this available data is very important for the automotive industry as means to quickly introduce predictive maintenance solutions. It is shown on a large data set from heavy duty trucks in normal operation how this can be done and generate a profit.
Random forest is used as the classifier algorithm, together with two methods for feature selection whose results are compared to a human expert. The machine learning based features outperform the human expert features, which supports the idea to use data mining to improve maintenance operations in this domain.

View all citing articles on Scopus

View full text

Sequential pattern mining algorithm for automotive warranty data

Abstract

Introduction

Section snippets

Source of automotive warranty data and data preprocessing

Sequential pattern mining algorithm

Computational results

Conclusion

Information Sciences

Decision Support Systems

Data and Knowledge Engineering

Reliability Engineering and System Safety

Data and Knowledge Engineering

Applied Soft Computing Journal

Reliability Engineering and System Safety

International Journal of Production Economics

International Journal of Production Economics

European Journal of Operational Research

European Journal of Operational Research

Knowledge-Based Systems

Parallel mining of association rules

IEEE Transactions on Knowledge and Data Engineering

Robert Bosch Corporation failure modes effects analysis (FMEA)

Warranty cost analysis

Product warranty handbook

Association rule generation algorithm for mining automotive warranty data. The special issue on data mining applications in engineering design, manufacturing, and logistics engineering

International Journal of Production Research

TQM for purchasing management

Abstract-driven pattern discovery in databases

IEEE Transactions on Knowledge and Data Engineering

An optimization model for concurrent selection of tolerances and suppliers

Computers and Industrial Engineering

From crispness to fuzziness: Three algorithms for soft sequential pattern mining

IEEE Transactions on Fuzzy Systems

Finite-horizon prediction of recurrent events, with application to forecasts of warranty claims

Technometrics

Database systems: The complete book

A Bayesian approach for the determination of warranty length

Journal of Quality Technology