Skip to main content
Erschienen in: Soft Computing 15/2019

04.06.2018 | Methodologies and Application

An ensemble of shapelet-based classifiers on inter-class and intra-class imbalanced multivariate time series at the early stage

verfasst von: Guoliang He, Wen Zhao, Xuewen Xia, Rong Peng, Xiaoying Wu

Erschienen in: Soft Computing | Ausgabe 15/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Early classification of time series will weaken the accuracy to some degree. If the time series data are imbalanced, it will be also challenging to accurately identify minority class examples. Up to now, these two problems have been intensively addressed separately on univariate time series data, but yet to be well studied when they occur together. Compared with univariate time series, multivariate time series (MTS) is more complex, which contains multiple variables, and the interconnections between variables are hidden. Therefore, it is even more challenging to handle the combination of both problems on multivariate time series. In this paper, we propose an adaptive classification ensemble method called early prediction on imbalanced MTS to deal with early classification on inter-class and intra-class imbalanced MTS data simultaneously. First, an adaptive ensemble framework is designed to learn an early classification model on imbalanced MTS data. Based on a multiple under-sampling approach and dynamical subspace generation method, the diversity of base classifiers is realized as well as all majority class examples being fully utilized. Second, to deal with the implicit issue of intra-class imbalance in the training data, a cluster-based shapelet selection method is introduced to obtain an optimal set of stable and robust shapelets. Finally, an associate-pattern mining approach is designed to efficiently learn base classifiers, which could enhance the interpretability of classification. Experimental results show that our proposed method can achieve effective early prediction on inter-class and intra-class imbalanced MTS data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, pp 487–499 Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, pp 487–499
Zurück zum Zitat Agrawal M, Singh G, Kumar GR (2012) Predictive data mining for highly imbalanced classification. Int J Emerg Technol Adv Eng 2(12):139–143 Agrawal M, Singh G, Kumar GR (2012) Predictive data mining for highly imbalanced classification. Int J Emerg Technol Adv Eng 2(12):139–143
Zurück zum Zitat Bregón A, Simón M A, Rodríguez JJ, Alonso CJ, et al (2005) Early fault classification in dynamic systems using case-based reasoning. In: Proceedings of the Spanish Association for Artificial Intelligence, pp 211–220 Bregón A, Simón M A, Rodríguez JJ, Alonso CJ, et al (2005) Early fault classification in dynamic systems using case-based reasoning. In: Proceedings of the Spanish Association for Artificial Intelligence, pp 211–220
Zurück zum Zitat Cao H, Li X-L, Woon Y-K, Ng S-K (2013) Integrated oversampling for imbalanced time series classification. IEEE Trans Knowl Data Eng 25(12):2809–2822CrossRef Cao H, Li X-L, Woon Y-K, Ng S-K (2013) Integrated oversampling for imbalanced time series classification. IEEE Trans Knowl Data Eng 25(12):2809–2822CrossRef
Zurück zum Zitat Cao H, Li XL, Woon YK, Ng SK (2011) SPO: structure preserving oversampling for imbalanced time series classification. In: Proceedings of international conference on data mining, pp 1008–1013 Cao H, Li XL, Woon YK, Ng SK (2011) SPO: structure preserving oversampling for imbalanced time series classification. In: Proceedings of international conference on data mining, pp 1008–1013
Zurück zum Zitat Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases, pp 241–256 Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases, pp 241–256
Zurück zum Zitat Diez JJR, González CA, Boström H (2001) Boosting interval based literals: variable length and early classification. Intell Data Anal 5(3):245–262CrossRef Diez JJR, González CA, Boström H (2001) Boosting interval based literals: variable length and early classification. Intell Data Anal 5(3):245–262CrossRef
Zurück zum Zitat Garcia-Trevino ES, Barria JA (2014) Structural generative descriptions for time series classification. IEEE Trans Cybern 44(10):1978–1991CrossRef Garcia-Trevino ES, Barria JA (2014) Structural generative descriptions for time series classification. IEEE Trans Cybern 44(10):1978–1991CrossRef
Zurück zum Zitat Ghalwash MF, Obradovic Z (2012) Early classification of multivariate temporal observations by extraction of interpretable shapelets. BMC Bioinform 13:195CrossRef Ghalwash MF, Obradovic Z (2012) Early classification of multivariate temporal observations by extraction of interpretable shapelets. BMC Bioinform 13:195CrossRef
Zurück zum Zitat Ghalwash MF, Radosavljevic V, Obradovic Z (2013) Extraction of interpretable multivariate patterns for early diagnostics. In: Proceedings of international conference on data mining, pp 201–210 Ghalwash MF, Radosavljevic V, Obradovic Z (2013) Extraction of interpretable multivariate patterns for early diagnostics. In: Proceedings of international conference on data mining, pp 201–210
Zurück zum Zitat Ghalwash MF, Radosavljevic V, Obradovic Z (2014) Utilizing temporal patterns for estimating uncertainty in interpretable early decision making. In: Proceedings of ACM SIGKDD international conference on Knowledge discovery and data mining, pp 402–411 Ghalwash MF, Radosavljevic V, Obradovic Z (2014) Utilizing temporal patterns for estimating uncertainty in interpretable early decision making. In: Proceedings of ACM SIGKDD international conference on Knowledge discovery and data mining, pp 402–411
Zurück zum Zitat Griffin MP, O’Shea TM, Bissonette EA, Harrell FE Jr, Lake DE, Moorman JR (2003) Abnormal heart rate characteristics preceding neonatal sepsis and sepsis-like illness. Pediatr Res 53(6):920–926CrossRef Griffin MP, O’Shea TM, Bissonette EA, Harrell FE Jr, Lake DE, Moorman JR (2003) Abnormal heart rate characteristics preceding neonatal sepsis and sepsis-like illness. Pediatr Res 53(6):920–926CrossRef
Zurück zum Zitat He Q, Dong Z, Zhuang F, Shang T, Shi Z (2012) Fast time series classification based on infrequent shapelets. In: Proceedings of international conference on machine learning and applications, pp 215–219 He Q, Dong Z, Zhuang F, Shang T, Shi Z (2012) Fast time series classification based on infrequent shapelets. In: Proceedings of international conference on machine learning and applications, pp 215–219
Zurück zum Zitat He G, Duan Y, Qian T, Xu C (2013) Early prediction on imbalanced multivariate time series. In: Proceedings of ACM international conference on Information and knowledge management, pp 1889–1892 He G, Duan Y, Qian T, Xu C (2013) Early prediction on imbalanced multivariate time series. In: Proceedings of ACM international conference on Information and knowledge management, pp 1889–1892
Zurück zum Zitat He G, Duan Y, Peng R, Jing X, Qian T, Wang L (2015) Early classification on multivariate time series. Neurocomputing 149:777–787CrossRef He G, Duan Y, Peng R, Jing X, Qian T, Wang L (2015) Early classification on multivariate time series. Neurocomputing 149:777–787CrossRef
Zurück zum Zitat He G, Chen L, Zeng C, Zheng Q, Zhou G (2016) Probabilistic skyline queries on uncertain time series. Neurocomputing 191:224–237CrossRef He G, Chen L, Zeng C, Zheng Q, Zhou G (2016) Probabilistic skyline queries on uncertain time series. Neurocomputing 191:224–237CrossRef
Zurück zum Zitat He G, Li Y, Zhao W (2017) An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification. Knowl Based Syst 124:80–92CrossRef He G, Li Y, Zhao W (2017) An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification. Knowl Based Syst 124:80–92CrossRef
Zurück zum Zitat Ho T (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRef Ho T (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRef
Zurück zum Zitat Köknar-Tezek S, Latecki LJ (2011) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inf Syst 28(1):1–23CrossRef Köknar-Tezek S, Latecki LJ (2011) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inf Syst 28(1):1–23CrossRef
Zurück zum Zitat Liang G (2013) An effective method for imbalanced time series classification: hybrid sampling, AI 2013. Lect Notes Comput Sci 8272:374–385CrossRef Liang G (2013) An effective method for imbalanced time series classification: hybrid sampling, AI 2013. Lect Notes Comput Sci 8272:374–385CrossRef
Zurück zum Zitat Liang G, Zhang C (2012) A comparative study of sampling methods and algorithms for imbalanced time series classification. In: Proceedings of Australasian joint conference on artificial intelligence, pp 637–648 Liang G, Zhang C (2012) A comparative study of sampling methods and algorithms for imbalanced time series classification. In: Proceedings of Australasian joint conference on artificial intelligence, pp 637–648
Zurück zum Zitat Mueen A, Keogh E, Yong N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 1154–1162 Mueen A, Keogh E, Yong N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 1154–1162
Zurück zum Zitat Orsenigo C, Vercellis C (2010) Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification. Pattern Recognit 43:3787–3794MATHCrossRef Orsenigo C, Vercellis C (2010) Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification. Pattern Recognit 43:3787–3794MATHCrossRef
Zurück zum Zitat Petković D, Gocić M, Shamshirband S (2016) Adaptive neuro-fuzzy computing technique for precipitation estimation. Facta Univ Ser Mech Eng 14(2):209–218CrossRef Petković D, Gocić M, Shamshirband S (2016) Adaptive neuro-fuzzy computing technique for precipitation estimation. Facta Univ Ser Mech Eng 14(2):209–218CrossRef
Zurück zum Zitat Ping XO, Tseng YJ, Lin YP, Chiu HJ, Lai F, Liang JD, Huang GT, Yang PM (2015) A multiple measurements case-based reasoning method for predicting recurrent status of liver cancer patients. Comput Ind 69:12–21CrossRef Ping XO, Tseng YJ, Lin YP, Chiu HJ, Lai F, Liang JD, Huang GT, Yang PM (2015) A multiple measurements case-based reasoning method for predicting recurrent status of liver cancer patients. Comput Ind 69:12–21CrossRef
Zurück zum Zitat Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65MATHCrossRef Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65MATHCrossRef
Zurück zum Zitat Ryan HT, Qian Q, Chawla NV, Zhou Z-H (2012) Building decision trees for the multi-class imbalance problem. In: Proceedings of Pacific-Asia conference on knowledge discovery and data mining, pp 122–134 Ryan HT, Qian Q, Chawla NV, Zhou Z-H (2012) Building decision trees for the multi-class imbalance problem. In: Proceedings of Pacific-Asia conference on knowledge discovery and data mining, pp 122–134
Zurück zum Zitat Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40:3358–3378MATHCrossRef Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40:3358–3378MATHCrossRef
Zurück zum Zitat Tan YFV, Cao H, Pang J (2013) MOGT: oversampling with a parsimonious mixture of Gaussian trees model for imbalanced time-series classification. In: MLSP, pp 1–6 Tan YFV, Cao H, Pang J (2013) MOGT: oversampling with a parsimonious mixture of Gaussian trees model for imbalanced time-series classification. In: MLSP, pp 1–6
Zurück zum Zitat Tseng YJ, Ping XO, Liang JD, Yang PM, Huang GT, Lai F (2015) Multiple time series clinical data processing for classification with merging algorithm and statistical measures. IEEE J Biomed Health Inform 15(3):1036–43 Tseng YJ, Ping XO, Liang JD, Yang PM, Huang GT, Lai F (2015) Multiple time series clinical data processing for classification with merging algorithm and statistical measures. IEEE J Biomed Health Inform 15(3):1036–43
Zurück zum Zitat Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295(1):395–406CrossRef Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295(1):395–406CrossRef
Zurück zum Zitat Xing Z, Pei J, Yu PS (2009) Early prediction on time series: a nearest neighbor approach. In: Proceedings of international joint conference on artifical intelligence, pp 1297–1302 Xing Z, Pei J, Yu PS (2009) Early prediction on time series: a nearest neighbor approach. In: Proceedings of international joint conference on artifical intelligence, pp 1297–1302
Zurück zum Zitat Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM SIGKDD Explor 12(1):40–48CrossRef Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM SIGKDD Explor 12(1):40–48CrossRef
Zurück zum Zitat Xing Z, Pei J, Yu PS, Wang K (2011) Extracting interpretable features for early classification on time series. In: Proceedings of SIAM international conference on data mining, pp 247–258 Xing Z, Pei J, Yu PS, Wang K (2011) Extracting interpretable features for early classification on time series. In: Proceedings of SIAM international conference on data mining, pp 247–258
Zurück zum Zitat Xu R, Wunsch D II (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRef Xu R, Wunsch D II (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRef
Zurück zum Zitat Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp. 947–956 Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp. 947–956
Zurück zum Zitat Yoon H, Yang K, Shahabi C (2005) Feature subset selection and feature ranking for multivariate time series. IEEE Trans Knowl Data Eng 17(9):1186–1198CrossRef Yoon H, Yang K, Shahabi C (2005) Feature subset selection and feature ranking for multivariate time series. IEEE Trans Knowl Data Eng 17(9):1186–1198CrossRef
Zurück zum Zitat Zheng Y, Jeon B, Xu D, Wu QM, Zhang H (2015) Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst 28(2):961–973CrossRef Zheng Y, Jeon B, Xu D, Wu QM, Zhang H (2015) Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst 28(2):961–973CrossRef
Metadaten
Titel
An ensemble of shapelet-based classifiers on inter-class and intra-class imbalanced multivariate time series at the early stage
verfasst von
Guoliang He
Wen Zhao
Xuewen Xia
Rong Peng
Xiaoying Wu
Publikationsdatum
04.06.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 15/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-018-3261-3

Weitere Artikel der Ausgabe 15/2019

Soft Computing 15/2019 Zur Ausgabe