Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 2/2023

23-01-2023

Scalable classifier-agnostic channel selection for multivariate time series classification

Authors: Bhaskar Dhariyal, Thach Le Nguyen, Georgiana Ifrim

Published in: Data Mining and Knowledge Discovery | Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Accuracy is a key focus of current work in time series classification. However, speed and data reduction are equally important in many applications, especially when the data scale and storage requirements rapidly increase. Current multivariate time series classification (MTSC) algorithms need hundreds of compute hours to complete training and prediction. This is due to the nature of multivariate time series data which grows with the number of time series, their length and the number of channels. In many applications, not all the channels are useful for the classification task, hence we require methods that can efficiently select useful channels and thus save computational resources. We propose and evaluate two methods for channel selection. Our techniques work by representing each class by a prototype time series and performing channel selection based on the prototype distance between classes. The main hypothesis is that useful channels enable better separation between classes; hence, channels with a larger distance between class prototypes are more useful. On the UEA MTSC benchmark, we show that these techniques achieve significant data reduction and classifier speedup for similar levels of classification accuracy. Channel selection is applied as a pre-processing step before training state-of-the-art MTSC algorithms and saves about 70% of computation time and data storage with preserved accuracy. Furthermore, our methods enable efficient classifiers, such as ROCKET, to achieve better accuracy than using no selection or greedy forward channel selection. To further study the impact of our techniques, we present experiments on classifying synthetic multivariate time series datasets with more than 100 channels, as well as a real-world case study on a dataset with 50 channels. In both cases, our channel selection methods result in significant data reduction with preserved or improved accuracy.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Adams C, Alonso L, Atkin B, Banning J, Bhola S, Buskens R, Chen M, Chen X, Chung Y, Jia Q, Sakharov N, Talbot G, Taylor N, Tart A (2020) Monarch: Google’s planet-scale in-memory time series database. Proc VLDB Endow 13(12):3181–3194. https://doi.org/10.14778/3181-3194 Adams C, Alonso L, Atkin B, Banning J, Bhola S, Buskens R, Chen M, Chen X, Chung Y, Jia Q, Sakharov N, Talbot G, Taylor N, Tart A (2020) Monarch: Google’s planet-scale in-memory time series database. Proc VLDB Endow 13(12):3181–3194. https://​doi.​org/​10.​14778/​3181-3194
go back to reference Avendaño-Valencia LD, Chatzi EN, Koo KY, Brownjohn JM (2017) Gaussian process time-series models for structures under operational variability. Front Built Environ 3:69CrossRef Avendaño-Valencia LD, Chatzi EN, Koo KY, Brownjohn JM (2017) Gaussian process time-series models for structures under operational variability. Front Built Environ 3:69CrossRef
go back to reference Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31:606–660MathSciNetCrossRef Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31:606–660MathSciNetCrossRef
go back to reference Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(5):1–10MathSciNetMATH Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(5):1–10MathSciNetMATH
go back to reference Chou RY (2005) Forecasting financial volatilities with extreme values: the conditional autoregressive range (carr) model. J Money Credit Bank 66:561–582CrossRef Chou RY (2005) Forecasting financial volatilities with extreme values: the conditional autoregressive range (carr) model. J Money Credit Bank 66:561–582CrossRef
go back to reference Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
go back to reference Dhariyal B, Nguyen TL, Ifrim G (2021) Fast channel selection for scalable multivariate time series classification. In: International workshop on advanced analytics and learning on temporal data. Springer, pp 36–54 Dhariyal B, Nguyen TL, Ifrim G (2021) Fast channel selection for scalable multivariate time series classification. In: International workshop on advanced analytics and learning on temporal data. Springer, pp 36–54
go back to reference Garcia S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets’’ for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH Garcia S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets’’ for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH
go back to reference Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH
go back to reference Han S, Niculescu-Mizil A (2020) Supervised feature subset selection and feature ranking for multivariate time series without feature extraction. arXiv preprint arXiv:2005.00259 Han S, Niculescu-Mizil A (2020) Supervised feature subset selection and feature ranking for multivariate time series without feature extraction. arXiv preprint arXiv:​2005.​00259
go back to reference Hu B, Chen Y, Zakaria J, Ulanova L, Keogh E (2013) Classification of multi-dimensional streaming time series by weighting each classifier’s track record. In: 2013 IEEE 13th international conference on data mining, pp 281–290. https://doi.org/10.1109/ICDM.2013.33 Hu B, Chen Y, Zakaria J, Ulanova L, Keogh E (2013) Classification of multi-dimensional streaming time series by weighting each classifier’s track record. In: 2013 IEEE 13th international conference on data mining, pp 281–290. https://​doi.​org/​10.​1109/​ICDM.​2013.​33
go back to reference John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine learning proceedings. Elsevier, pp 121–129 John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine learning proceedings. Elsevier, pp 121–129
go back to reference Kanawaday A, Sane A (2017) Machine learning for predictive maintenance of industrial machines using iot sensor data. In: 2017 8th IEEE international conference on software engineering and service science (ICSESS). IEEE, pp 87–90 Kanawaday A, Sane A (2017) Machine learning for predictive maintenance of industrial machines using iot sensor data. In: 2017 8th IEEE international conference on software engineering and service science (ICSESS). IEEE, pp 87–90
go back to reference Kathirgamanathan B, Cunningham P (2020) A feature selection method for multi-dimension time-series data. In: International workshop on advanced analytics and learning on temporal data. Springer, pp 220–231 Kathirgamanathan B, Cunningham P (2020) A feature selection method for multi-dimension time-series data. In: International workshop on advanced analytics and learning on temporal data. Springer, pp 220–231
go back to reference Kathirgamanathan B, Buckley C, Caulfield B, Cunningham P (2022) Feature subset selection for detecting fatigue in runners using time series sensor data. In: El Yacoubi M, Granger E, Yuen PC, Pal U, Vincent N (eds) Pattern recognition and artificial intelligence. Springer, Cham, pp 541–552CrossRef Kathirgamanathan B, Buckley C, Caulfield B, Cunningham P (2022) Feature subset selection for detecting fatigue in runners using time series sensor data. In: El Yacoubi M, Granger E, Yuen PC, Pal U, Vincent N (eds) Pattern recognition and artificial intelligence. Springer, Cham, pp 541–552CrossRef
go back to reference Le Nguyen T, Gsponer S, Ilie I, O’Reilly M, Ifrim G (2019a) Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations. Data Min Knowl Discov 33(4):1183–1222 Le Nguyen T, Gsponer S, Ilie I, O’Reilly M, Ifrim G (2019a) Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations. Data Min Knowl Discov 33(4):1183–1222
go back to reference Leys C, Ley C, Klein O, Bernard P, Licata L (2013) Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median. J Exp Soc Psychol 49(4):764–766CrossRef Leys C, Ley C, Klein O, Bernard P, Licata L (2013) Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median. J Exp Soc Psychol 49(4):764–766CrossRef
go back to reference Löning M, Bagnall A, Ganesh S, Kazakov V, Lines J, Király FJ (2019) sktime: a unified interface for machine learning with time series. arXiv preprint arXiv:1909.07872 Löning M, Bagnall A, Ganesh S, Kazakov V, Lines J, Király FJ (2019) sktime: a unified interface for machine learning with time series. arXiv preprint arXiv:​1909.​07872
go back to reference Perwass C, Edelsbrunner H, Kobbelt L, Polthier K (2009) Geometric algebra with applications in engineering, vol 4. Springer, Berlin Perwass C, Edelsbrunner H, Kobbelt L, Polthier K (2009) Geometric algebra with applications in engineering, vol 4. Springer, Berlin
go back to reference Riaboff L, Shalloo L, Smeaton A, Couvreur S, Madouasse A, Keane M (2022) Predicting livestock behaviour using accelerometers: a systematic review of processing techniques for ruminant behaviour prediction from raw accelerometer data. Comput Electron Agric 192:106610CrossRef Riaboff L, Shalloo L, Smeaton A, Couvreur S, Madouasse A, Keane M (2022) Predicting livestock behaviour using accelerometers: a systematic review of processing techniques for ruminant behaviour prediction from raw accelerometer data. Comput Electron Agric 192:106610CrossRef
go back to reference Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2020) The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 66:1–49MATH Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2020) The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 66:1–49MATH
go back to reference Sainio K, Grandström ML, Pettay O, Donner M (1983) Eeg in neonatal herpes simplex encephalitis. Electroencephalogr Clin Neurophysiol 56(6):556–561CrossRef Sainio K, Grandström ML, Pettay O, Donner M (1983) Eeg in neonatal herpes simplex encephalitis. Electroencephalogr Clin Neurophysiol 56(6):556–561CrossRef
go back to reference Schäfer P, Högqvist M (2012) Sfa: a symbolic Fourier approximation and index for similarity search in high dimensional datasets. In: Proceedings of the 15th international conference on extending database technology, pp 516–527 Schäfer P, Högqvist M (2012) Sfa: a symbolic Fourier approximation and index for similarity search in high dimensional datasets. In: Proceedings of the 15th international conference on extending database technology, pp 516–527
go back to reference Schäfer P, Leser U (2017) Fast and accurate time series classification with WEASEL. In: Lim E, Winslett M, Sanderson M, Fu AW, Sun J, Culpepper JS, Lo E, Ho JC, Donato D, Agrawal R, Zheng Y, Castillo C, Sun A, Tseng VS, Li C (eds) Proceedings of the 2017 ACM on conference on information and knowledge management (CIKM 2017), Singapore, November 06–10, 2017. ACM, pp 637–646. https://doi.org/10.1145/3132847.3132980 Schäfer P, Leser U (2017) Fast and accurate time series classification with WEASEL. In: Lim E, Winslett M, Sanderson M, Fu AW, Sun J, Culpepper JS, Lo E, Ho JC, Donato D, Agrawal R, Zheng Y, Castillo C, Sun A, Tseng VS, Li C (eds) Proceedings of the 2017 ACM on conference on information and knowledge management (CIKM 2017), Singapore, November 06–10, 2017. ACM, pp 637–646. https://​doi.​org/​10.​1145/​3132847.​3132980
go back to reference Schäfer P, Leser U (2018) Multivariate time series classification with weasel+ muse. ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data (AALTD18), arXiv preprint arXiv:1711.11343 Schäfer P, Leser U (2018) Multivariate time series classification with weasel+ muse. ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data (AALTD18), arXiv preprint arXiv:​1711.​11343
go back to reference Singh A, Le BT, Le Nguyen T, Whelan D, O’Reilly M, Caulfield B, Ifrim G (2021) Interpretable classification of human exercise videos through pose estimation and multivariate time series analysis. In: 5th International workshop on health intelligence (W3PHIAI-21) at AAAI21. Springer Singh A, Le BT, Le Nguyen T, Whelan D, O’Reilly M, Caulfield B, Ifrim G (2021) Interpretable classification of human exercise videos through pose estimation and multivariate time series analysis. In: 5th International workshop on health intelligence (W3PHIAI-21) at AAAI21. Springer
go back to reference Yoon H, Yang K, Shahabi C (2005) Feature subset selection and feature ranking for multivariate time series. IEEE Trans Knowl Data Eng 17(9):1186–1198CrossRef Yoon H, Yang K, Shahabi C (2005) Feature subset selection and feature ranking for multivariate time series. IEEE Trans Knowl Data Eng 17(9):1186–1198CrossRef
Metadata
Title
Scalable classifier-agnostic channel selection for multivariate time series classification
Authors
Bhaskar Dhariyal
Thach Le Nguyen
Georgiana Ifrim
Publication date
23-01-2023
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 2/2023
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-022-00909-1

Other articles of this Issue 2/2023

Data Mining and Knowledge Discovery 2/2023 Go to the issue

Premium Partner