research-article

Explainable Multivariate Time Series Classification: A Deep Neural Network Which Learns to Attend to Important Variables As Well As Time Intervals

Authors:
Tsung-Yu Hsieh

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Suhang Wang

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Yiwei Sun

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Vasant Honavar

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data MiningMarch 2021Pages 607–615https://doi.org/10.1145/3437963.3441815

Published:08 March 2021Publication History

WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

Pages 607–615

ABSTRACT

Many real-world applications, e.g., healthcare, present multi-variate time series prediction problems. In such settings, in addition to the predictive accuracy of the models, model transparency and explainability are paramount. We consider the problem of building explainable classifiers from multi-variate time series data. A key criterion to understand such predictive models involves elucidating and quantifying the contribution of time varying input variables to the classification. Hence, we introduce a novel, modular, convolution-based feature extraction and attention mechanism that simultaneously identifies the variables as well as time intervals which determine the classifier output. We present results of extensive experiments with several benchmark data sets that show that the proposed method outperforms the state-of-the-art baseline methods on multi-variate time series classification task. The results of our case studies demonstrate that the variables and time intervals identified by the proposed method make sense relative to available domain knowledge.

References

Amaia Abanda, Usue Mori, and Jose A Lozano. 2019. A review on distance based time series classification. Data Mining and Knowledge Discovery , Vol. 33, 2 (2019), 378--412.Google ScholarDigital Library
Marco Ancona, Cengiz Oztireli, and Markus Gross. 2019. Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), , Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, Long Beach, California, USA, 272--281. http://proceedings.mlr.press/v97/ancona19a.htmlGoogle Scholar
Anthony Bagnall, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. 2017. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery , Vol. 31, 3 (2017), 606--660.Google ScholarDigital Library
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR .Google Scholar
Donald J Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series.. In KDD workshop , Vol. 10. Seattle, WA, 359--370.Google ScholarDigital Library
Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José MF Moura, and Peter Eckersley. 2020. Explainable machine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 648--657.Google ScholarDigital Library
Peter Bloomfield. 2004. Fourier analysis of time series: an introduction .John Wiley & Sons.Google ScholarCross Ref
Prithwish Chakraborty, Manish Marwah, Martin Arlitt, and Naren Ramakrishnan. 2012. Fine-grained photovoltaic output prediction using a bayesian ensemble. In AAAI .Google Scholar
Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. 2018. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation. In ICML . 883--892.Google Scholar
Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv:1409.1259 (2014).Google Scholar
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In NeurIPS. 3504--3512.Google ScholarDigital Library
BA Conway, DM Halliday, SF Farmer, U Shahani, P Maas, AI Weir, and JR Rosenberg. 1995. Synchronization between motor cortex and spinal motoneuronal pool during the performance of a maintained motor task in man. The Journal of physiology , Vol. 489, 3 (1995), 917--924.Google ScholarCross Ref
Enyan Dai, Yiwei Sun, and Suhang Wang. 2020. Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repository. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 853--862.Google ScholarCross Ref
Vijay Ekambaram, Kushagra Manglik, Sumanta Mukherjee, Surya Shravan Kumar Sajja, Satyam Dwivedi, and Vikas Raykar. 2020. Attention based Multi-Modal New Product Sales Time-series Forecasting. In KDD . 3110--3118.Google Scholar
Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. 2019. Deep learning for time series classification: a review. Data Mining and Knowledge Discovery , Vol. 33, 4 (2019), 917--963.Google ScholarDigital Library
Garrett M Fitzmaurice, Nan M Laird, and James H Ware. 2012. Applied longitudinal analysis . Vol. 998. John Wiley & Sons.Google Scholar
Jean-Yves Franceschi, Aymeric Dieuleveut, and Martin Jaggi. 2019. Unsupervised scalable representation learning for multivariate time series. In NeurIPS . 4650--4661.Google Scholar
Meredith Franklin, Petros Koutrakis, and Joel Schwartz. 2008. The role of particle composition on the association between PM2. 5 and mortality. Epidemiology (Cambridge, Mass.) , Vol. 19, 5 (2008), 680.Google Scholar
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements of statistical learning . Vol. 1. Springer series in statistics New York.Google Scholar
Nicholas Frosst and Geoffrey Hinton. 2017. Distilling a neural network into a soft decision tree. arXiv:1711.09784 (2017).Google Scholar
Ben D Fulcher and Nick S Jones. 2014. Highly comparative feature-based time-series classification. IEEE TKDE , Vol. 26, 12 (2014), 3026--3037.Google Scholar
Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. circulation , Vol. 101, 23 (2000), e215--e220.Google Scholar
Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM Computing Surveys (CSUR) , Vol. 51, 5 (2018), 1--42.Google ScholarDigital Library
Tian Guo, Tao Lin, and Nino Antulov-Fantulin. 2019. Exploring interpretable LSTM neural networks over multi-variable data. In ICML . 2494--2504.Google Scholar
Min Han and Xiaoxin Liu. 2013. Feature selection techniques with class separability for multivariate time series. Neurocomputing , Vol. 110 (2013), 29--34.Google ScholarDigital Library
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction .Springer Science & Business Media.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation , Vol. 9, 8 (1997), 1735--1780.Google Scholar
Aria Khademi and Vasant Honavar. 2020. A Causal Lens for Peeking into Black Box Predictive Models: Predictive Model Interpretation via Causal Attribution. arXiv:2008.00357 (2020).Google Scholar
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014).Google Scholar
Thanh Le and Vasant Honavar. 2020. Dynamical Gaussian Process Latent Variable Model for Representation Learning from Longitudinal Data. In Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference. 183--188.Google ScholarDigital Library
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature , Vol. 521, 7553 (2015), 436--444.Google Scholar
Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. 2019. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In NeurIPS. 5244--5254.Google Scholar
Junjie Liang, Yanting Wu, Dongkuan Xu, and Vasant Honavar. 2021. Longitudinal Deep Kernel Gaussian Process Regression. In Proceedings of the 35th AAAI Conference on Artificial Intelligence. In press.Google Scholar
Junjie Liang, Dongkuan Xu, Yiwei Sun, and Vasant G Honavar. 2020. LMLFM: Longitudinal Multi-Level Factorization Machine. In AAAI .Google Scholar
Xuan Liang, Tao Zou, Bin Guo, Shuo Li, Haozhe Zhang, Shuyi Zhang, Hui Huang, and Song Xi Chen. 2015. Assessing Beijing's PM2. 5 pollution: severity, weather impact, APEC and winter heating. Proc. R. Soc. A: Mathematical, Physical and Engineering Sciences , Vol. 471, 2182 (2015), 20150257.Google Scholar
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In NeurIPS . 4765--4774.Google Scholar
Philippe Major and Elizabeth A Thiele. 2007. Seizures in Children: Laboratory. Pediatrics in review , Vol. 28, 11 (2007), 405.Google Scholar
Julieta Martinez, Michael J Black, and Javier Romero. 2017. On human motion prediction using recurrent neural networks. In CVPR. IEEE, 4674--4683.Google Scholar
Shane T Mueller, Robert R Hoffman, William Clancey, Abigail Emrey, and Gary Klein. 2019. Explanation in human-AI systems: A literature meta-review, synopsis of key ideas and publications, and bibliography for explainable AI. arXiv:1902.01876 (2019).Google Scholar
Meinard Müller. 2007. Dynamic time warping. Information retrieval for music and motion (2007), 69--84.Google ScholarDigital Library
W James Murdoch, Peter J Liu, and Bin Yu. 2018. Beyond word importance: Contextual decomposition to extract interactions from LSTMs. arXiv:1801.05453 (2018).Google Scholar
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv:1609.03499 (2016).Google Scholar
Donald B Percival and Andrew T Walden. 2000. Wavelet methods for time series analysis . Vol. 4. Cambridge university press.Google ScholarCross Ref
Tue Hvass Petersen, Maria Willerslev-Olsen, Bernard A Conway, and Jens Bo Nielsen. 2012. The motor cortex drives the muscles during walking in human subjects. The Journal of physiology , Vol. 590, 10 (2012), 2443--2452.Google ScholarCross Ref
Wei-wei Pu, Xiu-juan Zhao, Xiao-ling Zhang, and Zhi-qiang Ma. 2011. Effect of meteorological factors on PM2. 5 during July to September of Beijing. Procedia Earth and Planetary Science , Vol. 2 (2011), 272--277.Google ScholarCross Ref
Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison W Cottrell. 2017. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. In IJCAI .Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why should I trust you?”: Explaining the predictions of any classifier. In KDD. ACM, 1135--1144.Google Scholar
Gerwin Schalk, Dennis J McFarland, Thilo Hinterberger, Niels Birbaumer, and Jonathan R Wolpaw. 2004. BCI2000: a general-purpose brain-computer interface (BCI) system. IEEE TBME , Vol. 51, 6 (2004), 1034--1043.Google Scholar
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In IEEE ICCV . 618--626.Google ScholarDigital Library
Ali Hossam Shoeb. 2009. Application of machine learning to epileptic seizure onset detection and treatment . Ph.D. Dissertation. Massachusetts Institute of Technology.Google Scholar
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In ICML. JMLR. org, 3145--3153.Google Scholar
Avanti Shrikumar, Peyton Greenside, Anna Shcherbina, and Anshul Kundaje. 2016. Not just a black box: Learning important features through propagating activation differences. arXiv:1605.01713 (2016).Google Scholar
Yiwei Sun, Ngot Bui, Tsung-Yu Hsieh, and Vasant Honavar. 2018. Multi-view network embedding via graph factorization clustering and co-regularized multi-view agreement. In ICDM Workshop. IEEE, 1006--1013.Google ScholarCross Ref
Yiwei Sun and Shabnam Ghaffarzadegan. 2020. An Ontology-Aware Framework for Audio Event Classification. In ICASSP. IEEE, 321--325.Google Scholar
Yiwei Sun, Suhang Wang, Tsung-Yu Hsieh, Xianfeng Tang, and Vasant Honavar. 2019. MEGAN: a generative adversarial network for multi-view network embedding. In IJCAI. AAAI Press, 3527--3533.Google Scholar
Xianfeng Tang, Yandong Li, Yiwei Sun, Huaxiu Yao, Prasenjit Mitra, and Suhang Wang. 2020 a. Transferring Robustness for Graph Neural Network Against Poisoning Attacks. In WSDM . 600--608.Google Scholar
Xianfeng Tang, Huaxiu Yao, Yiwei Sun, Charu C Aggarwal, Prasenjit Mitra, and Suhang Wang. 2020 b. Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values.. In AAAI. 5956--5963.Google Scholar
Yue Wu, José Miguel Hernández Lobato, and Zoubin Ghahramani. 2013. Dynamic covariance models for multivariate financial time series. In ICML . III--558.Google Scholar
Yanbo Xu, Siddharth Biswal, Shriprasad R Deshpande, Kevin O Maher, and Jimeng Sun. 2018. Raim: Recurrent attentive and intensive model of multimodal patient monitoring data. In KDD . 2565--2573.Google ScholarDigital Library
Xiang Xuan and Kevin Murphy. 2007. Modeling changing dependency structure in multivariate time series. In ICML . 1055--1062.Google Scholar
Lexiang Ye and Eamonn Keogh. 2009. Time series shapelets: a new primitive for data mining. In KDD. 947--956.Google Scholar
Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan. 2008. Sensor-based abnormal human-activity detection. IEEE TKDE , Vol. 20, 8 (2008), 1082--1090.Google Scholar
Hyunjin Yoon and Cyrus Shahabi. 2006. Feature subset selection on multivariate time series with extremely large spatial features. In ICDM Workshop). IEEE, 337--342.Google ScholarDigital Library
Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2018. INVASE: Instance-wise variable selection using neural networks. In ICLR .Google Scholar
Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. 2015. Understanding neural networks through deep visualization. arXiv:1506.06579 (2015).Google Scholar
Ye Yuan, Guangxu Xun, Fenglong Ma, Yaqing Wang, Nan Du, Kebin Jia, Lu Su, and Aidong Zhang. 2018. Muvan: A multi-view attention network for multivariate temporal data. In ICDM. IEEE, 717--726.Google Scholar
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In ECCV. Springer, 818--833.Google Scholar
Xi Sheryl Zhang, Fengyi Tang, Hiroko H Dodge, Jiayu Zhou, and Fei Wang. 2019. Metapred: Meta-learning for clinical risk prediction with limited patient electronic health records. In KDD. 2487--2495.Google ScholarDigital Library

Index Terms

Explainable Multivariate Time Series Classification: A Deep Neural Network Which Learns to Attend to Important Variables As Well As Time Intervals
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection
    2. Machine learning approaches
      1. Neural networks

Recommendations

XEM: An explainable-by-design ensemble method for multivariate time series classification
Abstract
We present XEM, an eXplainable-by-design Ensemble method for Multivariate time series classification. XEM relies on a new hybrid ensemble method that combines an explicit boosting-bagging approach to handle the bias-variance trade-off faced by ...
Read More
Attention-Based Counterfactual Explanation for Multivariate Time Series
Big Data Analytics and Knowledge Discovery
Abstract
In this paper, we propose Attention-based Counterfactual Explanation (AB-CF), a novel model that generates post-hoc counterfactual explanations for multivariate time series classification that narrow the attention to a few important segments. We ...
Read More
Z-Time: efficient and effective interpretable multivariate time series classification
Abstract
Multivariate time series classification has become popular due to its prevalence in many real-world applications. However, most state-of-the-art focuses on improving classification performance, with the best-performing models typically opaque. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining
March 2021
1192 pages
ISBN:9781450382977
DOI:10.1145/3437963
General Chairs:
Liane Lewin-Eytan
Amazon, Israel
,
David Carmel
Amazon, Israel
,
Elad Yom-Tov
Microsoft, Israel
,
Program Chairs:
Eugene Agichtein
Emory University and Amazon, USA
,
Evgeniy Gabrilovich
Google Health, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 March 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
attentive convolution
explainability
multivariate time series
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate498of2,863submissions,17%
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 1,658
  Total Downloads
- Downloads (Last 12 months)285
- Downloads (Last 6 weeks)23
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Explainable Multivariate Time Series Classification: A Deep Neural Network Which Learns to Attend to Important Variables As Well As Time Intervals

WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

XEM: An explainable-by-design ensemble method for multivariate time series classification

Attention-Based Counterfactual Explanation for Multivariate Time Series

Z-Time: efficient and effective interpretable multivariate time series classification