ABSTRACT
Many real-world applications, e.g., healthcare, present multi-variate time series prediction problems. In such settings, in addition to the predictive accuracy of the models, model transparency and explainability are paramount. We consider the problem of building explainable classifiers from multi-variate time series data. A key criterion to understand such predictive models involves elucidating and quantifying the contribution of time varying input variables to the classification. Hence, we introduce a novel, modular, convolution-based feature extraction and attention mechanism that simultaneously identifies the variables as well as time intervals which determine the classifier output. We present results of extensive experiments with several benchmark data sets that show that the proposed method outperforms the state-of-the-art baseline methods on multi-variate time series classification task. The results of our case studies demonstrate that the variables and time intervals identified by the proposed method make sense relative to available domain knowledge.
- Amaia Abanda, Usue Mori, and Jose A Lozano. 2019. A review on distance based time series classification. Data Mining and Knowledge Discovery , Vol. 33, 2 (2019), 378--412.Google ScholarDigital Library
- Marco Ancona, Cengiz Oztireli, and Markus Gross. 2019. Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), , Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, Long Beach, California, USA, 272--281. http://proceedings.mlr.press/v97/ancona19a.htmlGoogle Scholar
- Anthony Bagnall, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. 2017. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery , Vol. 31, 3 (2017), 606--660.Google ScholarDigital Library
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR .Google Scholar
- Donald J Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series.. In KDD workshop , Vol. 10. Seattle, WA, 359--370.Google ScholarDigital Library
- Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José MF Moura, and Peter Eckersley. 2020. Explainable machine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 648--657.Google ScholarDigital Library
- Peter Bloomfield. 2004. Fourier analysis of time series: an introduction .John Wiley & Sons.Google ScholarCross Ref
- Prithwish Chakraborty, Manish Marwah, Martin Arlitt, and Naren Ramakrishnan. 2012. Fine-grained photovoltaic output prediction using a bayesian ensemble. In AAAI .Google Scholar
- Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. 2018. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation. In ICML . 883--892.Google Scholar
- Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv:1409.1259 (2014).Google Scholar
- Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In NeurIPS. 3504--3512.Google ScholarDigital Library
- BA Conway, DM Halliday, SF Farmer, U Shahani, P Maas, AI Weir, and JR Rosenberg. 1995. Synchronization between motor cortex and spinal motoneuronal pool during the performance of a maintained motor task in man. The Journal of physiology , Vol. 489, 3 (1995), 917--924.Google ScholarCross Ref
- Enyan Dai, Yiwei Sun, and Suhang Wang. 2020. Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repository. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 853--862.Google ScholarCross Ref
- Vijay Ekambaram, Kushagra Manglik, Sumanta Mukherjee, Surya Shravan Kumar Sajja, Satyam Dwivedi, and Vikas Raykar. 2020. Attention based Multi-Modal New Product Sales Time-series Forecasting. In KDD . 3110--3118.Google Scholar
- Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. 2019. Deep learning for time series classification: a review. Data Mining and Knowledge Discovery , Vol. 33, 4 (2019), 917--963.Google ScholarDigital Library
- Garrett M Fitzmaurice, Nan M Laird, and James H Ware. 2012. Applied longitudinal analysis . Vol. 998. John Wiley & Sons.Google Scholar
- Jean-Yves Franceschi, Aymeric Dieuleveut, and Martin Jaggi. 2019. Unsupervised scalable representation learning for multivariate time series. In NeurIPS . 4650--4661.Google Scholar
- Meredith Franklin, Petros Koutrakis, and Joel Schwartz. 2008. The role of particle composition on the association between PM2. 5 and mortality. Epidemiology (Cambridge, Mass.) , Vol. 19, 5 (2008), 680.Google Scholar
- Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements of statistical learning . Vol. 1. Springer series in statistics New York.Google Scholar
- Nicholas Frosst and Geoffrey Hinton. 2017. Distilling a neural network into a soft decision tree. arXiv:1711.09784 (2017).Google Scholar
- Ben D Fulcher and Nick S Jones. 2014. Highly comparative feature-based time-series classification. IEEE TKDE , Vol. 26, 12 (2014), 3026--3037.Google Scholar
- Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. circulation , Vol. 101, 23 (2000), e215--e220.Google Scholar
- Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM Computing Surveys (CSUR) , Vol. 51, 5 (2018), 1--42.Google ScholarDigital Library
- Tian Guo, Tao Lin, and Nino Antulov-Fantulin. 2019. Exploring interpretable LSTM neural networks over multi-variable data. In ICML . 2494--2504.Google Scholar
- Min Han and Xiaoxin Liu. 2013. Feature selection techniques with class separability for multivariate time series. Neurocomputing , Vol. 110 (2013), 29--34.Google ScholarDigital Library
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction .Springer Science & Business Media.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation , Vol. 9, 8 (1997), 1735--1780.Google Scholar
- Aria Khademi and Vasant Honavar. 2020. A Causal Lens for Peeking into Black Box Predictive Models: Predictive Model Interpretation via Causal Attribution. arXiv:2008.00357 (2020).Google Scholar
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014).Google Scholar
- Thanh Le and Vasant Honavar. 2020. Dynamical Gaussian Process Latent Variable Model for Representation Learning from Longitudinal Data. In Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference. 183--188.Google ScholarDigital Library
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature , Vol. 521, 7553 (2015), 436--444.Google Scholar
- Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. 2019. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In NeurIPS. 5244--5254.Google Scholar
- Junjie Liang, Yanting Wu, Dongkuan Xu, and Vasant Honavar. 2021. Longitudinal Deep Kernel Gaussian Process Regression. In Proceedings of the 35th AAAI Conference on Artificial Intelligence. In press.Google Scholar
- Junjie Liang, Dongkuan Xu, Yiwei Sun, and Vasant G Honavar. 2020. LMLFM: Longitudinal Multi-Level Factorization Machine. In AAAI .Google Scholar
- Xuan Liang, Tao Zou, Bin Guo, Shuo Li, Haozhe Zhang, Shuyi Zhang, Hui Huang, and Song Xi Chen. 2015. Assessing Beijing's PM2. 5 pollution: severity, weather impact, APEC and winter heating. Proc. R. Soc. A: Mathematical, Physical and Engineering Sciences , Vol. 471, 2182 (2015), 20150257.Google Scholar
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In NeurIPS . 4765--4774.Google Scholar
- Philippe Major and Elizabeth A Thiele. 2007. Seizures in Children: Laboratory. Pediatrics in review , Vol. 28, 11 (2007), 405.Google Scholar
- Julieta Martinez, Michael J Black, and Javier Romero. 2017. On human motion prediction using recurrent neural networks. In CVPR. IEEE, 4674--4683.Google Scholar
- Shane T Mueller, Robert R Hoffman, William Clancey, Abigail Emrey, and Gary Klein. 2019. Explanation in human-AI systems: A literature meta-review, synopsis of key ideas and publications, and bibliography for explainable AI. arXiv:1902.01876 (2019).Google Scholar
- Meinard Müller. 2007. Dynamic time warping. Information retrieval for music and motion (2007), 69--84.Google ScholarDigital Library
- W James Murdoch, Peter J Liu, and Bin Yu. 2018. Beyond word importance: Contextual decomposition to extract interactions from LSTMs. arXiv:1801.05453 (2018).Google Scholar
- Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv:1609.03499 (2016).Google Scholar
- Donald B Percival and Andrew T Walden. 2000. Wavelet methods for time series analysis . Vol. 4. Cambridge university press.Google ScholarCross Ref
- Tue Hvass Petersen, Maria Willerslev-Olsen, Bernard A Conway, and Jens Bo Nielsen. 2012. The motor cortex drives the muscles during walking in human subjects. The Journal of physiology , Vol. 590, 10 (2012), 2443--2452.Google ScholarCross Ref
- Wei-wei Pu, Xiu-juan Zhao, Xiao-ling Zhang, and Zhi-qiang Ma. 2011. Effect of meteorological factors on PM2. 5 during July to September of Beijing. Procedia Earth and Planetary Science , Vol. 2 (2011), 272--277.Google ScholarCross Ref
- Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison W Cottrell. 2017. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. In IJCAI .Google Scholar
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why should I trust you?”: Explaining the predictions of any classifier. In KDD. ACM, 1135--1144.Google Scholar
- Gerwin Schalk, Dennis J McFarland, Thilo Hinterberger, Niels Birbaumer, and Jonathan R Wolpaw. 2004. BCI2000: a general-purpose brain-computer interface (BCI) system. IEEE TBME , Vol. 51, 6 (2004), 1034--1043.Google Scholar
- Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In IEEE ICCV . 618--626.Google ScholarDigital Library
- Ali Hossam Shoeb. 2009. Application of machine learning to epileptic seizure onset detection and treatment . Ph.D. Dissertation. Massachusetts Institute of Technology.Google Scholar
- Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In ICML. JMLR. org, 3145--3153.Google Scholar
- Avanti Shrikumar, Peyton Greenside, Anna Shcherbina, and Anshul Kundaje. 2016. Not just a black box: Learning important features through propagating activation differences. arXiv:1605.01713 (2016).Google Scholar
- Yiwei Sun, Ngot Bui, Tsung-Yu Hsieh, and Vasant Honavar. 2018. Multi-view network embedding via graph factorization clustering and co-regularized multi-view agreement. In ICDM Workshop. IEEE, 1006--1013.Google ScholarCross Ref
- Yiwei Sun and Shabnam Ghaffarzadegan. 2020. An Ontology-Aware Framework for Audio Event Classification. In ICASSP. IEEE, 321--325.Google Scholar
- Yiwei Sun, Suhang Wang, Tsung-Yu Hsieh, Xianfeng Tang, and Vasant Honavar. 2019. MEGAN: a generative adversarial network for multi-view network embedding. In IJCAI. AAAI Press, 3527--3533.Google Scholar
- Xianfeng Tang, Yandong Li, Yiwei Sun, Huaxiu Yao, Prasenjit Mitra, and Suhang Wang. 2020 a. Transferring Robustness for Graph Neural Network Against Poisoning Attacks. In WSDM . 600--608.Google Scholar
- Xianfeng Tang, Huaxiu Yao, Yiwei Sun, Charu C Aggarwal, Prasenjit Mitra, and Suhang Wang. 2020 b. Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values.. In AAAI. 5956--5963.Google Scholar
- Yue Wu, José Miguel Hernández Lobato, and Zoubin Ghahramani. 2013. Dynamic covariance models for multivariate financial time series. In ICML . III--558.Google Scholar
- Yanbo Xu, Siddharth Biswal, Shriprasad R Deshpande, Kevin O Maher, and Jimeng Sun. 2018. Raim: Recurrent attentive and intensive model of multimodal patient monitoring data. In KDD . 2565--2573.Google ScholarDigital Library
- Xiang Xuan and Kevin Murphy. 2007. Modeling changing dependency structure in multivariate time series. In ICML . 1055--1062.Google Scholar
- Lexiang Ye and Eamonn Keogh. 2009. Time series shapelets: a new primitive for data mining. In KDD. 947--956.Google Scholar
- Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan. 2008. Sensor-based abnormal human-activity detection. IEEE TKDE , Vol. 20, 8 (2008), 1082--1090.Google Scholar
- Hyunjin Yoon and Cyrus Shahabi. 2006. Feature subset selection on multivariate time series with extremely large spatial features. In ICDM Workshop). IEEE, 337--342.Google ScholarDigital Library
- Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2018. INVASE: Instance-wise variable selection using neural networks. In ICLR .Google Scholar
- Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. 2015. Understanding neural networks through deep visualization. arXiv:1506.06579 (2015).Google Scholar
- Ye Yuan, Guangxu Xun, Fenglong Ma, Yaqing Wang, Nan Du, Kebin Jia, Lu Su, and Aidong Zhang. 2018. Muvan: A multi-view attention network for multivariate temporal data. In ICDM. IEEE, 717--726.Google Scholar
- Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In ECCV. Springer, 818--833.Google Scholar
- Xi Sheryl Zhang, Fengyi Tang, Hiroko H Dodge, Jiayu Zhou, and Fei Wang. 2019. Metapred: Meta-learning for clinical risk prediction with limited patient electronic health records. In KDD. 2487--2495.Google ScholarDigital Library
Index Terms
- Explainable Multivariate Time Series Classification: A Deep Neural Network Which Learns to Attend to Important Variables As Well As Time Intervals
Recommendations
XEM: An explainable-by-design ensemble method for multivariate time series classification
AbstractWe present XEM, an eXplainable-by-design Ensemble method for Multivariate time series classification. XEM relies on a new hybrid ensemble method that combines an explicit boosting-bagging approach to handle the bias-variance trade-off faced by ...
Attention-Based Counterfactual Explanation for Multivariate Time Series
Big Data Analytics and Knowledge DiscoveryAbstractIn this paper, we propose Attention-based Counterfactual Explanation (AB-CF), a novel model that generates post-hoc counterfactual explanations for multivariate time series classification that narrow the attention to a few important segments. We ...
Z-Time: efficient and effective interpretable multivariate time series classification
AbstractMultivariate time series classification has become popular due to its prevalence in many real-world applications. However, most state-of-the-art focuses on improving classification performance, with the best-performing models typically opaque. ...
Comments