research-article

Generalizing Long Short-Term Memory Network for Deep Learning from Generic Data

Authors:
Huimei Han

Zhejiang University of Technology and Florida Atlantic University, Zhejiang, P.R. China

Zhejiang University of Technology and Florida Atlantic University, Zhejiang, P.R. China
View Profile

,
Xingquan Zhu

Florida Atlantic University, Boca Raton, FL

Florida Atlantic University, Boca Raton, FL

0000-0003-4129-9611
View Profile

,
Ying Li

Xidian University, Shannxi, P.R. China

Xidian University, Shannxi, P.R. China
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 14 Issue 2Article No.: 13pp 1–28https://doi.org/10.1145/3366022

Published:09 February 2020Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Long Short-Term Memory (LSTM) network, a popular deep-learning model, is particularly useful for data with temporal correlation, such as texts, sequences, or time series data, thanks to its well-sought after recurrent network structures designed to capture temporal correlation. In this article, we propose to generalize LSTM to generic machine-learning tasks where data used for training do not have explicit temporal or sequential correlation. Our theme is to explore feature correlation in the original data and convert each instance into a synthetic sentence format by using a two-gram probabilistic language model. More specifically, for each instance represented in the original feature space, our conversion first seeks to horizontally align original features into a sequentially correlated feature vector, resembling to the letter coherence within a word. In addition, a vertical alignment is also carried out to create multiple time points and simulate word sequential order in a sentence (i.e., word correlation). The two dimensional horizontal-and-vertical alignments not only ensure feature correlations are maximally utilized, but also preserve the original feature values in the new representation. As a result, LSTM model can be utilized to achieve good classification accuracy, even if the underlying data do not have temporal or sequential dependency. Experiments on 20 generic datasets show that applying LSTM to generic data can improve the classification accuracy, compared to conventional machine-learning methods. This research opens a new opportunity for LSTM deep learning to be broadly applied to generic machine-learning tasks.

References

M. Abadi, A. Agarwal, and P. Barham. 2015. Tensorflow: Large-scale machine learning on heterogeneous systems. 1 (2015). Softwareavailablefromtensorflow.org.Google Scholar
A. Adam Pauls and D. Klein. 2011. Faster and smaller n-gram language models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. 258--267.Google Scholar
S. Al-Semari, F. Alajaji, and T. E. Fuja. 1999. Sequence MAP decoding of trellis codes for Gaussian and Rayleigh channels. IEEE Transactions on Vehicular Technology 48, 4 (1999), 1130--1140.Google ScholarCross Ref
K. G. Anil. 2006. On optimum choice of k in nearest neighbour classification. Computational Statistics and Data Analysis 50, 11 (2006), 3113--3123.Google ScholarDigital Library
Y. Bengio, A. Courville, and P. Vincent. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1798--1828.Google ScholarDigital Library
Y. Bengio, O. Delalleau, and N. Le Roux. 2005. The curse of highly variable functions for local kernel machines. In Proceedings of the Advances in Neural Information Processing Systems, British Columbia, Canada. MIT Press, 107--114.Google Scholar
Y. Bengio and P. Simard. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2 (1994), 157--166.Google ScholarDigital Library
Mairead L. Bermingham, Ricardo Pong-Wong, Athina Spiliopoulou, et al. 2015. Application of high-dimensional feature selection: Evaluation for genomic prediction in man. Scientific Reports 5, 10312 (2015).Google Scholar
A. L. Blum and P. Langley. 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 1--2 (1997), 245--271.Google ScholarDigital Library
C. E. Brodley and P. E. Utgoff. 1995. Multivariate decision trees. Machine Learning 19, 1 (1995), 45--77.Google ScholarDigital Library
Xiaojun Chang, Feiping Nie, Yi Yang, Chengqi Zhang, and Heng Huang. 2016. Convex sparse PCA for unsupervised feature learning. ACM Transactions on Knowledge Discovery from Data 11, 1 (2016), 3:1--3:16.Google Scholar
L. Changki and L. G. Geunbae. 2006. Information gain and divergence-based feature selection for machine learning-based text categorization. Information Processing 8 Management 42, 1 (2006), 155--165.Google Scholar
T. Chen and C. Guestrin. 2016. XGBoost: A scalable tree boosting System. In Proceedings of the Conference on Knowledge Discovery and Data Mining.Google Scholar
Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, and Gustavo Batista. 2015. The UCR Time Series Classification Archive. Retrieved from www.cs.ucr.edu/&sim;eamonn/time_series_data/.Google Scholar
Dan Ciresan, U. Meier, and J. Schmidhuber. 2012. Multi-column deep neural networks for image classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. 3642--3649.Google Scholar
C. M. Bishop. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK.Google ScholarDigital Library
C. Cortes and V. Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273--297.Google ScholarCross Ref
R. A. Dunne and N. A. Campbel. 1997. On the pairing Of the softmax activation and cross entropy penalty functions and the derivation of the softmax activation function. In Proceedings of the 8th Australian Conference on Neural Networks. 181--185.Google Scholar
M. Federico and M. Cettolo. 2007. Efficient handling of n-gram language models for statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation. 88--95.Google Scholar
F. Gers, N. Schraudolph, and J. Schmidhuber. 2002. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research 3, 1 (2002), 115--143.Google ScholarDigital Library
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press, Cambridge, MA.Google ScholarDigital Library
A. Graves, A. Mohamed, and G. Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 6645--6649.Google Scholar
A. Graves, A. R. Mohamed, and G. Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada. 6645--6649.Google Scholar
A. Graves and J. Schmidhuber. 2005. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks 18, 5 (2005), 602--610.Google ScholarDigital Library
I. Guyon and A. Elisseeff. 2003. An introduction to variable and feature selection. Journal of Machine Learning Research 3, 6 (2003), 1157--1182.Google ScholarDigital Library
M. F. A. Hady and F. Schwenker. 2013. Semi-supervised Learning,in Handbook on Neural Information Processing. Springer, Berlin, Germany.Google Scholar
H. Han, Y. Li, and X. Zhu. 2019. Convolutional neural network learning for generic data classification. Information Sciences 477 (2019), 448--465.Google ScholarCross Ref
H. Han, X. Zhu, and Y. Li. 2018. EDLT: Enabling deep learning for generic data classification. In Proceedings of the IEEE International Conference on Data Mining.Google Scholar
J. Hauke and T. Kossowski. 2011. Comparison of values of Pearson’s and Spearman’s correlation coefficient on the same sets of data. Quaestiones Geographicae 31, 2 (2011), 87--93.Google ScholarCross Ref
S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 3 (1997), 1735--1780.Google ScholarDigital Library
Kurt Hornik. 1991. Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 2 (1991), 251--257.Google ScholarDigital Library
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer.Google ScholarDigital Library
Adebayo Kolawole John, Luigi Di Caro, and Guido Boella. 2016. ImageNet classification with deep convolutional neural networks. In Proceedings of the 12th International Conference on Semantic Systems.Google Scholar
D. Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google Scholar
R. Kohavi and G. H. John. 1997. Wrappers for feature subset selection. Artificial Intelligent 97, 12 (1997), 273--324.Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffry Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, Nevada.Google Scholar
L. Ladla and T. Deepa. 2011. Feature selection methods and algorithms. International Journal on Computer Science and Engineering 3, 5 (2011), 1787--1797.Google Scholar
P. Langley. 1994. Selection of relevant features in machine learning. In Proceedings of the AAAI Fall Symposium on Relevance, New Orleans, Louisiana. 140--144.Google ScholarCross Ref
Y. LeCun, G. Bengio, and Y. Hinton. 2015. Deep learning. Nature 521 (2015), 436--444.Google ScholarCross Ref
Y. LeCun, G. Bengio, and Y. Hinton. 2019. Fast video frame correlation analysis for vehicular networks by using CVS-CNN. IEEE Transactions on Vehicular Technology 68, 7 (2019), 6286--6296.Google ScholarCross Ref
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1990. Handwritten digit recognition with a back-propagation network. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, Canada. MIT Press, 396--404.Google Scholar
Huan Liu and Hiroshi Motoda. 1998. Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers.Google ScholarDigital Library
H. Liu and R. Setiono. 1995. Chi2: Feature selection and discretization of numeric attributes. In Proceedings of the 7th IEEE International Conference on Tools with Artificial Intelligence.Google Scholar
D. Lunga, S. Prasad, M. M. Crawford, and O. Ersoy. 2014. Manifold learning-based feature extraction for classification of hyperspectral data: A review of advances in manifold learning. IEEE Signal Processing Magazine 31, 1 (2014), 55--66.Google ScholarCross Ref
Nasser M. Nasrabadi. 2007. Pattern recognition and machine learning. Journal of Electronic Imaging 16, 4 (2007), 049901.Google ScholarCross Ref
D. Newman, S. Hettich, C. Blake, and C. Merz. 1998. UCI repository of machine learning databases, Irvine. University of California, Department of Information and Computer Science, CA. Retrieved from http://www.ics.uci.edu/&sim;mlearn/MLRepository.html.Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, and V. Michel. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 10 (2011), 2825--2830.Google ScholarDigital Library
V. Rokhlin, A. Szlam, and M. Tygert. 2009. A randomized algorithm for principal component analysis. SIAM Journal on Matrix Analysis and Applications 31, 3 (2009), 1100--1124.Google ScholarDigital Library
H. Sak et al. 2014. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the Annual Conference of the International Speech Communication Association. 338--342.Google ScholarCross Ref
J. Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61, 1 (2015), 85--117.Google ScholarDigital Library
B. Scholkopft and K.-R. Mullert. 1999. Neural Networks for Signal Processing. Springer.Google Scholar
L. J. P. van der Maaten and G. E. Hinton. 2008. Visualizing High-dimensional data using t-SNE. Journal of Machine Learning Research 9, 12 (2008), 2579--2605.Google Scholar
Y. Wang, M. Huang, L. Zhao, and X. Zhu. 2016. Attention-based lstm for aspect-level sentiment classification. In Proceedings of the Conference on Conference on Empirical Methods in Natural Language Processing.Google Scholar
Man Wu, Shirui Pan, Xingquan Zhu, Chuan Zhou, and Lei Pan. 2019. Domain-adversarial graph neural networks for text classification. In Proceedings of the IEEE International Conference on Data Mining.Google ScholarCross Ref
Y. Wu, S. Hio, T. Mei, and N. Yu. 2017. Large-scale online feature selection for ultra-high dimensional sparse data. ACM Transactions on Knowledge Discovery from Data 11, 4 (2017), 48:1--48:22.Google Scholar
Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2016. Scalable and accurate online feature selection for big data. ACM Transactions on Knowledge Discovery from Data 11, 2 (2016), 16:1--16:39.Google Scholar
D. Zhang, J. Wang, F. Wang, and C. Zhang. 2008. Semi-supervised classification with universum. In Proceedings of the SIAM International Conference on Data Mining, San Diego, CA. 323--333.Google Scholar
Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2018. Network representation learning: A survey. IEEE Transactions on Big Data (2018). DOI:https://doi.org/10.1109/TBDATA.2018.2850013Google Scholar
X. Zhu. 2011. Cross-domain semi-supervised learning using feature formulation. IEEE Transactions on Systems, Man, and Cybernetics, Part B 41, 6 (2011), 1627--1638.Google ScholarDigital Library

Index Terms

Generalizing Long Short-Term Memory Network for Deep Learning from Generic Data
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Instance-based learning

Recommendations

A deep learning approach for stock market prediction using deep autoencoder and long short-term memory

The stock market prediction problems have received increased attention from researchers due to the high stakes involved and the need for better prediction accuracy. We have developed an architecture by combining a deep autoencoder and long short-term ...
Read More
Deep long short-term memory based model for agricultural price forecasting
Abstract
Agricultural price forecasting is one of the research hotspots in time series forecasting due to its unique characteristics. In this paper, we developed a deep long short-term memory (DLSTM) based model for the accurate forecasting of a ...
Read More
Application of deep learning to multivariate aviation weather forecasting by long short-term memory

Weather forecasts are essential to aviation safety. Unreliable forecasts not only cause problems to pilots and air traffic controllers, but also lead to aviation accidents and incidents. This study develops a long short-term memory (LSTM) integrating both ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 14, Issue 2
April 2020
322 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3382774
Editors:
Charu Aggarwal
IBM T. J. Watson Research, USA
,
Xindong Wu
Minginglamp Academy of Sciences, China
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 February 2020
- Accepted: 1 October 2019
- Revised: 1 August 2019
- Received: 1 December 2018
Published in tkdd Volume 14, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep learning
classification
feature learning
long short-term memory
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 405
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Generalizing Long Short-Term Memory Network for Deep Learning from Generic Data

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

A deep learning approach for stock market prediction using deep autoencoder and long short-term memory

Deep long short-term memory based model for agricultural price forecasting

Application of deep learning to multivariate aviation weather forecasting by long short-term memory