skip to main content
research-article

Generalizing Long Short-Term Memory Network for Deep Learning from Generic Data

Published:09 February 2020Publication History
Skip Abstract Section

Abstract

Long Short-Term Memory (LSTM) network, a popular deep-learning model, is particularly useful for data with temporal correlation, such as texts, sequences, or time series data, thanks to its well-sought after recurrent network structures designed to capture temporal correlation. In this article, we propose to generalize LSTM to generic machine-learning tasks where data used for training do not have explicit temporal or sequential correlation. Our theme is to explore feature correlation in the original data and convert each instance into a synthetic sentence format by using a two-gram probabilistic language model. More specifically, for each instance represented in the original feature space, our conversion first seeks to horizontally align original features into a sequentially correlated feature vector, resembling to the letter coherence within a word. In addition, a vertical alignment is also carried out to create multiple time points and simulate word sequential order in a sentence (i.e., word correlation). The two dimensional horizontal-and-vertical alignments not only ensure feature correlations are maximally utilized, but also preserve the original feature values in the new representation. As a result, LSTM model can be utilized to achieve good classification accuracy, even if the underlying data do not have temporal or sequential dependency. Experiments on 20 generic datasets show that applying LSTM to generic data can improve the classification accuracy, compared to conventional machine-learning methods. This research opens a new opportunity for LSTM deep learning to be broadly applied to generic machine-learning tasks.

References

  1. M. Abadi, A. Agarwal, and P. Barham. 2015. Tensorflow: Large-scale machine learning on heterogeneous systems. 1 (2015). Softwareavailablefromtensorflow.org.Google ScholarGoogle Scholar
  2. A. Adam Pauls and D. Klein. 2011. Faster and smaller n-gram language models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. 258--267.Google ScholarGoogle Scholar
  3. S. Al-Semari, F. Alajaji, and T. E. Fuja. 1999. Sequence MAP decoding of trellis codes for Gaussian and Rayleigh channels. IEEE Transactions on Vehicular Technology 48, 4 (1999), 1130--1140.Google ScholarGoogle ScholarCross RefCross Ref
  4. K. G. Anil. 2006. On optimum choice of k in nearest neighbour classification. Computational Statistics and Data Analysis 50, 11 (2006), 3113--3123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Bengio, A. Courville, and P. Vincent. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1798--1828.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Bengio, O. Delalleau, and N. Le Roux. 2005. The curse of highly variable functions for local kernel machines. In Proceedings of the Advances in Neural Information Processing Systems, British Columbia, Canada. MIT Press, 107--114.Google ScholarGoogle Scholar
  7. Y. Bengio and P. Simard. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2 (1994), 157--166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Mairead L. Bermingham, Ricardo Pong-Wong, Athina Spiliopoulou, et al. 2015. Application of high-dimensional feature selection: Evaluation for genomic prediction in man. Scientific Reports 5, 10312 (2015).Google ScholarGoogle Scholar
  9. A. L. Blum and P. Langley. 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 1--2 (1997), 245--271.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. E. Brodley and P. E. Utgoff. 1995. Multivariate decision trees. Machine Learning 19, 1 (1995), 45--77.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xiaojun Chang, Feiping Nie, Yi Yang, Chengqi Zhang, and Heng Huang. 2016. Convex sparse PCA for unsupervised feature learning. ACM Transactions on Knowledge Discovery from Data 11, 1 (2016), 3:1--3:16.Google ScholarGoogle Scholar
  12. L. Changki and L. G. Geunbae. 2006. Information gain and divergence-based feature selection for machine learning-based text categorization. Information Processing 8 Management 42, 1 (2006), 155--165.Google ScholarGoogle Scholar
  13. T. Chen and C. Guestrin. 2016. XGBoost: A scalable tree boosting System. In Proceedings of the Conference on Knowledge Discovery and Data Mining.Google ScholarGoogle Scholar
  14. Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, and Gustavo Batista. 2015. The UCR Time Series Classification Archive. Retrieved from www.cs.ucr.edu/∼eamonn/time_series_data/.Google ScholarGoogle Scholar
  15. Dan Ciresan, U. Meier, and J. Schmidhuber. 2012. Multi-column deep neural networks for image classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. 3642--3649.Google ScholarGoogle Scholar
  16. C. M. Bishop. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Cortes and V. Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273--297.Google ScholarGoogle ScholarCross RefCross Ref
  18. R. A. Dunne and N. A. Campbel. 1997. On the pairing Of the softmax activation and cross entropy penalty functions and the derivation of the softmax activation function. In Proceedings of the 8th Australian Conference on Neural Networks. 181--185.Google ScholarGoogle Scholar
  19. M. Federico and M. Cettolo. 2007. Efficient handling of n-gram language models for statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation. 88--95.Google ScholarGoogle Scholar
  20. F. Gers, N. Schraudolph, and J. Schmidhuber. 2002. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research 3, 1 (2002), 115--143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press, Cambridge, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Graves, A. Mohamed, and G. Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 6645--6649.Google ScholarGoogle Scholar
  23. A. Graves, A. R. Mohamed, and G. Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada. 6645--6649.Google ScholarGoogle Scholar
  24. A. Graves and J. Schmidhuber. 2005. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks 18, 5 (2005), 602--610.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. Guyon and A. Elisseeff. 2003. An introduction to variable and feature selection. Journal of Machine Learning Research 3, 6 (2003), 1157--1182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. F. A. Hady and F. Schwenker. 2013. Semi-supervised Learning,in Handbook on Neural Information Processing. Springer, Berlin, Germany.Google ScholarGoogle Scholar
  27. H. Han, Y. Li, and X. Zhu. 2019. Convolutional neural network learning for generic data classification. Information Sciences 477 (2019), 448--465.Google ScholarGoogle ScholarCross RefCross Ref
  28. H. Han, X. Zhu, and Y. Li. 2018. EDLT: Enabling deep learning for generic data classification. In Proceedings of the IEEE International Conference on Data Mining.Google ScholarGoogle Scholar
  29. J. Hauke and T. Kossowski. 2011. Comparison of values of Pearson’s and Spearman’s correlation coefficient on the same sets of data. Quaestiones Geographicae 31, 2 (2011), 87--93.Google ScholarGoogle ScholarCross RefCross Ref
  30. S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 3 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kurt Hornik. 1991. Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 2 (1991), 251--257.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Adebayo Kolawole John, Luigi Di Caro, and Guido Boella. 2016. ImageNet classification with deep convolutional neural networks. In Proceedings of the 12th International Conference on Semantic Systems.Google ScholarGoogle Scholar
  34. D. Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  35. R. Kohavi and G. H. John. 1997. Wrappers for feature subset selection. Artificial Intelligent 97, 12 (1997), 273--324.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Alex Krizhevsky, Ilya Sutskever, and Geoffry Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, Nevada.Google ScholarGoogle Scholar
  37. L. Ladla and T. Deepa. 2011. Feature selection methods and algorithms. International Journal on Computer Science and Engineering 3, 5 (2011), 1787--1797.Google ScholarGoogle Scholar
  38. P. Langley. 1994. Selection of relevant features in machine learning. In Proceedings of the AAAI Fall Symposium on Relevance, New Orleans, Louisiana. 140--144.Google ScholarGoogle ScholarCross RefCross Ref
  39. Y. LeCun, G. Bengio, and Y. Hinton. 2015. Deep learning. Nature 521 (2015), 436--444.Google ScholarGoogle ScholarCross RefCross Ref
  40. Y. LeCun, G. Bengio, and Y. Hinton. 2019. Fast video frame correlation analysis for vehicular networks by using CVS-CNN. IEEE Transactions on Vehicular Technology 68, 7 (2019), 6286--6296.Google ScholarGoogle ScholarCross RefCross Ref
  41. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1990. Handwritten digit recognition with a back-propagation network. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, Canada. MIT Press, 396--404.Google ScholarGoogle Scholar
  42. Huan Liu and Hiroshi Motoda. 1998. Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. H. Liu and R. Setiono. 1995. Chi2: Feature selection and discretization of numeric attributes. In Proceedings of the 7th IEEE International Conference on Tools with Artificial Intelligence.Google ScholarGoogle Scholar
  44. D. Lunga, S. Prasad, M. M. Crawford, and O. Ersoy. 2014. Manifold learning-based feature extraction for classification of hyperspectral data: A review of advances in manifold learning. IEEE Signal Processing Magazine 31, 1 (2014), 55--66.Google ScholarGoogle ScholarCross RefCross Ref
  45. Nasser M. Nasrabadi. 2007. Pattern recognition and machine learning. Journal of Electronic Imaging 16, 4 (2007), 049901.Google ScholarGoogle ScholarCross RefCross Ref
  46. D. Newman, S. Hettich, C. Blake, and C. Merz. 1998. UCI repository of machine learning databases, Irvine. University of California, Department of Information and Computer Science, CA. Retrieved from http://www.ics.uci.edu/∼mlearn/MLRepository.html.Google ScholarGoogle Scholar
  47. F. Pedregosa, G. Varoquaux, A. Gramfort, and V. Michel. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 10 (2011), 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. V. Rokhlin, A. Szlam, and M. Tygert. 2009. A randomized algorithm for principal component analysis. SIAM Journal on Matrix Analysis and Applications 31, 3 (2009), 1100--1124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. H. Sak et al. 2014. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the Annual Conference of the International Speech Communication Association. 338--342.Google ScholarGoogle ScholarCross RefCross Ref
  50. J. Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61, 1 (2015), 85--117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. B. Scholkopft and K.-R. Mullert. 1999. Neural Networks for Signal Processing. Springer.Google ScholarGoogle Scholar
  52. L. J. P. van der Maaten and G. E. Hinton. 2008. Visualizing High-dimensional data using t-SNE. Journal of Machine Learning Research 9, 12 (2008), 2579--2605.Google ScholarGoogle Scholar
  53. Y. Wang, M. Huang, L. Zhao, and X. Zhu. 2016. Attention-based lstm for aspect-level sentiment classification. In Proceedings of the Conference on Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle Scholar
  54. Man Wu, Shirui Pan, Xingquan Zhu, Chuan Zhou, and Lei Pan. 2019. Domain-adversarial graph neural networks for text classification. In Proceedings of the IEEE International Conference on Data Mining.Google ScholarGoogle ScholarCross RefCross Ref
  55. Y. Wu, S. Hio, T. Mei, and N. Yu. 2017. Large-scale online feature selection for ultra-high dimensional sparse data. ACM Transactions on Knowledge Discovery from Data 11, 4 (2017), 48:1--48:22.Google ScholarGoogle Scholar
  56. Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2016. Scalable and accurate online feature selection for big data. ACM Transactions on Knowledge Discovery from Data 11, 2 (2016), 16:1--16:39.Google ScholarGoogle Scholar
  57. D. Zhang, J. Wang, F. Wang, and C. Zhang. 2008. Semi-supervised classification with universum. In Proceedings of the SIAM International Conference on Data Mining, San Diego, CA. 323--333.Google ScholarGoogle Scholar
  58. Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2018. Network representation learning: A survey. IEEE Transactions on Big Data (2018). DOI:https://doi.org/10.1109/TBDATA.2018.2850013Google ScholarGoogle Scholar
  59. X. Zhu. 2011. Cross-domain semi-supervised learning using feature formulation. IEEE Transactions on Systems, Man, and Cybernetics, Part B 41, 6 (2011), 1627--1638.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generalizing Long Short-Term Memory Network for Deep Learning from Generic Data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 14, Issue 2
      April 2020
      322 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3382774
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 February 2020
      • Accepted: 1 October 2019
      • Revised: 1 August 2019
      • Received: 1 December 2018
      Published in tkdd Volume 14, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format