Top

Published in:

2017 | OriginalPaper | Chapter

Denoising Autoencoder as an Effective Dimensionality Reduction and Clustering of Text Data

Authors : Milad Leyli-Abadi, Lazhar Labiod, Mohamed Nadif

Published in: Advances in Knowledge Discovery and Data Mining

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Deep learning methods are widely used in vision and face recognition, however there is a real lack of application of such methods in the field of text data. In this context, the data is often represented by a sparse high dimensional document-term matrix. Dealing with such data matrices, we present, in this paper, a new denoising auto-encoder for dimensionality reduction, where each document is not only affected by its own information, but also affected by the information from its neighbors according to the cosine similarity measure. It turns out that the proposed auto-encoder can discover the low dimensional embeddings, and as a result reveal the underlying effective manifold structure. The visual representation of these embeddings suggests the suitability of performing the clustering on the set of documents relying on the Expectation-Maximization algorithm for Gaussian mixture models. On real-world datasets, the relevance of the presented auto-encoder in the visualisation and document clustering field is shown by a comparison with five widely used unsupervised dimensionality reduction methods including the classic auto-encoder.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter A SAT-Based Framework for Overlapping Community Detection in Networks

next chapter Gradable Adjective Embedding for Commonsense Knowledge

http://dataexpertise.org/research.

Gittins, R.: Canonical Analysis - A Review with Applications in Ecology. Springer, Heidelberg (1985)CrossRefMATH

van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)MATH

van der Maaten, L.: Learning a parametric embedding by preserving local structure. RBM, 500:500 (2009)

Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. NIPS 14, 585–591 (2001)

Bengio, Y.: Learning deep architectures for ai. Found. Trends Mach. Learn. 2(1), 1–127 (2009)CrossRefMATH

Vincent, P.: A connection between score matching and denoising autoencoders. Neural Comput. 23(7), 1661–1674 (2011)MathSciNetCrossRefMATH

Dempster, A.P., Nan Laird, M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. Ser. B (methodological) 39, 1–38 (1977)

Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)MathSciNetCrossRefMATH

LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). doi:10.1007/978-3-642-35289-8_3 CrossRef

10.

Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: the benefit of PCA and whitening. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 774–787. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33709-3_55 CrossRef

11.

Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefMATH

12.

Wang, W., Huang, Y., Wang, Y., Wang, L.: Generalized autoencoder: a neural network framework for dimensionality reduction. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 490–497 (2014)

13.

Ng, A.: Sparse autoencoder. CS294A Lecture Notes, vol. 72, pp. 1–19 (2011)

14.

Strehl, A., Ghosh, J.: Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)MathSciNetMATH

15.

Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)CrossRefMATH

16.

Banfield, J.D., Raftery, A.E.: Model-based gaussian and non-gaussian clustering. Biometrics 49, 803–821 (1993)MathSciNetCrossRefMATH

17.

Fraley, C., Raftery, A.E.: Mclust version 3: an R package for normal mixture modeling and model-based clustering. Technical report (2006)

18.

Priam, R., Nadif, M.: Data visualization via latent variables and mixture models: a brief survey. Pattern Anal. Appl. 19(3), 807–819 (2016)MathSciNetCrossRef

19.

Allab, K., Labiod, L., Nadif, M.: A semi-NMF-PCA unified framework for data clustering. IEEE Trans. Knowl. Data Eng. 29(1), 2–16 (2017)CrossRef

Title: Denoising Autoencoder as an Effective Dimensionality Reduction and Clustering of Text Data
Authors: Milad Leyli-Abadi
Lazhar Labiod
Mohamed Nadif
Publisher: Springer International Publishing
Book: Advances in Knowledge Discovery and Data Mining
Print ISBN: 978-3-319-57528-5

Electronic ISBN: 978-3-319-57529-2

Copyright Year: 2017
DOI: https://doi.org/10.1007/978-3-319-57529-2_62

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner