nach oben

Soft Computing

Erschienen in:

24.04.2018 | Focus

Building neural network language model with POS-based negative sampling and stochastic conjugate gradient descent

verfasst von: Jin Liu, Li Lin, Haoliang Ren, Minghao Gu, Jin Wang, Geumran Youn, Jeong-Uk Kim

Erschienen in: Soft Computing | Ausgabe 20/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Traditional statistical language model is a probability distribution over sequences of words. It has the problem of curse of dimensionality incurred by the exponentially increasing number of possible sequences of words in training text. To solve this issue, neural network language models are proposed by representing words in a distributed way. Due to computation cost on updating a large number of word vectors’ gradients, neural network model needs much training time to converge. To alleviate this problem, in this paper, we propose a gradient descent algorithm based on stochastic conjugate gradient to accelerate the convergence of the neural network’s parameters. To improve the performance of the neural language model, we also propose a negative sampling algorithm based on POS (part of speech) tagging, which can optimize the negative sampling process and improve the quality of the final language model. A novel evaluation model is also used with perplexity to demonstrate the performance of the improved language model. Experiment results prove the effectiveness of our novel methods.

Vorheriger Artikel Routing and wavelength assignment for exchanged crossed cubes on ring-topology optical networks

Nächster Artikel Reliable fault diagnosis of bearings with varying rotational speeds using envelope spectrum and convolution neural networks

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Bahl LR, Brown PF, Souza PVD, Mercer RL (1990) A tree-based statistical language model for natural language speech recognition. Read Speech Recogn 37(7):507–514CrossRef

Bengio Y, Ducharme R, Vincent P, Jauvin P, Jaz K (2003) A neural probabilistic language model. J Mach Learn Res (JMLR) 3:1137–1155MATH

Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT2010

Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479

Carneiro HCC, Franca FMG, Lima PMV (2015) Multilingual part-of-speech tagging with weightless neural networks. Neural Netw 66:11–21CrossRef

Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Proceedings of the international conference on neural information processing systems. MIT Press, pp 1646–1654

Feyzmahdavian HR, Aytekin A, Johansson M (2014) A delayed proximal gradient method with linear convergence rate. In: Proceedings of IEEE international workshop on machine learning for signal processing. IEEE, pp 1–6

Finogeev AG, Alexey G, Parygin Danila S, Finogeev Anton A (2017) The convergence computing model for big sensor data mining and knowledge discovery. Human Centric Comput Inf Sci 7(1):11–27CrossRef

Fu ZJ, Shu JG, Wang J, Liu YL, Lee SY (2015) Privacy-preserving smart similarity search based on simhash over encrypted data in cloud computing. J Internet Technol 16(3):453–460

Goldberg Y, Levy O (2014) word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. Eprint Arxiv: 1-5

Hinton GE (1986) Learning distributed representations of concepts. In: Proceedings of the 8th Annual Conference of the Cognitive Science Society, pp 1–12

Huang F, Ahuja A, Downey D, Yang Y, Guo Y (2016) Learning representations for weakly supervised natural language processing tasks. Comput Linguist 40(1):85–120CrossRef

Jelinek F (1997) Statistical method for speech recognition. A Bradford Book, Cambridge

Jiang M, Zhu X, Yuan B (1999) Smoothing algorithm of the task adaptation Chinses N-gram model. Tsinghua Univ (Sci&Tech)

Jurafsky D, Martin JH (2015) Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition. Int J Comput Sci Eng 2(08):2670–2676

Karpov A, Markov K, Kipyatkova I, Vazhenina D, Ronzhin A (2014) Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun 56:213–228CrossRef

Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (AAAI-16), pp 2741–2749

Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language model. In: Proceedings of the 31st international conference on machine learning, pp 595–604

Kombrink S, Mikolov T, Karafiát M, Burget L (2011) Recurrent neural network based language modeling in meeting recognition. In: Interspeech, Conference of the international speech communication association Florence, Italy, August, pp 2877–2880

Lafferty JD, Mccallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Eighteenth Int Conf Mach Learn 3(2):282–289

Lebret R, Grangier D, Auli M (2016) Neural text generation from structured data with application to the biography domain. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1203–1213

Li Q, Chen YP (2010) Personalized text snippet extraction using statistical language models. Pattern Recogn 43(1):378–386CrossRef

Li M, Zhang T, Chen Y, Smola AJ (2014) Efficient mini-batch training for stochastic optimization. Acm Sigkdd Int Conf Knowl Discov Data Min 2014:661–670CrossRef

Mikolov T, Kombrink S, Deoras A, Burget L, Cernocky JH (2011) RNNLM—Recurrent Neural Network Language Modeling Toolkit. ASRU 2011

Ming Y, Zhao Y, Wu C, Li K, Yin J (2018) Distributed and asynchronous stochastic gradient descent with variance reduction. Neurocomputing 281:27–36CrossRef

Miyamoto Y, Cho K (2016) Gated word-character recurrent language model. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1992–1997

Mulder WD, Bethard S, Moens M (2015) A survey on the application of recurrent neural networks to statistical language modeling. Comput Speech Lang 30(1):61–98CrossRef

Nagata R, Takamura H, Neubig G (2017) Adaptive spelling error correction models for learner english. Procedia Comput Sci 112:474–483CrossRef

Nejja M, Yousfi A (2015) The context in automatic spell correction. Procedia Comput Sci 73:109–114CrossRef

Nguyen AT, Nguyen TN (2015) Graph-based statistical language model for code. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, pp 858–868

Novais EMD, Tadeu TD, Paraboni I (2010) Improved text generation using N-gram statistics. Springer, Berlin, Heidelberg 6433(1):316–325

Novoa J, Fredes J, Poblete V, Yoma NB (2017) Uncertainty weighting and propagation in DNN-HMM-based speech recognition. Comput Speech Lang 47:30–46CrossRef

Park KM, Cho HC, Rim HC (2011) Utilizing various natural language processing techniques for biomedical interaction extraction. J Inf Process Syst 7(3):459–472CrossRef

Peris A, Domingo M, Casacuberta F (2017) Interactive neural machine translation. Comput Speech Lang 45:201–220CrossRef

Peter J, Klakow D (1999) Compact maximum entropy language models. In: Proceedings of the IEEE workshop on automatic speech recognition & understanding

Phangtriastu MR, Harefa J, Tanoto DF (2017) Comparison between neural network and support vector machine in optical character recognition. Procedia Comput Sci 116:351–357CrossRef

Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw Off J Int Neural Netw Soc 12(1):145–151CrossRef

Rosenfeld R, Carbonell J, Rudnicky A, Roukos S, Corporation I (1994) Adaptive statistical language modeling: a maximum entropy approach. Carnegie Mellon University, PittsburghCrossRef

Shen J, Shen J, Chen XF, Huang XY, Susilo Willy (2017) An efficient public auditing protocol with novel dynamic structure for cloud data. IEEE Trans Inf Forensics Secur 12:2402–2415. https://doi.org/10.1109/TIFS.2017.2705620 CrossRef

Shtykh RY, Roman Y, Jin Q (2011) A human-centric integrated approach to web information search and sharing. Human Centric Comput Inf Sci 1(1):2–38CrossRef

Wang S, Schuurmans D, Peng F, Zhao Y (2005) Combining statistical language models via the latent maximum entropy principle. Mach Learn 60(1–3):229–250CrossRef

Wang L, Yang Y, Min R, Chakradhar S (2017) Accelerating deep neural network training with inconsistent stochastic gradient descent. Neural Netw 93:219–229CrossRef

Wei Z, Yao S, Liu L (2006) The convergence properties of some new conjugate gradient methods. Appl Math Comput 183(2):1341–1350MathSciNetMATH

Xing EP, Ho Q, Dai W, Kim JK, Wei J (2015) Petuum: a new platform for distributed machine learning on big data. Acm Sigkdd Int Conf Knowl Discov Data Min 1(2):1335–1344

Xu W, Rudnicky AI (2000) Can artificial neural networks learn language models? In: Sixth international conference on spoken language processing, ICSLP 2000/INTERSPEECH 2000, pp 202–205

Zamora-Martinez F, Frinken V, España-Boquera S, Castro-Bleda MJ, Fischer A, Bunke H (2014) Neural network language models for off-line handwriting recognition. Pattern Recogn 47(4):1642–1652CrossRef

Zamora E, Sossa H (2017) Dendrite morphological neurons trained by stochastic gradient descent. Neurocomputing 260:420–431CrossRef

Zinkevich M, Weimer M, Smola AJ, Li L (2011) Parallelized stochastic gradient descent. Adv Neural Inf Process Syst 23(23):2595–2603

Titel: Building neural network language model with POS-based negative sampling and stochastic conjugate gradient descent
verfasst von: Jin Liu
Li Lin
Haoliang Ren
Minghao Gu
Jin Wang
Geumran Youn
Jeong-Uk Kim
Publikationsdatum: 24.04.2018
Verlag: Springer Berlin Heidelberg
Erschienen in: Soft Computing / Ausgabe 20/2018
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-018-3181-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 20/2018

A hierarchical soft computing model for parameter estimation of curve fitting problems

Ransomware detection method based on context-aware entropy analysis

A model for collecting and analyzing action data in a learning process based on activity theory

Feature selection for entity extraction from multiple biomedical corpora: A PSO-based approach

Network anomaly detection based on probabilistic analysis

Energy conscious multi-site computation offloading for mobile cloud computing