Skip to main content
Erschienen in: Soft Computing 20/2018

24.04.2018 | Focus

Building neural network language model with POS-based negative sampling and stochastic conjugate gradient descent

verfasst von: Jin Liu, Li Lin, Haoliang Ren, Minghao Gu, Jin Wang, Geumran Youn, Jeong-Uk Kim

Erschienen in: Soft Computing | Ausgabe 20/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Traditional statistical language model is a probability distribution over sequences of words. It has the problem of curse of dimensionality incurred by the exponentially increasing number of possible sequences of words in training text. To solve this issue, neural network language models are proposed by representing words in a distributed way. Due to computation cost on updating a large number of word vectors’ gradients, neural network model needs much training time to converge. To alleviate this problem, in this paper, we propose a gradient descent algorithm based on stochastic conjugate gradient to accelerate the convergence of the neural network’s parameters. To improve the performance of the neural language model, we also propose a negative sampling algorithm based on POS (part of speech) tagging, which can optimize the negative sampling process and improve the quality of the final language model. A novel evaluation model is also used with perplexity to demonstrate the performance of the improved language model. Experiment results prove the effectiveness of our novel methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bahl LR, Brown PF, Souza PVD, Mercer RL (1990) A tree-based statistical language model for natural language speech recognition. Read Speech Recogn 37(7):507–514CrossRef Bahl LR, Brown PF, Souza PVD, Mercer RL (1990) A tree-based statistical language model for natural language speech recognition. Read Speech Recogn 37(7):507–514CrossRef
Zurück zum Zitat Bengio Y, Ducharme R, Vincent P, Jauvin P, Jaz K (2003) A neural probabilistic language model. J Mach Learn Res (JMLR) 3:1137–1155MATH Bengio Y, Ducharme R, Vincent P, Jauvin P, Jaz K (2003) A neural probabilistic language model. J Mach Learn Res (JMLR) 3:1137–1155MATH
Zurück zum Zitat Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT2010 Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT2010
Zurück zum Zitat Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479 Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479
Zurück zum Zitat Carneiro HCC, Franca FMG, Lima PMV (2015) Multilingual part-of-speech tagging with weightless neural networks. Neural Netw 66:11–21CrossRef Carneiro HCC, Franca FMG, Lima PMV (2015) Multilingual part-of-speech tagging with weightless neural networks. Neural Netw 66:11–21CrossRef
Zurück zum Zitat Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Proceedings of the international conference on neural information processing systems. MIT Press, pp 1646–1654 Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Proceedings of the international conference on neural information processing systems. MIT Press, pp 1646–1654
Zurück zum Zitat Feyzmahdavian HR, Aytekin A, Johansson M (2014) A delayed proximal gradient method with linear convergence rate. In: Proceedings of IEEE international workshop on machine learning for signal processing. IEEE, pp 1–6 Feyzmahdavian HR, Aytekin A, Johansson M (2014) A delayed proximal gradient method with linear convergence rate. In: Proceedings of IEEE international workshop on machine learning for signal processing. IEEE, pp 1–6
Zurück zum Zitat Finogeev AG, Alexey G, Parygin Danila S, Finogeev Anton A (2017) The convergence computing model for big sensor data mining and knowledge discovery. Human Centric Comput Inf Sci 7(1):11–27CrossRef Finogeev AG, Alexey G, Parygin Danila S, Finogeev Anton A (2017) The convergence computing model for big sensor data mining and knowledge discovery. Human Centric Comput Inf Sci 7(1):11–27CrossRef
Zurück zum Zitat Fu ZJ, Shu JG, Wang J, Liu YL, Lee SY (2015) Privacy-preserving smart similarity search based on simhash over encrypted data in cloud computing. J Internet Technol 16(3):453–460 Fu ZJ, Shu JG, Wang J, Liu YL, Lee SY (2015) Privacy-preserving smart similarity search based on simhash over encrypted data in cloud computing. J Internet Technol 16(3):453–460
Zurück zum Zitat Goldberg Y, Levy O (2014) word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. Eprint Arxiv: 1-5 Goldberg Y, Levy O (2014) word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. Eprint Arxiv:​ 1-5
Zurück zum Zitat Hinton GE (1986) Learning distributed representations of concepts. In: Proceedings of the 8th Annual Conference of the Cognitive Science Society, pp 1–12 Hinton GE (1986) Learning distributed representations of concepts. In: Proceedings of the 8th Annual Conference of the Cognitive Science Society, pp 1–12
Zurück zum Zitat Huang F, Ahuja A, Downey D, Yang Y, Guo Y (2016) Learning representations for weakly supervised natural language processing tasks. Comput Linguist 40(1):85–120CrossRef Huang F, Ahuja A, Downey D, Yang Y, Guo Y (2016) Learning representations for weakly supervised natural language processing tasks. Comput Linguist 40(1):85–120CrossRef
Zurück zum Zitat Jelinek F (1997) Statistical method for speech recognition. A Bradford Book, Cambridge Jelinek F (1997) Statistical method for speech recognition. A Bradford Book, Cambridge
Zurück zum Zitat Jiang M, Zhu X, Yuan B (1999) Smoothing algorithm of the task adaptation Chinses N-gram model. Tsinghua Univ (Sci&Tech) Jiang M, Zhu X, Yuan B (1999) Smoothing algorithm of the task adaptation Chinses N-gram model. Tsinghua Univ (Sci&Tech)
Zurück zum Zitat Jurafsky D, Martin JH (2015) Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition. Int J Comput Sci Eng 2(08):2670–2676 Jurafsky D, Martin JH (2015) Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition. Int J Comput Sci Eng 2(08):2670–2676
Zurück zum Zitat Karpov A, Markov K, Kipyatkova I, Vazhenina D, Ronzhin A (2014) Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun 56:213–228CrossRef Karpov A, Markov K, Kipyatkova I, Vazhenina D, Ronzhin A (2014) Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun 56:213–228CrossRef
Zurück zum Zitat Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (AAAI-16), pp 2741–2749 Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (AAAI-16), pp 2741–2749
Zurück zum Zitat Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language model. In: Proceedings of the 31st international conference on machine learning, pp 595–604 Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language model. In: Proceedings of the 31st international conference on machine learning, pp 595–604
Zurück zum Zitat Kombrink S, Mikolov T, Karafiát M, Burget L (2011) Recurrent neural network based language modeling in meeting recognition. In: Interspeech, Conference of the international speech communication association Florence, Italy, August, pp 2877–2880 Kombrink S, Mikolov T, Karafiát M, Burget L (2011) Recurrent neural network based language modeling in meeting recognition. In: Interspeech, Conference of the international speech communication association Florence, Italy, August, pp 2877–2880
Zurück zum Zitat Lafferty JD, Mccallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Eighteenth Int Conf Mach Learn 3(2):282–289 Lafferty JD, Mccallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Eighteenth Int Conf Mach Learn 3(2):282–289
Zurück zum Zitat Lebret R, Grangier D, Auli M (2016) Neural text generation from structured data with application to the biography domain. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1203–1213 Lebret R, Grangier D, Auli M (2016) Neural text generation from structured data with application to the biography domain. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1203–1213
Zurück zum Zitat Li Q, Chen YP (2010) Personalized text snippet extraction using statistical language models. Pattern Recogn 43(1):378–386CrossRef Li Q, Chen YP (2010) Personalized text snippet extraction using statistical language models. Pattern Recogn 43(1):378–386CrossRef
Zurück zum Zitat Li M, Zhang T, Chen Y, Smola AJ (2014) Efficient mini-batch training for stochastic optimization. Acm Sigkdd Int Conf Knowl Discov Data Min 2014:661–670CrossRef Li M, Zhang T, Chen Y, Smola AJ (2014) Efficient mini-batch training for stochastic optimization. Acm Sigkdd Int Conf Knowl Discov Data Min 2014:661–670CrossRef
Zurück zum Zitat Mikolov T, Kombrink S, Deoras A, Burget L, Cernocky JH (2011) RNNLM—Recurrent Neural Network Language Modeling Toolkit. ASRU 2011 Mikolov T, Kombrink S, Deoras A, Burget L, Cernocky JH (2011) RNNLM—Recurrent Neural Network Language Modeling Toolkit. ASRU 2011
Zurück zum Zitat Ming Y, Zhao Y, Wu C, Li K, Yin J (2018) Distributed and asynchronous stochastic gradient descent with variance reduction. Neurocomputing 281:27–36CrossRef Ming Y, Zhao Y, Wu C, Li K, Yin J (2018) Distributed and asynchronous stochastic gradient descent with variance reduction. Neurocomputing 281:27–36CrossRef
Zurück zum Zitat Miyamoto Y, Cho K (2016) Gated word-character recurrent language model. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1992–1997 Miyamoto Y, Cho K (2016) Gated word-character recurrent language model. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1992–1997
Zurück zum Zitat Mulder WD, Bethard S, Moens M (2015) A survey on the application of recurrent neural networks to statistical language modeling. Comput Speech Lang 30(1):61–98CrossRef Mulder WD, Bethard S, Moens M (2015) A survey on the application of recurrent neural networks to statistical language modeling. Comput Speech Lang 30(1):61–98CrossRef
Zurück zum Zitat Nagata R, Takamura H, Neubig G (2017) Adaptive spelling error correction models for learner english. Procedia Comput Sci 112:474–483CrossRef Nagata R, Takamura H, Neubig G (2017) Adaptive spelling error correction models for learner english. Procedia Comput Sci 112:474–483CrossRef
Zurück zum Zitat Nejja M, Yousfi A (2015) The context in automatic spell correction. Procedia Comput Sci 73:109–114CrossRef Nejja M, Yousfi A (2015) The context in automatic spell correction. Procedia Comput Sci 73:109–114CrossRef
Zurück zum Zitat Nguyen AT, Nguyen TN (2015) Graph-based statistical language model for code. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, pp 858–868 Nguyen AT, Nguyen TN (2015) Graph-based statistical language model for code. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, pp 858–868
Zurück zum Zitat Novais EMD, Tadeu TD, Paraboni I (2010) Improved text generation using N-gram statistics. Springer, Berlin, Heidelberg 6433(1):316–325 Novais EMD, Tadeu TD, Paraboni I (2010) Improved text generation using N-gram statistics. Springer, Berlin, Heidelberg 6433(1):316–325
Zurück zum Zitat Novoa J, Fredes J, Poblete V, Yoma NB (2017) Uncertainty weighting and propagation in DNN-HMM-based speech recognition. Comput Speech Lang 47:30–46CrossRef Novoa J, Fredes J, Poblete V, Yoma NB (2017) Uncertainty weighting and propagation in DNN-HMM-based speech recognition. Comput Speech Lang 47:30–46CrossRef
Zurück zum Zitat Park KM, Cho HC, Rim HC (2011) Utilizing various natural language processing techniques for biomedical interaction extraction. J Inf Process Syst 7(3):459–472CrossRef Park KM, Cho HC, Rim HC (2011) Utilizing various natural language processing techniques for biomedical interaction extraction. J Inf Process Syst 7(3):459–472CrossRef
Zurück zum Zitat Peris A, Domingo M, Casacuberta F (2017) Interactive neural machine translation. Comput Speech Lang 45:201–220CrossRef Peris A, Domingo M, Casacuberta F (2017) Interactive neural machine translation. Comput Speech Lang 45:201–220CrossRef
Zurück zum Zitat Peter J, Klakow D (1999) Compact maximum entropy language models. In: Proceedings of the IEEE workshop on automatic speech recognition & understanding Peter J, Klakow D (1999) Compact maximum entropy language models. In: Proceedings of the IEEE workshop on automatic speech recognition & understanding
Zurück zum Zitat Phangtriastu MR, Harefa J, Tanoto DF (2017) Comparison between neural network and support vector machine in optical character recognition. Procedia Comput Sci 116:351–357CrossRef Phangtriastu MR, Harefa J, Tanoto DF (2017) Comparison between neural network and support vector machine in optical character recognition. Procedia Comput Sci 116:351–357CrossRef
Zurück zum Zitat Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw Off J Int Neural Netw Soc 12(1):145–151CrossRef Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw Off J Int Neural Netw Soc 12(1):145–151CrossRef
Zurück zum Zitat Rosenfeld R, Carbonell J, Rudnicky A, Roukos S, Corporation I (1994) Adaptive statistical language modeling: a maximum entropy approach. Carnegie Mellon University, PittsburghCrossRef Rosenfeld R, Carbonell J, Rudnicky A, Roukos S, Corporation I (1994) Adaptive statistical language modeling: a maximum entropy approach. Carnegie Mellon University, PittsburghCrossRef
Zurück zum Zitat Shtykh RY, Roman Y, Jin Q (2011) A human-centric integrated approach to web information search and sharing. Human Centric Comput Inf Sci 1(1):2–38CrossRef Shtykh RY, Roman Y, Jin Q (2011) A human-centric integrated approach to web information search and sharing. Human Centric Comput Inf Sci 1(1):2–38CrossRef
Zurück zum Zitat Wang S, Schuurmans D, Peng F, Zhao Y (2005) Combining statistical language models via the latent maximum entropy principle. Mach Learn 60(1–3):229–250CrossRef Wang S, Schuurmans D, Peng F, Zhao Y (2005) Combining statistical language models via the latent maximum entropy principle. Mach Learn 60(1–3):229–250CrossRef
Zurück zum Zitat Wang L, Yang Y, Min R, Chakradhar S (2017) Accelerating deep neural network training with inconsistent stochastic gradient descent. Neural Netw 93:219–229CrossRef Wang L, Yang Y, Min R, Chakradhar S (2017) Accelerating deep neural network training with inconsistent stochastic gradient descent. Neural Netw 93:219–229CrossRef
Zurück zum Zitat Wei Z, Yao S, Liu L (2006) The convergence properties of some new conjugate gradient methods. Appl Math Comput 183(2):1341–1350MathSciNetMATH Wei Z, Yao S, Liu L (2006) The convergence properties of some new conjugate gradient methods. Appl Math Comput 183(2):1341–1350MathSciNetMATH
Zurück zum Zitat Xing EP, Ho Q, Dai W, Kim JK, Wei J (2015) Petuum: a new platform for distributed machine learning on big data. Acm Sigkdd Int Conf Knowl Discov Data Min 1(2):1335–1344 Xing EP, Ho Q, Dai W, Kim JK, Wei J (2015) Petuum: a new platform for distributed machine learning on big data. Acm Sigkdd Int Conf Knowl Discov Data Min 1(2):1335–1344
Zurück zum Zitat Xu W, Rudnicky AI (2000) Can artificial neural networks learn language models? In: Sixth international conference on spoken language processing, ICSLP 2000/INTERSPEECH 2000, pp 202–205 Xu W, Rudnicky AI (2000) Can artificial neural networks learn language models? In: Sixth international conference on spoken language processing, ICSLP 2000/INTERSPEECH 2000, pp 202–205
Zurück zum Zitat Zamora-Martinez F, Frinken V, España-Boquera S, Castro-Bleda MJ, Fischer A, Bunke H (2014) Neural network language models for off-line handwriting recognition. Pattern Recogn 47(4):1642–1652CrossRef Zamora-Martinez F, Frinken V, España-Boquera S, Castro-Bleda MJ, Fischer A, Bunke H (2014) Neural network language models for off-line handwriting recognition. Pattern Recogn 47(4):1642–1652CrossRef
Zurück zum Zitat Zamora E, Sossa H (2017) Dendrite morphological neurons trained by stochastic gradient descent. Neurocomputing 260:420–431CrossRef Zamora E, Sossa H (2017) Dendrite morphological neurons trained by stochastic gradient descent. Neurocomputing 260:420–431CrossRef
Zurück zum Zitat Zinkevich M, Weimer M, Smola AJ, Li L (2011) Parallelized stochastic gradient descent. Adv Neural Inf Process Syst 23(23):2595–2603 Zinkevich M, Weimer M, Smola AJ, Li L (2011) Parallelized stochastic gradient descent. Adv Neural Inf Process Syst 23(23):2595–2603
Metadaten
Titel
Building neural network language model with POS-based negative sampling and stochastic conjugate gradient descent
verfasst von
Jin Liu
Li Lin
Haoliang Ren
Minghao Gu
Jin Wang
Geumran Youn
Jeong-Uk Kim
Publikationsdatum
24.04.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 20/2018
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-018-3181-2

Weitere Artikel der Ausgabe 20/2018

Soft Computing 20/2018 Zur Ausgabe