Neural networks different architectures have been investigated and applied to language model estimations by many researchers. Feed forward neural networks [
1] have been adapted in language modeling estimation [
1]; feed forward neural network language models simultaneously learn the probability functions for word sequences and build the distributed representation for individual words, but this model has a drawback in that a fixed number of words can be considered as a context window for the current or target word. To enhance the conventional feed forward neural network language models training time, researchers proposed continuous space language modeling (CSLM), which is a modular open-source toolkit of feed forward neural network language models [
13]; this model introduces support for GPU cards that enable neural networks to build models with corpora that contain more than five billion words in less than 24 hours with about a 20% perplexity reduction [
13]. Recurrent neural networks have been applied to estimate language models. However, with this model, there is no need to specify the context window size by using feedback from the hidden to the input layer as a kind of network memory for the word context. Experiment results have proved that recurrent neural networks in language models outperform n-gram language models [
5,
6,
9,
14,
15]. An RNNLM toolkit was designed to estimate the class-based language model using recurrent neural networks [
5,
6]. It can also provide functions such as an internist model evaluation using perplexity, N-best rescoring and model-based text generation. The training speed is the main RNNLM drawback, especially with large vocabulary sizes and large hidden layers. The RWTHLM [
16] is another recurrent neural network-based toolkit with long short-term memory (LSTM) implementation, and the RWTHLM toolkits BLAS library was used to support reduced training time and efficient network training. The CUED-RNNLM [
11] provides an implementation for the recurrent neural network-based model, and it has GPU support to achieve a more efficient training speed. Both the basic feed forward network and the recurrent neural network-based language models do not include any type of word level morphological features, but some researchers tried to add this type of word feature explicitly by input layer factorization. Factored neural language models (FNLM) [
12] add word features explicitly in the neural network input layer in the feed-forward based neural network language model and the factored recurrent neural network language model (fRNNLM) [
10]. They also add word features to the recurrent neural network input layer to model the results better than the basic model. Their complexity is higher than that of the original models since they add word features explicitly to the input layer. While adding these features improves network performance, it adds more complexity to the models estimation and the application performance, especially when applying it to large size vocabulary applications or language with rich morphological features. Researches tries to build RNNLM personalization models [
17] using dataset collected from social media networks, model-based RNNLM personalization aims to captures patterns posted by used and his/her related friends while another approach is feature-based where RNNLM parameters are static throw users. Recently neural-based language modeling models added as an extension to Kaldi automatic speech recognition (Kaldi-RNNLM) [
18] software, this architecture combines the use of subword features and one-hot encoding of words with high frequency to handle large vocabularies containing infrequent words. Also Kaldi-RNNLM architecture improves cross-entropy objective function to train unnormalized probabilities. In addition to feed forward network and the recurrent neural network-based language models architectures convolution neural network (CNN) [
19] was applied to estimate language models with inputs to the network in the form of character and output predictions is at the word-level.