Skip to main content

2019 | OriginalPaper | Buchkapitel

Text Classification Research Based on Improved Word2vec and CNN

verfasst von : Mengyuan Gao, Tinghui Li, Peifang Huang

Erschienen in: Service-Oriented Computing – ICSOC 2018 Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In view of the traditional classification algorithm, the problem of high feature dimension and data sparseness often occurs when text classification of short texts. This paper proposes a text feature combining neural network language model word2vec and document topic model Latent Dirichlet Allocation (LDA). Represents a matrix model. The matrix model can not only effectively represent the semantic features of the words but also convey the context features and enhance the feature expression ability of the model. The feature matrix was input into the convolutional neural network (CNN) for convolution pooling, and text classification experiments were performed. The experimental results show that the proposed matrix model has better classification effect than the traditional text classification methods based on word2vec and CNN. In the text classification accuracy rate, recall rate and F1 three evaluation indicators increased by 8.4%, 8.9% and 8.6%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 2015, pp. 373–382 (2015) Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 2015, pp. 373–382 (2015)
2.
Zurück zum Zitat Zhang, Y., Jin, R., Zhou, Z.H.: Understanding bag-of-words model: a statistical framework. Int. J. Mach. Learn. Cybern. 1, 43–52 (2010)CrossRef Zhang, Y., Jin, R., Zhou, Z.H.: Understanding bag-of-words model: a statistical framework. Int. J. Mach. Learn. Cybern. 1, 43–52 (2010)CrossRef
3.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119 (2013)
4.
Zurück zum Zitat Zheng, X., Chen, H., Xu, T.: Deep learning for Chinese word segmentation and POS tagging. In: EMNLP, pp. 647–657 (2013) Zheng, X., Chen, H., Xu, T.: Deep learning for Chinese word segmentation and POS tagging. In: EMNLP, pp. 647–657 (2013)
5.
Zurück zum Zitat Xue, B., Fu, C., Shaobin, Z.: A study on sentiment computing and classification of Sina Weibo with Word2vec. In: 2014 IEEE International Congress on Big Data, pp. 358–363 (2014) Xue, B., Fu, C., Shaobin, Z.: A study on sentiment computing and classification of Sina Weibo with Word2vec. In: 2014 IEEE International Congress on Big Data, pp. 358–363 (2014)
6.
Zurück zum Zitat Xing, C., Wang, D., Zhang, X., Liu, C.: Document classification with distributions of word vectors. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014 (2014) Xing, C., Wang, D., Zhang, X., Liu, C.: Document classification with distributions of word vectors. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014 (2014)
7.
Zurück zum Zitat Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on International Conference on Machine Learning, pp. 1–9 (2014) Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on International Conference on Machine Learning, pp. 1–9 (2014)
8.
Zurück zum Zitat Kim, H.K., Kim, H., Cho, S.: Bag-of-concepts: comprehending document representation through clustering words in distributed representation. Neurocomputing 266, 336–352 (2017)CrossRef Kim, H.K., Kim, H., Cho, S.: Bag-of-concepts: comprehending document representation through clustering words in distributed representation. Neurocomputing 266, 336–352 (2017)CrossRef
9.
Zurück zum Zitat Agarwal, A., Xie, B., Vovsha, I.: Sentiment analysis of Twitter data. In: The Workshop on Languages in Social Media, pp. 30–38. Association for Computational Linguistics (2011) Agarwal, A., Xie, B., Vovsha, I.: Sentiment analysis of Twitter data. In: The Workshop on Languages in Social Media, pp. 30–38. Association for Computational Linguistics (2011)
10.
Zurück zum Zitat Yang, F., Li, Z., Zeng, S., Hao, B., Qi, P., Pang, Z.: A novel method for wireless communication signal modulation recognition in smart grid. J. Commun. 11, 813–818 (2016) Yang, F., Li, Z., Zeng, S., Hao, B., Qi, P., Pang, Z.: A novel method for wireless communication signal modulation recognition in smart grid. J. Commun. 11, 813–818 (2016)
11.
Zurück zum Zitat Jie, C., Zhiyi, F., Dan, Z., Guannan, Q.: Network traffic classification using feature selection and parameter optimization. Int. J. Appl. Eng. Res. 10, 5663–5679 (2015) Jie, C., Zhiyi, F., Dan, Z., Guannan, Q.: Network traffic classification using feature selection and parameter optimization. Int. J. Appl. Eng. Res. 10, 5663–5679 (2015)
12.
Zurück zum Zitat Luong, M., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015) Luong, M., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015)
13.
Zurück zum Zitat Kim, Y.: Convolutional neural networks for sentence classification. In: The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751 (2014) Kim, Y.: Convolutional neural networks for sentence classification. In: The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751 (2014)
14.
Zurück zum Zitat Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: The 8th International Joint Conference on Natural Language Processing, pp. 253–263 (2015) Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: The 8th International Joint Conference on Natural Language Processing, pp. 253–263 (2015)
15.
Zurück zum Zitat Mathew, J., Radhakrishnan, D.: An FIR digital filter using one-hot coded residue representation. In: IEEE, pp. 1–4 (2015) Mathew, J., Radhakrishnan, D.: An FIR digital filter using one-hot coded residue representation. In: IEEE, pp. 1–4 (2015)
16.
Zurück zum Zitat Ming, T., Lei, Z., Xianchun, Z.: Document vector representation based on Word2Vec. Comput. Sci. 43, 214–219 (2016) Ming, T., Lei, Z., Xianchun, Z.: Document vector representation based on Word2Vec. Comput. Sci. 43, 214–219 (2016)
17.
Zurück zum Zitat Carrera-trejo, V., Sidorov, G., Miranda-jiménez, S., Ibarra, M.M., Martínez, R.C.: Latent Dirichlet allocation complement in the vector space model for multi-label text classification. Int. J. Comb. Optim. Probl. Inform. 6, 7–19 (2015) Carrera-trejo, V., Sidorov, G., Miranda-jiménez, S., Ibarra, M.M., Martínez, R.C.: Latent Dirichlet allocation complement in the vector space model for multi-label text classification. Int. J. Comb. Optim. Probl. Inform. 6, 7–19 (2015)
18.
Zurück zum Zitat Taiyong, G.: A method based on TF-IDF and improved support vector machine research on Chinese text categorization. Comput. Eng. 37, 141–145 (2016) Taiyong, G.: A method based on TF-IDF and improved support vector machine research on Chinese text categorization. Comput. Eng. 37, 141–145 (2016)
19.
Zurück zum Zitat Yuting, S., Dehua, X.: Research on Chinese text classification based on LDA and SVM. Res. Dev. 2, 18–23 (2016) Yuting, S., Dehua, X.: Research on Chinese text classification based on LDA and SVM. Res. Dev. 2, 18–23 (2016)
20.
Zurück zum Zitat Nguyen, T.H., Grishman, R.: Relation extraction: perspective from convolutional neural networks. In: Workshop on Vector Modeling for NLP, pp. 39–48 (2015) Nguyen, T.H., Grishman, R.: Relation extraction: perspective from convolutional neural networks. In: Workshop on Vector Modeling for NLP, pp. 39–48 (2015)
Metadaten
Titel
Text Classification Research Based on Improved Word2vec and CNN
verfasst von
Mengyuan Gao
Tinghui Li
Peifang Huang
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-17642-6_11

Premium Partner