Skip to main content

2018 | OriginalPaper | Buchkapitel

A Semantic Representation Enhancement Method for Chinese News Headline Classification

verfasst von : Zhongbo Yin, Jintao Tang, Chengsen Ru, Wei Luo, Zhunchen Luo, Xiaolei Ma

Erschienen in: Natural Language Processing and Chinese Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recently there has been an increasing research interest in short text such as news headline. Due to the inherent sparsity of short text, the current text classification methods perform badly when applied to the classification of news headlines. To overcome this problem, a novel method which enhances the semantic representation of headlines is proposed in this paper. Firstly, we add some keywords extracted from the most similar news to expand the word features. Secondly, we use the corpus in news domain to pre-train the word embedding so as to enhance the word representation. Moreover, Fasttext classifier, which uses a liner method to classify text with fast speed and high accuracy, is adopted for news headline classification. On the task for Chinese news headline categorization in NLPCC2017, the proposed method achieved 83.1% of the F-measure, which got the first rank in 33 teams.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Tang, Q., Guo, Q.-L., Li, Y.-M.: Similarity computing of documents based on VSMJ. Appl. Res. Comput. 25(11), 3256–3258 (2008) Tang, Q., Guo, Q.-L., Li, Y.-M.: Similarity computing of documents based on VSMJ. Appl. Res. Comput. 25(11), 3256–3258 (2008)
2.
Zurück zum Zitat Corrado, G., Mikolov, T., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Corrado, G., Mikolov, T., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
3.
4.
Zurück zum Zitat Lachiche, N., Flach, P.A.: Naive Bayesian classification of structured data. Mach. Learn. 57(3), 233–269 (2004)CrossRefMATH Lachiche, N., Flach, P.A.: Naive Bayesian classification of structured data. Mach. Learn. 57(3), 233–269 (2004)CrossRefMATH
5.
Zurück zum Zitat Sontag, D., Rush, A.M., Kim, Y., Jernite, Y.: Character-aware neural language models. Comput. Sci. 2741–2749 (2015) Sontag, D., Rush, A.M., Kim, Y., Jernite, Y.: Character-aware neural language models. Comput. Sci. 2741–2749 (2015)
7.
8.
Zurück zum Zitat Horiguchi, S., Phan, X.H., Nguyen, L.M.: Learning to classify short and sparse text and web with hidden topics from large-scale data collections. In: WWW 2008 Refereed Track: Data Mining - Learning, pp. 91–100 (2008) Horiguchi, S., Phan, X.H., Nguyen, L.M.: Learning to classify short and sparse text and web with hidden topics from large-scale data collections. In: WWW 2008 Refereed Track: Data Mining - Learning, pp. 91–100 (2008)
9.
Zurück zum Zitat Hu, H., Fan, X.: A new model for Chinese short-text classification considering feature expansion. In: International Conference on Artificial Intelligence and Computational Intelligence, vol. 2, pp. 7–11 (2010) Hu, H., Fan, X.: A new model for Chinese short-text classification considering feature expansion. In: International Conference on Artificial Intelligence and Computational Intelligence, vol. 2, pp. 7–11 (2010)
10.
Zurück zum Zitat Xu, J., Yang, L., Li., C., Zhou, Y., Xu, B.: Compositional recurrent neural networks for Chinese short text classification. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp. 137–144 (2016) Xu, J., Yang, L., Li., C., Zhou, Y., Xu, B.: Compositional recurrent neural networks for Chinese short text classification. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp. 137–144 (2016)
11.
Zurück zum Zitat Cai, Y.Q., Chen, Y.W., Wang, J.L., et al.: A method for Chinese text classification based on apparent semantics and latent aspects. J. Ambient Intell. Human. Comput. 6(4), 473–480 (2015)CrossRef Cai, Y.Q., Chen, Y.W., Wang, J.L., et al.: A method for Chinese text classification based on apparent semantics and latent aspects. J. Ambient Intell. Human. Comput. 6(4), 473–480 (2015)CrossRef
12.
Zurück zum Zitat Probabilistic latent semantic analysis. Proceedings of 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, pp. 289–296 (1999) Probabilistic latent semantic analysis. Proceedings of 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, pp. 289–296 (1999)
13.
Zurück zum Zitat Luo, W., Du, J.X., Chen, Y.W., Zhou, Q.: Classification of Chinese text based on recognition of semantic topics. Cogn. Comput. 8(1), 114–124 (2016)CrossRef Luo, W., Du, J.X., Chen, Y.W., Zhou, Q.: Classification of Chinese text based on recognition of semantic topics. Cogn. Comput. 8(1), 114–124 (2016)CrossRef
14.
Zurück zum Zitat Liu, X., Wu, X., Sang, L., Xie, F.: Wefest: word embedding feature expansion for short text classification. In: IEEE International Conference on Data Mining Workshops (2017) Liu, X., Wu, X., Sang, L., Xie, F.: Wefest: word embedding feature expansion for short text classification. In: IEEE International Conference on Data Mining Workshops (2017)
15.
Zurück zum Zitat Huang, J., Zhu, J., Yao, D., Bi, J.: A word distributed representation based framework for large-scale short text classification. In: International Joint Conference on Neural Networks, pp. 1–7 (2015) Huang, J., Zhu, J., Yao, D., Bi, J.: A word distributed representation based framework for large-scale short text classification. In: International Joint Conference on Neural Networks, pp. 1–7 (2015)
16.
Zurück zum Zitat Zhang, Z., Li, T., Zhang., Y., Ma, C., Wan, X.: Short text classification based on semantics. In: International Conference on Intelligent Computing, vol. 9227, pp. 463–470 (2015) Zhang, Z., Li, T., Zhang., Y., Ma, C., Wan, X.: Short text classification based on semantics. In: International Conference on Intelligent Computing, vol. 9227, pp. 463–470 (2015)
17.
Zurück zum Zitat Zhang, H., Yin, C., Xiang, J., A new SVM method for short text classification based on semi-supervised learning. In: Advanced Information Technology and Sensor Application (AITS), pp. 100–103 (2016) Zhang, H., Yin, C., Xiang, J., A new SVM method for short text classification based on semi-supervised learning. In: Advanced Information Technology and Sensor Application (AITS), pp. 100–103 (2016)
18.
Zurück zum Zitat Xu, J., Wang, P., Xua, B., et al.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174(PB), 806–814 (2016) Xu, J., Wang, P., Xua, B., et al.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174(PB), 806–814 (2016)
19.
Zurück zum Zitat Sequential short-text classification with recurrent and convolutional neural networks. Proceedings of NAACL-HLT 2016, pp. 515–520 (2016) Sequential short-text classification with recurrent and convolutional neural networks. Proceedings of NAACL-HLT 2016, pp. 515–520 (2016)
20.
Zurück zum Zitat Huiyou, C., Yongjun, H., Jiaxin, J.: A new method of keywords extraction for Chinese short - text classification. New Technol. Libr. Inf. Serv. 234(6), 42–48 (2013) Huiyou, C., Yongjun, H., Jiaxin, J.: A new method of keywords extraction for Chinese short - text classification. New Technol. Libr. Inf. Serv. 234(6), 42–48 (2013)
21.
Zurück zum Zitat Jieba Chinese text segmentation, June 2017 Jieba Chinese text segmentation, June 2017
22.
23.
Zurück zum Zitat Senécal, J.S., Morin, F., Gauvain, J.L., Bengio, Y., Schwenk, H.: Neural probabilistic language models. J. Mach. Learn. Res. 3(6), 1137–1155 (2006). Springer, Heidelberg Senécal, J.S., Morin, F., Gauvain, J.L., Bengio, Y., Schwenk, H.: Neural probabilistic language models. J. Mach. Learn. Res. 3(6), 1137–1155 (2006). Springer, Heidelberg
24.
Zurück zum Zitat Dagan, I., Levy, O., Goldberg, Y.: Improving distributional similarity with lessons learned from word embeddings. Bulletin De La Société Botanique De France 75(3), 552–555 (2015) Dagan, I., Levy, O., Goldberg, Y.: Improving distributional similarity with lessons learned from word embeddings. Bulletin De La Société Botanique De France 75(3), 552–555 (2015)
25.
Zurück zum Zitat Corpus for Chinese news headline categorization, June 2017 Corpus for Chinese news headline categorization, June 2017
26.
Zurück zum Zitat Schmidhuber, J., Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Schmidhuber, J., Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
Metadaten
Titel
A Semantic Representation Enhancement Method for Chinese News Headline Classification
verfasst von
Zhongbo Yin
Jintao Tang
Chengsen Ru
Wei Luo
Zhunchen Luo
Xiaolei Ma
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-73618-1_27