Skip to main content

2018 | OriginalPaper | Buchkapitel

Addressing Domain Adaptation for Chinese Word Segmentation with Instances-Based Transfer Learning

verfasst von : Yanna Zhang, Jinan Xu, Guoyi Miao, Yufeng Chen, Yujie Zhang

Erschienen in: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recent studies have shown effectiveness in using neural networks for Chinese Word Segmentation (CWS). However, these models, constrained by the domain and size of the training corpus, do not work well in domain adaptation. In this paper, we propose a novel instance-transferring method, which use valuable target domain annotated instances to improve CWS on different domains. Specifically, we introduce semantic similarity computation based on character-based n-gram embedding to select instances. Furthermore, training sentences similar to instances are used to help annotate instances. Experimental results show that our method can effectively boost cross-domain segmentation performance. We achieve state-of-the-art results on Internet literatures datasets, and competitive results to the best reported on micro-blog datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Pei, W., Ge, T., Chang, B.: Max-margin tensor neural network for Chinese word segmentation. In: The 52nd Annual Meeting of the Association for Computational Linguistics, pp. 293–303, Baltimore, Maryland (2014) Pei, W., Ge, T., Chang, B.: Max-margin tensor neural network for Chinese word segmentation. In: The 52nd Annual Meeting of the Association for Computational Linguistics, pp. 293–303, Baltimore, Maryland (2014)
2.
Zurück zum Zitat Chen, X., Qiu, X., Zhu, C., Huang, X.: Gated recursive neural network for Chinese word segmentation. In: The 53rd Annual Meeting of the Association for Computer Linguistics, pp. 1744–1753 (2015) Chen, X., Qiu, X., Zhu, C., Huang, X.: Gated recursive neural network for Chinese word segmentation. In: The 53rd Annual Meeting of the Association for Computer Linguistics, pp. 1744–1753 (2015)
3.
Zurück zum Zitat Cai, D., Zhao, H.: Neural word segmentation learning for Chinese. In: The 54th Annual Meeting of the Association for Computer Linguistics, pp. 409–420 (2016) Cai, D., Zhao, H.: Neural word segmentation learning for Chinese. In: The 54th Annual Meeting of the Association for Computer Linguistics, pp. 409–420 (2016)
4.
Zurück zum Zitat Chen, X., Shi, Z., Qiu, X., Huang, X.: Adversarial multi-criteria learning for Chinese word segmentation, pp. 1193–1203 (2017) Chen, X., Shi, Z., Qiu, X., Huang, X.: Adversarial multi-criteria learning for Chinese word segmentation, pp. 1193–1203 (2017)
5.
Zurück zum Zitat Qiu, L., Zhang, Y.: Word segmentation for Chinese novels. In: AAAI, pp. 2440–2446 (2015) Qiu, L., Zhang, Y.: Word segmentation for Chinese novels. In: AAAI, pp. 2440–2446 (2015)
6.
Zurück zum Zitat Liu, Y., Zhang, Y.: Unsupervised domain adaptation for joint segmentation and POS-tagging. In: Proceedings of COLING 2012, Posters, pp. 745–754. The COLING 2012 Organizing Committee (2012) Liu, Y., Zhang, Y.: Unsupervised domain adaptation for joint segmentation and POS-tagging. In: Proceedings of COLING 2012, Posters, pp. 745–754. The COLING 2012 Organizing Committee (2012)
7.
Zurück zum Zitat Daume, H., Marcu, D.: Domain adaptation for statistical classifiers. J. Artif. Intell. Res. 26, 101–126 (2006)MathSciNetCrossRef Daume, H., Marcu, D.: Domain adaptation for statistical classifiers. J. Artif. Intell. Res. 26, 101–126 (2006)MathSciNetCrossRef
8.
Zurück zum Zitat Liu, Y., Zhang, Y., Che, W., Liu, T., Wu, F.: Domain adaptation for CRF-based Chinese word segmentation using free annotations. In: EMNLP (2014) Liu, Y., Zhang, Y., Che, W., Liu, T., Wu, F.: Domain adaptation for CRF-based Chinese word segmentation using free annotations. In: EMNLP (2014)
9.
Zurück zum Zitat Zhang, M., Zhang, Y., Che, W., Liu, T.: Type-supervised domain adaptation for joint segmentation and pos-tagging. In: EACL, pp. 588–597(2014) Zhang, M., Zhang, Y., Che, W., Liu, T.: Type-supervised domain adaptation for joint segmentation and pos-tagging. In: EACL, pp. 588–597(2014)
10.
Zurück zum Zitat Jiang, W., Huang, L., Liu, Q., Lü, Y.: A cascaded linear model for joint chinese word segmentation and part-of-speech tagging. In: Meeting of the Association for Computational Linguistics, pp. 897–904, 15–20 June 2008, Columbus, Ohio, USA. DBLP (2008) Jiang, W., Huang, L., Liu, Q., Lü, Y.: A cascaded linear model for joint chinese word segmentation and part-of-speech tagging. In: Meeting of the Association for Computational Linguistics, pp. 897–904, 15–20 June 2008, Columbus, Ohio, USA. DBLP (2008)
11.
Zurück zum Zitat Xu, J., Ma, S., Zhang, Y., Wei, B., Cai, X., Sun, X.: Transfer deep learning for low-resource chinese word segmentation with a novel neural network. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Yu. (eds.) NLPCC 2017. LNCS (LNAI), vol. 10619, pp. 721–730. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73618-1_62CrossRef Xu, J., Ma, S., Zhang, Y., Wei, B., Cai, X., Sun, X.: Transfer deep learning for low-resource chinese word segmentation with a novel neural network. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Yu. (eds.) NLPCC 2017. LNCS (LNAI), vol. 10619, pp. 721–730. Springer, Cham (2018). https://​doi.​org/​10.​1007/​978-3-319-73618-1_​62CrossRef
12.
Zurück zum Zitat Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: EMNLP, pp. 120–128 (2006) Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: EMNLP, pp. 120–128 (2006)
13.
Zurück zum Zitat Choi, H., Cho, K., Bengio, Y.: Context-dependent word representation for neural machine translation. Comput. Speech Lang. 45, 149–160 (2016)CrossRef Choi, H., Cho, K., Bengio, Y.: Context-dependent word representation for neural machine translation. Comput. Speech Lang. 45, 149–160 (2016)CrossRef
14.
Zurück zum Zitat Qin, L., Zhang, Z., Zhao, H.: Implicit discourse relation recognition with context-aware character-enhanced embeddings. In: The 26th International Conference on Computational Linguistics (COLING), Osaka, Japan, December (2016) Qin, L., Zhang, Z., Zhao, H.: Implicit discourse relation recognition with context-aware character-enhanced embeddings. In: The 26th International Conference on Computational Linguistics (COLING), Osaka, Japan, December (2016)
16.
Zurück zum Zitat Zhou, H., Yu, Z., Zhang, Y., Huang, S., Dai, X.: Word-context character embeddings for chinese word segmentation. In: Conference on Empirical Methods in Natural Language Processing, pp. 760–766 (2017) Zhou, H., Yu, Z., Zhang, Y., Huang, S., Dai, X.: Word-context character embeddings for chinese word segmentation. In: Conference on Empirical Methods in Natural Language Processing, pp. 760–766 (2017)
17.
Zurück zum Zitat Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition, pp. 260–270 (2016) Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition, pp. 260–270 (2016)
18.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
19.
Zurück zum Zitat Zheng, X., Chen, H., Xu, T.: Deep learning for Chinese word segmentation and POS tagging. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 647–657. Association for Computational Linguistics (2013) Zheng, X., Chen, H., Xu, T.: Deep learning for Chinese word segmentation and POS tagging. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 647–657. Association for Computational Linguistics (2013)
21.
Zurück zum Zitat Lin, D., An, X., Zhang, J.: Double-bootstrapping source data selection for instance-based transfer learning. Pattern Recognit. Lett. 34(11), 1279–1285 (2013)CrossRef Lin, D., An, X., Zhang, J.: Double-bootstrapping source data selection for instance-based transfer learning. Pattern Recognit. Lett. 34(11), 1279–1285 (2013)CrossRef
Metadaten
Titel
Addressing Domain Adaptation for Chinese Word Segmentation with Instances-Based Transfer Learning
verfasst von
Yanna Zhang
Jinan Xu
Guoyi Miao
Yufeng Chen
Yujie Zhang
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01716-3_3