Skip to main content

2018 | OriginalPaper | Buchkapitel

Shortcut Sequence Tagging

verfasst von : Huijia Wu, Jiajun Zhang, Chengqing Zong

Erschienen in: Natural Language Processing and Chinese Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep stacked RNNs are usually hard to train. Recent studies have shown that shortcut connections across different RNN layers bring substantially faster convergence. However, shortcuts increase the computational complexity of the recurrent computations. To reduce the complexity, we propose the shortcut block, which is a refinement of the shortcut LSTM blocks. Our approach is to replace the self-connected parts (\(c_t^l\)) with shortcuts (\(h_t^{l-2}\)) in the internal states. We present extensive empirical experiments showing that this design performs better than the original shortcuts. We evaluate our method on CCG supertagging task, obtaining a 8% relatively improvement over current state-of-the-art results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
2.
Zurück zum Zitat Clark, S., Curran, J.R.: Wide-coverage efficient statistical parsing with CCG and log-linear models. Comput. Linguist. 33(4), 493–552 (2007)CrossRefMATH Clark, S., Curran, J.R.: Wide-coverage efficient statistical parsing with CCG and log-linear models. Comput. Linguist. 33(4), 493–552 (2007)CrossRefMATH
3.
Zurück zum Zitat El Hihi, S., Bengio, Y.: Hierarchical recurrent neural networks for long-term dependencies. In: NIPS. vol. 400, p. 409. Citeseer (1995) El Hihi, S., Bengio, Y.: Hierarchical recurrent neural networks for long-term dependencies. In: NIPS. vol. 400, p. 409. Citeseer (1995)
5.
Zurück zum Zitat Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)CrossRef Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)CrossRef
6.
7.
Zurück zum Zitat Hermans, M., Schrauwen, B.: Training and analysing deep recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 190–198 (2013) Hermans, M., Schrauwen, B.: Training and analysing deep recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 190–198 (2013)
8.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. In: Advances in Neural Information Processing Systems, pp. 473–479 (1997) Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. In: Advances in Neural Information Processing Systems, pp. 473–479 (1997)
9.
Zurück zum Zitat Hockenmaier, J., Steedman, M.: CCGbank: a corpus of CCG derivations and dependency structures extracted from the penn treebank. Comput. Linguist. 33(3), 355–396 (2007)CrossRefMATH Hockenmaier, J., Steedman, M.: CCGbank: a corpus of CCG derivations and dependency structures extracted from the penn treebank. Comput. Linguist. 33(3), 355–396 (2007)CrossRefMATH
10.
Zurück zum Zitat Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.: Deep networks with stochastic depth. arXiv preprint arXiv:1603.09382 (2016) Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.: Deep networks with stochastic depth. arXiv preprint arXiv:​1603.​09382 (2016)
12.
Zurück zum Zitat Krueger, D., Maharaj, T., Kramár, J., Pezeshki, M., Ballas, N., Ke, N.R., Goyal, A., Bengio, Y., Larochelle, H., Courville, A., et al.: Zoneout: Regularizing RNNs by randomly preserving hidden activations. arXiv preprint arXiv:1606.01305 (2016) Krueger, D., Maharaj, T., Kramár, J., Pezeshki, M., Ballas, N., Ke, N.R., Goyal, A., Bengio, Y., Larochelle, H., Courville, A., et al.: Zoneout: Regularizing RNNs by randomly preserving hidden activations. arXiv preprint arXiv:​1606.​01305 (2016)
13.
Zurück zum Zitat Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016) Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:​1603.​01360 (2016)
14.
Zurück zum Zitat Lewis, M., Lee, K., Zettlemoyer, L.: LSTM CCG parsing. In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics (2016) Lewis, M., Lee, K., Zettlemoyer, L.: LSTM CCG parsing. In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics (2016)
15.
Zurück zum Zitat Lewis, M., Steedman, M.: Improved CCG parsing with semi-supervised supertagging. Trans. Assoc. Comput. Linguist. 2, 327–338 (2014) Lewis, M., Steedman, M.: Improved CCG parsing with semi-supervised supertagging. Trans. Assoc. Comput. Linguist. 2, 327–338 (2014)
16.
Zurück zum Zitat Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993) Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
17.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
18.
Zurück zum Zitat Saxe, A.M., McClelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120 (2013) Saxe, A.M., McClelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:​1312.​6120 (2013)
19.
Zurück zum Zitat Schmidhuber, J.: Learning complex, extended sequences using the principle of history compression. Neural Comput. 4(2), 234–242 (1992)CrossRef Schmidhuber, J.: Learning complex, extended sequences using the principle of history compression. Neural Comput. 4(2), 234–242 (1992)CrossRef
20.
Zurück zum Zitat Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH
22.
Zurück zum Zitat Steedman, M.: The Syntactic Process, vol. 24. MIT Press, Cambridge (2000)MATH Steedman, M.: The Syntactic Process, vol. 24. MIT Press, Cambridge (2000)MATH
23.
Zurück zum Zitat Steedman, M., Baldridge, J.: Combinatory categorial grammar. In: Non-Transformational Syntax: Formal and Explicit Models of Grammar. Wiley, Hoboken (2011) Steedman, M., Baldridge, J.: Combinatory categorial grammar. In: Non-Transformational Syntax: Formal and Explicit Models of Grammar. Wiley, Hoboken (2011)
24.
Zurück zum Zitat Vaswani, A., Bisk, Y., Sagae, K., Musa, R.: Supertagging with LSTMs. In: Proceedings of the Human Language Technology Conference of the NAACL (2016) Vaswani, A., Bisk, Y., Sagae, K., Musa, R.: Supertagging with LSTMs. In: Proceedings of the Human Language Technology Conference of the NAACL (2016)
25.
Zurück zum Zitat Wang, P., Qian, Y., Soong, F.K., He, L., Zhao, H.: Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. arXiv preprint arXiv:1510.06168 (2015) Wang, P., Qian, Y., Soong, F.K., He, L., Zhao, H.: Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. arXiv preprint arXiv:​1510.​06168 (2015)
26.
Zurück zum Zitat Wu, H., Zhang, J., Zong, C.: A dynamic window neural network for CCG supertagging. In: National Conference on Artificial Intelligence, pp. 3337–3343 (2016) Wu, H., Zhang, J., Zong, C.: A dynamic window neural network for CCG supertagging. In: National Conference on Artificial Intelligence, pp. 3337–3343 (2016)
27.
Zurück zum Zitat Wu, H., Zhang, J., Zong, C.: An empirical exploration of skip connections for sequential tagging. In: International Conference on Computational Linguistics, pp. 203–212 (2016) Wu, H., Zhang, J., Zong, C.: An empirical exploration of skip connections for sequential tagging. In: International Conference on Computational Linguistics, pp. 203–212 (2016)
28.
Zurück zum Zitat Xu, W., Auli, M., Clark, S.: CCG supertagging with a recurrent neural network. Volume 2: Short Papers, p. 250 (2015) Xu, W., Auli, M., Clark, S.: CCG supertagging with a recurrent neural network. Volume 2: Short Papers, p. 250 (2015)
29.
Zurück zum Zitat Xu, W., Auli, M., Clark, S.: Expected f-measure training for shift-reduce parsing with recurrent neural networks. In: Proceedings of NAACL-HLT, pp. 210–220 (2016) Xu, W., Auli, M., Clark, S.: Expected f-measure training for shift-reduce parsing with recurrent neural networks. In: Proceedings of NAACL-HLT, pp. 210–220 (2016)
30.
Zurück zum Zitat Zhang, Y., Chen, G., Yu, D., Yaco, K., Khudanpur, S., Glass, J.: Highway long short-term memory RNNs for distant speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5755–5759. IEEE (2016) Zhang, Y., Chen, G., Yu, D., Yaco, K., Khudanpur, S., Glass, J.: Highway long short-term memory RNNs for distant speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5755–5759. IEEE (2016)
31.
Metadaten
Titel
Shortcut Sequence Tagging
verfasst von
Huijia Wu
Jiajun Zhang
Chengqing Zong
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-73618-1_17