Skip to main content

2018 | OriginalPaper | Buchkapitel

Verbal Multi-Word Expressions in Yiddish

verfasst von : Chaya Liebeskind, Yaakov HaCohen-Kerner

Erschienen in: Natural Language Processing and Information Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Verbal Multi-Word Expressions (VMWEs) are very common in many languages. They include among other types the following types: Verb-Particle Constructions (VPC) (e.g. get around), Light-Verb Constructions (LVC) (e.g. make a decision), and idioms (ID) (e.g. break a leg). In this paper, we present a new dataset for supervised learning of VMWEs written in Yiddish. The dataset was manually collected and annotated from a web resource. It contains a set of positive examples for VMWEs and a set of non-VMWEs examples. While the dataset can be used for training supervised algorithms, the positive examples can be used as seeds in unsupervised bootstrapping algorithms. Moreover, we analyze the lexical properties of VMWEs written in Yiddish by classifying them to six categories: VPC, LVC, ID, Inherently Pronominal Verb (IPronV), Inherently Prepositional Verb (IPrepV), and other (OTH). The analysis suggests some interesting features of VMWEs for exploration. This dataset is a first step towards automatic identification of VMWEs written in Yiddish, which is important for natural language understanding, generation and translation systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E., Quirk, R.: Longman Grammar of Spoken and Written English. MIT Press, Cambridge (1999) Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E., Quirk, R.: Longman Grammar of Spoken and Written English. MIT Press, Cambridge (1999)
3.
Zurück zum Zitat Fazly, A., Stevenson, S.: Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Proceedings of the Workshop on a Broader Perspective on Multiword Expressions, pp. 9–16. Association for Computational Linguistics (2007) Fazly, A., Stevenson, S.: Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Proceedings of the Workshop on a Broader Perspective on Multiword Expressions, pp. 9–16. Association for Computational Linguistics (2007)
4.
Zurück zum Zitat Jacobs, N.G.: Yiddish: A Linguistic Introduction. Cambridge University Press, Cambridge (2005) Jacobs, N.G.: Yiddish: A Linguistic Introduction. Cambridge University Press, Cambridge (2005)
5.
Zurück zum Zitat Baumgarten, J.: Introduction to Old Yiddish Literature. Oxford University Press, Oxford (2005)CrossRef Baumgarten, J.: Introduction to Old Yiddish Literature. Oxford University Press, Oxford (2005)CrossRef
6.
Zurück zum Zitat Santorini, B.: The Penn Yiddish Corpus. University of Pennsylvania (1997) Santorini, B.: The Penn Yiddish Corpus. University of Pennsylvania (1997)
7.
Zurück zum Zitat Aptroot, M., Hansen, B.: Yiddish Language Structures. vol. 52, Walter de Gruyter, Berlin (2014) Aptroot, M., Hansen, B.: Yiddish Language Structures. vol. 52, Walter de Gruyter, Berlin (2014)
8.
Zurück zum Zitat Dias, G., Guilloré, S., Lopes, J.G.P.: Language independent automatic acquisition of rigid multiword units from unrestricted text corpora. In: Proceedings of Conférence Traitement Automatique des Langues Naturelles (TALN) (1999) Dias, G., Guilloré, S., Lopes, J.G.P.: Language independent automatic acquisition of rigid multiword units from unrestricted text corpora. In: Proceedings of Conférence Traitement Automatique des Langues Naturelles (TALN) (1999)
9.
Zurück zum Zitat Deane, P.: A nonparametric method for extraction of candidate phrasal terms. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 605–613. Association for Computational Linguistics (2005) Deane, P.: A nonparametric method for extraction of candidate phrasal terms. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 605–613. Association for Computational Linguistics (2005)
10.
Zurück zum Zitat Pecina, P., Schlesinger, P.: Combining association measures for collocation extraction. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 651–658. Association for Computational Linguistics (2006) Pecina, P., Schlesinger, P.: Combining association measures for collocation extraction. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 651–658. Association for Computational Linguistics (2006)
11.
Zurück zum Zitat Bejcek, E., Stranák, P., Pecina, P.: Syntactic identification of occurrences of multiword expressions in text using a lexicon with dependency structures. In: MWE@ NAACL-HLT, pp. 106–115 (2013) Bejcek, E., Stranák, P., Pecina, P.: Syntactic identification of occurrences of multiword expressions in text using a lexicon with dependency structures. In: MWE@ NAACL-HLT, pp. 106–115 (2013)
12.
Zurück zum Zitat Green, S., de Marneffe, M.-C., Manning, C.D.: Parsing models for identifying multiword expressions. Comput. Linguist. 39, 195–227 (2013)CrossRef Green, S., de Marneffe, M.-C., Manning, C.D.: Parsing models for identifying multiword expressions. Comput. Linguist. 39, 195–227 (2013)CrossRef
13.
Zurück zum Zitat Al-Haj, H., Itai, A., Wintner, S.: Lexical representation of multiword expressions in morphologically-complex languages. Int. J. Lexicogr. 27, 130–170 (2013)CrossRef Al-Haj, H., Itai, A., Wintner, S.: Lexical representation of multiword expressions in morphologically-complex languages. Int. J. Lexicogr. 27, 130–170 (2013)CrossRef
14.
Zurück zum Zitat Baldwin, T.: Deep lexical acquisition of verb–particle constructions. Comput. Speech Lang. 19, 398–414 (2005)CrossRef Baldwin, T.: Deep lexical acquisition of verb–particle constructions. Comput. Speech Lang. 19, 398–414 (2005)CrossRef
15.
Zurück zum Zitat Zhang, Y., Kordoni, V., Villavicencio, A., Idiart, M.: Automated multiword expression prediction for grammar engineering. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 36–44. Association for Computational Linguistics (2006) Zhang, Y., Kordoni, V., Villavicencio, A., Idiart, M.: Automated multiword expression prediction for grammar engineering. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 36–44. Association for Computational Linguistics (2006)
16.
Zurück zum Zitat Fazly, A.: Automatic acquisition of lexical knowledge about multiword predicates. University of Toronto (2007) Fazly, A.: Automatic acquisition of lexical knowledge about multiword predicates. University of Toronto (2007)
17.
Zurück zum Zitat Boulaknadel, S., Daille, B., Aboutajdine, D.: A multi-word term extraction program for Arabic language. In: LREC (2008) Boulaknadel, S., Daille, B., Aboutajdine, D.: A multi-word term extraction program for Arabic language. In: LREC (2008)
18.
Zurück zum Zitat Ramisch, C., de Medeiros Caseli, H., Villavicencio, A., Machado, A., Finatto, M.J.: A hybrid approach for multiword expression identification. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 65–74. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_9CrossRef Ramisch, C., de Medeiros Caseli, H., Villavicencio, A., Machado, A., Finatto, M.J.: A hybrid approach for multiword expression identification. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 65–74. Springer, Heidelberg (2010). https://​doi.​org/​10.​1007/​978-3-642-12320-7_​9CrossRef
19.
Zurück zum Zitat Farahmand, M., Nivre, J.: Modeling the statistical idiosyncrasy of multiword expressions. In: MWE@ NAACL-HLT, pp. 34–38 (2015) Farahmand, M., Nivre, J.: Modeling the statistical idiosyncrasy of multiword expressions. In: MWE@ NAACL-HLT, pp. 34–38 (2015)
20.
Zurück zum Zitat Sangati, F., van Cranenburgh, A.: Multiword expression identification with recurring tree fragments and association measures. In: MWE@ NAACL-HLT, pp. 10–18 (2015) Sangati, F., van Cranenburgh, A.: Multiword expression identification with recurring tree fragments and association measures. In: MWE@ NAACL-HLT, pp. 10–18 (2015)
21.
Zurück zum Zitat Mandravickaite, J., Krilavičius, T.: Identification of multiword expressions for Latvian and Lithuanian: hybrid approach. In: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp. 97–101 (2017) Mandravickaite, J., Krilavičius, T.: Identification of multiword expressions for Latvian and Lithuanian: hybrid approach. In: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp. 97–101 (2017)
22.
Zurück zum Zitat Lapata, M., Lascarides, A.: Detecting novel compounds: the role of distributional evidence. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 235–242. Association for Computational Linguistics, Stroudsburg (2003) Lapata, M., Lascarides, A.: Detecting novel compounds: the role of distributional evidence. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 235–242. Association for Computational Linguistics, Stroudsburg (2003)
23.
Zurück zum Zitat Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eval. 44, 137–158 (2010)CrossRef Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eval. 44, 137–158 (2010)CrossRef
24.
Zurück zum Zitat Ramisch, C., Schreiner, P., Idiart, M., Villavicencio, A.: An evaluation of methods for the extraction of multiword expressions. In: Proceedings of the LREC Workshop-Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 50–53 (2008) Ramisch, C., Schreiner, P., Idiart, M., Villavicencio, A.: An evaluation of methods for the extraction of multiword expressions. In: Proceedings of the LREC Workshop-Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 50–53 (2008)
25.
Zurück zum Zitat Ramisch, C., Villavicencio, A., Moura, L., Idiart, M.: Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity. In: Proceedings of the Twelfth Conference on Computational Natural Language Learning, pp. 49–56. Association for Computational Linguistics (2008) Ramisch, C., Villavicencio, A., Moura, L., Idiart, M.: Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity. In: Proceedings of the Twelfth Conference on Computational Natural Language Learning, pp. 49–56. Association for Computational Linguistics (2008)
26.
Zurück zum Zitat Al-Haj, H., Wintner, S.: Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 10–18. Association for Computational Linguistics (2010) Al-Haj, H., Wintner, S.: Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 10–18. Association for Computational Linguistics (2010)
27.
Zurück zum Zitat Rondon, A., de Medeiros Caseli, H., Ramisch, C.: Never-ending multiword expressions learning. In: MWE@ NAACL-HLT, pp. 45–53 (2015) Rondon, A., de Medeiros Caseli, H., Ramisch, C.: Never-ending multiword expressions learning. In: MWE@ NAACL-HLT, pp. 45–53 (2015)
28.
Zurück zum Zitat Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 12–19. Association for Computational Linguistics (2006) Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 12–19. Association for Computational Linguistics (2006)
29.
Zurück zum Zitat Sporleder, C., Li, L.: Unsupervised recognition of literal and non-literal use of idiomatic expressions. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 754–762. Association for Computational Linguistics (2009) Sporleder, C., Li, L.: Unsupervised recognition of literal and non-literal use of idiomatic expressions. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 754–762. Association for Computational Linguistics (2009)
30.
Zurück zum Zitat Biemann, C., Giesbrecht, E.: Distributional semantics and compositionality 2011: shared task description and results. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 21–28. Association for Computational Linguistics (2011) Biemann, C., Giesbrecht, E.: Distributional semantics and compositionality 2011: shared task description and results. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 21–28. Association for Computational Linguistics (2011)
31.
Zurück zum Zitat Guevara, E.: Computing semantic compositionality in distributional semantics. In: Proceedings of the Ninth International Conference on Computational Semantics, pp. 135–144. Association for Computational Linguistics (2011) Guevara, E.: Computing semantic compositionality in distributional semantics. In: Proceedings of the Ninth International Conference on Computational Semantics, pp. 135–144. Association for Computational Linguistics (2011)
32.
Zurück zum Zitat Salehi, B., Cook, P., Baldwin, T.: A word embedding approach to predicting the compositionality of multiword expressions. In: HLT-NAACL, pp. 977–983 (2015) Salehi, B., Cook, P., Baldwin, T.: A word embedding approach to predicting the compositionality of multiword expressions. In: HLT-NAACL, pp. 977–983 (2015)
33.
Zurück zum Zitat Yazdani, M., Farahmand, M., Henderson, J.: Learning semantic composition to detect non-compositionality of multiword expressions. In: EMNLP, pp. 1733–1742 (2015) Yazdani, M., Farahmand, M., Henderson, J.: Learning semantic composition to detect non-compositionality of multiword expressions. In: EMNLP, pp. 1733–1742 (2015)
34.
Zurück zum Zitat Liebeskind, C., HaCohen-Kerner, Y.: Semantically motivated Hebrew verb-noun multi-word expressions identification. In: COLING, pp. 1242–1253 (2016) Liebeskind, C., HaCohen-Kerner, Y.: Semantically motivated Hebrew verb-noun multi-word expressions identification. In: COLING, pp. 1242–1253 (2016)
35.
Zurück zum Zitat Dandapat, S., Mitra, P., Sarkar, S.: Statistical investigation of Bengali noun-verb (NV) collocations as multi-word-expressions. In: Proceedings of Modeling and Shallow Parsing of Indian Languages, MSPIL, pp. 230–233 (2006) Dandapat, S., Mitra, P., Sarkar, S.: Statistical investigation of Bengali noun-verb (NV) collocations as multi-word-expressions. In: Proceedings of Modeling and Shallow Parsing of Indian Languages, MSPIL, pp. 230–233 (2006)
36.
Zurück zum Zitat Diab, M.T., Bhutada, P.: Verb noun construction MWE token supervised classification. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 17–22. Association for Computational Linguistics (2009) Diab, M.T., Bhutada, P.: Verb noun construction MWE token supervised classification. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 17–22. Association for Computational Linguistics (2009)
37.
Zurück zum Zitat Schneider, N., Danchik, E., Dyer, C., Smith, N.A.: Discriminative lexical semantic segmentation with gaps: running the MWE gamut. Trans. Assoc. Comput. Linguist. 2, 193–206 (2014) Schneider, N., Danchik, E., Dyer, C., Smith, N.A.: Discriminative lexical semantic segmentation with gaps: running the MWE gamut. Trans. Assoc. Comput. Linguist. 2, 193–206 (2014)
38.
Zurück zum Zitat Todirascu, A., Navlea, M.: Aligning Verb+Noun Collocation to Improve a French-Romanian Statistical MT System. John Benjamins (2015) Todirascu, A., Navlea, M.: Aligning Verb+Noun Collocation to Improve a French-Romanian Statistical MT System. John Benjamins (2015)
39.
Zurück zum Zitat Blum, Y.P.: Techniques for automatic normalization of orthographically variant Yiddish texts (2015) Blum, Y.P.: Techniques for automatic normalization of orthographically variant Yiddish texts (2015)
40.
Zurück zum Zitat Liebeskind, C., HaCohen-Kerner, Y.: A lexical resource of Hebrew verb-noun multi-word expressions. In: LREC, pp. 522–527 (2016) Liebeskind, C., HaCohen-Kerner, Y.: A lexical resource of Hebrew verb-noun multi-word expressions. In: LREC, pp. 522–527 (2016)
Metadaten
Titel
Verbal Multi-Word Expressions in Yiddish
verfasst von
Chaya Liebeskind
Yaakov HaCohen-Kerner
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-91947-8_20