Skip to main content

2015 | OriginalPaper | Buchkapitel

A Three-Layered Collocation Extraction Tool and Its Application in China English Studies

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We design a three-layered collocation extraction tool by integrating syntactic and semantic knowledge and apply it in China English studies. The tool first extracts peripheral collocations in the frequency layer from dependency triples, then extracts semi-peripheral collocations in the syntactic layer by association measures, and last extracts core collocations in the semantic layer with a similar word thesaurus. The syntactic constraints filter out much noise from surface co-occurrences, and the semantic constraints are effective in identifying the very “core” collocations. The tool is applied to automatically extract collocations from a large corpus of China English we compile to explore how China English as a variety of English is nativilized. Then we analyze similarities and differences of the typical China English collocations of a group of verbs. The tool and results can be applied in the compilation of language resources for Chinese-English translation and corpus-based China studies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Seretan, V.: Syntax-based collocation extraction. Text, Speech and Language Technology Series. Springer, Netherlands (2011)MATHCrossRef Seretan, V.: Syntax-based collocation extraction. Text, Speech and Language Technology Series. Springer, Netherlands (2011)MATHCrossRef
2.
Zurück zum Zitat Evert, S.: Corpora and collocations. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics. An International Handbook, pp. 1112–1248. Mouton de Gruyter, Berlin (2008) Evert, S.: Corpora and collocations. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics. An International Handbook, pp. 1112–1248. Mouton de Gruyter, Berlin (2008)
3.
Zurück zum Zitat Smadja, F.: Retrieving collocations from text: Xtract. Comput. Linguist. 19(1), 143–177 (1993) Smadja, F.: Retrieving collocations from text: Xtract. Comput. Linguist. 19(1), 143–177 (1993)
4.
Zurück zum Zitat Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993) Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)
5.
Zurück zum Zitat Wermter, J., Hahn, U.: Paradigmatic modifiability statistics for the extraction of complex multi-word terms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 843–850. Association for Computational Linguistics (2005) Wermter, J., Hahn, U.: Paradigmatic modifiability statistics for the extraction of complex multi-word terms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 843–850. Association for Computational Linguistics (2005)
6.
Zurück zum Zitat Lin, D.: Extracting collocations from text corpora. In: Proceedings of the First Workshop on Computational Terminology, Montreal, Canada, pp. 57–63 (1998) Lin, D.: Extracting collocations from text corpora. In: Proceedings of the First Workshop on Computational Terminology, Montreal, Canada, pp. 57–63 (1998)
7.
Zurück zum Zitat Heid, U., Weller, M.: Tools for collocation extraction: preferences for active vs. passive. In: Sixth International Conference on Language Resources & Evaluation LREC, vol. 24, pp. 1266–1272 (2008) Heid, U., Weller, M.: Tools for collocation extraction: preferences for active vs. passive. In: Sixth International Conference on Language Resources & Evaluation LREC, vol. 24, pp. 1266–1272 (2008)
8.
Zurück zum Zitat Scott, M.: WordSmith Tools Version 5.0. Lexical Analysis Software, Liverpool (2008) Scott, M.: WordSmith Tools Version 5.0. Lexical Analysis Software, Liverpool (2008)
9.
Zurück zum Zitat Li, D., Cao, J., Huang D.: A hierarchical collocation extraction tool. In: The 5th IEEE International Conference on Big Data and Cloud Computing (BDCloud 2015), pp. 51–55, Dalian, China, 26–29 August 2015 Li, D., Cao, J., Huang D.: A hierarchical collocation extraction tool. In: The 5th IEEE International Conference on Big Data and Cloud Computing (BDCloud 2015), pp. 51–55, Dalian, China, 26–29 August 2015
10.
Zurück zum Zitat He, D., Li, D.C.S.: Language attitudes and linguistic features in the “China English” debate. World Englishes 28(1), 70–89 (2009)CrossRef He, D., Li, D.C.S.: Language attitudes and linguistic features in the “China English” debate. World Englishes 28(1), 70–89 (2009)CrossRef
11.
Zurück zum Zitat Kirkpatrick, A., Zhichang, X.U.: Chinese pragmatic norms and ‘China English’. World Englishes 21(2), 269–279 (2002)CrossRef Kirkpatrick, A., Zhichang, X.U.: Chinese pragmatic norms and ‘China English’. World Englishes 21(2), 269–279 (2002)CrossRef
12.
Zurück zum Zitat Wei, Y., Jia, F.: Using english in China. Engl. Today 19(4), 42–47 (2003)CrossRef Wei, Y., Jia, F.: Using english in China. Engl. Today 19(4), 42–47 (2003)CrossRef
13.
Zurück zum Zitat Du, R., Jiang, Y.: China English in the past 20 years. 33(1), 37–41 (2001) Du, R., Jiang, Y.: China English in the past 20 years. 33(1), 37–41 (2001)
14.
Zurück zum Zitat Bolton, K., Graddol, D.: English in china today. Engl. Today 28(03), 3–9 (2012)CrossRef Bolton, K., Graddol, D.: English in china today. Engl. Today 28(03), 3–9 (2012)CrossRef
15.
Zurück zum Zitat Yang, J.: Lexical innovations in China English. World Engl. 24(4), 425–436 (2005)CrossRef Yang, J.: Lexical innovations in China English. World Engl. 24(4), 425–436 (2005)CrossRef
16.
Zurück zum Zitat Zhang, H.: Bilingual creativity in Chinese English: Ha Jin’s in the pond. World Engl. 21(2), 305–315 (2002)CrossRef Zhang, H.: Bilingual creativity in Chinese English: Ha Jin’s in the pond. World Engl. 21(2), 305–315 (2002)CrossRef
17.
Zurück zum Zitat Yu, X., Wen, Q.: The nativilized characteristics of evaluative adjective collocational patterns in China’s english-language newspapers. Foreign Lang. Teach. 5, 23–28 (2010) Yu, X., Wen, Q.: The nativilized characteristics of evaluative adjective collocational patterns in China’s english-language newspapers. Foreign Lang. Teach. 5, 23–28 (2010)
18.
Zurück zum Zitat Ai, H., You, X.: The grammatical features of english in a chinese internet discussion forum. World Engl. 34(2), 211–230 (2015)CrossRef Ai, H., You, X.: The grammatical features of english in a chinese internet discussion forum. World Engl. 34(2), 211–230 (2015)CrossRef
19.
Zurück zum Zitat Hamid, M.B., Baldauf, Jr., R.B.: Second language errors and features of world Englishes. World Engl. 32(4), 476–494 (2013)CrossRef Hamid, M.B., Baldauf, Jr., R.B.: Second language errors and features of world Englishes. World Engl. 32(4), 476–494 (2013)CrossRef
20.
Zurück zum Zitat Kachru, B.B.: World Englishes: approaches, issues and resources. Lang. Teach. 25(1), 1–14 (1992)CrossRef Kachru, B.B.: World Englishes: approaches, issues and resources. Lang. Teach. 25(1), 1–14 (1992)CrossRef
21.
Zurück zum Zitat Bahns, J.: Lexical collocations: a contrastive view. ELT J. 47(1), 56–63 (1993)CrossRef Bahns, J.: Lexical collocations: a contrastive view. ELT J. 47(1), 56–63 (1993)CrossRef
22.
Zurück zum Zitat Benson, M., Benson, I., Robert, E.: The BBI combinatory dictionary of English: a guide to word combinations, pp. x–xxiii. Benjamins John, New York (1986)CrossRef Benson, M., Benson, I., Robert, E.: The BBI combinatory dictionary of English: a guide to word combinations, pp. x–xxiii. Benjamins John, New York (1986)CrossRef
23.
Zurück zum Zitat Sinclair, J.: Corpus, Concordance. Collocation. Shanghai Foreign Language Education Press, Shanghai (2000) Sinclair, J.: Corpus, Concordance. Collocation. Shanghai Foreign Language Education Press, Shanghai (2000)
24.
Zurück zum Zitat Mckeown, K.R., Ravd, D.R.: Collocations. In: Dale, R., Moils, H., Somers, H. (eds.) Handbook of Natural Language Processing, pp. 1–19. CRC Press (2000) Mckeown, K.R., Ravd, D.R.: Collocations. In: Dale, R., Moils, H., Somers, H. (eds.) Handbook of Natural Language Processing, pp. 1–19. CRC Press (2000)
25.
Zurück zum Zitat Firth, J.R.: A synopsis of linguistic theory, 1903–1955. In: Studies in Linguistic Analysis (Special volume of the Philological Society), pp. 1–15 (1962) Firth, J.R.: A synopsis of linguistic theory, 1903–1955. In: Studies in Linguistic Analysis (Special volume of the Philological Society), pp. 1–15 (1962)
26.
Zurück zum Zitat Bartsch, S., Evert, S.: Towards a firthian notion of collocation. Online publication Arbeiten zui Linguistik. 2, 48–60 (2014) Bartsch, S., Evert, S.: Towards a firthian notion of collocation. Online publication Arbeiten zui Linguistik. 2, 48–60 (2014)
27.
Zurück zum Zitat Kiss, T., Strunk, J.: Unsupervised multilingual sentence boundary detection. Comput. Linguist. 32, 485–525 (2006)CrossRef Kiss, T., Strunk, J.: Unsupervised multilingual sentence boundary detection. Comput. Linguist. 32, 485–525 (2006)CrossRef
28.
Zurück zum Zitat Bird, S., Loper, E.: NLTK: the natural language toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Association for Computational Linguistics, Philadelphia (2002) Bird, S., Loper, E.: NLTK: the natural language toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Association for Computational Linguistics, Philadelphia (2002)
29.
Zurück zum Zitat Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003) Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)
30.
Zurück zum Zitat Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRef Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRef
31.
Zurück zum Zitat Lin, D.: Automatic identification of non-compositional phrases. In: Proceedings of ACL 1999, pp. 317–324. University of Maryland, Maryland (1999) Lin, D.: Automatic identification of non-compositional phrases. In: Proceedings of ACL 1999, pp. 317–324. University of Maryland, Maryland (1999)
32.
Zurück zum Zitat Alvaro, J.J.: Analyzing China’s english-language media. World Engl. 34(2), 260–277 (2015)CrossRef Alvaro, J.J.: Analyzing China’s english-language media. World Engl. 34(2), 260–277 (2015)CrossRef
33.
Zurück zum Zitat Pereira, L., Strafella, E., Duh, K., Matsumoto, Y.: Identifying collocations using cross-lingual association measures. In: ACL 2014 14th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014), pp. 26–27 (2014) Pereira, L., Strafella, E., Duh, K., Matsumoto, Y.: Identifying collocations using cross-lingual association measures. In: ACL 2014 14th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014), pp. 26–27 (2014)
Metadaten
Titel
A Three-Layered Collocation Extraction Tool and Its Application in China English Studies
verfasst von
Jingxiang Cao
Dan Li
Degen Huang
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-25816-4_4

Premium Partner