Skip to main content
Top

2015 | OriginalPaper | Chapter

A Three-Layered Collocation Extraction Tool and Its Application in China English Studies

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We design a three-layered collocation extraction tool by integrating syntactic and semantic knowledge and apply it in China English studies. The tool first extracts peripheral collocations in the frequency layer from dependency triples, then extracts semi-peripheral collocations in the syntactic layer by association measures, and last extracts core collocations in the semantic layer with a similar word thesaurus. The syntactic constraints filter out much noise from surface co-occurrences, and the semantic constraints are effective in identifying the very “core” collocations. The tool is applied to automatically extract collocations from a large corpus of China English we compile to explore how China English as a variety of English is nativilized. Then we analyze similarities and differences of the typical China English collocations of a group of verbs. The tool and results can be applied in the compilation of language resources for Chinese-English translation and corpus-based China studies.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Seretan, V.: Syntax-based collocation extraction. Text, Speech and Language Technology Series. Springer, Netherlands (2011)MATHCrossRef Seretan, V.: Syntax-based collocation extraction. Text, Speech and Language Technology Series. Springer, Netherlands (2011)MATHCrossRef
2.
go back to reference Evert, S.: Corpora and collocations. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics. An International Handbook, pp. 1112–1248. Mouton de Gruyter, Berlin (2008) Evert, S.: Corpora and collocations. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics. An International Handbook, pp. 1112–1248. Mouton de Gruyter, Berlin (2008)
3.
go back to reference Smadja, F.: Retrieving collocations from text: Xtract. Comput. Linguist. 19(1), 143–177 (1993) Smadja, F.: Retrieving collocations from text: Xtract. Comput. Linguist. 19(1), 143–177 (1993)
4.
go back to reference Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993) Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)
5.
go back to reference Wermter, J., Hahn, U.: Paradigmatic modifiability statistics for the extraction of complex multi-word terms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 843–850. Association for Computational Linguistics (2005) Wermter, J., Hahn, U.: Paradigmatic modifiability statistics for the extraction of complex multi-word terms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 843–850. Association for Computational Linguistics (2005)
6.
go back to reference Lin, D.: Extracting collocations from text corpora. In: Proceedings of the First Workshop on Computational Terminology, Montreal, Canada, pp. 57–63 (1998) Lin, D.: Extracting collocations from text corpora. In: Proceedings of the First Workshop on Computational Terminology, Montreal, Canada, pp. 57–63 (1998)
7.
go back to reference Heid, U., Weller, M.: Tools for collocation extraction: preferences for active vs. passive. In: Sixth International Conference on Language Resources & Evaluation LREC, vol. 24, pp. 1266–1272 (2008) Heid, U., Weller, M.: Tools for collocation extraction: preferences for active vs. passive. In: Sixth International Conference on Language Resources & Evaluation LREC, vol. 24, pp. 1266–1272 (2008)
8.
go back to reference Scott, M.: WordSmith Tools Version 5.0. Lexical Analysis Software, Liverpool (2008) Scott, M.: WordSmith Tools Version 5.0. Lexical Analysis Software, Liverpool (2008)
9.
go back to reference Li, D., Cao, J., Huang D.: A hierarchical collocation extraction tool. In: The 5th IEEE International Conference on Big Data and Cloud Computing (BDCloud 2015), pp. 51–55, Dalian, China, 26–29 August 2015 Li, D., Cao, J., Huang D.: A hierarchical collocation extraction tool. In: The 5th IEEE International Conference on Big Data and Cloud Computing (BDCloud 2015), pp. 51–55, Dalian, China, 26–29 August 2015
10.
go back to reference He, D., Li, D.C.S.: Language attitudes and linguistic features in the “China English” debate. World Englishes 28(1), 70–89 (2009)CrossRef He, D., Li, D.C.S.: Language attitudes and linguistic features in the “China English” debate. World Englishes 28(1), 70–89 (2009)CrossRef
11.
go back to reference Kirkpatrick, A., Zhichang, X.U.: Chinese pragmatic norms and ‘China English’. World Englishes 21(2), 269–279 (2002)CrossRef Kirkpatrick, A., Zhichang, X.U.: Chinese pragmatic norms and ‘China English’. World Englishes 21(2), 269–279 (2002)CrossRef
12.
go back to reference Wei, Y., Jia, F.: Using english in China. Engl. Today 19(4), 42–47 (2003)CrossRef Wei, Y., Jia, F.: Using english in China. Engl. Today 19(4), 42–47 (2003)CrossRef
13.
go back to reference Du, R., Jiang, Y.: China English in the past 20 years. 33(1), 37–41 (2001) Du, R., Jiang, Y.: China English in the past 20 years. 33(1), 37–41 (2001)
14.
go back to reference Bolton, K., Graddol, D.: English in china today. Engl. Today 28(03), 3–9 (2012)CrossRef Bolton, K., Graddol, D.: English in china today. Engl. Today 28(03), 3–9 (2012)CrossRef
15.
go back to reference Yang, J.: Lexical innovations in China English. World Engl. 24(4), 425–436 (2005)CrossRef Yang, J.: Lexical innovations in China English. World Engl. 24(4), 425–436 (2005)CrossRef
16.
go back to reference Zhang, H.: Bilingual creativity in Chinese English: Ha Jin’s in the pond. World Engl. 21(2), 305–315 (2002)CrossRef Zhang, H.: Bilingual creativity in Chinese English: Ha Jin’s in the pond. World Engl. 21(2), 305–315 (2002)CrossRef
17.
go back to reference Yu, X., Wen, Q.: The nativilized characteristics of evaluative adjective collocational patterns in China’s english-language newspapers. Foreign Lang. Teach. 5, 23–28 (2010) Yu, X., Wen, Q.: The nativilized characteristics of evaluative adjective collocational patterns in China’s english-language newspapers. Foreign Lang. Teach. 5, 23–28 (2010)
18.
go back to reference Ai, H., You, X.: The grammatical features of english in a chinese internet discussion forum. World Engl. 34(2), 211–230 (2015)CrossRef Ai, H., You, X.: The grammatical features of english in a chinese internet discussion forum. World Engl. 34(2), 211–230 (2015)CrossRef
19.
go back to reference Hamid, M.B., Baldauf, Jr., R.B.: Second language errors and features of world Englishes. World Engl. 32(4), 476–494 (2013)CrossRef Hamid, M.B., Baldauf, Jr., R.B.: Second language errors and features of world Englishes. World Engl. 32(4), 476–494 (2013)CrossRef
20.
go back to reference Kachru, B.B.: World Englishes: approaches, issues and resources. Lang. Teach. 25(1), 1–14 (1992)CrossRef Kachru, B.B.: World Englishes: approaches, issues and resources. Lang. Teach. 25(1), 1–14 (1992)CrossRef
21.
go back to reference Bahns, J.: Lexical collocations: a contrastive view. ELT J. 47(1), 56–63 (1993)CrossRef Bahns, J.: Lexical collocations: a contrastive view. ELT J. 47(1), 56–63 (1993)CrossRef
22.
go back to reference Benson, M., Benson, I., Robert, E.: The BBI combinatory dictionary of English: a guide to word combinations, pp. x–xxiii. Benjamins John, New York (1986)CrossRef Benson, M., Benson, I., Robert, E.: The BBI combinatory dictionary of English: a guide to word combinations, pp. x–xxiii. Benjamins John, New York (1986)CrossRef
23.
go back to reference Sinclair, J.: Corpus, Concordance. Collocation. Shanghai Foreign Language Education Press, Shanghai (2000) Sinclair, J.: Corpus, Concordance. Collocation. Shanghai Foreign Language Education Press, Shanghai (2000)
24.
go back to reference Mckeown, K.R., Ravd, D.R.: Collocations. In: Dale, R., Moils, H., Somers, H. (eds.) Handbook of Natural Language Processing, pp. 1–19. CRC Press (2000) Mckeown, K.R., Ravd, D.R.: Collocations. In: Dale, R., Moils, H., Somers, H. (eds.) Handbook of Natural Language Processing, pp. 1–19. CRC Press (2000)
25.
go back to reference Firth, J.R.: A synopsis of linguistic theory, 1903–1955. In: Studies in Linguistic Analysis (Special volume of the Philological Society), pp. 1–15 (1962) Firth, J.R.: A synopsis of linguistic theory, 1903–1955. In: Studies in Linguistic Analysis (Special volume of the Philological Society), pp. 1–15 (1962)
26.
go back to reference Bartsch, S., Evert, S.: Towards a firthian notion of collocation. Online publication Arbeiten zui Linguistik. 2, 48–60 (2014) Bartsch, S., Evert, S.: Towards a firthian notion of collocation. Online publication Arbeiten zui Linguistik. 2, 48–60 (2014)
27.
go back to reference Kiss, T., Strunk, J.: Unsupervised multilingual sentence boundary detection. Comput. Linguist. 32, 485–525 (2006)CrossRef Kiss, T., Strunk, J.: Unsupervised multilingual sentence boundary detection. Comput. Linguist. 32, 485–525 (2006)CrossRef
28.
go back to reference Bird, S., Loper, E.: NLTK: the natural language toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Association for Computational Linguistics, Philadelphia (2002) Bird, S., Loper, E.: NLTK: the natural language toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Association for Computational Linguistics, Philadelphia (2002)
29.
go back to reference Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003) Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)
30.
go back to reference Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRef Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRef
31.
go back to reference Lin, D.: Automatic identification of non-compositional phrases. In: Proceedings of ACL 1999, pp. 317–324. University of Maryland, Maryland (1999) Lin, D.: Automatic identification of non-compositional phrases. In: Proceedings of ACL 1999, pp. 317–324. University of Maryland, Maryland (1999)
32.
go back to reference Alvaro, J.J.: Analyzing China’s english-language media. World Engl. 34(2), 260–277 (2015)CrossRef Alvaro, J.J.: Analyzing China’s english-language media. World Engl. 34(2), 260–277 (2015)CrossRef
33.
go back to reference Pereira, L., Strafella, E., Duh, K., Matsumoto, Y.: Identifying collocations using cross-lingual association measures. In: ACL 2014 14th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014), pp. 26–27 (2014) Pereira, L., Strafella, E., Duh, K., Matsumoto, Y.: Identifying collocations using cross-lingual association measures. In: ACL 2014 14th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014), pp. 26–27 (2014)
Metadata
Title
A Three-Layered Collocation Extraction Tool and Its Application in China English Studies
Authors
Jingxiang Cao
Dan Li
Degen Huang
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-25816-4_4

Premium Partner