Skip to main content

2018 | Buch

Chinese Lexical Semantics

18th Workshop, CLSW 2017, Leshan, China, May 18–20, 2017, Revised Selected Papers

insite
SUCHEN

Über dieses Buch

This book constitutes the thoroughly refereed post-workshop proceedings of the 18th Chinese Lexical Semantics Workshop, CLSW 2017, held in Leshan, China, in May 2017.
The 48 full papers and 5 short papers included in this volume were carefully reviewed and selected from 176 submissions. They are organized in the following topical sections: lexical semantics; applications of natural language processing; lexical resources; and corpus linguistics.

Inhaltsverzeichnis

Frontmatter

Lexical Semantics

Frontmatter
On le 2: Its Nature and Syntactic Status

Le 2 has special features in both syntax and semantics, and should not be put on an equal footing with le 1 . Based on the framework of split CP hypothesis, this paper discusses the syntactic status of le 2 . It finds that le 2 is a Fin element, located outside the clause to mark the overall state of the clause. For this reason, its occurrence will not be affected by the existence of the negative marker in a clause (while le 1 will be affected).

Peicui Zhang, Jiyan Li, Ru’e Liang, Youlong Fu, Huibin Zhuang
A Study on the Counter-expectation and Semantic Construal
Strategy of Implicit Negative Adverbs

This paper examines the meanings, expressive functions, and semantic construal strategy of the adverbs bai (), gan (), xia (), kong (), xu (), tu (), and wang () in Mandarin. We begin by discussing the original meaning of the adverbs through which their implicit negations are then depicted. Then, we explore the counter-expectative function of the implicit negative adverbs and match them with different verbs. The counter-expectation of these implicit negative adverbs were classified and categorized based on the speakers’intentions, which mainly discusses the semantic condition and construal strategy of implicit negative adverbs. Finally, the construal formula in logic and construal process is analyzed. We conclude that bai, gan, xia, kong, xu, tu, and wang in Mandarin have implicit meanings, which negate the felicity condition of an event. The implicit negative adverbs can express the counter-expectation of the speaker, indicating that the actual situation is different from what the speaker expected. Additionally, the semantic construal of the counter-expectation relates to the evaluative function of the event, which corresponds to the optimistic orientation of human beings. The semantic construal of the implicit negation in Mandarin can help process negative information in both machine translation and teaching Chinese as a foreign language.

Jinghan Zeng, Yulin Yuan
Sentence Patterns of “ (You)” in Semantic Dependence Graphs

With the development of natural language processing research, automatic semantic analysis has attracted much more attention. After a full study on the semantic structure characteristics of Chinese sentences, this paper presents an architecture of semantic dependency graphs and builds a semantic dependency graph corpus containing 30,000 sentences. On the basis of semantic dependency graph corpus, this paper focuses on “(you)” sentences, and summarizes the sentence patterns and rules corresponding to “(you)” sentences to provide rule support for the automatic semantic analysis model.

Yanqiu Shao, Cuiting Hu, LiJuan Zheng
On the Quantification of Events in Dou a( ) Construction

In the existing literature, the three aspects of the quantificational adverb doua—the objects of quantification, the properties of quantification and the means by which doua quantifies—are still controversial issues. Therefore in this paper we first try to define quantify and construct a quantification system. This quantification system consists of three primary elements including quantified scope, quantifying unit, quantity value, as well as two basic quantifying ways including cumulative sum (forward quantification) and distributive division (reverse quantification), in addition to four main results including singular, plural, total quantity and partial quantity. On this basis, we define the quantified object of doua as the eventualities expressed by the given sentence; we also redefine doua’s properties of quantification as distributive universality. Then we divide the quantification construction of doua into four parts: distributive domain, distributive index, distributive operator and distributive share. Through investigating the syntactic, semantic and pragmatic features of these four parts, as well as the distributive dependency relations between these four parts, we explained how doua quantifies eventualities.

Hua Zhong
Approximate Constructions Using duo ‘more’ in Chinese

The approximate constructions using duo ‘more’ in Chinese present some interesting puzzles for compositional semantics. This paper attempts to unveil the lexical semantics of duo by investigating its distribution and interpretation in approximate constructions. It is claimed that: (i) the distribution of duo is governed by a monotonicity restriction, that is, its domain must be defined by some mereological part-whole relations; (ii) duo is an additive particle whose semantics involves the arithmetic operations of addition (+) and multiplication (×). This novel account provides a more motivated account of the seemingly complicated distributional patterns of duo approximate expressions.

Qiongpeng Luo
A Comparative Study on the Definition of Adverbs
– Taking the Dictionary of Contemporary Chinese (the 6th Edition) and the Dictionary of Mandarin (Revised Edition) as Examples

Adverbs play an important role in the research of Chinese vocabulary and lexicography, therefore it is always a focal point in language study. The adverb definition is also the emphasis of the dictionary definition research. Mainly through comparing and analyzing the definition mode of common adverb senses in the Dictionary of Contemporary Chinese (the 6th Edition) and the Dictionary of Mandarin (Revised Edition), this paper summarizes the differences and similarities of definition mode across the Taiwan Straits with a hope to be conducive to the complication of Chinese dictionary and natural language processing.

Xiaoyao Zhang, Dekuan Xu, Shumei Chen
Temporal Behavior of Temporal Modifiers and Its Implications:
The Case of běnlái and yuánlái in Mandarin Chinese

In this paper, I argue that contrast is an essential and indispensable part of the semantics of běnlái and yuánlái, in addition to the temporal semantics argued in [1] and [2], based on the discourse function of these two temporal mdofiers. Due to the close interaction between lexical semantics and discourse function, I propose that examining a lexical entry in discourse will give us a more complete and satisfactory picture of the semantics of the lexical entry.

Jiun-Shiung Wu
Distributive Quantifier měi in Mandarin Chinese

This paper aims to understand the function of the distributive quantifier měi in the [měi-Numeral-NP V Numeral-NP] construction. Měi is viewed as a marker of plurality over plural entities, seen as a set denoted by Numeral-NP, and marks the sorting key for a distributive dependency in the sense of Choe (1987). Without měi, the structure [Numeral-NP V Numeral-NP] can induce either a distributive dependency or a proportional relation between the two numeral NPs. We account for the implausibility of the sequence [*měi-yī-NP]. Měi involves the notion of unit, which is distinct from the notion of singular number expressed by the numeral yī ‘one’.

Hua-Hung Yuan
On the semantic functions and denotations of jīngcháng ( ) and chángcháng ( )

Jīngcháng and chángcháng are two adverbs which have been classified in the same category of frequency adverbs in the Sinologist literature. While there are clear similarities between the two (e.g. both denote a multiplicity of ‘occasions’, can be treated as markers of verbal plurality operating at the occasion-level in the sense of Cusic [2], and induce the pure frequency and relational readings in the sense of de Swart [6], they present differences in two aspects: (i) their semantic functions (habituality vs. iterativity) and (ii) their denotations in the view of temporal intervals between two occasions.

Daniel Kwang Guan Chan, Yuan Hua-Hung
On the Grammaticalization of Chinese Prefix Di

The origin and development of Chinese prefix di which indicates ordinal numbers are very important in Chinese ordinal number system. However, in current studies, it is still a controversial issue whether prefix di originated from a nominal di “order” or a verb di “arrange the order” and when it became grammaticalized. This paper provides a detailed analysis about the controversial data in Archaic Chinese from the perspective of generative syntax and semantics, and argues that the prefix di indicating ordinal numbers is grammaticalized from the nominal di “order/rank” which was frequently used before the numerals. Di was first used as a prefix in Han Period. The theory and way of analysis of generative syntax and semantics are very helpful for the studies of the historical grammar of Chinese.

Mengbin Liu
The Degree Usage of Cai and Relevant Issues

This paper investigates the degree usage of [cai] and some relevant issues. It is argued that in this case cai and the sentence final particle [ne] form a discontinuous constituent, which modifies the element surrounded by it.“” [cai…ne’] requires that the element modified by it has the gradable property. Thus only qualified VPs, AdjPs and NegPs can co-occur with ‘cai…ne’. Some characteristics of ‘cai…ne’ are shown by comparing it with other degree adverbs. Moreover, the syntactic structure and semantic interpretation of ‘cai…ne’ are given.

Lei Zhang
A study on the third interpretation of ‘V+ Duration Phrase’ in Chinese from the perspective of qualia structure

Duration Phrase (i.e. shifenzhong ‘ten minutes’, santian ‘three days’) in “V + Duration Phrase” normally refers to a duration of a certain action or duration since the completion of an event in Chinese. In this article, a less-discussed third interpretation of “V + Duration Phrase” has been put forward to discuss, which refers to the duration of a span (i.e. shelf life of a cake), with the action of the verb being discontinuous in the whole process. It is pointed out that the V in this third reading of “V + Duration Phrase” is not fully lexical, and can be substituted by a semi-lexical verb yong ‘use’ or a light verb as USE. In line with the qualia structure analysis under the framework of generative lexicon theory, this semi-lexical/light verb is found to function as a Telic role of an involved noun of the sentence.

Wei Chin, Changsong Wang
Semantic Classification and Category Expansion of “Qing( ) X” in Modern Chinese

Based on the statistics of the headlines of the People’s Daily Online in the past 10 years, we find that the category of the “Qing() X” can be divided into five semantic classes, each of which is based on the prototype meaning and gradually expands into the category of “Qing() X” in comparison and generalization. The semantic extension model of the “Qing() X” category is similar to the extended pattern of polysemous words, which extends based on the family resemblance. And the category of “Qing() X” is open, which means the number of members is expanding.

Xufeng Yang
Three Registers behind two Characters: an Analysis of the Words’ Formation and their Registers

Three Registers behind two characters, or Two-Character-Three-Register, is a special phenomenon in Chinese word formation, which mainly means that on the premise that a dissyllabic verb consists of two monosyllabic verbs which could be used separately, it represents a picture that three Chinese words belong to three individual registers. 81 groups of such words are exhaustively analyzed in this paper and it is found that (1) the most important function of the Two-Character-Three-Register Phenomenon is to distinguish registers; (2) dissyllabic verbs are mainly laid in formal register, whereas monosyllabic verbs in informal and elegant registers; (3) this can mostly be accounted for through the filling effect of register vacancy.

Jianfei Luo, Jing Xiang
The Meaning of Polysemous Adjective “Hao(Good)”

It is difficult to explain the meaning of polysemous adjectives. Previous scholars have done a number of researches, but there are still two problems unsolved. First, how to describe the meaning of adjective in “Adj + N” compound word? Second, how to describe the meaning of Adjective in “Adj + N” contextual combination? Taking “Hao(Good)” as an example, this paper establishes a framework of semantic analyzing for adjective on the basis of Qualia Structure (see table 1). The research shows that:1) the framework is suitable for analyzing not only the meaning of “Hao(Good)” in “hao(good)+ N” compound word but also in “hao(good) + N” combination.2) The “hao(good) + N” might have multiple meanings, but the “to function well” is its basic meaning.

Enxu Wang, Yulin Yuan
The (Dis) appearance of Affected Role “lìnɡ/shǐ…” in the Causative Adjective-Noun Composition

The adjective-noun composition of “emotional value + emotional initiation” has causal meaning, which can be indicated by adding the affected role “lìnɡ/shǐ…” before adjectives. However, “lìnɡ/shǐ…” cannot be added before some adjective-noun compositions. On the basis of some literature, this paper discusses this issue, pointing out that the (dis)appearance of the affected role “lìnɡ/shǐ…” can be preliminarily explained through the semantic relation between the adjectives and nouns referring to things or participants in events.

Qiang Li
The Study of Content Restriction of Mandarin yǒu Measure Construction

This paper investigates mandarin yǒu measure construction, which consists of a main verb yǒu, a Measure Subject, and a Measure Object. This construction has been studied by many researchers. However, its content restriction has not been systematically summarized. Based on Autonomous/Dependent Alignment Model in Cognitive Grammar, this paper aims to shed some light on this topic. It is found that there are four rules governing its content restriction, which is motivated by the interaction of the Subject and the Measure Object at the cognitive level.

Shaoshuai Shen
Metaphors in Chinese Accompanying Gesture Modality and Cognition

Chinese accompanying gestures are a common phenomenon in life, and native speakers of Chinese use gesture modality to express metaphors. This study analyzes metaphors in Chinese accompanying gesture modality from the perspective of cognitive linguistics and cognitive neuroscience rather than the multimodal metaphor theory. Both spoken language and accompanying gesture modalities of metaphors are integrated into discourse, indicating that metaphor creation is based on general cognitive rather than linguistic features. The case study of Chinese accompanying gestures shows that conceptual metaphors can be cognitively activated when they are not expressed in oral modality. The close interaction between modality and language indicates that metaphors are a dynamic attribute rather than a static in the physical and interactive context. The study of metaphors in accompanying gesture modality provides a specific case for further studies on metaphors and offers support for creating a non-literal language cognitive model.

Dengfeng Yao, Minghu Jiang, Abudoukelimu Abulizi, Renkui Hou, Lifei Shu
Competition and Differentiation of a Pair of Morpheme-inverted Words in Mandarin Chinese: Dòuzhēng and Zhēngdòu

A large number of morpheme-inverted words are unique elements in Chinese vocabulary, which is an important lexical phenomenon occurred in the history of the development and evolution of Chinese vocabulary. As a corpus driven study, this paper discusses a pair of morpheme-inverted words dòuzhēng () and zhēngdòu () from microscopic perspective. Based on the statis-tics and analysis, this paper explores the origin, occurrence time, evolution pro-cess, service conditions and the reasons of semantic changes of dòuzhēng () and zhēngdòu (). And this study aims at providing some reference for the study of morpheme-inverted words from microscopic perspective.

Li Wensi
The Polylexicalization and Grammaticalization of Zhiyu

Zhiyu can be used as a conjunction and an adverb in Mandarin Chinese. This research focuses on the polylexicalization and grammaticalization of this word. The conjunction zhiyu derives from the “verb +preposition” construction “zhi+yu”, which means “arrive at/in”, while the adverb zhiyu comes from the same construction means “result in”. According to the development of these homographs, the mechanism of polylexicalization is the differentiation of morpheme meanings. Additionally, texts also provide conditions for meaning change. When zhiyu became two different words, the grammaticalization progresses of both words happened separately.

Xin Kou

Applications of Natural Language Processing

Frontmatter
Research on the Recognition of Chinese Autonomous Verbs Based on Semantic Selection Restriction and Natural Annotation Information

Verb has always been an important yet challenging topic in linguistic research. The secondary classification of verbs is of great value for language ontology and language applied research. This paper investigated the computer recognition of autonomous verbs and constructed rules for recognizing such verbs, which is based on the contextual characteristics of automatic verbs and some natural annotation information (mainly punctuations and position information). A rule-based automatic recognition algorithm was thus devised. The F1 value of the algorithm is 86.3% on the manually labeled test data.

Chengwen Wang, Endong Xun
Disambiguating Polysemous Word Senses Based on Semantic Types and Syntactic Collocations: A Case Study of “Zhongguo+N”

This paper deals with the polysemous words in noun compounds and investigates the relationship between their senses and the semantic types of the co-occurrent nouns. Taking “Zhongguo+N” as a studying case, we first review the senses of Zhongguo that are defined in Chinese Wordnet, WordNet and HowNet. Then we claim an innovative way of distinguishing the senses by integrating the semantic types based method and collocation based method. Afterwards, we focus on the noun compounds of “Zhongguo+N” and investigate the relationships between the semantic types of the collocating nouns and the different senses of Zhongguo. It turns out that the senses of the polysemous words are related to the semantic types of the co-occurrent nouns. In detail, the sense of culture highly relates to artificial objects, and the sense of location has more collocating nouns, while the sense of organization has the lowest frequency. These results show that the collocation information is the solid evidence for the sense distribution.

Lulu Wang, Meng Wang
Extracting Opinion Targets and Opinion Words with CenterNP and syntactic-semantic features

Extracting opinion targets and opinion words from online reviews are 2 fundamental tasks in opinion mining. This paper proposed a novel approach to collectively extract them with centerNP and other syntactic and semantic features. Researching on the models of opinion targets and words on their syntactic and semantic features through HNC theory, this paper found that the opinion targets are some special noun phases (NP), and we named them centerNPs which are made up with conceptual categories g,r,z,pw,ww and v, and their syntactic position must be in chunk GBK. So this paper extracted the centerNPs as the candidate of the opinion targets, and introduced other syntactic and semantic features such as syntactic position of opinion targets, the conceptual categories and contextual features of opinion targets and opinion words. The experimental results showed that our approach achieved a good performance.

Jing Wang
Acquiring Selectional Preferences for Knowledge Base Construction

Selectional preference, or SP, is an important lexical knowledge that can be applied to many natural language processing tasks, like semantic error detection, metaphor detection, word sense disambiguation, syntactic parsing, semantic role labeling, and machine translation. This paper studies semantic class level SP acquisition for knowledge base construction. Firstly, the noun taxonomy of SKCC, a Semantic Knowledge-base of Contemporary Chinese, is adjusted for SP acquisition. Secondly, a MDL-based tree cut model is implemented. Thirdly, SP in SKCC is introduced as the source of gold standard test set to evaluate SP acquisition performance. Three kinds of predicate-argument relations are investigated in the experiments, including verb-object, verb-subject, and adjective-noun relations. For the verb-object relation, the top1 strict accuracy is 24.74% while the top3 relaxed accuracy reaches 75.26%.

Yuxiang Jia, Yuguang Li, Hongying Zan
Identifying Chinese Event Factuality with Convolutional Neural Networks

Event factuality describes the factual level of the event expressed by event narrator and is one of the deep semantic representations of natural texts. This paper focuses on identifying Chinese event factuality and proposes an effective approach based on CNN (Convolutional Neural Networks). It extracts factual related information from event sentences and then regards them and their transformation as features. Meanwhile, it transfers the features to word vectors to construct a sentence-level word vector map. Finally, it inputs the word vector map to the CNN model to identify event factuality. Experimental results show that our approach achieves a higher performance by using factual features and CNN model, especially the advantage to tackle the imbalanced data distribution problem.

He Tianxiong, Li Peifeng, Zhu Qiaoming
Recognizing Textual Entailment Using Inference Phenomenon

Inference phenomena refer to inference relations in local fragments between two texts. Current research on inference phenomenon focuses on the construction of data annotation, whereas there are few research on how to identify those inference phenomena in texts, which will contributes to improving the performance of recognizing textual entailment. This paper proposes an approach, which uses inference phenomena to recognize entailment in texts. In the approach, the task of recognizing textual entailment is formalized as two problems, that is, inference phenomenon identification and entailment judgment, then a joint model is employed to combine such two related subtasks, which is helpful to avoid error propagation. Experimental results show that the approach performs efficiently for identifying inference phenomena and recognizing entailment at the same time.

Han Ren, Xia Li, Wenhe Feng, Jing Wan
Deriving Probabilistic Semantic Frames from HowNet

Representing knowledge as frames has a long history in artificial intelligence and computational linguistics. However, constructing frame banks that support frame-based processing is quite time-consuming, leading to the unavailability of usable frame banks for many languages, including Chinese. This paper proposed a method for deriving probabilistic semantic frames from HowNet, which is a well-known common-sense knowledge base and has been successfully used in many NLP applications. Unlike most previous HowNet-related work which focused on using HowNet as a lexico-semantic bank, this work viewed the HowNet dictionary as a semantic-annotated corpus. According to the proposed method, governor-role-dependent triples are firstly extracted from the concept definitions of the HowNet dictionary. Then, they are organized into frames by the governors and the probabilities are estimated based on maximum likelihood estimation (MLE). Finally, the probabilistic frames form a frame bank. Moreover, in order to overcome the data sparseness problem, a smoothing method based on HowNet’s taxonomy was put forward. To verify the constructed frame bank, we applied it in a task to recognize relationships between Chinese word pairs, which are extracted from the Chinese Message Structure Database of HowNet. The experimental results showed that, even without using context information, the system based on the constructed frame bank achieved an accuracy of 83.74%, which indicates the soundness of the constructed frame bank.

Yidong Chen, Yu Wan, Xiaodong Shi, Suxia Xu
Semantic Relations Mining in Social Tags Based on a Modern Chinese Semantic Dictionary

At present, many scholars have studied the semantic relations mining in social tags based on WordNet—an English semantic dictionary and have made some progress. There have been few studies to combine modern Chinese semantic dictionary and social tags. The paper selects tag data from Dòubàn Reading first, then uses the classification and coding system of A Thesaurus of Modern Chinese(TMC), calculates the semantic similarity of tag data and mines the semantic relations in social tags by WordSimilarity—a lexical semantic similarity computing system. The results obtained with this method, not so different from the way we think of lexical semantic relations, have a higher accuracy.

Jiangying Yu

Lexical Resources

Frontmatter
The Study of Indian Domain Ontology Building Based on the Framework of HNC

The study applies the theory and method of Hierarchical Network of Concepts (or HNC theory) to Indian domain ontology building, which is approved to be effective. The process of ontology building covers two phases: collection of ontology terms, top-level framework design of ontology. Firstly, the study applies HowNet, TYCCL and Word2Vec to assist collecting India-relevant terms based on word similarity calculation, and then HNC conceptual tree table is applied to build Indian Ontology. The corpus of training Word2Vec models mainly comes from People daily, Wikipedia and Sogou news and then, based on the well-trained models, Word2Vec is applied to assist basic term collection. According to HNC conceptual tree table, the Indian Ontological Knowledge Base (or IOKB) covers seven general fields of India such as politics, economy, military and culture, etc. Currently, IOKB has more than 4350 concepts and instances, 51 object properties and 207 data properties.

Jinzhu Huang, Keliang Zhang, Feng Li
Mandarin Relata: A Dataset of Word Relations and Their Semantic Types

For both the training and evaluation of semantic distributional models, language datasets are needed that are both elaborate in their word level descriptors and readily intuitive to human judgment. The current paper introduces a dataset for Mandarin Chinese constructed through the combination of word relation pairs from two distinct sources: corpus extraction, and human elicitation. Our results show that while more word relation pairs were gained through the corpus extraction process, human elicited semantic neighbors were almost twice as likely to show agreement with human raters. The current methods created 4091 word relation pairs that span hypernymy, hyponymy, synonymy, antonymy, and meronymy alongside semantic type information. To date, this is the largest collection of human-rated word relation pairs in Mandarin Chinese.

Hongchao Liu, Chu-Ren Huang, Ren-kui Hou
Quantitative Analysis of Synergetic Properties in Chinese Nouns

This paper analyses more than 13000 nouns in contemporary Chinese and investigates their synergetic properties on the basis of word polysemy. HowNet is used in this paper to measure the word polysemy and it proves to be an effective approach. Statistical analysis of the data indicates that the polysemy of nouns abides by the modified Zipf-Alekseev distribution. The results of function fitting with Altmann Fitter show that word polysemy is somewhat related to some linguistic variables including word length, word frequency and polytexuality, which means that each variables in the subsystem of lexicon is still synergetic and Köhler’s lexical model proves to be effective for Chinese nouns.

Yujie Liu, Pengyuan Liu
Construction of an Online Lexicon of Chinese Loan Words and Phrases Translated from English

From Morrison to Pearl S. Buck, Chinese language has been introducing new words from western languages – English as a typical source – for over a hundred years. Generally speaking, this new vocabulary is termed as loan words, which can be traced to two major sources: 1. Introduced by western missionaries having worked in China; 2. Introduced by Chinese intellectuals via Japanese “(hé zhì hàn cí)” that were originally translated from western literature in the early 1900s. From the perspective of Chinese-English equivalence, these new words in Chinese form a one-to-one relation with their English source words for they were directly or indirectly translated from English. Therefore they may translate it into some other expressions. Currently, dictionaries of Chinese loan words serve as the vehicles of this new type of vocabulary, but they have only paper-versions and limited number of entries, which lag behind the fast development of information technology and the growing need of instant acquirement of knowledge. Therefore, to compile a new lexicon for Chinese loan words that have one-to-one correspondence with English will help translators work with a better quality and efficiency.

Lei Wang, Shiwen Yu, Houfeng Wang

Corpus Linguistics

Frontmatter
Study on the Annotation Framework of Chinese Logic Complement Semantics

The meaning expressed by elements of negation, degree, tense and aspect, modality and mood in a sentence attached to the basic predicate-centered proposition is called logic complement semantics, which is embodied as semantic constraints of logic semantic operators to the predicate. Logic complement semantics is the effective supplement to the basic logic meaning, and is important for deep understanding of sentence semantics. In this paper, a Chinese logic complement semantics annotation framework aimed for deep semantic comprehension is preliminarily practiced, which constructed a classification system including negation, degree, tense and aspect, and mood on the basis of existing research results, built the operator dictionary, established rules for annotation, and annotated logic complement semantics operators of a sentence which have been tagged with basic propositional arguments. Finally, the statistics of the annotation result are presented, and the problems in annotation process are analyzed.

Kunli Zhang, Yingjie Han, Yuxiang Jia, Lingling Mu, Zhifang Sui, Hongying Zan
Building A Parallel Corpus with Bilingual Discourse Alignment

This paper describes a discourse resource, namely a Chinese-English parallel corpus, based on the idea of bilingual discourse alignment. We introduce a bilingual collaborative annotation approach, which annotates English discourse units based on Chinese ones, and annotates Chinese discourse structure based on English ones subsequently. Such approach can ensure full discourse structure alignment between parallel texts, and reduce cost for annotating texts of two languages as well. Annotation Evaluation of the parallel corpus justifies the appropriateness of the discourse alignment framework to parallel texts.

Wenhe Feng, Han Ren, Xia Li, Haifang Guo
NTU-EA: A Graphic Program for Error Annotation

This paper reports the Nanyang Technological University Error Annotation Program (NTU-EA), a language error annotation program with graphic user interface. Compared with previous studies, the program is featured by being efficient, comprehensive, and flexible for error annotation of Chinese language. The results exported from the program can be readily used for various purposes including error analysis and construction of learner corpora. The NTU-EA program is free for download.

Jingxia Lin, Kang Kwong Kapathy Luke
A Comparable Corpus-based Genre Analysis of Research Article Introductions

Research concerning academic genre has attracted an increasing attention from linguistics abroad and at home, ranging from a wide variety of perspectives, but there is still a shortage of research regarding the introduction part of research articles (RA). This paper makes a multi-level comparative analysis of Chinese and English RA introductions by employing the strengths of large corpora in terms of large text, multi-function and rigorous selection of texts, based on our self-compiled large-scale comparable corpus: Chinese-English Comparable Introduction Corpus (CECIC). Studies based on comparable corpora can avoid the “translationese” of parallel corpora. Findings reveal that there is still significant difference between Chinese RA introductions and the international ones and that the irregular move distribution feature of rhetorical structure in Chinese RA introductions is still prominent today, indicating that it is still necessary to strengthen Chinese RA writers’ genre awareness. That the current research chooses RA introductions, a weak point in academic studies, as the research focus is our positive effort to contribute to world genre analysis, and it will be beneficial to academic variety and theoretical innovation.

Guiling Niu
Figurative Language in Emotion Expressions

This paper examines the use of figurative language in expressing emotions in social media. Based on the analysis of 300 posts from Weibo.com, we argue that there is a close interaction between figurative language and emotion. It is found that 27% of the posts contain figurative devices such as metaphor, simile, rhetorical questions, and irony. Among the five basic emotions, the emotion of anger has the greatest tendency to be expressed via the figurative devices, followed by sadness, fear, surprise, and happiness. In addition, the data shows that rhetorical questions are the most frequently used figurative device for evoking negative emotions, i.e. anger and sadness. We believe that the linguistic account of figurative language in emotion expressions will significantly enhance the effectiveness of the existing automatic emotion classification systems.

Sophia Yat Mei Lee
From Linguistic Synaesthesia to Embodiment: Asymmetrical Representations of Taste and Smell in Mandarin Chinese

This paper applied the embodiment theory of metaphor to the study of linguistic synaesthesia. In particular, we tried to account for the distribution of synaesthetic uses of Mandarin adjectives for taste and smell in terms of the degree of embodiment of different bodily experiences. We have found that taste is involved frequently both as the source domain and as the target domain in linguistic synaesthesia of Mandarin adjectives, while smell is productive only as the target domain. Besides, the synaesthetic transfer from taste to smell has also been attested to be more predominant than the transfer in a reverse direction, i.e., from smell to taste. We have thus proposed that a finer-grained theory of embodiment is sorely needed to account for the subtle differences in synaesthetic patterns of taste and smell in Mandarin adjectives. That is, the degree of embodiment is not only relevant in terms of the traditional dichotomy of bodily versus non-bodily events in the embodiment theory. The degree of embodiment is also a crucial concept to differentiate physiologically-based events such as those involving sensory modalities, which thus should also be taken into consideration in the theory of embodiment.

Qingqing Zhao, Chu-Ren Huang, Yat-mei Sophia Lee
A Study on Chinese Vocabulary Learning Strategies of Second Language Learners

This study is to provide insights into how second language learners of Chinese develop their vocabulary learning strategies and explore the differences among beginning, intermediate and advanced learners. The results of investigation indicate that the advanced learners have a wider range of strategies. We also found that learners from different cultures have some specific preferences in using vocabulary learning strategies.

Geng Zhi, Lu Xiuchuan
A Study on the Distribution Differences of Sentence Group Coherence in Chinese Texts of Different Genres

Chinese sentence group plays an important role in text coherence analysis.Because of the complexity and the diversity of Chinese language, the sentence groups of different genres often show different coherence distribution characteristics.This paper analyzed the coherence of four different corporainnews, application, prose and encyclopedia based on the different statistical features of two independent annotators. In this work, the coherence distribution characteristics of sentence groups in four corpora are analyzed, and the differences of sentence group coherence in different genres are compared in detail.The study lays a good foundation for the automatic segmentation of the boundary of sentence group and automatic analysis of the relation between sentences in the future.

Tianke Wei, Qiang Zhou, Xuejing Zhang, Xueqiang Lv
The Construction and Application of The Legal Corpus

With the development of Teaching Chinese as an International Language and the professionalization trend of Chinese learning, legal Chinese becomes more and more important. To support the legal Chinese teaching and provide Chinese learners, Chinese teachers and other legal workers with authentic data, this paper constructs a legal corpus, which contains 35 legal texts of Mainland China. This study automatically segments the texts into words and manually checks all the segmentation results. Besides, through using the quantitative and qualitative analysis methods, this paper analyzes the common vocabulary of legal Chinese, analyzes the features of legal Chinese, compares the differences between the common vocabulary of legal Chinese and that of the international Chinese teaching, and compares the differences of the common meaning between legal Chinese words and common words in international Chinese vocabulary syllabus. This study also makes reference to the classification of Chinese word level in The Syllabus of Chinese Vocabulary and Characters Levels [18] to classify the words in the legal corpus and explores the application of this corpus in international Chinese teaching. This study finds that there are many differences between legal Chinese and general Chinese, in terms of the common vocabulary and the common meaning of words. So, it can be seen that the legal vocabulary has particularity in the teaching. We cannot directly utilize the existing vocabulary teaching methods to the teaching of legal Chinese vocabulary. Therefore, this paper puts forward several solutions for solving this problem.

Huiting Luo, Shan Wang
Research on the Lexicography Based on the Corpus of International Chinese Teaching Materials

With the development of computer science and corpus technology, corpus use has been widely accepted in the field of lexicography. However, corpus application has been greatly restricted because of the lack of relevant information regarding the traditional corpus. Through the comparison of research situations at home and abroad, this paper analyses the reasons why the Chinese corpus is inadequate in assisting lexicography. Additionally, through the analysis of data processing for the diagrammatic Chinese syntactic Treebank based on the international Chinese teaching materials constructed by Beijing Normal University, this paper identifies how the diagrammatic Chinese syntactic Treebank can avoid the shortcomings of the traditional Chinese corpus in assisting lexicography. Additionally, according to the HSK lexical syllabus and Modern Chinese Dictionary, we have attempted lexicography of example sentences dictionary assisted by the diagrammatic Chinese syntactic Treebank. Finally, illustrations are provided for the problems encountered, and the important role of corpora in lexicography is emphasized.

Yinbing Zhang, Jihua Song, Weiming Peng, Dongdong Guo, Tianbao Song
Research on Dynamic Words and Their Automatic Recognition in Chinese Information Processing

Many of words in Chinese sentences are dynamic construction of “temporary words”. Dynamic words are sentence units which are generally not included in the lexicon and should not be done further analysis as phrase structures in the syntactic analysis. The dynamic word problem is one of the key problems in Chinese information processing. On the one hand, it is conducive to the unity of granularity sizes of word segmentation results; on the other hand, it is an important basis for the realization of efficient and accurate automatic lexical and syntactic analysis. This paper summarizes dynamic words in Chinese information processing, analyzes the structural modes of dynamic words and establishes a relatively scientific and complete dynamic word structural mode knowledge base by means of annotating structural mode information of dynamic words in a certain scale corpus. At last, the problem of automatic recognition of dynamic words is preliminarily explored. This paper provides a new idea and way for the study of lexical analysis in Chinese information processing.

Dongdong Guo, Jihua Song, Weiming Peng
Matching Pattern Acquisition Approach for Ancient Chinese Treebank Construction

Matching Pattern (MP) is a sequence of words or part-of-speech (POS), sampled from clauses, and MP acquisition is an effective approach for ancient Chinese treebank construction. This approach uses the typical characteristics of ancient Chinese short-clauses and strong-patterns, and lays down the syntactic annotation process of the treebank construction in three stages. These stages involve: (1) obtaining weighted MPs with a syntactic skeleton; (2) applying these MPs to match the clauses; and (3) generating syntactic structures of these clauses according to the syntactic skeleton of the MP. The syntactic skeletons are constructed based on the Sentence-based Grammar in our experiments. The MP-based parsing procedures are implemented on both clause and fragment units. Experiments on corpora extracted from Yili and Zuozhuan show that an integrated algorithm, involving both clause and fragment units, can achieve a performance of 99.07%/82.76% and 97.25%/77.77% for coverage/precision, respectively.

Jing He, Tianbao Song, Weiming Peng, Jihua Song
Annotation Guidelines of Semantic Roles for Semantic Dependency Graph Bank

During the process of annotating the corpus of Semantic Dependency Graph, we found that each semantic role contains tinier semantic characters which are easy for annotators to tag them in different ways. So we set a whole set of annotation guidelines to keep the annotation process objective and identical. There are 3 types of guidelines: paradigmatic relations, syntagmatic relations and semantic features. From the annotation guidelines of subject roles, object roles and some groups of circumstanced roles, a more scientific annotation system was gradually founded, and so that we can make the manual annotation less confused. By means of this, we can make a high-quality corpus and make the computer understand nature language better.

Xinghui Cheng, Yanqiu Shao
Chinese Conjunctions in Second Language Learners’ Written Texts

In Chinese texts, cohesion refers to grammatical or lexical relationships within sentences and texts. Through these relationships, a series of sentences are connected to form unified texts that are intelligible and meaningful. Various studies have found that cohesion is a crucial factor in readability and reading comprehension, and thus have maintained that cohesion in a text influences comprehension [1]. Among the recent studies on Chinese readability [2–5], most have overlooked the function of discourse connectives for better reading comprehension. In this study, five Chinese conjunction types taken from a Chinese Written Corpus (CWC) were analyzed to determine their semantic features and structures and the reasons for learners’ usage errors. The results of this study will contribute to the development of teaching Chinese writing by using Chinese conjunctions to improve learners’ writing abilities.

Jia-Fei Hong
Study on Lexical Gap Phenomenon at the Primary Stage of Vocabulary Teaching in TCFL

Recently, the study on lexical gap phenomenon has not been only the study on whole-word gap and hypernym-hyponym gap, but also the research on word formation gap and lexical development mode gap, etc. However, recent vocabulary teaching in Teaching Chinese as a Foreign Language (TCFL) is still based on language comparison, which leads to the ignorance of the systematicness of this phenomenon. This study will reveal the importance of lexical gap phenomenon at the primary stage of vocabulary teaching in TCFL with the common word “Apple”. The rational use of the systematicness of the lexical gap phenomenon will help learners increase morphological awareness.

Xie Jingyi
Study on Chinglish in Web Text for Natural Language Processing

Recently, Chinglish in Web Text is one of new language phenomena, and has brought some problems for automatic analysis of natural language processing. This paper builds a small-scale open Chinglish corpus for NLP, then analyzes the linguistic characteristics of Chinglish in Web Text from two aspects: vocabulary and grammar, as well as Chinese-English translation of phrases and sentences. The study can be helpful for natural language processing, such as machine translation, sentiment analysis and information extraction.

Bo Chen, Chen Lyu, Ziqing Ji
Construction of a Database of Parallel Phrases in Chinese and Arabic

Parallel Corpora are a basic resource for research in bilingual and multilingual Natural Language Process (NLP), comparative linguistics, and translational studies. The basic unit of semantics in natural language is the phrase with fixed form and meaning. Phrasal alignment is a very important application for a parallel corpus. The current paper adopts a “analyze-analyze-match” strategy to select phrases for alignment from a parallel corpus of Chinese and Arabic. An aligned phrase database is built and the verbal phrases are compared between Chinese and Arabic. Verbal phrases in Arabic typically have three categories: Verb + Noun Phrase, Verb + Prepositional Phrase, and Particle + Verb. When aligned to Chinese phrases, each category of Arabic verb phrase corresponds to a number of phrase categories, including verb phrases, as well as noun phrases and prepositional phrases. The translational examples and rules derived from the aligned Arabic and Chinese phrases can have important supporting role in translational studies and comparative linguistics research.

Alaa Mamdouh Akef, Yingying Wang, Erhong Yang
A Study on the Discourse Connectives in Analects of the Sixth Chan Patriarch Huineng

By labeling the structure of Analects of the Sixth Chan Patriarch Huineng, this paper is devoted to study of the explicit and implicit connectives, their semantics and usage. We examine the following results: 1) Implicit connectives (2067,84.9%) are more than explicit connectives (369,15.1%). Among 17 discourse relations, only in 2 Hypothesis and Concession explicit connectives are used more than implicit. 2) There are different ways to use synonymous connectives to represent the same relation. On the one hand, the connectives are used most frequently in Continuity – 14 times. On the other hand, Summary-elaboration and Background relations can be set up without any connectives. 3) Among 60 kinds of connectives, the monosemous connectives are more than the polysemous. Polysemous connectives (“ruo(), ji(), yi()”) contain at most 4 meanings which are used in different ways in sentences. Besides, we analyze the usage of synonymous connectives in Hypothesis and the polysemous connective (“ji()”) for case studies.

Haifang Guo, Tao Liu, Wenhe Feng, Yi Yang
Transitivity Variations in Mandarin VO Compounds—A Comparable Corpus-based Approach

This paper adopts a comparable corpus-based statistical approach to VO compound Variations in two varieties of Mandarin Chinese and examines the variations from a transitivity perspective. In recent years, more and more VO compounds are observed to have transitive usages. Previous studies categorize the transitivity of VO compound in a dichotomy way, while we argue that each VO actually differs in their degree of transitivity, especially when the variations between different variants of Mandarin are taken into consideration. The degree of transitivity can be measured by both transitivity frequency and its semantic/syntactic properties (follow the theory of Hopper and Thompson [1]). In our study, we compare the transitivity difference between Mainland and Taiwan Mandarin by adopting a corpus-based statistical approach. For both transitivity frequency and semantic/syntactic properties study, the results clearly show that Taiwan VO compounds have a higher degree of transitivity than the Mainland counterparts. We further argue that the higher transitivity degree in Taiwan also illustrates the conservatism of Taiwan Mandarin. This observation is consistent with the earlier study of transitivity variations of light verbs (Jiang et al. [2]) and follows the established null hypothesis in language changes that peripheral varieties tend to be more conservative.

Menghan Jiang, Chu-ren Huang
Entrenchment and Creativity in Chinese Quadrasyllabic Idiomatic Expressions

This paper aims to explore a special type of idiomatic expressions of even length called Quadrasyllabic Idiomatic Expressions (QIEs) in Chinese, and explain their variations with reference to semantic and structural constraints on the elements imposed by the construction of QIEs on the one hand, and its interplay with individual semantic elements in semantic space in the comprehension task of QIEs variants. Results of human ratings and behavioral experiment both show that semantic distance affects the speed of comprehension with the construction entrenchment. For those QIEs with idiomaticity, semantic distance leads to no major effect. We show that Chinese QIEs provide an ideal testing ground for the empirical investigation of the functional linguistic notion of entrenchment in processing multi-morphemic strings.

Shu-Kai Hsieh, Chiung-Yu Chiang, Yu-Hsiang Tseng, Bo-Ya Wang, Tai-Li Chou, Chia-Lin Lee
A Study on Chinese Synonyms: From the Perspective of Collocations

Words are often considered to be synonyms when they share the same meaning. However, there are subtle differences of synonyms appear in actual language use and expressions. Compared to previous approaches mostly based on invented instances, this study uses a corpus-based approach which provides a more comprehensive method to investigate the synonymy through their collocations. This paper examines three pairs of synonyms expressing the concept of something not being “real” with wěi-jiǎ being the etymological root, as well as their derivatives wěi zhuāng-jiǎ zhuāng and xū wěi-xū jiǎ, words in each pair are often used to simply mutually define one another in dictionaries and do not further explain their differences. Using the statistical method namely Mutual Information to compute data collected from corpus, this paper analyzes the relation between each word and their collocates by looking at the register in which they appear as well as their semantic features and prosody.

Yeechin Gan
Backmatter
Metadaten
Titel
Chinese Lexical Semantics
herausgegeben von
Yunfang Wu
Jia-Fei Hong
Qi Su
Copyright-Jahr
2018
Electronic ISBN
978-3-319-73573-3
Print ISBN
978-3-319-73572-6
DOI
https://doi.org/10.1007/978-3-319-73573-3

Premium Partner