Skip to main content

2018 | Buch

Chinese Lexical Semantics

19th Workshop, CLSW 2018, Chiayi, Taiwan, May 26–28, 2018, Revised Selected Papers

insite
SUCHEN

Über dieses Buch

This book constitutes the thoroughly refereed post-workshop proceedings of the 19th Chinese Lexical Semantics Workshop, CLSW 2018, held in Chiayi, Taiwan, in May 2018.
The 50 full papers and 19 short papers included in this volume were carefully reviewed and selected from 150 submissions. They are organized in the following topical sections: Lexical Semantics; Applications of Natural Language Processing; Lexical Resources; Corpus Linguistics.

Inhaltsverzeichnis

Frontmatter

Lexical Semantics

Frontmatter
Noun-Verb Pairs in Taiwan Sign Language

Nouns and verbs that are semantically and formationally related are called noun-verb pairs. Noun-verb pairs are found both in spoken and signed languages. A debate has been raised as to whether the noun and the verb in the pairs are distinguished by syntactic environments or they have a morphological (derivational) relation. Based on the Taiwan Sign Language data we have collected, it was found that nouns and verbs are distinguished more systematically by syntactic environments. Modality effects and non-effects in word formation in spoken versus signed languages are also discussed with a special focus on the role of iconicity.

Jane S. Tsay
A Semantic Analysis of Sense Organs in Chinese Compound Words: Based on Embodied Cognition and Generative Lexicon Theory

This article aims to analyse the four major sense organs of human beings, viz., 眼 (yǎn, eyes), 耳 (ěr, ears), 口/嘴 (kǒu/zuǐ, mouth) and 鼻 (bí, nose), in Chinese compound words with the combination of Generative Lexicon Theory and Embodied Cognition. It was shown that Embodied Cognition gives us an idea of the locus of the source domain in figurative use of organ-related words. Meanwhile, qualia structure in Generative Lexicon Theory, in particular, can be used to examine which sense of the word is activated when combining with other morphemes in a compound word. Moreover, the study found that the involved qualia roles vary in different syntactic structures and metaphorization of the compound words, which further demonstrates different lexical compositionality and productivity of the four basic sense organ words.

Yin Zhong, Chu-Ren Huang
The Functions of 了liǎo in Singapore Mandarin

This paper provides an account of the functions of 了 liǎo in Singapore Mandarin using spoken data. It is found that while liǎo can perform the grammatical roles of le, it presents its own constraints, particularly when used as a perfective aspect marker. Specifically, it occurs with verbal compounds in clause-final positions – an unacceptable construction in Mainland China Mandarin. Additionally, it is also observed that it co-occurs with 了 le at the sentence-final position, resulting in a rather peculiar ‘double-了’ construction. Some explanations are given for these occurrences of liǎo in Singapore Mandarin; most notably, it is hypothesized that Singapore Mandarin has retained the older construction of liao and the retention might have been brought on by the language contact with non-Mandarin dialects like Minnanese. The implication of this study is two-fold: (a) it can first shed light on a linguistic variation found in Singapore Mandarin and (b) it can also potentially serve as a reference study for future research conducted on 了 in general.

Yong Kang Khoo
The Conventional Implicature of Dōub(Dōu2,Dōu3): On Semantics of Dōub from the Perspective of Discourse Analysis

In existing studies dōub (dōu2, dōu3) is a polarity-marker and a universal quantifier which sometimes expresses having done or been and has an emphatic function. This paper argues that marking-a-polarity, universal quantification and expressing having done or been are all conversational implicatures or context meanings of dōub sentences. And as a kind of generalized conversational implicature the emphatic function which belongs to dōub construction is drawing from a plausible inference. But it is not the conventional meaning of dōub. The conventional implicature of dōub indicates that the speaker has made a judgment on the state of affairs described by a proposition, and he/she believes that the possibility of the state of affairs is inferior-to-expectation (or normal). As a kind of rule meaning it is a non-truth conditional, procedural and pragmatic meaning. But it is different from an ordinary rule meaning that is an explicit, literal, objective and truth conditional meaning because it is subjective, non-truth conditional and implicit. And it is different from a conversational implicature because it is non-cancellable.

Hua Zhong
External Causation and Agentivity in Mandarin Chinese

This work addresses a contrast in the encoding pattern of two kinds of events of caused change in Mandarin Chinese. Caused change of state events are typically expressed with a resultative verb compound, while caused change of location and caused motion events may be expressed with a monomorphemic verb. I argue that this asymmetry arises from two factors. One is a requirement in Mandarin that monomorphemic verbs of causation be agentive, which reflects a prototypical association between causers and volitional agents. The second is an ontological distinction between change of state and change of location. Changes of state may arise spontaneously without an external cause for any kind of individual. In contrast, change of location for one kind of entity – inanimates – requires the mediation of an external agent.

Shiao Wei Tham
Somewhere in COLDNESS Lies Nibbāna: Lexical Manifestations of COLDNESS

This paper starts with an investigation of three coldness-related tactile words, viz. han2 ‘cold’, leng3 ‘cold’ and liang2 ‘cool’, in their synaesthetic and metaphorical uses in Modern Chinese. It is found that leng3 ‘cold’ is most versatile whereas liang2 ‘cool’ is most inert with regard to their synaesthetic and metaphorical mappings, with han2 ‘cold’ standing in the middle. Moreover, han2 ‘cold’ tends to be object-oriented, while liang2 ‘cool’ is likely to be subject-oriented, with leng3 ‘cold’ allowing both subject- and object-oriented readings. We further conduct a study on the uses of these three tactile words in Buddhist texts of Āgamas, finding that liang2 ‘cool’ was consistently employed to refer the nibbānic status. Apart from it, two counts of leng3 ‘cold’ exhibit the nibbānic meaning. However, han2 ‘cold’ is never attested in this philosophical meaning. It is interesting to note that a kind of tactile feeling is associated with nibbāna, even though nibbānic experience is supposed to transcend sensory experience. This finding, together with some other findings with regard to the relation between sensory expressions and nibbāna, can shed light on the linguistc expressions of the inexpressible nibbāna.

Jiajuan Xiong, Chu-Ren Huang
Grammaticalization of Shuo and Jiang in Singapore Mandarin Chinese: A Spoken-Corpus-Based Study

This study investigates the extended grammatical uses of the speech act verbs shuō ‘say’ and jiǎng ‘say’ in Singapore Mandarin Chinese (SMC). With data from a contemporary spoken corpus, the study finds that while both and are major speech act verbs in SMC, has been more grammatically expanded than not only in SMC, but also in its counterparts, namely other Mandarin varieties and Chinese dialects. The extension has been motivated by both language-external and language-internal factors. The findings of this study will contribute not only to the typology of speech act verbs and their grammaticalization in general, but also to the study of language variations and changes.

Jingxia Lin
Internal Structures and Constructional Meanings: ‘Da-X-da-Y’ and Its Related Constructions in Mandarin Chinese

Adopting the theoretical framework of Construction Grammar, the present paper aims to examine the internal structures and constructional meanings of six Mandarin idiomatic prefabs: da-X-da-Y ‘big-X-big-Y’, da-X-xiao-Y ‘big-X-small-Y’, xiao-X-xiao-Y ‘small-X-small-Y’, da-X-wu-Y ‘big-X-no-Y’ and wu-X-wu-Y ‘no-X-no-Y’. The analysis has not only identified five constructional meanings among them, but confirmed the weightiness of semantic integration between lexical and constructional senses. The semantic map approach is further applied to characterize these sense relations in illustration of any multiple inheritance nested among distinct constructional meanings. For instance, the sense of intensification or emphasis is prominent in parallel constructions: meanings inherited from the reduplication structure. On the contrary, non-parallel constructions may carry such senses as overallness, equivalence, contrast, or serve a subjunctive mood.

Chiarung Lu, I-Ni Tsai, I-Wen Su, Te-Hsin Liu
Quantifier měi and Two Types of Verbal Classifiers in Mandarin Chinese

This paper discusses the quantifier měi in Mandarin Chinese, especially its co-occurrence with two types of verbal classifiers (ClV), which individualize event-denoting expressions at phase level, such as xià, and at occasion level, such as cì, huí in terms of verbal plurality in the framework of Cusic [1]. The sequence měi-ClV can appear in two types of structures: (i) měi-V-Num(eral)-ClV and (ii) měi-(Num-)ClV, and both require the co-occurrence of an adverb, jiù or dōu in the sentence. It is claimed that according to the co-occurrence with the adverb, jiù or dōu, the měi-quantification over VPs involves different types of pairing relation between měi-ClV-N/VP and the VP after the adverb.

Hua-Hung Yuan
Research on Basic Vocabulary Extraction Based on Chinese Language Learners

The basic vocabulary of language teaching is an important resource for use in the classroom, and for the compilation of teaching materials. In this paper, a hypothesis was proposed that every Chinese learner has a fuzzy set of Chinese basic vocabularies in their minds. We investigate 309 Chinese language learners and require them to automatically output their own basic words. We put forward a model for extracting basic words based on the above data. Through continuous improvement, we have achieved positive results and established the basic vocabulary knowledge base of Chinese language learners. This resource will provide a strong support for the vocabulary teaching and textbook compilation.

Zhimin Wang, Huizhou Zhao, Junping Zhang, Caihong Cao
Chinese Emotion Commonsense Knowledge Base Construction and Its Application

Commonsense knowledge usually exists in standard human to human communication, and it is very helpful to most of natural language processing works. However, Chinese commonsense knowledge, especially emotion commonsense knowledge, is still an urgent demand. In this paper, we try to construct a Chinese emotion commonsense knowledge base, which optimizes the existed structure of emotion commonsense knowledge base. First, emotion commonsense are collected and extracted from corpus, then HowNet and Tongyici Cilin are used to expand its scale, finally manually labeled and verified annotation quality are completed. The experiment results on the corpus and dataset show that the Chinese emotion commonsense knowledge base is helpful to improve the results of text polarity and emotion classification to some degree and it can be used in other emotion analysis work.

Liang Yang, Fengqing Zhou, Hongfei Lin, Jian Wang, Shaowu Zhang
A Comparable Corpus-Based Study of Three DO Verbs in Varieties of Mandarin: gao

In this study, we adopt a comparable corpus-based approach to investigate variations of three DO verbs in Mandarin Chinese: zuo ‘do’, gao ‘do’ and congshi ‘be engaged in’. Mandarin Chinese is unique in having three light verbs with bare meaning. The interesting and challenging facts about these three DO verbs are that: first, their usages can be differentiated even though they share the bare minimal meaning of ‘to do’; and second, their ranges of usages vary in different varieties of Chinese. How can the complex differentiations of these three verbs within one variety and across different varieties be accounted for with the minimal shared meaning? We tackle this challenge applying functions from Chinese Word Sketch to effectively identify the subtle differences among near-synonyms and their usage variations among different varieties with explicit semantic cues. This study thus underlines the contribution of empirical approaches when there is very little intuition to rely on.

Menghan Jiang, Chu-Ren Huang
From Near Synonyms to Power Relation Variations in Communication: A Cross-Strait Comparison of “Guli” and “Mianli”

This paper proposes a new approach to the study of stance differences in different speaking communities based on comparable corpus-based study of near synonyms. In particular, we study the differences in stance implications of the same pair of near synonyms of two varieties of Mandarin Chinese cross the strait: in Taiwan and Mainland China. We show that important communication frame differences such as interpersonal power relation can be encoded lexically and sharing same lexical forms that express different power relations can lead to barriers in communication. More specifically, our study of the uses of near synonyms “guli” and “mianli” adopts both verbal semantic representation of MARVS theory and functional communication theory of Systemic Functional Linguistics. The stance differences in terms of implication of interpersonal relation variations in Taiwan and Mainland China are represented and accounted for in MARVS. Our study synergizes the verbal semantic representation of MARVS theory with the functional communication theory of Halliday’s Systemic Functional Linguistics, especially in terms of tenor and modality. Our results also suggest that comparable corpus-driven, lexical semantics based approaches can provide a strong foundation for stance detection and classification of different communities.

Xiaowen Wang, Chu-Ren Huang
Exploring and Analyzing the Contact-Induced Semantic Transferring Cases Based on a Sanskrit-Chinese Parallel Corpus

Semantic transferring is a special way of producing the new senses of words in the process of language contact. From the perspective of Sanskrit-Chinese language contact, a parallel corpus of Sanskrit and Chinese languages was established and two categories of semantic transferring cases were brought into discussion to analyze their respective motivation. Various cases were provided to illustrate the conditions, processes and results of semantic transferring, which is a cross-language phenomenon that can only be investigated by comparing two languages in detail. Semantic transferring, as a special linguistic phenomenon, can reflect the similarities and differences between the mechanisms of lexical representation across different languages to a certain extent.

Bing Qiu
The Adjective “dà (big)” and Grammatical Analysis of “dà+N” Structure

Adjectives can be semantically classified into intersective adjectives, which modify the whole object referred by nouns, and non-intersective adjectives, which can also modify the characteristic function of semantic structure in nouns, so ambiguous interpretation appears. The adjective “dà” (big) is consistent with the above-mentioned adjective classification. “dà” indicating large size is an intersective adjective, while “dà” indicating degree is a non-intersective adjective. Some significant differences in syntactic performance exist in “dà+N” structure with different semantic usages. In addition, nouns have descriptive and hierarchical semantic features, which causes the different semantic interpretation of “dà” in “dà zhuōzi” (big table) and “dà bèndàn” (big idiot). Moreover, the hierarchical property of nouns is consistent with the measurable concept of adjectives. Hence, the semantic structure of nouns can be described formally.

Qiang Li
Acquiring Unaccusative Verbs in a Second Language: An L1-Mandarin L2-English Learner Investigation

This study investigates English unaccusative verbs, definiteness, and word order in native Mandarin speakers whose second language is English. The goal of the paper is to see how L1 Mandarin influences speakers’ learning of the unaccusative structure in English. I propose two hypotheses. Hypothesis (a) proposes that participants judge raised internal arguments as more acceptable than in-situ internal arguments because both indefinite and definite internal arguments are always allowed to move to a subject position (i.e., raise) in Mandarin. Hypothesis (b) proposes that unaccusative constructions where a definite internal argument remains in situ are less acceptable than those where an indefinite one remains in situ because, in Mandarin, only an indefinite internal argument is allowed to remain in situ. The findings support hypothesis (a) but not (b).

Yu-Leng Lin
A Referendum Is a Forward-Moving Object or a Bundled Object?

This article analyzes how a referendum is represented through the use of conceptual metaphors in two major newspapers in Taiwan, the Liberty Times and the United Daily News. The analysis indicates that a general schema for the referendum a causer causes an object to move or stop, which is further divided to the forward-moving and stopping sub-schemas, can be retrieved from the metaphors used. Moreover, it shows that three expressions for the forward-moving sub-schema, as lexicalized in the legislation domain, are predominant in the representation of the referendum. With the inevitability to use the general schema, the two newspapers take strategies to elaborate or neutralize the effects embedded in the schema, so as to transmit their respective political stances.

Ren-feng Duann, Kathleen Ahrens, Chu-Ren Huang
A Cognitive Study on Modern Chinese Construction “V-lai-V-qu”

As a commonly-used construction in modern Chinese, X-lai (come)-X-qu (go) comes from classical Chinese and it is still widely used in both oral and written Chinese today. In this paper we focus primarily on the construction where two Xs belong to the same monosyllable verb (V-lai-V-qu). Based on previous studies, we use theories of cognitive linguistics to explain the syntactic and semantic features of this construction, as well as the different frequencies of its variants. We hope this study can bring new insights to the exploration of cognitive mechanisms in researches of the similar Chinese verbal constructions.

Xiaolong Lu
A Study on the Type Coercion of the Causer of Chinese Causative Verb-Resultative Construction Based on the Generative Lexicon Theory

Based on the qualia structure and co-composition of Generative Lexicon Theory, as well as the Light Verb Theory, this paper analyzes the logical metonymy and the acceptability of an expression as a causer. We find out that the NP expression of the causer of the event in the Causative resultative construction undergoes an event coercion. Thus the NP does not represent an entity but an event, in order to adapt to the causative event schema. We also find out that the qualia unification of the NP and the VP has a great influence on the acceptance of the causer as well. This paper puts forward an unified interpretation model for the diverse sources of the causer.

Yiqiao Xia, Daqin Li
Towards a Lexical Analysis on Chinese Middle Constructions

This paper aims to provide a formal analysis on Chinese middle constructions in a lexical approach. In analyzing the middle constructions, there are two prominent issues that are discussed: (1) how does the transitive complex predicate become intransitive in the middle constructions? (2) how to assign the semantic roles to the logical object and the implicit argument? To answer the first question, the study makes use of the argument composition technique to append the arguments together in HPSG (Head-driven Phrase Structure Grammar). Then this study makes use of a flat semantics analysis within the MRS (Minimal Recursive Semantics) framework to constrain the semantic relations. The analysis shows that a lexical approach can deal with the syntax-semantics interface issues of the middle constructions with complex predicates.

Lulu Wang
A Study of Color Words in Tang Poetry and Song Lyrics

Throughout history, litterateurs have always attached great importance to the use of color words in their works. Tang poetry and Song lyrics, the most brilliant parts of Chinese literature history, are perfect examples in using color words to create artistic conception and express the thoughts and feeling. In this paper, high-frequency color words that appear in “All-Tang Poetry” (The collection of ancient Chinese poetry in 618–907 A.D.) and “All-Song Poetry” (The collection of ancient Chinese lyrics in 960–1279 A.D.) are chosen as research objects, and the optimal algorithm is employed to study the collocation of the color words in Tang poetry and Song lyrics by comparing the effects of PMI and Word2vec extraction collocation. The detailed analysis of works of twenty poets in Tang and Song Dynasties are also carried out from both macro and micro perspectives, from the use of color words in “All-Tang Poetry” and “All-Song Poetry” to that specific works. As a result, most “ (white)” in Tang poetry expresses frustration or sadness. “ (red)”, the most popular word used in Song lyrics, expresses femininity, indicating that “ (Tang Poetry expresses aspiration and ambition, and Song lyrics express personal emotions between lovers)”.

Ying Yang, Zhijun Zheng, Yanqiu Shao
A SkE-Assisted Comparison of Three “Prestige” Near Synonyms in Chinese

The discrimination of near synonyms is one of the most important research areas in lexicology and lexical semantics. Traditional comparative studies of near synonyms are mostly introspection-based and corpus-based, both having disadvantages. Sketch Engine (SkE), a tool designed to automatically obtain grammatical and collocational relations of target words from huge amount of data, helps to avoid subjectivity and solves the problem of utilizing the massive amount of data efficiently. By making use of various functions of Chinese Word Sketch (CWS), this paper distinguishes three Chinese synonymous words mingwang, shengwang and weiwang and finds that shengwang and weiwang are closer in meaning and more similar in grammatical features. Our comprehensive detailed examination of similarities and differences between the three words through CWS will shed light on Chinese lexicography, near synonym discrimination as well as Chinese vocabulary teaching and learning.

Longxing Li, Chu-Ren Huang, Xuefeng Gao
“Zuì(the Most)+Noun” Structure from the Perspective of Construction Grammar

“最[Zuì](the most)+Noun” structure is widely used, and yet runs counter to traditional grammar rules. In order to explore the theoretical basis of this construction, the paper analyzes “最[Zuì](the most)+Noun” structure from the perspective of the construction coercion mechanism. The research shows that “最[Zuì](the most)+Noun” is a non-common construction structure in the intermediate state of schematic construction and entity construction. The fundamental motivation of construction coercion effect lies in interface conflict, including the contradiction and discord among the syntactic, semantic and pragmatic levels. Meanwhile, construction prototype, cognitive context and use frequency exert an influence on the process of construction coercion effect of “最[Zuì](the most)+Noun” structure.

Bin Yang
Buddhist Influence on Chinese Synesthetic Words—A Case Study of 味 Wèi in the Āgamas and Indigenous Chinese Literature

This study examines whether Chinese synesthetic words that are composed of 味 wèi ‘taste’ were influenced by the spread of Buddhism to China through the translation of Buddhist texts into Chinese. Many terms of 味 wèi found in the Chinese Āgamas go beyond the gustatory field, e.g., the taste of material form and the taste of feeling. To confirm that these terms are derived from their source language, the parallel Pāli texts of the Āgamas are used to identify the original terms of wèi. By doing so, this study reveals the essential meanings of wèi in the Āgamas, which are employed as a yardstick to determine whether they influenced the wèi terms found in Chinese literature across different periods of time. The results show that translated Buddhist texts influenced the Chinese language regarding synesthetic words, as exemplified by the case of wèi.

Jiandao Shi, Jianxun Shi
The Semantic Differences and Substitution Restrictions of -Zhe(着) and Zhengzai(正在)

The substitution of -zhe(着) and zhengzai(正在) is influenced by the temporal meaning, aspectual meaning and the verb situation. This paper analyzes the differences of -zhe and zhengzai in detail in these three areas above, and finds out some meanings and usages of them, which rarely be mentioned in the prior literature. The temporal and aspectual meaning of -zhe and zhengzai are quite different. It is proposed that -zhe represents aspectual meaning while zhengzai mainly represents temporal meaning. -Zhe can not only indicate progressive, continuous and frequentative aspect of single event, but also indicate plural event and habitual events. The core meaning of zhengzai is concurrence, which means the process of change occurs simultaneously with the corresponding event. Only in two occasions can -zhe and zhengzai substitute with each other. One is when -zhe and zhengzai represent progressive aspect plus process verb. The other is when they represent frequentative aspect plus instantaneous verb. In the rest of situations -zhe and zhengzai are not mutually replaceable.

Jie Fan, Chong-ming Ding
The Expressive Content of the Ad-Adjectival tai ‘too’ in Mandarin Chinese: Evidence from Large Online Corpora

There are two competing analyses for the semantics of the ad-adjectival modifier tai ‘too’ in Mandarin Chinese. On the homophonous account, there are two tais: one tai is a canonical degree adverb, while the other is a subjective intensifier expressing the speaker’s evaluation. On the unified account, tai is essentially a degree adverb, and the subjective evaluation of tai is attributed to pragmatic inferencing in the pragmatic domain. Both of these accounts face empirical challenges. An alternative analysis in the multidimensional, use-conditional framework is proposed. On the present account, the meanings of tai operate on two dimensions: in the descriptive (truth-conditional) dimension, tai contributes some degree-related semantics, while in the expressive (use-conditional) dimension, tai contributes some expressive meaning expressing the speaker’s subjective (mostly negative) evaluation towards x holding to a high degree. Data from large collections of online commodity reviews (more than 12,000 reviews; over 335,000 characters) provides quantitative support for this multidimensional analysis.

Qiongpeng Luo, Fan Liu
A Survey of the Compatibility of Laizhe with the MARVS Model of Event Types

Previous studies have shown that laizhe can be compatible with some types of verbs, and that it is related to a certain concept of past tense. However, there is no consensus about the exact conditions that allow laizhe to surface in a sentence. This study argues that simply examining verbal semantics is insufficient to identifying the use and functions of laizhe, and instead investigates them from the perspective of MARVS-theory event types. This new approach establishes that predicates compatible with laizhe often express an event from a prior time point that, while recent, is still different from the time of the conversation. Thus, two fundamental attributes are needed to account for the distribution of laizhe: that is, the timing of the expression of (1) temporal gaps, and/or (2) different stage-level properties associated with the subject of a sentence.

Tong Mu, Yu-Yin Hsu
The Comparative Construction and the Evolution of “ ” [guò yú]

“ ”[guò yú](excessively) is a typical adverb in modern Chinese, but in the early history of Chinese language, it was not a word but a cross layer syntactic structure. This shows that in the whole process of Chinese evolution, “ ” [guò yú] as a case has experienced the evolution from non-word to word. This article firstly studies the character of “ ” [guò yú] in different historical periods, depicts the word formation process of “ ” [guò yú](excessively); The second, the paper discusses the motivation and the mechanism of “ ” [guò yú] to be word; After the discussion the motivation and mechanism of “ ” [guò yú](excessively), the paper explores the word formation of “ ” [guò yú](excessively) as a comparison and as an adverb respectively. The article finally claims the comparative construction is the important syntactic soil for “ ” [guò yú] to be a word, and with the procedure there emerged the semantic prosody of emotion in the pragmatic level.

Qi Rao, Hui Li, Mengxiang Wang, Youjie Zheng
On the Condition of X in “fei X bu ke” Constructions
An Analysis Based on the BCC Corpus

Prosody plays an important role in Chinese studies. This can be observed not only in word building but also in constructions’ formation. By analyzing the 1120 results of the “fei X bu ke” construction via the BCC corpus, it is discovered that no matter from the amount of use or from the average frequency of use of the construction, the quadrisyllabic “fei X bu ke” construction takes a dominant position, that is, the monosyllabic X is preferred to form the construction. In order to explicate the reason, this paper attempts to analyze the preference of syllables of X in the “fei X bu ke” construction from the perspective of prosody.

Peicui Zhang, Lei Wang, Wentong Sun
A Corpus-Based Lexical Semantic Study of Mandarin Verbs: Guān and Bì

This is a corpus-based study of the near-synonymous pair Guān 關 and Bì 閉 in Mandarin. We adopt the Module-Attribute Representation of Verbal Semantics (MARVS) to analyze the differences between this pair of words. There are three shared senses: ‘to close the opening of a specific object’ (hereafter ‘close an opening’), ‘to stop a machine by using a switch’ (hereafter ‘stop a machine’), and ‘to close stores or agencies’ (hereafter ‘close a store’). The event of ‘close an opening’ in Guān 關 is a composite event of a dual process-state, while Bì 閉 is a composite event of a completive resultative. The event relating to ‘stop a machine’ was not found for Bì 閉. The events of ‘close a store’ for both Guān 關 and Bì 閉 are simplex events of an inchoative state. However, Guān 關 often appears in collocates such as Guānchǎng 關廠, while Bì 閉 has collocates such as Bìguǎn 閉館. It is also discovered that Guān 關 emphasizes not only the process of the activity but also the state after the activity. However, Bì 閉 emphasizes only the state after the activity.

Meng-Chieh Lin, Siaw-Fong Chung
Information-Seeking Questions and Rhetorical Questions in Emotion Expressions

This paper explores the interaction between emotions and two types of questions, namely information-seeking questions and rhetorical questions. Corpus data shows that rhetorical questions (60.3%) are more frequently used in social media than information-seeking questions (39.7%). Of the two types of questions, approximately 94% of rhetorical questions are used to express emotions, while only 23% of information-seeking questions contain emotions. Given that rhetorical questions do play an important role in emotion expressions, we examine the interaction between rhetorical questions and emotions in terms of question type. Various syntactic structures are proposed for the identification of different emotions. We believe that the linguistic account of different types of rhetorical questions in emotion expressions will paint a fuller picture of the nature of emotion.

Helena Yan Ping Lau, Sophia Yat Mei Lee
Verbal Plurality of Frequency Adverbs in Mandarin Chinese: The Case of Tōngcháng ( )

This paper concerns the adverb, tōngcháng, which can be placed in the topic position, at the beginning of a sentence or in the preverbal position. It will be analyzed as a verbal plurality marker at the occasion level in the framework of Cusic [1]. We will identify the adverb in these two positions as having different semantic properties, namely habituality and iterativity, as defined Bertinetto and Lenci [2]. We will demonstrate the function of tōngcháng when co-occuring with NP subjects with individual or kind denotation.

Daniel Kwang Guan Chan, Hua-Hung Yuan
A Comparative Analysis of Level 1 of the ICCLE and Learners’ Basic Mental Lexicon

This study makes a comparison between the level 1 word list of the ICCLE issued by Hanban and the learners’ Basic Mental Lexicon (BML). The results show that 80 words in the ICCLE are verified, meaning they are also found in the BML, accounting for 53.3% of the 150 words. In terms of parts of speech, nouns have the biggest difference with 30.36%, while verbs, adverbs and phrases have the smallest difference at 3.57%. Classifiers, interjections, and auxiliaries are not found in the BML. In regard to the topics, the degree of attention and emphases in the two lists differ. The ICCLE emphasizes the function words in achieving grammar integrity, whereas students focus on learning words related to their daily life. Finally, the results are discussed and corresponding suggestions are put forward to guide the practice of the international teaching of Chinese.

Caihong Cao, Junping Zhang, Huizhou Zhao, Zhimin Wang
Toward a Unified Semantics for Ū in Ū + Situation in Taiwan Southern Min: A Modal-Aspectual Account

Ū in ū + situation in Taiwan Southern Min has been noted, in the literature, to express multiple readings. In this paper, we argue for a modal-aspectual account for ū. The semantics of ū includes an epistemic modal base, a modal force – necessity, and an affirmative ordering source, which orders possible worlds based on the number of propositions affirmed to be true. Furthermore, ū performs event realization, with complications concerning atelic situations.

Jiun-Shiung Wu, Zhi-Ren Zheng
Right Dislocation in Cantonese: An Emotion-Intensifying Device

This paper explores the emotive function of Cantonese right dislocation based on language samples in a large spoken Cantonese database. Right dislocation is found to be highly associated with emotion expressions. In particular, explicit emotion words always appear in the first part of right dislocation. We argue that right dislocation is used as a focus marking device for highlighting emotion information and intensifying emotions. Therefore, right dislocation can serve as one of the linguistic cues for identifying implicit emotions.

Sophia Yat Mei Lee, Christy Choi Ting Lai
The Constructional Form of Meaning and the Difficulty Level of Definition in Chinese Compound Words

Most of the meanings of Chinese compound words are opaque. How to define the meaning of compound words, there is no good methods. With the predication and the undergraded predication theory, this paper analyzes the semantic construction of compound words and summarizes more than 20 semantic constructional forms. On this basis, we further analyzes the definition difficulty of compound words and divides it into four levels: level 0 means the meaning is easiest to define; level 1 is easier to define; level 2 is difficult to define; and level 3 is the most difficult to define. We hope this paper can contribute to solve the definition problem of Chinese compound words.

Enxu Wang, Yulin Yuan
A Corpus-Based Study on the Semantic Prosody of Quasi-Affix “Zu”

Semantic prosody has been a hot issue in corpus linguistics since it has been put forward in the early 1990s. Some theories suggest that some words in specific context are subject to semantic inflection. However, this paper argues that semantic prosody appears in not only collocation, but also words and their internal affixes. In this paper, we chose the most commonly used quasi-affix “Zu” as the research object. Our study explores the semantic prosody tendencies of “X_zu” and analyzes the cognition mechanism of the words constructed as “X_zu”. The analysis shows that the words of “X_zu” implicating negative meanings appear more frequently than those with positive ones based on a large scale corpus.

Yueming Du, Bihua Wang, Lijiao Yang
Placement Verbs in Chinese and English: Language-Specific Lexicalization Patterns

This study aims to show that language-specific distinctions of lexicalization patterns are crucial to verbal semantic studies by examining the differences of Placement verbs in English and Chinese. It argues that cross-linguistic transference of lexical knowledge should not be made without a detailed analysis of seemingly corresponding verbs in different languages. It also probes into the long-debated issue on how languages conceptualize a common event type with distinct lexical and grammatical realizations. By conducting a contrastive study of the lexicalization patterns of placement verbs in Chinese and English, it is proposed that, while a placing event is conceptually universal in taking the basic semantic components of Agent, Theme, Location, and Path, placement verbs in Chinese and English vary in their lexical origins, level of specificity and morpho-semantic subtypes. It is shown that placement verbs are lexicalized and categorized in language-specific ways that have typological implications. Ultimately, the study sheds new light on class-specific, cross-linguistic comparisons.

Meichun Liu, Jui-Ching Chang
A Study on Lexical Ambiguity in Mandarin Chinese

This study explored lexical ambiguity in Mandarin Chinese. Previous studies have concentrated on the lexical ambiguity resolution of nouns and verbs in the sentential context, while others presented different results for ambiguity resolution in English and Italian. Based on [1] study, which focused on the cognitive processing of strong bias words in the sentential context, the current study proposed two modular and interactive hypotheses and designed two experiments—a production test and a YES-NO test—to distinguish lexical and metaphorical bias found in Mandarin Chinese. This article would present the results of the two experiments, followed by a discussion of the related senses of lexical ambiguity in Mandarin Chinese.

Jia-Fei Hong
The Dynamic Evolution of Common Address Terms in Chinese Based on Word Embedding

Common address terms as an important part of daily communication have been studied with qualitative methods. Inspired by previous methods, this paper proposes a novel method based on word embedding to study the dynamic evolution of common address terms in Chinese. In particular, we first obtained the relevant words of address terms by calculating the distance between word embeddings based on People’s Daily Corpus (1948–2017), then studied the laws of semantic changes of fictive kinship terms and non-kinship terms which belong to common address terms in Chinese through the related words. The results showed that there were significant differences between them.

Yingdi Jiang, Zhiying Liu, Lijiao Yang
Arguments of the Disposal Construction in Hainan Min

This paper studies the disposal construction in Hainan Min, especially focusing on its argument structure. The ɓue construction in Hainan Min corresponds to the ba construction in Mandarin. Unlike the multiple arguments of ba in Mandarin, the object and subject of ɓue have some semantic restrictions. After examining different thematic roles on the arguments of ɓue, the result shows that the animacy feature of ɓue object affects the grammaticality of ɓue sentences. Furthermore, the thematic role of object of ɓue is mostly Patient, and the subject of ɓue is Agent. Following Dowty’s (1991) proto-roles of Patient and Agent, this paper shows that the object of ɓue tends to match the properties for proto-role of Patient and the subject of ɓue tends to match the properties for proto-role of Agent. The arguments of the disposal construction in Hainan Min are strongly associated with the s-selection of the word ɓue.

Hui-chi Lee

Applications of Natural Language Processing

Frontmatter
A Study on Automatic Recognition of Chinese Sentence Pairs Relations Based on CNN

The sentence pairs relations of Chinese discourse play an important role in many natural language processing tasks. Automatic recognition the sentence pairs relations will effectively improve the performance of tasks such as automatic writing and text generation. Among sentence pairs relations, coordination as the double-nucleus relation is the most widely distributed one. In order to automatically identify the double-nucleus relations, this paper combines convolutional neural network and word sequence features, synthetically takes into account the semantic and structural characteristics, and add attention to dig the double-nucleus relations. Experiments show that this method can effectively identify the double-nucleus relations, and the method is portability.

Xuejing Zhang, Xueqiang Lv, Qiang Zhou, Tianke Wei
A Classification Method for Chinese Word Semantic Relations Based on TF-IDF and CNN

The classification of semantic relations between words is an important part of semantic analysis in natural language research. The automatic achievement of this classification is of significance to construction of the Knowledge Graph and Information Retrieval. In NLPCC2017 shared task on Chinese Word Semantic Relations Classification, the semantic relations have been classified into four categories: synonym, antonym, hyponymy and meronym. This paper presents a classification method for Chinese word semantic relations based on TF-IDF and CNN, and uses words’ literal and semantic features. Four new literal features are proposed including whether a word is part of another word and the ratio of their common substring. The extraction of semantic features is a four-step process— training a vector model of words on BaiduBaike Corpus, selecting a set of words most related to a given word from BaiduBaike based on TF-IDF, constructing a vector matrix for the set of related words, and using CNN to get the semantic features of the given word from the vector matrix. The experiment on the NLPCC2017 dataset demonstrates that the F1-score is up to 83.91%, which proves effective to eliminate the influence of the OOV words.

Teng Mao, Yuanyuan Peng, Yuru Jiang, Yangsen Zhang
Research on Question Classification Based on Bi-LSTM

Question classification plays an important role in question answering (QA) system, and its results directly affect the quality of QA. Traditional methods of question classification include rule-based methods and statistical machine learning methods. They need to manually summarize rules or extract the features of questions. The rule definition and feature selection are subjective and one-sided, which are not conducive to fully understand the semantic information of questions. Based on the above problem, this paper proposes a question classification model based on Bi-LSTM. This model combines words, part of speech (POS) and position information of words to generate embedded representation of words, and uses Bi-LSTM to classify questions. The method can efficiently extract the local features of questions and simplify feature engineering. The accuracy of coarse-grained classification on the question classification data set of Harbin Institute of Technology (HIT) has reached 92.38%.

Qian Zhang, Lingling Mu, Kunli Zhang, Hongying Zan, Yadi Li
Optimizing Relation Extraction Based on the Type Tag of Named Entity

Using the named entity’s type tag to construct a unique vector for a class of named entities can solve the problem that named entities are too scattered in the semantic space. In the relation extraction task, the relation of the specified entity pair in each sentence needs to be extracted. However, the general deep learning model cannot reflect the usefulness of the entity pair and its type tag effectively. In order to solve this problem, this paper studies the characteristics of named entity’s type tag, and proposes a word vector optimization relation extraction model and a parallel structure optimization relation extraction model based on the type tags of named entities. Experiments on COAE 2016 task 3 show that the parallel structure optimization model based on the named entity’s type tag improves the relation extraction effect effectively.

Yixing Zhang, Yangsen Zhang, Gaijuan Huang, Zhengbin Guo
Answer Ranking Based on Language Phenomenon Recognition

Answer Ranking is one of the core tasks in Question Answering, which greatly depends on the performance of answer ranking. This paper introduces an approach of answer ranking based on language phenomenon identification, that is, identifying language phenomena between a question and its answer sentence candidates, then computing entailment confidence score between the question and each candidate. Finally, an answer ranking is made according to such scores. This paper also introduces a joint model for both language phenomenon identification and entailment recognition task, in order to avoid error propagation to some extent, and make the two tasks learn to each other for a better overall performance as well. Experimental results show that the joint learning of language phenomenon identification and entailment recognition is an effective way for answer ranking.

Han Ren, Jing Wan, Yafeng Ren, Wenhe Feng
Text Rewriting Pattern Mining Based on Monolingual Alignment

Text rewriting pattern mining was important for stylistic change detection and machine (aided) writing. This paper combined monolingual sentence alignment and monolingual word alignment for text rewriting pattern mining. Edit distance was used to compute sentence similarity for sentence alignment, and a log-linear modification of IBM Model 2 was used for word alignment. We built a rewriting corpus of Jin Yong’s novels, on which quantitative and qualitative experiments were carried out. Rewriting patterns were extracted and classified, including function word usages and some content word usages, which reflected the stylistic shift of the author.

Yuxiang Jia, Lu Wang, Hongying Zan
Sense Group Segmentation for Chinese Second Language Reading Based on Conditional Random Fields

Second language reading is an important task for second language learners, and sense group reading training can quickly improve a learner’s reading speed. In this paper, we consider text containing syntactic information as experimental data. Sense group segmentation is converted to the problem of sequence annotation, and automatic sense group segmentation is completed based on conditional random fields (CRF). This method provides an auxiliary segmentation approach by which information technology can assist international Chinese teaching.

Shuqin Zhu, Jihua Song, Weiming Peng, Jingbo Sun
Multi-perspective Embeddings for Chinese Chunking

Chunking is a crucial step in natural language processing (NLP), which aims to divide a text into syntactically correlated but non-overlapping chunks. The task is typically modeled as a sequence labeling problem. Various machine learning algorithms, such as Conditional Random Fields (CRFs) and Support Vector Machines (SVMs), have been successfully used for this task. However, these state-of-the-art chunking systems largely depend on hand-crafted appropriate features. In this paper, we present a recurrent neural network (RNN) framework based on multi-perspective embeddings for Chinese chunking. This framework takes the character representation, part-of-speech (POS) embeddings and word embeddings as the input features of the RNN layer. On top of the RNN, we use a CRF layer to jointly decode labels for the whole sentence. Experimental results show that various embeddings can improve the performance of the RNN model. Although our model uses these embeddings as the only features, it can be successfully used for Chinese chunking without any feature engineering efforts.

Chen Lyu, Bo Chen, Donghong Ji
Applying Chinese Semantic Collocation Knowledge to Semantic Error Reasoning

The knowledge of Chinese semantic collocation plays an important role in Chinese semantic understanding. Based on the investigation of the existing semantic collocation knowledge base, this article proposes a method of constructing the Chinese semantic collocation knowledge base which combines dependency parsing with HowNet. Based on the analysis of the existing semantic collocation relationship, the method extracts the collocation relation in the corpus by dependency parsing, and then the semantic information of the collocation knowledge base is generalized based on the sememe information in HowNet. A three-layer semantic collocation knowledge base is constructed. At the same time, based on the semantic collocation knowledge base, a Chinese semantic error detection algorithm is designed and implemented, and its effectiveness is verified by experiments. The semantic collocation knowledge base based on dependency parsing and HowNet is less dependent on word distance and can better deal with the long-distance semantic collocation in text.

Yangsen Zhang, Wenjie Wei, Ruoyu Chen, Gaijuan Huang
Resolution of Personal Pronoun Anaphora in Chinese Micro-blog

Anaphora resolution plays an important role in Chinese micro-blog information mining. Based on the linguistic features of personal pronouns in Chinese micro-blog texts, this paper proposes a multi-strategy method for the resolution of personal pronoun anaphora. Firstly, according to part of speech tagging and named entity recognition, personal pronouns and their candidate antecedents are extracted from Chinese micro-blog texts, and the rules for judging the consistency between a personal pronoun and its antecedents in grammar, semantics, gender and singular-plural are established. The antecedents which are inconsistent with the personal pronoun in these four aspects are preliminarily filtered, and Candidate Set 1 of antecedents is obtained. Then, SVM is used to classify the antecedents in Candidate Set 1, and the antecedents which have certain anaphoric relations with the current personal pronoun are selected to construct Candidate Set 2 of antecedents. Finally, by combination of the four linguistic characteristics of grammatical role, co-occurrence relation, reference distance and appositive dependency, the best antecedent is found out from Candidate Set 2 through the priority selection policy. At the same time, a strategy of extending antecedent is provided to solve the problem that the antecedent of the pronoun can’t be found according to the above method. In this paper, the validity of the proposed method is verified by using NLPCC2013 micro-blog corpus as the experimental data set. The experimental results show that the F value of the proposed method is 91.7% in Chinese micro-blog texts.

Yuanyuan Peng, Yangsen Zhang, Shujing Huang, Ruoyu Chen, Jianqing You
Automatic Chinese Nominal Compound Interpretation Based on Deep Neural Networks Combined with Semantic Features

The present paper reports on the results of the automatic interpretation of Chinese nominal compounds using CNN-Highway network model combined with semantic features. Chinese nominal compound interpretation is aimed to identify semantic relations between verbal nouns like “ ” (data acquisition and processing), and “ ” (wastewater treatment). The main idea is to define a set of semantic relations of verbal nouns and use deep neural network classifier with semantic features to automatically assign semantic relations to nominal compounds. Experiment shows that our model achieves 84% F1-score on the test dataset. Convolutional layer plus highway network combined with semantic features architecture can effectively solve the problem of Chinese nominal compound interpretation.

Huayong Li, Yanqiu Shao, Yimeng Li
Learning Term Weight with Long Short-Term Memory for Question Retrieval

Most of previous methods on question retrieval treat all words as equally important. This paper employs a bidirectional long short-term memory network to predict word salience weight in the question, which is hinted by the word’s matching status in the answer. Our method is trained on a large corpus of natural question-answer pairs, and so it requires no human annotation. We conduct experiments on question retrieval in a cQA dataset. The results show that our model outperforms traditional methods by a wide margin.

Xifeng Huang, Xiang Dai
Improved Implementation of Expectation Maximization Algorithm on Graphic Processing Unit

In our previous work, an efficient implementation of Expectation-Maximization (EM) algorithm using CUDA has been proposed for high-speed word alignment. The proposed algorithm can gain a 16.8-fold speedup compared to a multi-thread algorithm and a 234.7-fold speedup compared to a sequential algorithm on a modern graphic processing unit (GPU). In this paper, we try to improve the algorithm to achieve better performance. Through analysis of the previous algorithm, we find that two places in “E” step (expectation calculation) are unreasonably designed. An improved CUDA implementation of the EM algorithm is proposed in this paper. Experimental results show that the new algorithm can improve the speed of expectation calculation by 29.4%.

Si-Yuan Jing, Rui Sun, Chun-Ming Xie, Peng Jin, Yi Liu, Cai-Ming Liu
A Deep Learning Baseline for the Classification of Chinese Word Semantic Relations

The classification of Chinese word semantic relations is a significant research topic in the field of natural language processing. Compared with studies which identify the relation of word-pairs in given texts, the task of context-free lexical relational classification is more challenging due to the lack of context. A common way of solving this problem is to use word embeddings and lexical features to train a classifier. In this paper, we design various combinations of deep learning models and features and propose a joint model based on convolutional neural network and highway network. The joint model has reached a f1 value of 0.58 and outperform all the other deep learning models now available. Furthermore, we design extensive experiments to analyze how the magnitude of the training data influences the model’s performance and whether the distribution of data influences model’s performance.

Yuning Deng, Mengyi Lu, Huayong Li, Pengyuan Liu
Attention-Based Bi-LSTM for Chinese Named Entity Recognition

As an integral part of deep learning, attention mechanism and bi-directional long short-term memory (Bi-LSTM) are widely used in the field of NLP (natural language processing) and their effectiveness has been well recognized. This paper adopts an attention-based Bi-LSTM approach to the question of Chinese NER (named entity recognition). With the use of word2vec, we compile vectorized dictionaries and employ Bi-LSTM models to train text vectors, with which the output eigenvectors of the attention model are multiplied. Finally, softmax is used to classify vectors in order to achieve Chinese NER. In four different configurations, our experiments describe the impact of the domain relevance of Chinese character vectors, phrase vectors, and vectorized datasets on the effectiveness of Chinese NER. The experimental results show that the standard precision (P), recall (R), and F1-score (F1) are 97.51%, 95.33%, and 96.41% respectively.

Kai Zhang, Weiping Ren, Yangsen Zhang

Lexical Resources

Frontmatter
Emotional Knowledge Corpus Construction for Deep Understanding of Text

Emotional knowledge corpus will provide data support for deep understanding of text. However, the problems of incomplete coverage and lacking of emotional skeleton are found from semantics, and from the perspective of pragmatic, the scarcity problem is more serious. To adapt literary works, we expand the existing emotional lexicon, and construct sentimental phrase knowledge corpus and discourse-based sentimental collocation networks from the perspective of semantics. In addition, the rhetoric is an important component of pragmatics, so the emotional knowledge corpus based on the rhetoric is constructed. Finally, with the emotional knowledge corpus, a comprehensive and accurate answer for the reading and appreciating question is obtained.

Xin Chen, Yang Li, Suge Wang, Deyu Li, Wanqing Mu
Construction and Application of Chinese Generation Lexicon for Chinese Irregular Collocation Between Verbs and Nouns

With the expanding scope of current research, irregular collocation processing began to attract scholars’ attention. Based on the Generation Lexicon theory, we build a special word-description system. This knowledge representation system can restore the irregular collocation caused by omitting or metaphor in a clear manner, and then interpret their internal grouping mechanism and general process. The whole process will provide an effective way to deal with the irregular collocation processing problems.

Mengxiang Wang, Qi Rao, Houfeng Wang
Construction of Word Sense Tagging Corpus

The key problem of supervising word sense disambiguation is the lack of a large-scale and high-quality corpus of word sense tagging. Based on the Contemporary Chinese Semantic Dictionary, the Modern Chinese Dictionary (5th Edition) and the Chinese Lexical Semantic Knowledge Base, this paper analyzes the adjectives, nouns and verbs with polysemic in the dictionaries and fuses them together to construct the Zhengzhou University Contemporary Chinese Semantic Dictionary. People’s Daily corpus is selected for annotation, and the word sense tagging corpus with 1.87 million words is constructed. It is expected to provide better data support for natural language processing tasks such as semantic automatic analysis and word sense disambiguation. This paper presents a detailed and rigorous specification of word sense tagging in the process of annotation. In addition, in the new domain corpus, the automatic annotation method achieved excellent performance, which can be used for subsequent reference.

Hongying Zan, JunYi Chen, XiaoYu Cheng, Lingling Mu
Construction of Chinese Semantic Annotation Resource of Connective Structures

This paper aims to study Chinese connective structures for semantic parsing based on the semantic dependency graph we proposed. The resources we built include three parts: first, Chinese connectives ontology that contains 1291 words; second, a large-scale Chinese connective structure semantic annotation resource with 20,000 sentences; third, a pattern-set of Chinese connective structure that facilitates the next step—automatic analysis. We use manually annotation to ensure the accuracy of the resources. We also make an annotation criteria handbook.

Bo Chen, Chen Lyu, Ziqing Ji
A Research on Construction of Knowledge Fusion Network in Chinese and English Languages

In this paper, we created a prototype of Chinese-English lexical and encyclopedic knowledge fusion network CELK-Net. We integrated Chinese Concept Dictionary (CCD) and Chinese-English knowledge graph XLore by an automatic mapping algorithm inspired by Babelnet. The result is a localized bilingual semantic network with large coverage in Chinese language.

Odmaa Byambasuren, Zhifang Sui, Baobao Chang

Corpus Linguistics

Frontmatter
Research on Verb Reduplication Based on the Corpus of International Chinese Textbooks

Verb reduplication is one of the most common and most important ways of reduplication. It occupies a very important position in the field of international Chinese teaching. Using Chinese information processing technology to study vocabulary reduplication in international Chinese teaching, on the one hand, can improve the efficiency of research and make the research of reduplication more systematic and accurate; on the other hand, trying to transform the existing achievements of Chinese information processing into the field of international Chinese teaching is conducive to promoting deep integration in the two fields. This paper first constructs a knowledge base of verb reduplication structural mode by tagging the verb reduplication in the corpus of a certain scale of international Chinese textbooks. Then the characteristics of the verb reduplication in the field of international Chinese teaching are analyzed through the knowledge base. Finally, the automatic recognition of the verb reduplication in the corpus of international Chinese textbooks is studied.

Dongdong Guo, Jihua Song, Weiming Peng, Yinbing Zhang
The Annotation Scheme of Sematic Structure Relations Based on Semantic Dependency Graph Bank

Semantic Dependency Graph is a kind of deep semantic analysis method. In order to annotate every component of a sentence, we proposed an annotation scheme including three different types of semantic relations. They are semantic roles, semantic structure relations and semantic marks. Specifically semantic structure relations deal with the situations in which a sentence contains more than one verbal concepts. This paper will focuses on the three sub-structures of semantic structure relations—Reverse Relations, Nested Relations and Event Relations.

Xinghui Cheng, Yanqiu Shao
Study on Chinese Discourse Semantic Annotation Based on Semantic Dependency Graph

Semantic annotation of discourse is always one of most important tasks in natural language processing (NLP). We proposed a complex semantic representation mechanism for Chinese complex sentences. The semantic relations between sentences and between words can be represented as a semantic dependency graph by recursive directed graph. We studied the basic definition, the types of relations, and dependency direction of semantic dependence, and discussed the formal representation mechanism of semantic dependency from phrase-level, sentence-level and discourse-level. The semantic dependency graph model is characterized by allowing multiple correlations, allowing recursion and nesting, and formal representations are shown in recursive directed graph. The semantic dependency graph can more comprehensively represent the semantic relations between words and between the clauses in the discourse.

Bo Chen, Chen Lyu, Ziqing Ji
A Corpus-Based Approach to Studying Chinese Literal and Fictive Motion Sentences in Fiction

This study aims to determine the distribution of three senses of motion verbs in Mandarin Chinese. A fiction corpus was thus constructed to search for 32 motion verb phrases and collect sentences with these motion verbs. These were classified into three senses: (1) literal meanings; (2) fictive motion involving no actual movement in space; and (3) fictive motion involving metaphorical meanings. The corpus data showed that literal meanings had the highest frequency among the three sense categories, followed by metaphorical meanings and, finally, fictive motions. In addition, it was found that literal and fictive motions functioned as intransitive verbs, often expressing location information. However, metaphorical motions functioned as transitive verbs, often expressing themes. Finally, it was found that the preferential use for some verbs was metaphorical in meaning, while others appear to be used as fictive in meaning. The current study has implications for word sense disambiguation in dealing with multiple meanings of motion verbs in Mandarin Chinese.

Shu-Ping Gong, Zhao-Ying Huang
A Comparative Study on the Coordinate Relation of Chinese Official Documents in Mainland, Hong Kong and Macau

This paper investigates the coordinate relation in Mainland Chinese, Hong Kong Chinese and Macau Chinese, including explicit relations, implicit relations and their parallel items. In order to conduct the comparative study, a corpus including official documents of Mainland China, Hong Kong and Macau has been built. The study shows that the frequency of the explicit coordinate relation appearing in Hong Kong (53.5%) and Macao texts (23.1%) is much higher than that in Mainland China texts (1.5%), and the explicit coordinate relation occurs most frequently in Hong Kong texts. The study also shows that the connective types in Hong Kong (59 types) and Macao texts (21 types) are much more than that in Mainland China texts (11 types). Such differences are probably derived from the English influence on Hong Kong and Macau Chinese, according to the comparative results of the coordinate relation in English and Chinese texts of Hong Kong and Macau.

Wenhe Feng, Haifang Guo, Dengxia Cao, Han Ren
Word Sense Comparison Between DCC and GKB

Dictionary of Contemporary Chinese (DCC) is an authoritative, human-oriented dictionary that defines word senses in natural language. The Contemporary Chinese Grammatical Knowledge Base (GKB) uses detailed syntactic information to describe word senses. The combination of DCC and GKB will be helpful for studying the problem of word sense distinction and word sense disambiguation (WSD). In this paper, we defined the types of alignment of the individual word sense and the types of word sense granularity correspondence. We also designed semi-auto algorithm that used the similarity of definitions and example sentences to compute words sense alignment. The algorithm turned definitions and example sentences into word sequences, and then used sememes of HowNet to compute the similarity of word sequences. The individual word sense alignment and word sense granularity correspondence are constructed based on the similarity. This algorithm was used to construct the individual word sense alignment table and word sense granularity correspondence relation of intersection words of the two dictionaries. The results of word sense granularity correspondence relation analyzing showed that the word sense alignment between DCC and GKB are quite complicated. The completely “equal” sense mainly exists in the monosemy. Generally speaking, the word sense granularity of GKB is larger than that of DCC. There are “no correspondence” word senses in both dictionaries. The integration of semantic meaning and syntactic information of the two dictionaries can be achieved based on sense alignment and will be helpful for solving natural language processing problems such as WSD.

Lingling Mu, Xiaoyu Cheng, Yingjie Han, Hongying Zan
The Study of the Homoatomic Quasi Fixed Phrase

The quasi fixed phrase is situated between the fixed phrase and the free phrase. The research on the quasi fixed phrases mainly focused on a common format of the quasi fixed phrases to explore the use of law and historical evolution. However, there is little research on categories of the quasi fixed phrase, so this paper is based on large corpus and studies the homoatomic quasi fixed phrase, which is a special category of the quasi fixed phrases. The homoatomic quasi fixed phrase is a phrase that has the same fixed components. Through this study, we found that the fixed components of the homoatomic quasi fixed phrases will limit the POS of the replaceable components and the semantic relations between two replaceable components prefer to be similar or opposite. In addition, the frequency of the fixed components is related to the frequency of the phrase itself.

Chengyu Du, Pengyuan Liu
The Attention to Safety Issues from Mainland China and Taiwan

Language safety is an important part of national security and it concerns the development of countries and regions. With the increasingly close exchanges between Mainland China and Taiwan, there is more and more in-depth research on the integration and differences of Chinese between them. However, the existing research seldom systematically analyzes the safety issues across the Taiwan Strait. Focusing on the global issue of “safety”, this paper uses Tagged Chinese Gigaword (second edition) to explore the safety issues of common and respective concerns of the two sides and analyze the characteristics of similarities and differences based on the data of Xinhua News Agency (xin) and Central News Agency (cna) from the 1990s to the beginning of the 21st century. This study can help us better understand the Chinese language use and social phenomenon on both sides of the Taiwan Straits.

Shan Wang, Xinyan Wang
Backmatter
Metadaten
Titel
Chinese Lexical Semantics
herausgegeben von
Jia-Fei Hong
Qi Su
Jiun-Shiung Wu
Copyright-Jahr
2018
Electronic ISBN
978-3-030-04015-4
Print ISBN
978-3-030-04014-7
DOI
https://doi.org/10.1007/978-3-030-04015-4