Skip to main content
main-content

Über dieses Buch

This book constitutes the thoroughly refereed post-workshop proceedings of the 17th Chinese Lexical Semantics Workshop, CLSW 2016, held in Singapore, Singapore, in May 2016.

The 70 regular papers included in this volume were carefully reviewed and selected from 182 submissions. They are organized in topical sections named: lexicon and morphology, the syntax-semantics interface, corpus and resource, natural language processing, case study of lexical semantics, extended study and application.

Inhaltsverzeichnis

Frontmatter

Lexicon and Morphology

Frontmatter

Emotion Lexicon and Its Application: A Study Based on Written Texts

Compared with other language forms, journalese, which is characterized as formal, serious, brief, and standard, in Chinese written texts, carries rich information using few words. Expressing textual meanings, especially personal emotions, by means of appropriate words is quite important in Chinese writing. In teaching Chinese writing, it is easier to compose appropriate texts if teachers and learners interpret emotion words through sense divisions, semantic features, related words, and collocations. This study aimed to build a Chinese emotion lexicon that distinguishes emotion expressions in contexts through the classification of semantic features, as well as provides information via related words and collocations. Moreover, this study applied the characteristics of Chinese emotion words to teach Chinese writing with the aim of improving learners’ writing.

Jia-Fei Hong

A Study on Semantic Word-Formation Rules of Chinese Nouns from the Perspective of Generative Lexicon Theory

——A Case Study of Undirected Disyllable Compounds

This paper mainly applies the qualia structure theory to the study of compound nouns whose lexical meanings cannot be inferred from their morpheme meanings by taking some undirected disyllabic compound nouns as example from the Chinese Semantic Word-formation Database. The paper concludes some specific ways by which morpheme meanings can be integrated with lexical meanings. It is hoped that the research result can be conducive to the further study of Chinese semantic word-formation, compilation of dictionaries on motivation, the lexicographical definition and the vocabulary teaching in the teaching of Chinese as a foreign language.

Xiao Wang, Shiyong Kang, Baorong He, Dian Zhang

Polysemous Words Definition in the Dictionary and Word Sense Annotation

By employing Modern Chinese Dictionary as the semantic system for word sense tagging, this paper analyzes the relationships between polysemous words meanings in dictionary and their senses in real corpus. The paper finds that there are three types of relations. First, word meanings in the dictionary can cover all the word senses in the corpus, with some overlapping and repetitive content of each other. Second, word meanings fail to explain word senses in corpus with a lack of meanings or narrow meanings. Third, word meanings exceed the word senses in corpus. The phenomena of overlapping, narrow scope, absence and redundancy of word meanings in dictionary bring difficulties to the work of word sense tagging on real corpus. With an overall consideration of the dictionary and real corpus, reasons behind the phenomena are found out and attempts and methods have been proposed in hope of compiling better dictionaries and completing a better work of word sense annotation on corpus.

Jing Wang, Zhiying Liu, Honfei Jiang

The Motivated Topicalization and Referentiality of the Compound Noun in Chinese

This paper mainly studies the compound noun in Chinese in terms of forming process and motivation. Instead of OV inversion of the Chinese compound noun in previous studies, we claimed that the core of its forming process is topicalization. This involves the object-fronting movement and the relativization based on the downgraded predication structure. The first component stem of the compound is an object-topic, but not an inverted object. This topicalization may be attributed to the topic-prominent feature of Chinese. Moreover, the motivation of the object-fronting movement is to obtain the referentiality resulting in a process of decategorization.

Jinghan Zeng, Yulin Yuan

An Analysis of Definitions of Poly-meaning Sports’ Entries Based on Generative Lexicon Theory

Applying generative lexicon theory, the paper analyzes several aspects of currently popular Chinese language dictionaries, such as the sense and semantic features of word order and explanation of the multiple-meaning of sports’ entries; in addition, the paper analyzes the advantages and disadvantages of the dictionaries, which are explained by related optimized suggestions.

Haixia Feng, Wanjing Meng

A Finer-Grained Classification of the Morpheme ke in the “ke+X” Adjectives

This paper reanalyzes the morpheme ke in the “ke+X” adjectives and categorizes these adjectives into four classes according to their different degrees of lexicalization. This paper also argues from the perspective of grammaticalization that bound morphemes can also develop into affixes or quasi-affixes.

Fei Qi

Classifiers in Singapore Mandarin Chinese: A Corpus-based Study

While the study of classifiers in Modern Standard Mandarin Chinese has been discussed extensively in the literature, there are also key differences in the classifiers between Singapore Mandarin Chinese and other varieties of Modern Standard Mandarin Chinese, such as Mainland China Mandarin Chinese. Yet, classifiers in Singapore Mandarin Chinese have been minimally explored. With a corpus-based approach, involving both the written and spoken data sampled from Singapore Mandarin Chinese, this study aims to carry out a comprehensive and systematic investigation of the classifiers in Singapore Mandarin Chinese, and thereafter compare the classifiers between the (a) written and spoken data of Singapore Mandarin Chinese, and between (b) Singapore Mandarin Chinese and Mainland China Mandarin Chinese. In addition, this study will also look into the “adjective+classfier” adjectival phrase structure in Singapore Mandarin Chinese. The findings of this study will not only serve as an important reference for future studies of Singapore Mandarin Chinese classifiers, but also contribute to the theoretical discussion on classifiers in general and language variation and change.

Xuelian Yuan, Jingxia Lin

The “v + n” compound nouns in Chinese: from the perspective of Generative Lexicon Theory

In Chinese, a verbal morpheme and a nominal morpheme can form a variety of compound words, such as compound verbs, compound nouns, compound adjectives etc. Among them, the ‘v + n’ compound nouns are not only abundant in quantity, but also rich in meaning. The verbal morphemes show the strong capacity of word-formation. In this paper, we selected nearly 2800 disyllabic “v + n” compound nouns from the “Modern Chinese Dictionary (Sixth Edition)”as the research object and studied their main semantic mode and semantic combination mechanism. We found that the functions of the verbal morphemes were downgraded from predication into identification, and their qualia structure exhibited complex and heterogeneous diversity. The referential transparency of this type of compound nouns is very high, but the semantics is not always simple additive relation and the semantic transparency of part of this type of compound nouns is not high.

Yaxi Jin

A Study on the Referents of Chinese Imposters

This paper is concerned with the language phenomena in Chinese that some nominal phrases as DPs with the default third person can refer to the speaker with the first person or the addressee with the second person. In English, such nominal phrases are called imposters (Collins & Postal 2012), which refer to the speaker or the addressee and keep the agreement with verbs in third person form when existing in the position of subject. However, unlike English, Chinese is lack of morphological forms to show the subject-verb agreement, and there is no grammatical person form, so the phenomenon of imposter is more popular in Chinese, especially in classic Chinese expressions. The paper, with many instances of such use, intends to interpret the reason why the Chinese nominal phrase in some context have the non-third person interpretation from both syntactic and semantic perspectives.

Fengcun An, Lei Zhao, Gong Cheng

Quantitative Relation of the Length and the Count of Senses for Chinese Polysyllabic Words

Polysyllabic (i.e. having two or more syllables) words account for a major part in the modern Chinese vocabulary. In Chinese, polysyllabic words are more than a collection of syllables; they are a combination of meaningful morphemes and thus a profound manifestation of the phonetic, semantic and syntactic laws in Chinese language. This study focuses on the polysyllabic words in the Comprehensive Dictionary of Chinese Words and examines the quantitative relation between their number of senses and word length. The data indicate that when the word length increases, the number of senses decreases, and monosemous words are in majority. The negative correlation between number of senses and word length of Chinese polysyllabic words is due to the restriction of word meaning caused by involved morphemic meanings. This reveals a significant difference between Chinese and typical Western languages from a quantitative perspective.

Wei Huangfu, Bing Qiu

Polarity of Chinese Emotion Words: The Construction of a Polarity Database Based on Singapore Chinese Speakers

In this paper, we report a study of the polarity of Chinese emotion words. We conducted a large-scale polarity rating experiment with laymen speakers, and compiled a database of polarity ratings for Chinese emotion words based on these experimental results. The polarity ratings were also compared with previously reported polarity ratings, as well as related emotion word ratings such as emotion category and emotional intensity. The participants in the current study were all Singapore Chinese speakers, but the methodology and the current results will serve as an important reference for future research on sentiment analysis and emotion language in Chinese in a broader context.

Chin Loong Ng, Jingxia Lin, Yao Yao

A Structural and Prosodic Analysis of Trisyllabic New Words

The majority of Chinese new words are trisyllabic, which violates the disyllabic norm of Chinese vocabulary. This paper explores reasons underlying this phenomenon by studying the structure and prosody of the trisyllabic new words.

Jianfei Luo

On the Notion of “Syntactic Word”: its Origin and Evolution

This paper aims to review the previous studies on “syntactic word”, with the purpose of analyzing their strengths and weaknesses, and clarifying the advantages and significance of the diversified revisions, especially the one amended by Feng [5–10].

Huibin Zhuang, Baopeng Ma, Shaoshuai Shen, Peicui Zhang

The Syntax-Semantics Interface

Frontmatter

The formation of the “NP1exp+V+NP2” construction in Chinese

In this paper, I discuss the formation of the “NP1exp+V+NP2” construction from the perspective of historical development and formal syntax. I argue that the “NP1exp+V+NP2” construction comes from a morphological or lexical causative construction and has a competitive relationship with the causative constructions in the process of the historical development of Chinese. The “V+NPtheme” constructions are actually two-place unaccusative sentences with omitted causer or experiencer. Only the “NPtheme+V” construction is the real one-place unaccusative construction in which the NPtheme generates in the object position in deep structure and moves to the subject position in surface structure because of case requirement.

Mengbin Liu

Chinese Word Order Analysis Based on Binary Dependency Relationships

Based on binary dependency relationships of notional words, this paper presents a calculating method of word order and a representation scheme as well. Calculations of sentence examples show that the binary dependency relationship product of SOV, OSV etc. may be turned into the semantic structure of SVO. Therefore, SVO is not only the simplest but also the complete structure of Chinese sentences. Also, modifier-core structure prevents the formation of a sentence, which intends to express more content. From this perspective, SVO order and the disharmony between VO and PPV, are the same law in two opposite ways.

Yan He, Yang Liu

From Form to Meaning and to Concept

We propose a way of connecting the form of a sentence to its meaning and then to its concept. Meaning is the pure, authentic, and unadorned semantic interpretation of form, and concept is meaning adorned by social needs and cultural values in a particular speech community. To frame this view, we borrow the Interface Theory proposed by Chomsky. In that theory, a grammar has three major components: (1) The Computational System, (2) The Conceptual System, and (3) The Sensory-motor System. For a sentence x, the computational system creates a syntactic form f(x), gives it a semantic content, sem(x), then sends it to the conceptual system, to adorn it as a concept, con(x), which the sensory-motor system then processes as a phonetic form, p(x). With this procedure of interface, we can derive a language-specific sentence from its universal origin.

Hsin-I Hsieh

A Corpus-Based Analysis of Syntactic-Semantic Relations between Adjectival Objects and Nouns in Mandarin Chinese

In this study, we discuss the syntactic and semantic relations between adjectival objects of VA verb-object constructions and specific nouns in Mandarin Chinese. Firstly, it shows that VA verb-object constructions can function as predicates and modifiers in sentences. It also demonstrates that as adjectives represent attributes of nouns, the adjectival objects of VA verb-object constructions have attribute-entity semantic relations with specific nouns. According to the syntactic functions and attribute-entity semantic relations, we point out that there are two main kinds of syntactic relations which are subject-predicate and modifier-head relations between adjectival objects and specific nouns. Moreover, we take adjectival objects as metonymic expressions. At last, we argue that adjectives which function as objects have nominal meanings in semantics, while they are not nominalized in syntax.

Lin Li, Pengyuan Liu

A Corpus-based Study on the Structure of V in “进行[jin4xing2]+V” in Modern Chinese

This paper argues for the necessity of study on the structure of “V” in “进行[jin4xing2]+V” based on the analysis of the previous research, and gives a full description and analysis of it based on the large-scale corpus. In the choice of verb of predicate-object structure, only verbs that have external object in semantic are allowed to enter “进行[jin4xing2]+V” structure. In the syllable, only double-syllable or tri-syllable verbs can be the verb object of “进行[jin4xing2]”, and those tri-syllable verbs are mainly composed in additional morphology, with affixes like”化[hua4]、热[re4]、再[zai4]”and so on.

Yujie Liu, Pengyuan Liu

A Study on Causative meaning of Verb-Resultative Construction in Mandarin Chinese Based on the Generative Lexicon Theory

Based on the Generative Lexicon Theory, this paper analyses the co-compositon and the qualia projections of the predicate verb denoting an action and the complement verb describing the result in the Verb-Resultative Construction. The paper reveals that the co-compostion of the qualia structures results in a derived causative sense of the VP, where the AGENTIVE role of the action verb matches that of the complement verb, and the FORMAL role of the action verb matches that of the complement verb. In consequence, under the qualia unification (QS α (β) = QS α ∩ QS β), the FORMAL role of the complement verb is shared with that of the VP, and the AGENTIVE role of the action verb is shared with that of the VP, resulting in a derived causative and aspectually telic interpretation.

Yiqiao Xia, Daqin Li

Lexical Semantic Constraints on the Syntactic Realization of Semantic Role Possessor

When mapped to syntactic elements, semantic roles are constrained by lexical semantic categories of words that assume the semantic roles. Based on a large-scale annotated corpus, this paper takes the semantic role “Possessor” as an example and analyzes its syntactic-semantic pattern, the influence of lexical semantic category upon its mapping to syntactic elements, the collocations of the semantic categories of noun-core structures and verb-core structures. The research initially reveals the constraints of lexical semantic categories upon the mapping of the semantic role “Possessor” to syntactic elements.

Shiyong Kang, Minghai Zhou, Qianqian Zhang

Research on Collocation Extraction Based on Syntactic and Semantic Dependency Analysis

In this paper we present a kind of collocation extraction method based on automatic semantic analysis. On the basis of semantic dependency analysis, we use co-occurrence frequency and mutual information to extract collocation. Compared with the accuracy of collocation extraction based on syntactic dependency parsing, the one based on semantic dependency analysis can achieve 77.7% accuracy.

Shijun Liu, Yanqiu Shao, Lijuan Zheng, Yu Ding

Comitative Relational Nouns and Relativization of Comitative Case in Mandarin

This paper discusses unmarked comitative relative clauses activated by comitative relational nouns. Different from canonical relative clauses, these clauses are restricted syntactically and semantically. This paper points out that comitative relative clauses are directly generated by the conceptual structure of comitative relational nouns rather than by transformation, since the semantic structures of these nouns contain comitative prepositions and their arguments. Therefore, comitative relative clauses are presentations of the downgraded predications of comitative relational nouns at the syntactic level. Besides, a questionnaire survey is conducted in this research to investigate the grammaticality of comitative relative clauses, through which, we find that these constructions, which are triggered by comitative relational nouns, are less grammatical than canonical relative clauses, which are generated by syntactic transformation.

Xin Kou

A Corpus-Based Study on Pseudo-ditransitive Verbs in Mandarin Chinese

Based on the search in the Sinica Corpus, in this paper I present a novel syntactic observation for pseudo-ditransitive verbs in Mandarin Chinese. That is, the internal argument order of certain pseudo-ditransitive verbs in the previous proposal is not complete. The internal arguments of certain pseudo-ditransitive verbs in fact can have two different orders, which is reminiscent of locative alternation observed in English. Consequently, in addition to the proposed lexical categorization in the literature, it is suggested that Mandarin pseudo-ditransitive verbs should also be categorized based on their genuine syntactic behaviors in order to get a full picture of these verbs in Mandarin verbal categorization.

Pei-Jung Kuo

Corpus and Resource

Frontmatter

Construction of the Dynamic Word Structural Mode Knowledge Base for the International Chinese Teaching

The different definition about “word” in international Chinese teaching and Chinese information processing leads to many achievements in the field of Chinese information processing cannot directly serve international Chinese teaching. In this paper, by studying the dynamic words which are frequently appeared in the international Chinese teaching materials we design a set of symbols which describe the structural relationship between the internal components of dynamic words, put forward a method to describe the dynamic word structural mode. Finally, the dynamic word structural mode knowledge base is created simultaneously when building the international Chinese teaching materials Treebank. The knowledge base provides a resource for research and information processing in the field of international Chinese teaching.

Dongdong Guo, Shuqin Zhu, Weiming Peng, Jihua Song, Yinbing Zhang

EVALution-MAN 2.0: Expand the Evaluation Dataset for Vector Space Models

We introduce EVALution 2.0, a simplified Mandarin dataset for the evaluation of Vector Space Models. We take a psycholinguistics-based methodology through the use of a verbal association task, which differs from previous datasets that use corpus and ontology to construct word relation pairs. Semantic neighbors were created for 100 target words and surprisingly, to which participants produced 1129 word relation pairs. In a separate agreement-rating task, only 62 pairs showed were rejected. The methodology has proven to be a way to expand the existing resources quickly while maintaining a high level of quality.

Hongchao Liu, Chu-Ren Huang

Constructing the Verb Sub-library in the Electronic Dictionary of NIGEN DABHVR ASAR

The Mongolian Classic literature NIGEN DABHVR ASAR is one of the main representative works by famous Mongolian writer Yinzhannashi. The Electronic Dictionary of NIGEN DABHVR ASAR is greatly useful to Mongolian researchers and learners. In this paper, 33634 verbs in the text of NIGEN DABHVR ASAR were analyzed and processed from the perspective of the Mongolian information processing and the compilation methods and the content of the Electronic Dictionary of NIGEN DABHVR ASAR are mainly introduced. It described the verb entries selection, phonetic notation, part of speech information, syllable structure, interpretation, frequency rank, Morphological changes and examples. Then, it covered the significance of constructing the verb section in the Electronic Dictionary of NIGEN DABHVR ASAR and a brief talk about the subsequent tasks related to the dictionary construction.

Dabhurbayar, Xiaojuan

The Construction Scheme of a Graded Spoken Interaction Corpus for Mandarin Chinese

This paper introduces the construction scheme of a graded spoken interaction corpus for Mandarin Chinese. Material selection and collection principals, corpus annotation and assistant software development are explained. This paper also points out the important and difficult issues in the construction process. The corpus proposed in this paper consists of 1 million words (transcribed from 1.5 TB data), and it is graded and tagged with interaction annotation. This corpus can provide naturally occurring interactions with transcriptions and annotations for researchers, by which the quantitative analysis of SI can be realized. In addition, exemplars grading according to Conversation Analysis (CA) is also provided in the corpus for the reference of other researchers.

Yuelong Wang

Study on Modality Annotation Framework of Modern Chinese

Modality is the speaker’s subject idea processed and expressed for sentence objective express system. Modal meaning is important for deep understanding of sentence semantics. In this paper, a Chinese modality annotation framework aimed for deep semantic understanding is preliminarily practiced, which constructed a modal meaning classification system on the basis of existing research results, built the modal operator dictionary, established rules for annotation, and annotated modal operators of a sentence which have been tagged basic proposition arguments.

Kunli Zhang, Lingling Mu, Hongying Zan, Yingjie Han, Zhifang Sui

The Construction of Sentence-Based Diagrammatic Treebank

The Sentence-based Diagrammatic Treebank is built under the annotation scheme of diagrammatic Chinese syntactic analysis. This scheme implements Sentence Component Analysis (SCA) as its main idea and highlights the importance of sentence pattern structure. This paper first reviews the researches on diagrammatic Chinese syntactic analysis, then illustrates the typical characteristics of this Treebank around sentence pattern structure, and then introduces the engineering implementation of this Treebank from the perspective of practice.

Tianbao Song, Weiming Peng, Jihua Song, Dongdong Guo, Jing He

Dictionary as Corpus: A Study Case to Reveal the Statistical Trend of Polysyllablization of Chinese Vocabulary

From a macro-level viewpoint, a dictionary is indeed a lexical corpus. 漢語大詞典(Han Yu Da Ci Dian, or literally the Comprehensive Dictionary of Chinese Words), as a summary of ancient and modern vocabulary, provides abundant information with respect to its construction of entry. Based on the classification of entry’s emerging year in it, the information of new-created words in different periods is obtained to analyze the development of Chinese vocabulary system. Thus the quantitative trend of polysyllablization in different periods is revealed, which demonstrate a novel perspective of Chinese historical lexicology as a study case. The lexical evidence for the periodization of the historical Chinese language is also discussed.

Bing Qiu

Yet Another Resource to Sketch Word Behavior in Chinese Variation

Most corpus-based lexical studies require considerable efforts in manually annotating grammatical relations in order to find the collocations of the target word in corpus data. In this paper, we claim that the current technique of natural language processing can facilitate lexical research by automating the annotation of these relations. We exploit the above technique and report an online open-resource for the comparison of lexical behaviors in cross-strait Chinese variations. The proposed resource is evaluated by juxtaposing the results with previous lexical research based on the same corpus data. The results show that our resource may provide more comprehensive and fine-grained grammatical collocation candidates in the case study.

Meng-Hsien Shih, Shu-Kai Hsieh

Natural Language Processing

Frontmatter

Integrating Character Representations into Chinese Word Embedding

In this paper we propose a novel word representation for Chinese based on a state-of-the-art word embedding approach. Our main contribution is to integrate distributional representations of Chinese characters into the word embedding. Recent related work on European languages has demonstrated that information from inflectional morphology can reduce the problem of sparse data and improve word representations. Chinese has very little inflectional morphology, but there is potential for incorporating character-level information. Chinese characters are drawn from a fixed set – with just under four thousand in common usage – but a major problem with using characters is their ambiguity. In order to address this problem, we disambiguate the characters according to groupings in a semantic hierarchy. Coupling our character embeddings with word embeddings, we observe improved performance on the tasks of finding synonyms and rating word similarity compared to a model using word embeddings alone, especially for low frequency words.

Xingyuan Chen, Peng Jin, Diana McCarthy, John Carroll

Parallel Corpus-based Bilingual Co-training for Relation Classification

This paper proposes a bilingual co-training paradigm for relation classification based on an instance-level parallel corpus aligned between Chinese and English on entity and relation level. Given a small-scale seed set and a large-scale unlabeled corpus, reliable instances induced from the Chinese classifier are iteratively augmented to the English classifier, and vice versa, in order to enhance both classifiers. Experimental results on the Chinese and English parallel corpus show that bilingual co-training can improve relation classification in both languages, especially in English. Moreover, as the size of the seed set and of the iteration batch increases, bilingual co-training can always make consistent improvements, demonstrating its better robustness.

Haotian Hui, Yanqun Li, Longhua Qian, Guodong Zhou

Named Entity Recognition for Chinese Novels in the Ming-Qing Dynasties

This paper presents a Named Entity Recognition (NER) system for Chinese classic novels in the Ming and Qing dynasties using the Conditional Random Fields (CRFs) method. An annotated corpus of four influential vernacular novels produced during this period is used as both training and testing data. In the experiment, three novels are used as training data and one novel is used as the testing data. Three sets of features are proposed for the CRFs model: (1) baseline feature set, that is, word/POS and bigram for different window sizes, (2) dependency head and dependency relationship, and (3) Wikipedia categories. The F-measures for these four books range from 67% to 80%. Experiments show that using the dependency head and relationship as well as Wikipedia categories can improve the performance of the NER system. Compared with the second feature set, the third one can produce greater improvement.

Yunfei Long, Dan Xiong, Qin Lu, Minglei Li, Chu-Ren Huang

Chinese Text Proofreading Model of Integration of Error Detection and Error Correction

In text proofreading area, the error detection and error correction are reversible process to each other. In this paper, considering them in the same angle, we put forward an idea of “scattered string concentration”, and combine the bidirectional Pinyin knowledge bases to improve the accurate of positioning error. We fill the gaps between error detection and error correction fundamentally, so as to achieve the effective integration of error detection and error correction. In this paper, the error detection model of today’s text is optimized and the sorting algorithm of the error correction is discussed. The experimental results show that the recall rate of this method is 95.37%, the accuracy rate is 83%, and the method has good application prospect.

Yizhuo Sun, Yangsen Zhang, Yanhua Zhang

A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM

Most of ancient Chinese texts have no punctuations or segmentation of sentences. Recent researches on automatic ancient Chinese sentence segmentation usually resorted to sequence labelling models and utilized small data sets. In this paper, we propose a sentence segmentation method for ancient Chinese texts based on neural network language models. Experiments on large-scale corpora indicate that our method is effective and achieves a comparable result to the traditional CRF model. Implementing sentence length penalty, using larger Simplified Chinese corpora, or dividing corpora by ages can further improve performance of our model.

Boli Wang, Xiaodong Shi, Zhixing Tan, Yidong Chen, Weili Wang

A Stack LSTM Transition-Based Dependency Parser with Context Enhancement and K-best Decoding

Transition-based parsing is useful for many NLP tasks. For improving the parsing accuracy, this paper proposes the following two enhancements based on a transition-based dependency parser with stack long short-term memory: using the context of a word in a sentence, and applying K-best decoding to expand the searching space. The experimental results show that the unlabeled and labeled attachment accuracies of our parser improve 0.70% and 0.87% over those of the baseline parser for English respectively, and are 0.82% and 0.86% higher than those of the baseline parser for Chinese respectively.

Fuxiang Wu, Minghui Dong, Zhengchen Zhang, Fugen Zhou

An Analysis of the Relation Between Similarity Positions and Attributes of Concepts by Distance Geometry

In this paper, we discussed the relation between the attributes of a concept and its similarity position with other concepts. We constructed a function to map a similarity position of a concept to its coordinates in a geometry space. The coordinates can be further mapped to the attributes through another function. We constructed the functions by distance geography methods and proved that such functions do exist under some conditions. This work will benefit attribute retrieval tasks.

Hui Liu, Jianyong Duan

Computation of Word Similarity Based on the Information Content of Sememes and PageRank Algorithm

Based on sememe structure of HowNet and PageRank algorithm, this article proposes a method to compute word similarity. Using depth information of HowNet as information content of sememes and considering sememe hyponymy, this method builds a transfer matrix and computes sememe vector with PageRank algorithm to obtain sememe similarity. Thus, the word similarity can be calculated by the sememe similarity. This method is tested on several groups of typical Chinese words and word sense classification of nouns in Contemporary Chinese Semantic Dictionary (CSD). The results show that the word similarity computed in this way quite conforms with the facts. It also shows a more accurate result in word sense classification of nouns in the CSD, reaching 71.9% consistency with the judgment of human.

Hao Li, Lingling Mu, Hongying Zan

Joint Event Co-reference Resolution and Temporal Relation Identification

Event co-reference and event temporal relations are two important types of event relations, which are widely used in many NLP applications, such as information extraction, text summarization, question answering system, etc. Event temporal relations provide much useful semantic and discourse information for more accurate co-reference resolution. However, traditional event co-reference resolution neglects those event temporal relations, leading to inconsistent resolutions in temporal logic. This paper proposes a joint model for event co-reference resolution and event temporal relation identification in Chinese event corpus. The experimental results on the ACE corpus show that our model can improve the performance of the above two tasks.

Jiayue Teng, Peifeng Li, Qiaoming Zhu, Weiyi Ge

Automatic Classification of Classical Chinese Lyrics and Songs

We proposed a text classification model for classical Chinese lyrics and songs in this paper. 596 Song lyrics and Yuan songs are represented in vectors with Vector Space Model. The classifiers are based on Naive Bayes and Support Vector Machine algorithms, which both performed well in the experiment (with SVM, the F-measure up to 92.6%). In addition, we examined the performance of the classifiers in sorting atypical texts, with lyrics and songs in Ming dynasty as the test set. Although the F-measure drops to 79.2%, it still demonstrates stylistic changes in lyrics in Ming dynasty.

Yuchen Zhu, Tianqi Qi, Xinning Dong

Clustering of News Topics Integrating the Relationship among News Elements

To make full use of news document structure and the relation among different news documents, a news topic clustering method is proposed of using the relation among document elements. First, the word characteristic weight was calculated by the TF-IDF method based on word frequency statistics to generate document space vector and news document similarity was calculated using text similarity measurement algorithm to obtain the initial news document similarity matrix. Then, the initial similarity matrix was modified with the relation among different news elements as semi-supervised constraint information, the clustering of news documents was realized using Affinity Propagation algorithm, and news topics were extracted from news clusters. As a result, the construction of news topic model was finished. At last, the contrast experiments were performed on manually-annotated news corpus. The results show that the Affinity Propagation clustering methods integrating the relation among document elements can achieve a better effect than those without constraint information.

Jiaying Hou, Zhengtao Yu, Xudong Hong, Feng Li

Case Study of Lexical Semantics

Frontmatter

Manifestation of Certainty in Semantics:

The Case of Yídìng, Kěndìng and Dǔdìng in Mandarin Chinese

This paper examines three modal adverbials in Mandarin Chinese: yídìng, kěndìng and dǔdìng. These three lexical entries can all express strong epistemic necessity or intensification. However, denoting intensification, kěndìng and dǔdìng have an additional semantic requirement: they both require that there be at least one alternative to the proposition they present. They are different in that the speaker uses kěndìng to ascertain the truth of a proposition it takes, although all the alternatives are potentially true, while dǔdìng is used to assert the speaker’s certainty that only the proposition dǔdìng takes is true. Concerning certainty, two cases are demonstrated here. For yídìng, certainty is expressed implicitly, because certainty manifests itself through the speaker’s attitude. However, for kěndìng and dǔdìng, certainty is revealed explicitly, since (part of) the semantics of these two lexical items is certainty.

Jiun-Shiung Wu

On the Intersubjectivity of Sentence Final Particle-Ne

–A Case Study between Degree Modifiers and SFP-Ne

This research aims at identifying the intersubjectivity information encoded in SFP-ne in Mandarin. Built upon the theoretical framework of subjectivity and intersubjectivity, we find degree adverbs ke, zhen and tai are in complementary distribution when co-occurring with SFPs based on large-scale corpus data. We summarize the semantic properties and syntactic characterization between ke and SFP-ne via analyzing its stylistic function, it semantic relevance to the lead and answer sentences. We ultimately propose the interactional functions revealed by SFP-ne including reminding, informing, correcting and refuting.

Yifan He

When Degree Meets Evaluativity: A Multidimensional Semantics for the Ad-adjectival Modifier hăo ‘well’ in Mandarin Chinese

This study provides a semantic account of the Mandarin ad-adjectival modifier hăo ‘(lit.) well’, which is a member of a family of adverbs that yield intensification, display subjectivity and parallel with speaker-oriented intensifiers in resisting nonveridical contexts. We argue that hăo is a mixed expressive item. On the one hand, it is like canonical degree adverbs, meaning some individual x holding to a high degree with respect to some gradable property. On the other, it is a conventional implicature (CI) trigger that contributes an expressive content, expressing the speaker’s strong emotion (surprise, approval, etc.) towards x holding to high degree. We propose a formal analysis of hăo by incorporating the degree semantics into the multidimensional logic.

Qiongpeng Luo, Yuan Wang

Verbal Plurality of Frequency Adverbs in Mandarin Chinese: The case of cháng and chángcháng

This paper looks at the similarities and differences between two frequency adverbs, cháng and chángcháng, which have been often confused with each other in the Sinologist literature. Since both denote a multiplicity of ‘occasions’, they can be treated as markers of verbal plurality operating at the occasion-level in the sense of Cusic (1981). They present differences in two aspects: (i) the sentence types that they can occur in (respectively, characterizing vs. particular sentences) and (ii) their semantic functions (respectively, the expression of habituality vs. iterativity).

Hua-Hung Yuan, Daniel Kwang Guan Chan

On the distributive nature of adverbial quan in Mandarin Chinese

The Chinese adverbial quan is analysed as an event predicate modifier that can force a distributive reading on a sentence by targetting a nominal that expresses a plural participant in the event, and encapsulating the distributive function in the $$\theta $$-role associated with such a participant. This solution enables us to model the speakers’ intuition of an ‘overall evaluation’ associated with quan.

Lucia M. Tovena, Yan Li

The Use of “Ná” and “Hold” Verbs by Bilingual PreSchool Children in Singapore

Verb forms a major category of lexicon in any language. As our daily actions consist of mostly hand actions, it is important for bilingual speakers to be able to use different specific hand action verbs to describe the corresponding actions. This study focuses on one particular type of hand action, namely, holding actions, to examine Singaporean bilingual preschoolers’ competencies in both English and Mandarin through their usage of holding verbs.Thirty bilingual children between the ages of 3 and 6 were recruited for the study. In the experiment of the study, we used the standard PPVT-IV pictures and self-selected pictures of different holding actions as stimulus and asked the children to describe the actions in English and Chinese. The results show that most of the children used the Mandarin word “ná” (拿) and the English word “hold” for almost all the scenarios of holding in the experiment.

Ying Tong Yap, Helena Hong Gao

The Polysemy of the Chinese Action Verb “Dǎ” and Its Implications in Child Language Acquisition

The Chinese verb “dǎ” is a polysemous and frequently used verb. Studies have shown that it is one of the earliest verbs acquired by monolingual children by the age of five year old, they can use most of the commonly used senses in their daily life. But whether it is an easy task for bilingual children to acquire and use the verb in different contexts is unknown. Our study investigated the usage pattern of “dǎ” by 30 Chinese-English bilingual preschool children in Singapore. Visual stimuli depicting “dǎ” actions were used to elicit descriptions from the participants. The results reveal that the meaning representations of “dǎ” in the semantic domains such as “social interaction” and “physical punishment” are most commonly used by the children while the meaning representations of “dǎ” in the semantic domains such as “fastening” and “possession” are the least used by the children. This paper will discuss the factors that affect the children’s use of the polysemous verb.

Hui Er Sak, Helena Hong Gao

A Metaphorical and Cognitive Study on Idioms with “Ru”

This paper explores the structural characteristics of idioms with “如(Ru)”, focuses on the similarities and differences between the format “1+Ru+2” and “2+Ru+1”, and summarizes the selection restriction and metaphorical mapping between tenor and vehicle, through the analysis of word “Ru” in different position. This study shows that although the word “Ru” is in the idiom of the four kinds of positions, among which the number of the third positions is the largest. The “Ru” idioms have different mapping regularities, that is, the tenor is not the abstract and unfamiliar things, but the choice of human body, the body parts. The concept of the five elements such as “gold, wood, water, fire, earth” usually is selected as sources in the format “2+Ru+1”,as well as the living things familiar with the ancient ancestors as a metaphor.

Zhimin Wang, Lei Wang, Shiwen Yu

Semantic Development: From Hand Act to Speech Act — Metaphors and Different Image Schemas

For the speech act verbs “fureru” in Japanese and “ti2” in Chinese, the function of speaking is derived from the meaning of hand act under the action of metaphor. However, this semantic development is based on different image schemas. “Fureru” is based on “far to near” schema, while “ti2” is based on “low to high” schema, container schema, and focus schema. There also exist similar semantic development paths in English. This study is designed to provide an example for the research of semantic development based on the same cognitive mechanism and different image schemas, thus deepening the understanding of semantic development rules about the verbs of hand act.

Minli Zhou

Computer Semantic Identification of the Adverb “Dou”

Semantic identification of adverbs in Modern Chinese is regarded as an exploration and attempt in improving semantic analysis on sentence level. This paper mainly focuses on two problems. Firstly, how to make computer identify the meanings of adverb “Dou”. Secondly, how to make computer identify the semantic orientation of adverb “Dou”. Based on the real large-scale corpus, the authors investigate the meanings of adverb “Dou” which are restricted by various factors and formal features, summarize the formal rules in a systematic way, and build a flow chart that computers are capable of identifying. We hope that this research will contribute to computer understanding and generating in the sentence that contains the adverb “Dou”.

Yong Lu, Pengyuan Liu

“净” (jìng) and “都” (doū), which one has exclusiveness?

This paper argues that “jìng” has exclusiveness, while the basic exclusive meaning does not exist in “doū”. The exclusiveness of “doū” is deduced from the collective meaning. Thus, we can say that “doū” triggers the exclusive meaning or “doū” sentence has exclusiveness. In some context, the exclusive meaning of “doū” sentence is defeasible. Moreover, this paper points out that besides the quantifying direction of “jìng” and “doū”, contrastive focus and stress also play an important role in the generation of the exclusive meaning.

Qiang Li

Constructional Coercion in the Evolution of Contemporary Chinese Morphemes—Taking “Fen”(粉) as an Example

In contemporary Chinese, morphemes are changing with the development of vocabulary. During the evolution of morphemes, constructional coercion has played an important role. It is a mechanism to adjust semantic conflict and type mismatch, and will sometimes cause the change of meaning in the component of the construction. Chinese morpheme “fen”(Chinese character “粉”)has evolved from a meaningless transliterated syllable to a nominal morpheme and then a verb to express events, which is a semantical evolution chain from specific to abstract, and from special indication to general reference. Constructional coercion has played a vital role in the process. This paper takes the new Chinese morpheme “fen(粉)” as an example in order to analysis and explain the reason of its evolution using the theory of constructional coercion on the bases of a clear semantical evolutional trace of “fen(粉)”.

Xiaoping Zhang

Applying Chinese Word Sketch Engine to Distinguish Commonly Confused Words

Recently, as the popularizing of Chinese language learning on a worldwide scale and the rapid expansion of Confucius Institutes, teaching Chinese as an international language (TCIL) develops rapidly all over the world. As a result, teaching commonly confused words from the perspective of non-native Chinese learners has become a necessary concern. Traditional commonly confused words discrimination stands much more on authoritative Chinese dictionaries for senses definition or meanings of a word. However, some scholars in and out of China now promote a new technology for collocation extraction in Chinese which is based on corpora. The latest concept of commonly confused words discrimination is using Chinese Word Sketch (CWS), a powerful tool for extracting meaning grammatical relations and presenting non-native Chinese learners an in-depth analysis of synonyms phrase. This method can not only embrace learners’ better understanding about the discrimination between synonym words or commonly confused words, but also built up their capability of choosing a suitable word in different Chinese contexts. This study takes a pair of commonly confused words 接收 jiēshōu ‘receive’ and 接受 jiēshòu ‘accept’ which non-native Chinese learners would always confuse as an example, and based on Chinese Gigaword Corpus, as well as using CWS, to explore the discrimination between 接收 jiēshōu ‘receive’ and 接受 jiēshòu ‘accept’, showing how to adopt CWS to distinguish commonly confused words and apply the results in error analysis and vocabulary learning.

Yang Wu, Shan Wang

Domain Restrictor Chúfēi and Relevant Constructions

This paper argues that chúfēi is a domain restrictor, which restricts the domain of quantification. More precisely, it has two functions: it can serve as either a marker of the only condition or an exceptive operator. As an only-condition marker, chúfēi marks the condition denoted by its associate element to be the only condition which is quantified by a universal/negated existential quantifier. As an exceptive operator, chúfēi subtracts the condition denoted by its associate element from the domain of a quantifier. In the case that chúfēi occurs in the first clause of a complex sentence, chúfēi usually requires some element such as cái or fŏuzé to co-occur with it because as a unary operator, chúfēi can only take its interacting element to be its argument. Since it fails to take both the subordinate clause and the main clause to be its arguments, the two clauses cannot be related semantically by chúfēi, and a co-occurring element in the main clause is thus needed.

Lei Zhang

A Corpus-Based Study on Near-Synonymous Manner Adverbs: “yiran” and “duanran”

Manner adverb is one of the major groups of adverbs describing how an action is carried out. Although a number of near-synonymous manner adverbs are found in Chinese, most dictionaries use rather general or even circular definitions for these items which persistently confuse second language learners. The subtle yet important differences between near-synonymous manner adverbs are invisible by definitions but observable in collocational behaviors. Thus, this work aims to examine the semantic differences between a pair of near-synonymous manner adverbs 毅然 and 斷然 (resolutely). We propose that near-synonymous manner adverbs can be differentiated in terms of the event structures constructed by the collocated verbs, conjunctions, and nouns.

Helena Yan Ping Lau, Sophia Yat Mei Lee

The Lexicalization of the Word “Jinqing” (尽情) in Chinese

The lexicalization and grammaticalization of Chinese word “Jinqing” (尽情) has been studied in this paper, with examples from Old Chinese, Middle Chinese, and modern Chinese. The word has undergone three phases: V+N phrase, verb, and adverb. Reasons of the grammaticalization have been proposed and discussed. In addition, the lexicalization of “Jin+X” structure has also been examined.

Xiujuan Chen, Qing Wang, Gaowu Wang

On the Case of Zheyangzi in Mandarin Words of Taiwan and Mainland China

As a mandarin word, Zheyangzi has been widely used in the accent of Taiwan, where the usages and part of meanings in this word are the same as its application in mainland China, but there can be some disparities on the features of Zheyangzi through comparison between two areas. This paper will illustrate these disparities by analyzing corpus and explain the reasons behind the features shared by this word.

Xiaolong LU

Dou in Kaifeng Dialect

Compared with dou in Mandarin, dou in Kaifeng dialect shows a higher frequency of use and reflects a richer diversity of usages. The investigation finds that dou in Kaifeng dialect has two pronunciations of, i.e. douk1 [təu33] and douk2 [təu214]. Among all the usages of douk1, four of them correspond to that of dou in Mandarin, one demonstrates a unique feature, and the rest of the usages of douk1, together with that of douk2, largely correspond to that of jiu in Mandarin. This new collection of data can shed light on the study of dou in Chinese.

Shaoshuai Shen, Bing Lu, Huibin Zhuang

About “好说歹说[haoshuodaishuo]”

This paper mainly discusses the structural, semantic and pragmatic features of “好说歹说[haoshuodaishuo]”. Firstly it can be viewed as a parallel structure of “好说[haoshuo]” and “歹说[daishuo]”, whose main syntactic function is to be part of the predicate. Then its meaning “repeatedly requesting or persuading by various arguments or means” is formed through the mechanism of metaphor. In addition, “haoshuodaishuo” is usually used in a pragmatic situation of asking for a favor or persuading another. Especially, for the universal principle of optimism, “haoshuodaishuo” usually implicates an anticipated positive result.

Bin Liu

On the Semantic Patterns and Features of “Shao” Serving as Adverbial in Imperative Sentences

Ambivalent words are seen as a difficult point both in machine translation and Chinese language teaching. Although some more research has been made on them, none is done in respect to machine translation and Chinese language teaching. This article, based on the previous research, studies the Ambivalent word “shao”, including its syntactic and semantic features and the conditions and formal marks when a meaning appears, exploring new ways of studying Ambivalent words for machine translation and Chinese language teaching.

Pu Li

Extended Study and Application

Frontmatter

The MFCC Vowel Space of [ɤ] in Grammatical and Lexical Word in Standard Chinese

This study investigated the phonetic quality of [ɤ] in the light syllables of grammatical words and the non-light syllables of content words. By using the Mandarin Speech Test Materials as the main linguistic materials, audio data of 10 native speakers of Standard Chinese was collected for this study. After the audio data collection, the Mel Frequency Cepstral Coefficents of [ɤ] in both conditions were extracted as the acoustic feature. Then, the dimension of MFCCs was reduced through Laplacian eigenmaps in order to construct the 3D acoustic space. Also, an independent T-test was applied to test the differences of the three-principal-dimension data be-tween the light and non-light [ɤ]. The results of 3D vowel space and independent T-test jointly show that the quality of [ɤ] in light syllables distinguish from that in non-light syllables. Therefore, results of the present study may suggest that the traditional term light-tone is not an accurate term for describing the phenomenon of light syllable in Mandarin. Furthermore, considering any possible variations of vowel quality under different lexical context may improve the output of natural speech processing of Standard Chinese.

Yu Chen, Ju Zhang, Huixia Wang, Jin Zhang, Yanting Chen, Hua Lin, Jianwu Dang

On the Lexical Characteristics of Deception

The use of language reflects the way of people thinking. Some research indicate that, people show significant differences in their cognitive state when they are lying. And the differences further reflect in their external language behaviors, which suggests the possibility of distinguishing truth from deception through verbal cues. Expanding the previous studies which mainly focus on the analysis of fabricated events or facts, in this study, we construct a Chinese opinion-oriented deception corpus. Based on the observation and statistics of different word usages in the deceptive and non-deceptive speech, this paper examined the possible word categories which may serve as cues to distinguish between genuine and deceptive reviews, e.g. the second-person pronouns, direct speech and so on.

Qi Su

An Empirical Study of Gender-Related Differences in Topics Selection Based on We-Media Corpus

Our thesis aims to find out the gender-related differences in topics selection in Chinese We-Media through an extensive amount of classified blog corpus text. According to the Chinese blog corpus text, there is a statistically significant differences in topics selection between men and women. Based on the empirical study, we questioned the previous studies which suggest that men incline to choose politics and economic-related topics. We believe that in different communicative context, Men tend to change the topics accordingly, whereas women remain the same.

Yubo Wang

The Interaction of Semantic and Orthographic Processing During Chinese Sinograms Recognition: An ERP Study

The present study investigated the interaction of semantic and orthographic processing during compound sinogram recognition, using event related potentials (ERPs) and a picture-word matching task. The behavioral results showed that participants generally needed more time to make a response and were more prone to make mistakes, when the paired mismatch sinogram was orthographically similar or semantically related to the picture’s matching name. The N400 results indicated the main effect of semantics and the significant interaction of semantics by orthography. Moreover, only under the semantically related condition (S+), the mean amplitude of N400 was more negative going in orthographically similar condition (O+) than in orthographically dissimilar one (O-), while there was no significant difference under the semantically unrelated condition (S-). Consequently, the sub-lexical orthographic information plays an important role in discriminating the sinograms sharing related semantics.

Hao Zhang, Fei Chen, Nan Yan, Lan Wang, I-Fan Su, Manwa L. Ng

Study on the Effectiveness of the Regulations for Chinese Words with Variant Forms Based on a Long-Span Diachronic Corpus

Chinese words with variant forms are synonymous words with same written forms. They’re important objects of language planning. In this article, the diachronic use of Chinese words with variant forms involved in the Consolidated Table of the First Batch of Chinese Words with Variant Forms and the Consolidated Table of the First Batch of Chinese Words with Variant Forms (Draft) is studied using long-time span diachronic corpus and diachronic retrieval systems, and such words are categorized by their diachronic trends. Based on this method, the effectiveness of artificial regulations on Chinese words with variant forms exhibiting different trends in usage is analyzed. The analytical data show that the consolidation and standardization of Chinese words with variant forms were effectively implemented in the language situation of 2002 and 2003 and had a positive significance.

Gaoqi Rao, Meng Dai, Endong Xun

Motion Constructions in Singapore Mandarin Chinese: A Typological Perspective

Research on Singapore Mandarin Chinese has shown that it is influenced, to a certain degree, by dialects such as Min (e.g. Hokkien) and Cantonese. This has resulted in many differences between Mainland China Mandarin Chinese and Singapore Mandarin Chinese. This paper examines one such difference: the expression of self-agentive motion constructions. This study finds that Singapore Mandarin Chinese lies somewhere in between dialects and modern Mandarin Chinese with respect to lexicalization of motion events. The findings suggest that rather than the categorical patterns that have been proposed in many previous studies, the lexicalization patterns in different languages may form a continuum.

Yong Kang Khoo, Jingxia Lin

The Source of Implicit Negation in Mandarin Chinese Yes-no Questions

How is it possible that negation can be expressed without the explicit use of negation markers? This paper answers this question by looking at the cases of implicit negation in different types of yes-no questions in Mandarin Chinese. After arguing that Mandarin Chinese has at least three distinct classes of yes-no questions, and that the three distinct classes form a continuum of semantic features, we consider the sentence-final particle ma (吗), based on its etymology, as a source of the negative meaning in affirmative yes-no questions, or inversion polarity in negative yes-no questions. Finally, we propose that negative yes-no questions can be analysed like English tag questions.

Daniel Kwang Guan Chan

Study in Words and Characters Use and Contents of the Opening Addresses by Chinese University Presidents

This paper selects 213 opening addresses by Chinese university presidents between 2003 and 2015, synchronically and diachronically, this paper has devoted itself to the study of the usage of words and characters in those addresses, the contents as a whole. The location and the time those speeches were delivered have also been taken into consideration. By comparing the words and characters used in those addresses to Lexicon of Common Words in Contemporary Chinese, this paper shows that vocabulary used in addresses doesn’t change dramatically with time going diachronically, and the themes and focuses of the speeches tend to vary from time to time. This paper classifies the speeches into certain types according to the frequently and individually used words, analyzes the features of their contents. The conclusion is that universities have put more emphasis on cultivating spirits, quality, ability and virtue.

Xinyuan Zhang, Pengyuan Liu

Backmatter

Weitere Informationen