Skip to main content

About this book

This book constitutes the refereed proceedings of the Second International Conference on Statistical Language and Speech Processing, SLSP 2014, held in Grenoble, France, in October 2014. The 18 full papers presented together with three invited talks were carefully reviewed and selected from 53 submissions. The papers are organized in topical sections on machine translation, speech and speaker recognition, machine learning methods, text extraction and categorization, and mining text.

Table of Contents


Invited Talks


Syntax and Data-to-Text Generation

With the development of the web of data, recent statistical, data-to-text generation approaches have focused on mapping data (e.g., database records or knowledge-base (KB) triples) to natural language. In contrast to previous grammar-based approaches, this more recent work systematically eschews syntax and learns a direct mapping between meaning representations and natural language. By contrast, I argue that an explicit model of syntax can help support NLG in several ways. Based on case studies drawn from KB-to-text generation, I show that syntax can be used to support supervised training with little training data; to ensure domain portability; and to improve statistical hypertagging.
Claire Gardent

Spoken Language Processing: Time to Look Outside?

Over the past thirty years, the field of spoken language processing has made impressive progress from simple laboratory demonstrations to mainstream consumer products. However, commercial applications such as Siri highlight the fact that there is still some way to go in creating Autonomous Social Agents that are truly capable of conversing effectively with their human counterparts in real-world situations. This paper suggests that it may be time for the spoken language processing community to take an interest in the potentially important developments that are occurring in related fields such as cognitive neuroscience, intelligent systems and developmental robotics. It then gives an insight into how such ideas might be integrated into a novel Mutual Beliefs Desires Intentions Actions and Consequences (MBDIAC) framework that places a focus on generative models of communicative behaviour which are recruited for interpreting the behaviour of others.
Roger K. Moore

Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis

Text-to-speech synthesis is a task that solves many real-world problems such as providing speaking and reading ability to people who lack those capabilities. It is thus viewed mainly as an engineering problem rather than a purely scientific one. Therefore many of the solutions in speech synthesis are purely practical. However, from the point of view of phonetics, the process of producing speech from text artificially is also a scientific one. Here I argue – using an example from speech prosody, namely speech melody – that phonetics is the key discipline in helping to solve what is arguably one of the most interesting problems in machine learning.
Martti Vainio

Machine Translation


A Hybrid Approach to Compiling Bilingual Dictionaries of Medical Terms from Parallel Corpora

Existing bilingual dictionaries of technical terms suffer from limited coverage and are only available for a small number of language pairs. In response to these problems, we present a method for automatically constructing and updating bilingual dictionaries of medical terms by exploiting parallel corpora. We focus on the extraction of multi-word terms, which constitute a challenging problem for term alignment algorithms. We apply our method to two low resourced language pairs, namely English-Greek and English-Romanian, for which such resources did not previously exist in the medical domain. Our approach combines two term alignment models to improve the accuracy of the extracted medical term translations. Evaluation results show that the precision of our method is \(86\,\%\) and \(81\,\%\) for English-Greek and English-Romanian respectively, considering only the highest ranked candidate translation.
Georgios Kontonatsios, Claudiu Mihăilă, Ioannis Korkontzelos, Paul Thompson, Sophia Ananiadou

Experiments with a PPM Compression-Based Method for English-Chinese Bilingual Sentence Alignment

Alignment of parallel corpora is a crucial step prior to training statistical language models for machine translation. This paper investigates compression-based methods for aligning sentences in an English-Chinese parallel corpus. Four metrics for matching sentences required for measuring the alignment at the sentence level are compared: the standard sentence length ratio (SLR), and three new metrics, absolute sentence length difference (SLD), compression code length ratio (CR), and absolute compression code length difference (CD). Initial experiments with CR show that using the Prediction by Partial Matching (PPM) compression scheme, a method that also performs well at many language modeling tasks, significantly outperforms the other standard compression algorithms Gzip and Bzip2. The paper then shows that for sentence alignment of a parallel corpus with ground truth judgments, the compression code length ratio using PPM always performs better than sentence length ratio and the difference measurements also work better than the ratio measurements.
Wei Liu, Zhipeng Chang, William J. Teahan

BiMEANT: Integrating Cross-Lingual and Monolingual Semantic Frame Similarities in the MEANT Semantic MT Evaluation Metric

We present experimental results showing that integrating cross-lingual semantic frame similarity into the semantic frame based automatic MT evaluation metric MEANT improves its correlation with human judgment on evaluating translation adequacy. Recent work shows that MEANT more accurately reflects translation adequacy than other automatic MT evaluation metrics such as BLEU or TER, and that moreover, optimizing SMT systems against MEANT robustly improves translation quality across different output languages. However, in some cases the human reference translation employs different scoping strategies from the input sentence and thus standard monolingual MEANT, which only assesses translation quality via the semantic frame similarity between the reference and machine translations, fails to fairly and accurately reward the adequacy of the machine translation. To address this issue we propose a new bilingual metric, BiMEANT, that correlates with human judgment more closely than MEANT by incorporating new cross-lingual semantic frame similarity assessments into MEANT.
Chi-kiu Lo, Dekai Wu

Speech and Speaker Recognition


Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space

In the last few years, the use of i-vectors along with a generative back-end has become the new standard in speaker recognition. An i-vector is a compact representation of a speaker utterance extracted from a low dimensional total variability subspace. Although current speaker recognition systems achieve very good results in clean training and test conditions, the performance degrades considerably in noisy environments. The compensation of the noise effect is actually a research subject of major importance. As far as we know, there was no serious attempt to treat the noise problem directly in the i-vectors space without relying on data distributions computed on a prior domain. This paper proposes a full-covariance Gaussian modeling of the clean i-vectors and noise distributions in the i-vectors space then introduces a technique to estimate a clean i-vector given the noisy version and the noise density function using MAP approach. Based on NIST data, we show that it is possible to improve up to 60 % the baseline system performances. A noise adding tool is used to help simulate a real-world noisy environment at different signal-to-noise ratio levels.
Waad Ben Kheder, Driss Matrouf , Pierre-Michel Bousquet, Jean-François Bonastre, Moez Ajili

Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech

Speaker variability is a well-known problem of state-of-the-art Automatic Speech Recognition (ASR) systems. In particular, handling children speech is challenging because of substantial differences in pronunciation of the speech units between adult and child speakers. To build accurate ASR systems for all types of speakers Hidden Markov Models with Gaussian Mixture Densities were intensively used in combination with model adaptation techniques.
This paper compares different ways to improve the recognition of children speech and describes a novel approach relying on Class-Structured Gaussian Mixture Model (GMM).
A common solution for reducing the speaker variability relies on gender and age adaptation. First, it is proposed to replace gender and age by unsupervised clustering. Speaker classes are first used for adaptation of the conventional HMM. Second, speaker classes are used for initializing structured GMM, where the components of Gaussian densities are structured with respect to the speaker classes. In a first approach mixture weights of the structured GMM are set dependent on the speaker class. In a second approach the mixture weights are replaced by explicit dependencies between Gaussian components of mixture densities (as in stranded GMMs, but here the GMMs are class-structured).
The different approaches are evaluated and compared on the TIDIGITS task. The best improvement is achieved when structured GMM is combined with feature adaptation.
Arseniy Gorin, Denis Jouvet

Physiological and Cognitive Status Monitoring on the Base of Acoustic-Phonetic Speech Parameters

In this paper the development of an online monitoring system is shown in order to track physiological and cognitive condition of crew members of the Concordia Research Station in Antarctica, with specific regard to depression. Follow-up studies were carried out on recorded speech material in such a way that segmental and supra-segmental speech parameters were measured for individual researchers weakly, and the changes of these parameters were detected over time. Two kind of speech were recorded weekly by crew members in their mother tongue: a diary and a tale (“North Wind and The Sun”). An automatic language independent program was used to segment the records in phoneme level for the measurements. Such a way Concordia Speech Databases were constructed. Those acoustic-phonetic parameters were selected for the follow up study at Concordia, which parameters were statistically selected during a research on the base of the analysis of Seasonal Affective Disorder Databases gathered separately in Europe.
Gábor Kiss, Klára Vicsi

Automatic Phonetic Transcription in Two Steps: Forced Alignment and Burst Detection

In the last decade, there was a growing interest in conversational speech in the fields of human and automatic speech recognition. Whereas for the varieties spoken in Germany, both resources and tools are numerous, for Austrian German only recently the first corpus of read and conversational speech was collected. In the current paper, we present automatic methods to phonetically transcribe and segment (read and) conversational Austrian German. For this purpose, we developed an automatic two-step transcription procedure: In the first step, broad phonetic transcriptions are created by means of a forced alignment and a lexicon with multiple pronunciation variants per word. In the second step, plosives are annotated on the sub-phonemic level: an automatic burst detector automatically determines whether a burst exists and where it is located. Our preliminary results show that the forced alignment based approach reaches accuracies in the range of what has been reported for the inter-transcriber agreement for conversational speech. Furthermore, our burst detector outperforms previous tools with accuracies between 98 % and 74 % for the different conditions in read speech, and between 82 % and 52 % for conversational speech.
Barbara Schuppler, Sebastian Grill, André Menrath, Juan A. Morales-Cordovilla

Machine Learning Methods


Supervised Classification Using Balanced Training

We examine supervised learning for multi-class, multi-label text classification. We are interested in exploring classification in a real-world setting, where the distribution of labels may change dynamically over time. First, we compare the performance of an array of binary classifiers trained on the label distribution found in the original corpus against classifiers trained on balanced data, where we try to make the label distribution as nearly uniform as possible. We discuss the performance trade-offs between balanced vs. unbalanced training, and highlight the advantages of balancing the training set. Second, we compare the performance of two classifiers, Naive Bayes and SVM, with several feature-selection methods, using balanced training. We combine a Named-Entity-based rote classifier with the statistical classifiers to obtain better performance than either method alone.
Mian Du, Matthew Pierce, Lidia Pivovarova, Roman Yangarber

Exploring Multidimensional Continuous Feature Space to Extract Relevant Words

With growing amounts of text data the descriptive metadata become more crucial in efficient processing of it. One kind of such metadata are keywords, which we can encounter e.g. in everyday browsing of webpages. Such metadata can be of benefit in various scenarios, such as web search or content-based recommendation. We research keyword extraction problem from the perspective of vector space and present a novel method to extract relevant words from an article, where we represent each word and phrase of the article as a vector of its latent features. We evaluate our method within text categorisation problem using a well-known 20-newsgroups dataset and achieve state-of-the-art results.
Márius Šajgalík, Michal Barla, Mária Bieliková

Lazy and Eager Relational Learning Using Graph-Kernels

Machine learning systems can be distinguished along two dimensions. The first is concerned with whether they deal with a feature based (propositional) or a relational representation; the second with the use of eager or lazy learning techniques. The advantage of relational learning is that it can capture structural information. We compare several machine learning techniques along these two dimensions on a binary sentence classification task (hedge cue detection). In particular, we use SVMs for eager learning, and \(k\)NN for lazy learning. Furthermore, we employ kLog, a kernel-based statistical relational learning framework as the relational framework. Within this framework we also contribute a novel lazy relational learning system. Our experiments show that relational learners are particularly good at handling long sentences, because of long distance dependencies.
Mathias Verbeke, Vincent Van Asch, Walter Daelemans, Luc De Raedt

Linear Co-occurrence Rate Networks (L-CRNs) for Sequence Labeling

Sequence labeling has wide applications in natural language processing and speech processing. Popular sequence labeling models suffer from some known problems. Hidden Markov models (HMMs) are generative models and they cannot encode transition features; Conditional Markov models (CMMs) suffer from the label bias problem; And training of conditional random fields (CRFs) can be expensive. In this paper, we propose Linear Co-occurrence Rate Networks (L-CRNs) for sequence labeling which avoid the mentioned problems with existing models. The factors of L-CRNs can be locally normalized and trained separately, which leads to a simple and efficient training method. Experimental results on real-world natural language processing data sets show that L-CRNs reduce the training time by orders of magnitudes while achieve very competitive results to CRFs.
Zhemin Zhu, Djoerd Hiemstra, Peter Apers

Text Extraction and Categorization


A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling for Handwriting Recognition

Long Short-Term Memory Recurrent Neural Networks are the current state-of-the-art in handwriting recognition. In speech recognition, Deep Multi-Layer Perceptrons (DeepMLPs) have become the standard acoustic model for Hidden Markov Models (HMMs). Although handwriting and speech recognition systems tend to include similar components and techniques, DeepMLPs are not used as optical model in unconstrained large vocabulary handwriting recognition. In this paper, we compare Bidirectional LSTM-RNNs with DeepMLPs for this task. We carried out experiments on two public databases of multi-line handwritten documents: Rimes and IAM. We show that the proposed hybrid systems yield performance comparable to the state-of-the-art, regardless of the type of features (hand-crafted or pixel values) and the neural network optical model (DeepMLP or RNN).
Théodore Bluche, Hermann Ney, Christopher Kermorvant

Probabilistic Anomaly Detection Method for Authorship Verification

Authorship verification is the task of determining if a given text is written by a candidate author or not. In this paper, we present a first study on using an anomaly detection method for the authorship verification task. We have considered a weakly supervised probabilistic model based on a multivariate Gaussian distribution. To evaluate the effectiveness of the proposed method, we conducted experiments on a classic French corpus. Our preliminary results show that the probabilistic method can achieve a high verification performance that can reach an F1 score of 85 %. Thus, this method can be very valuable for authorship verification.
Mohamed Amine Boukhaled, Jean-Gabriel Ganascia

Measuring Global Similarity Between Texts

We propose a new similarity measure between texts which, contrary to the current state-of-the-art approaches, takes a global view of the texts to be compared. We have implemented a tool to compute our textual distance and conducted experiments on several corpuses of texts. The experiments show that our methods can reliably identify different global types of texts.
Uli Fahrenberg, Fabrizio Biondi, Kevin Corre, Cyrille Jegourel, Simon Kongshøj, Axel Legay

Identifying and Clustering Relevant Terms in Clinical Records Using Unsupervised Methods

The automatic processing of clinical documents created at clinical settings has become a focus of research in natural language processing. However, standard tools developed for general texts are not applicable or perform poorly on this type of documents. Moreover, several crucial tasks require lexical resources and relational thesauri or ontologies to identify relevant concepts and their connections. In the case of less-resourced languages, such as Hungarian, there are no such lexicons available. The construction of annotated data and their organization requires human expert work. In this paper we show how applying statistical methods can result in a preprocessed, semi-structured transformation of the raw documents that can be used to aid human work. The modules detect and resolve abbreviations, identify multiword terms and derive their similarity, all based on the corpus itself.
Borbála Siklósi, Attila Novák

Mining Text


Predicting Medical Roles in Online Health Fora

Online health fora are increasingly visited by patients to get help and information related to their health. However, these fora are not limited to patients: a significant number of health professionals actively participate in many discussions. As experts their posted information are very important since, they are able to well explain the problems, the symptoms, correct false affirmations and give useful advices, etc. For someone interested in trusty medical information, obtaining only these kinds of posts can be very useful and informative. Unfortunately, extracting such knowledge needs to navigate over the fora in order to evaluate the information. Navigation and selection are time consuming, tedious, difficult and error-prone activities when done manually. It is thus important to propose a new method for automatically categorize information proposed both by non-experts as well as by professionals in online health fora. In this paper, we propose to use a supervised approach to evaluate what are the most representative components of a post considering vocabularies, uncertainty markers, emotions, misspellings and interrogative forms to perform efficiently this categorization. Experiments have been conducted on two real fora and shown that our approach is efficient for extracting posts done by professionals.
Amine Abdaoui, Jérôme Azé, Sandra Bringay, Natalia Grabar, Pascal Poncelet

Informal Mathematical Discourse Parsing with Conditional Random Fields

Discourse parsing for the Informal Mathematical Discourse (IMD) has been a difficult task because of the lack of data sets, partly because the Natural Language Processing (NLP) techniques must be adapted to informality of IMD. In this paper, we present an end-to-end discourse parser which is a sequential classifier of informal deductive argumentations (IDA) for Spanish. We design a discourse parser using sequence labeling based on CRFs (Conditional Random Fields). We use the CRFs on lexical, syntactic and semantic features extracted from a discursive corpus (MD-TreeBank: Mathematical Discourse TreeBank). In this article, we describe a Penn Discourse TreeBank (PDTB) styled End-to-End discourse parser into the Control Natural Languages (CNLs) context. Discourse parsing is focused from a discourse low level perspective in which we identify the IDA connectives avoiding complex linguistic phenomena. Our discourse parser performs parsing as a connective-level sequence labeling task and classifies several types of informal deductive argumentations into the mathematical proof.
Raúl Ernesto Gutierrez de Piñerez Reyes, Juan Francisco Díaz-Frías

Corpus-Based Information Extraction and Opinion Mining for the Restaurant Recommendation System

In this paper corpus-based information extraction and opinion mining method is proposed. Our domain is restaurant reviews, and our information extraction and opinion mining module is a part of a Russian knowledge-based recommendation system.
Our method is based on thorough corpus analysis and automatic selection of machine learning models and feature sets. We also pay special attention to the verification of statistical significance.
According to the results of the research, Naive Bayes models perform well at classifying sentiment with respect to a restaurant aspect, while Logistic Regression is good at deciding on the relevance of a user’s review.
The approach proposed can be used in similar domains, for example, hotel reviews, with data represented by colloquial non-structured texts (in contrast with the domain of technical products, books, etc.) and for other languages with rich morphology and free word order.
Ekaterina Pronoza, Elena Yagunova, Svetlana Volskaya


Additional information

Premium Partner

    Image Credits