Skip to main content
Top

2020 | Book

Information Retrieval Technology

15th Asia Information Retrieval Societies Conference, AIRS 2019, Hong Kong, China, November 7–9, 2019, Proceedings

Editors: Fu Lee Wang, Haoran Xie, Wai Lam, Dr. Aixin Sun, Lun-Wei Ku, Tianyong Hao, Wei Chen, Tak-Lam Wong, Xiaohui Tao

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the refereed proceedings of the 15th Information Retrieval Technology Conference, AIRS 2019, held in Hong Kong, China, in November 2019.The 14 full papers presented together with 3 short papers were carefully reviewed and selected from 27 submissions. The scope of the conference covers applications, systems, technologies and theory aspects of information retrieval in text, audio, image, video and multimedia data.

Table of Contents

Frontmatter

Question Answering

Frontmatter
Towards Automatic Evaluation of Reused Answers in Community Question Answering
Abstract
We consider the problem of reused answer retrieval for community question answering (CQA): given a question q, retrieve answers \(a_{i}^{j}\) posted in response to other questions \(q_{i} (\ne q)\), where \(a_{i}^{j}\) serves as an answer to q. While previous work evaluated this task by manually annotating the relationship between q and \(a_{i}^{j}\), this approach does not scale for large-scale CQA sites. We therefore explore an automatic evaluation method for reused answer retrieval, which computes nDCG by defining the gain value of each retrieved answer as a ROUGE score that treats the original answers to q as gold summaries. Our answer retrieval experiment suggests that effective reused answer retrieval systems may not be the same as effective gold answer retrieval systems. We provide case studies to discuss the benefits and limitations of our approach.
Hsin-Wen Liu, Sumio Fujita, Tetsuya Sakai
Unsupervised Answer Retrieval with Data Fusion for Community Question Answering
Abstract
Community question answering (cQA) systems have enjoyed the benefits of advances in neural information retrieval, some models of which need annotated documents as supervised data. However, in contrast with the amount of supervised data for cQA systems, user-generated data in cQA sites have been increasing greatly with time. Thus, focusing on unsupervised models, we tackle a task of retrieving relevant answers for new questions from existing cQA data and propose two frameworks to exploit a Question Retrieval (QR) model for Answer Retrieval (AR). The first framework ranks answers according to the combined scores of QR and AR models and the second framework ranks answers using the scores of a QR model and best answer flags. In our experiments, we applied the combination of our proposed frameworks and a classical fusion technique to AR models with a Japanese cQA data set containing approximately 9.4M question-answer pairs. When best answer flags in the cQA data cannot be utilized, our combination of AR and QR scores with data fusion outperforms a base AR model on average. When best answer flags can be utilized, the retrieval performance can be improved further. While our results lack statistical significance, we discuss effect sizes as well as future sample sizes to attain sufficient statistical power.
Sosuke Kato, Toru Shimizu, Sumio Fujita, Tetsuya Sakai
A Semantic Expansion-Based Joint Model for Answer Ranking in Chinese Question Answering Systems
Abstract
Answer ranking is one of essential steps in open domain question answering systems. The ranking of the retrieved answers directly affects user satisfaction. This paper proposes a new joint model for answer ranking by leveraging context semantic features, which balances both question-answer similarities and answer ranking scores. A publicly available dataset containing 40,000 Chinese questions and 369,919 corresponding answer passages from Sogou Lab is used for experiments. Evaluation on the joint model shows a Precison@1 of 72.6%, which outperforms the state-of-the-art baseline methods.
Wenxiu Xie, Leung-Pun Wong, Lap-Kei Lee, Oliver Au, Tianyong Hao
Arc Loss: Softmax with Additive Angular Margin for Answer Retrieval
Abstract
Answer retrieval is a crucial step in question answering. To determine the best Q–A pair in a candidate pool, traditional approaches adopt triplet loss (i.e., pairwise ranking loss) for a meaningful distributed representation. Triplet loss is widely used to push away a negative answer from a certain question in a feature space and leads to a better understanding of the relationship between questions and answers. However, triplet loss is inefficient because it requires two steps: triplet generation and negative sampling. In this study, we propose an alternative loss function, namely, arc loss, for more efficient and effective learning than that by triplet loss. We evaluate the proposed approach on a commonly used QA dataset and demonstrate that it significantly outperforms the triplet loss baseline.
Rikiya Suzuki, Sumio Fujita, Tetsuya Sakai

Context-Awareness

Frontmatter
Context-Aware Collaborative Ranking
Abstract
Recommender systems (RS) are being used in a broad range of applications, from online shopping websites to music streaming platforms, which aim to provide users high-quality personalized services. Collaborative filtering (CF) is a promising technique to ensure the accuracy of a recommender system, which can be divided into specific tasks such as rating prediction and item ranking. However, there is a larger volume of published works studying the problem of rating prediction, rather than item ranking though it is recognized to be more appropriate for the final recommendation in a real application. On the other hand, many studies on item ranking devoted to leveraging implicit feedback are limited in performance improvements due to the uniformity of implicit feedback. Hence, in this paper, we focus on item ranking with informative explicit feedback, which is also called collaborative ranking. In particular, we propose a novel recommendation model termed context-aware collaborative ranking (CCR), which adopts a logistic loss function to measure the predicted error of ranking and exploits the inherent preference context derived from the explicit feedback. Moreover, we design an elegant strategy to distinguish between positive and negative samples used in the process of model training. Empirical studies on four real-world datasets clearly demonstrate that our CCR outperforms the state-of-the-art methods in terms of various ranking-oriented evaluation metrics.
Wei Dai, Weike Pan, Zhong Ming
Context-Aware Helpfulness Prediction for Online Product Reviews
Abstract
Modeling and prediction of review helpfulness has become more predominant due to proliferation of e-commerce websites and online shops. Since the functionality of a product cannot be tested before buying, people often rely on different kinds of user reviews to decide whether or not to buy a product. However, quality reviews might be buried deep in the heap of a large amount of reviews. Therefore, recommending reviews to customers based on the review quality is of the essence. Since there is no direct indication of review quality, most reviews use the information that “X out of Y” users found the review helpful for obtaining the review quality. However, this approach undermines helpfulness prediction because not all reviews have statistically abundant votes. In this paper, we propose a neural deep learning model that predicts the helpfulness score of a review. This model is based on convolutional neural network (CNN) and a context-aware encoding mechanism which can directly capture relationships between words irrespective of their distance in a long sequence. We validated our model on human annotated dataset and the result shows that our model significantly outperforms existing models for helpfulness prediction.
Iyiola E. Olatunji, Xin Li, Wai Lam
LGLMF: Local Geographical Based Logistic Matrix Factorization Model for POI Recommendation
Abstract
With the rapid growth of Location-Based Social Networks, personalized Points of Interest (POIs) recommendation has become a critical task to help users explore their surroundings. Due to the scarcity of check-in data, the availability of geographical information offers an opportunity to improve the accuracy of POI recommendation. Moreover, matrix factorization methods provide effective models which can be used in POI recommendation. However, there are two main challenges which should be addressed to improve the performance of POI recommendation methods. First, leveraging geographical information to capture both the user’s personal, geographic profile and a location’s geographic popularity. Second, incorporating the geographical model into the matrix factorization approaches. To address these problems, a POI recommendation method is proposed in this paper based on a Local Geographical Model, which considers both users’ and locations’ points of view. To this end, an effective geographical model is proposed by considering the user’s main region of activity and the relevance of each location within that region. Then, the proposed local geographical model is fused into the Logistic Matrix Factorization to improve the accuracy of POI recommendation. Experimental results on two well-known datasets demonstrate that the proposed approach outperforms other state-of-the-art POI recommendation methods.
Hossein A. Rahmani, Mohammad Aliannejadi, Sajad Ahmadian, Mitra Baratchi, Mohsen Afsharchi, Fabio Crestani

IR Models

Frontmatter
On the Pluses and Minuses of Risk
Abstract
Evaluating the effectiveness of retrieval models has been a mainstay in the IR community since its inception. Generally speaking, the goal is to provide a rigorous framework to compare the quality of two or more models, and determine which of them is the “better”. However, defining “better” or “best” in this context is not a simple task. Computing the average effectiveness over many queries is the most common approach used in Cranfield-style evaluations. But averages can hide subtle trade-offs in retrieval models – a percentage of the queries may well perform worse than a previous iteration of the model as a result of an optimization to improve some other subset. A growing body of work referred to as risk-sensitive evaluation, seeks to incorporate these effects. We scrutinize current approaches to risk-sensitive evaluation, and consider how risk and reward might be recast to better account for human expectations of result quality on a query by query basis.
Rodger Benham, Alistair Moffat, J. Shane Culpepper
Randomised vs. Prioritised Pools for Relevance Assessments: Sample Size Considerations
Abstract
The present study concerns depth-k pooling for building IR test collections. At TREC, pooled documents are traditionally presented in random order to the assessors to avoid judgement bias. In contrast, an approach that has been used widely at NTCIR is to prioritise the pooled documents based on “pseudorelevance,” in the hope of letting assessors quickly form an idea as to what constitutes a relevant document and thereby judge more efficiently and reliably. While the recent TREC 2017 Common Core Track went beyond depth-k pooling and adopted a method for selecting documents to judge dynamically, even this task let the assessors process the usual depth-10 pools first: the idea was to give the assessors a “burn-in” period, which actually appears to echo the view of the NTCIR approach. Our research questions are: (1) Which depth-k ordering strategy enables more efficient assessments? Randomisation, or prioritisation by pseudorelevance? (2) Similarly, which of the two strategies enables higher inter-assessor agreements? Our experiments based on two English web search test collections with multiple sets of graded relevance assessments suggest that randomisation outperforms prioritisation in both respects on average, although the results are statistically inconclusive. We then discuss a plan for a much larger experiment with sufficient statistical power to obtain the final verdict.
Tetsuya Sakai, Peng Xiao
Cross-Level Matching Model for Information Retrieval
Abstract
Recently, many neural retrieval models have been proposed and shown competitive results. In particular, interaction-based models have shown superior performance to traditional models in a number of studies. However, the interactions used as the basic matching signals are between single terms or their embeddings. In reality, a term can often match a phrase or even longer segment of text. This paper proposes a Cross-Level Matching Model which enhances the basic matching signals by allowing terms to match hidden representation states within a sentence. A gating mechanism aggregates the learned matching patterns of different matching channels and outputs a global matching score. Our model provides a simple and effective way for word-phrase matching.
Yifan Nie, Jian-Yun Nie
Understanding and Improving Neural Ranking Models from a Term Dependence View
Abstract
Recently, neural information retrieval (NeuIR) has attracted a lot of interests, where a variety of neural models have been proposed for the core ranking problem. Beyond the continuous refresh of the state-of-the-art neural ranking performance, the community calls for more analysis and understanding of the emerging neural ranking models. In this paper, we attempt to analyze these new models from a traditional view, namely term dependence. Without loss of generality, most existing neural ranking models could be categorized into three categories with respect to their underlying assumption on query term dependence, i.e., independent models, dependent models, and hybrid models. We conduct rigorous empirical experiments over several representative models from these three categories on a benchmark dataset and a large click-through dataset. Interestingly, we find that no single type of model can achieve a consistent win over others on different search queries. An oracle model which can select the right model for each query can obtain significant performance improvement. Based on the analysis we introduce an adaptive strategy for neural ranking models. We hypothesize that the term dependence in a query could be measured through the divergence between its independent and dependent representations. We thus propose a dependence gate based on such divergence representation to softly select neural ranking models for each query accordingly. Experimental results verify the effectiveness of the adaptive strategy.
Yixing Fan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng

Training

Frontmatter
Generating Short Product Descriptors Based on Very Little Training Data
Abstract
We propose a pipeline model for summarising a short textual product description for inclusion in an online advertisement banner. While a standard approach is to truncate the advertiser’s original product description so that the text will fit the small banner, this simplistic approach often removes crucial information or attractive expressions from the original description. Our objective is to shorten the original description more intelligently, so that users’ click through rate (CTR) will improve. One major difficulty in this task, however, is the lack of large training data: machine learning methods that rely on thousands of pairs of the original and shortened texts would not be practical. Hence, our proposed method first employs a semisupervised sequence tagging method called TagLM to convert the original description into a sequence of entities, and then a BiLSTM entity ranker which determines which entities should be preserved: the main idea is to tackle the data sparsity problem by leveraging sequences of entities rather than sequences of words. In our offline experiments with Korean data from travel and fashion domains, our sequence tagger outperforms an LSTM-CRF baseline, and our entity ranker outperforms LambdaMART and RandomForest baselines. More importantly, in our online A/B testing where the proposed method was compared to the simple truncation approach, the CTR improved by 34.1% in the desktop PC environment.
Peng Xiao, Joo-Young Lee, Sijie Tao, Young-Sook Hwang, Tetsuya Sakai
Experiments with Cross-Language Speech Retrieval for Lower-Resource Languages
Abstract
Cross-language speech retrieval systems face a cascade of errors due to transcription and translation ambiguity. Using 1-best speech recognition and 1-best translation in such a scenario could adversely affect recall if those 1-best system guesses are not correct. Accurately representing transcription and translation probabilities could therefore improve recall, although possibly at some cost in precision. The difficulty of the task is exacerbated when working with languages for which limited resources are available, since both recognition and translation probabilities may be less accurate in such cases. This paper explores the combination of expected term counts from recognition with expected term counts from translation to perform cross-language speech retrieval in which the queries are in English and the spoken content to be retrieved is in Tagalog or Swahili. Experiments were conducted using two query types, one focused on term presence and the other focused on topical retrieval. Overall, the results show that significant improvements in ranking quality result from modeling transcription and recognition ambiguity, even in lower-resource settings, and that adapting the ranking model to specific query types can yield further improvements.
Suraj Nair, Anton Ragni, Ondrej Klejch, Petra Galuščáková, Douglas Oard
Weighted N-grams CNN for Text Classification
Abstract
Text categorization can solve the problem of information clutter to a large extent, and it also provides a more efficient search strategy and more effective search results for information retrieval. In recent years, Convolutional Neural Networks have been widely applied to this task. However, most existing CNN models are difficult to extract longer n-grams features for the reason as follow: the parameters of the standard CNN model will increase with the increase of the length of n-grams features because it extracts n-grams features through convolution filters of fixed window size. Meanwhile, the term weighting schemes assigning reasonable weight values to words have exhibited excellent performance in traditional bag-of-words models. Intuitively, considering the weight value of each word in n-grams features may be beneficial in text classification. In this paper, we proposed a model called weighted n-grams CNN model. It is a variant of CNN introducing a weighted n-grams layer. The parameters of the weighted n-grams layer are initialized by term weighting schemes. Only by adding fixed parameters can the model generate any length of weighted n-grams features. We compare our proposed model with other popular and latest CNN models on five datasets in text classification. The experimental results show that our proposed model exhibits comparable or even superior performance.
Zequan Zeng, Yi Cai, Fu Lee Wang, Haoran Xie, Junying Chen

Semantics

Frontmatter
Detecting Emerging Rumors by Embedding Propagation Graphs
Abstract
In this paper, we propose a propagation-driven approach to discover newly emerging rumors which are spreading on social media. Firstly, posts and their responsive ones (i.e., comments, sharing) are modeled as graphs. These graphs will be embedded using their structure and node’s attributes. We then train a classifier to predict from these graph embedding vectors rumor labels. In addition, we also propose an incremental training method to learn embedding vectors of out-of-vocabulary (OOV) words because newly emerging rumor regularly contains new terminologies. To demonstrate the actual performance, we conduct an experiment by using a real-world dataset which is collected from Twitter. The result shows that our approach outperforms the state-of-the-art method with a large margin.
Dang-Thinh Vu, Jason J. Jung
Improving Arabic Microblog Retrieval with Distributed Representations
Abstract
Query expansion (QE) using pseudo relevance feedback (PRF) is one of the approaches that has been shown to be effective for improving microblog retrieval. In this paper, we investigate the performance of three different embedding-based methods on Arabic microblog retrieval: Embedding-based QE, Embedding-based PRF, and PRF incorporated with embedding-based reranking. Our experimental results over three variants of EveTAR test collection showed a consistent improvement of the reranking method over the traditional PRF baseline using both MAP and P@10 evaluation measures. The improvement is statistically-significant in some cases. However, while the embedding-based QE fails to improve over the traditional PRF, the embedding-based PRF successfully outperforms the baseline in several cases, with a statistically-significant improvement using MAP measure over two variants of the test collection.
Shahad Alshalan, Raghad Alshalan, Hend Al-Khalifa, Reem Suwaileh, Tamer Elsayed
LODeDeC: A Framework for Integration of Entity Relations from Knowledge Graphs
Abstract
Large knowledge graphs (KGs), which are part of Linked Open Data (LOD) serve as the primary source for retrieving structured data in many Semantic Web applications. In order for machines to efficiently process the data for different data mining, entity linking and information retrieval tasks, it is always beneficial to have as many reliable facts from KGs as possible. But none of the KGs is complete on its own with respect to the number of relations describing an entity. Moreover, large KGs like DBpedia, YAGO and Wikidata appear similar in nature, but do not fully merge in terms relations of the entities from different domains. The complementary nature of different KGs can be utilized to expand the coverage of relations of the an entity. In order to achieve this, a framework for integration of entity information from different KGs using LOD, semantic similarity approaches and RDF reification is proposed in this paper.
Sini Govindapillai, Lay-Ki Soon, Su-Cheng Haw
Backmatter
Metadata
Title
Information Retrieval Technology
Editors
Fu Lee Wang
Haoran Xie
Wai Lam
Dr. Aixin Sun
Lun-Wei Ku
Tianyong Hao
Wei Chen
Tak-Lam Wong
Xiaohui Tao
Copyright Year
2020
Electronic ISBN
978-3-030-42835-8
Print ISBN
978-3-030-42834-1
DOI
https://doi.org/10.1007/978-3-030-42835-8