Skip to main content
Top

2018 | Book

Information Retrieval Technology

14th Asia Information Retrieval Societies Conference, AIRS 2018, Taipei, Taiwan, November 28-30, 2018, Proceedings

Editors: Yuen-Hsien Tseng, Dr. Tetsuya Sakai, Jing Jiang, Lun-Wei Ku, Dae Hoon Park, Jui-Feng Yeh, Liang-Chih Yu, Lung-Hao Lee, Zhi-Hong Chen

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the refereed proceedings of the 14th Information Retrieval Societies Conference, AIRS 2018, held in Taipei, Taiwan, in November 2018.
The 8 full papers presented together with 9 short papers and 3 session papers were carefully reviewed and selected from 41 submissions. The scope of the conference covers applications, systems, technologies and theory aspects of information retrieval in text, audio, image, video and multimedia data.

Table of Contents

Frontmatter

Social

Frontmatter
An Ensemble Neural Network Model for Benefiting Pregnancy Health Stats from Mining Social Media
Abstract
Extensive use of social media for communication has made it a desired resource in human behavior intensive tasks like product popularity, public polls and more recently for public health surveillance tasks such as lifestyle associated diseases and mental health. In this paper, we exploited Twitter data for detecting pregnancy cases and used tweets about pregnancy to study trigger terms associated with maternal physical and mental health. Such systems can enable clinicians to offer a more comprehensive health care in real time. Using a Twitter-based corpus, we have developed an ensemble Long-short Term Memory (LSTM) – Recurrent Neural Networks (RNN) and Convolution Neural Networks (CNN) network representation model to learn legitimate pregnancy cases discussed online. These ensemble representations were learned by a SVM classifier, which can achieve F1-score of 95% in predicting pregnancy accounts discussed in tweets. We also further investigate the words most commonly associated with physical disease symptoms ‘Distress’ and negative emotions ‘Annoyed’ sentiment. Results from our sentiment analysis study are quite encouraging, identifying more accurate triggers for pregnancy sentiment classes.
Neha Warikoo, Yung-Chun Chang, Hong-Jie Dai, Wen-Lian Hsu
Chinese Governmental Named Entity Recognition
Abstract
Named entity recognition (NER) is a fundamental task in natural language processing and there is a lot of interest on vertical NER such as medical NER, short text NER etc. In this paper, we study the problem of Chinese governmental NER (CGNER). CGNER serves as the basis for automatic governmental text analysis, which can greatly benefit the public. Considering the characteristics of the governmental text, we first formulate the task of CGNER, adding one new entity type, i.e., policy (POL) in addition to the generic types such as person (PER), location (LOC), organization (ORG) and title (TIT) for recognition. Then we constructed a dataset called GOV for CGNER. We empirically evaluate the performances of mainstream NER tools and state-of-the-art BiLSTM-CRF method on the GOV dataset. It was found that there is a performance decline compared to applying these methods on generic NER dataset. Further studies show that compound entities account for a non-negligible proportion and using the classical BIO (Begin-Inside-Outside) annotation cannot encode the entity type combination effectively. To alleviate the problem, we propose to utilize the compound tagging and BiLSTM-CRF for doing CGNER. Experiments show that our proposed methods can significantly improve the CGNER performance, especially for the LOC, ORG and TIT entity types.
Qi Liu, Dong Wang, Meilin Zhou, Peng Li, Baoyuan Qi, Bin Wang
StanceComp: Aggregating Stances from Multiple Sources for Rumor Detection
Abstract
With the rapid development of the Internet, social media has become a major information dissemination platform where any users can post and share information. Although this facilitates the share of breaking news, it also becomes the fertile land for the spread of malicious rumors. On the contrary, online news media might lag behind on reporting breaking news but their articles are more reliable since the journalists often go to verify the information before they report it. Intuitively, when users try to decide whether to trust a claim they saw on the social media, they would want to check stances of the same claim on social media and news media. More specifically, they want to know the opinions of other users, i.e., whether they support or against the claim. To facilitate such a process, we develop StanceComp(https://​stancecomp.​herokuapp.​com), which aggregates the relevant information about a claim and compares the stances of the claim for both social media and news media. The developed system aims to provide a summary of the stances for the claim so that users can have a more comprehensive understanding of the information to detect potential rumors.
Hao Xu, Hui Fang
Personalized Social Search Based on Agglomerative Hierarchical Graph Clustering
Abstract
This paper describes a personalized social search algorithm for retrieving multimedia contents of a consumer generated media (CGM) site having a social network service (SNS). The proposed algorithm generates cluster information on users in the social network by using an agglomerative hierarchical graph clustering, and stores it to a contents database (DB). Retrieved contents are sorted by scores calculated according to similarities of cluster information between a searcher and authors of contents. This paper also describes the evaluation experiments to confirm effectiveness of the proposed algorithm by implementing the proposed algorithm in a video retrieval system of a CGM site.
Kenkichi Ishizuka

Search

Frontmatter
Improving Session Search Performance with a Multi-MDP Model
Abstract
To fulfill some sophisticated information needs in Web search, users may submit multiple queries in a search session. Session search aims to provide an optimized document rank by utilizing query log as well as user interaction behaviors within a search session. Although a number of solutions were proposed to solve the session search problem, most of these efforts simply assume that users’ search intents stay unchanged during the search process. However, most complicated search tasks involve exploratory processes where users’ intents evolve while interacting with search results. The evolving process leaves the static search intent assumption unreasonable and hurts the performance of document rank. To shed light on this research question, we propose a system with multiple agents which adjusts its framework by a self-adaption mechanism. In the framework, each agent models the document ranking as a Markov Decision Process (MDP) and updates its parameters by Reinforcement Learning algorithms. Experimental results on TREC Session Track datasets (2013 & 2014) show the effectiveness of the proposed framework.
Jia Chen, Yiqun Liu, Cheng Luo, Jiaxin Mao, Min Zhang, Shaoping Ma
Analysis of Relevant Text Fragments for Different Search Task Types
Abstract
This paper investigates the trend of relevant text fragments by task type. The search results of fine-grained information retrieval systems propose not documents but text fragments. We hypothesize that the properties of relevant text fragments depend on the task type. To reveal these properties, we evaluate a relevant text fragment to judge (1) its granularity (e.g., word, phrase, or sentence) and (2) its structural complexity. Our analysis shows that a task type based on more complex information needs has a larger granularity of relevant text fragments. On the other hand, the complexity of task type’s information needs does not necessarily correlate with the structural complexity of the relevant text fragments.
Atsushi Keyaki, Jun Miyazaki
A FAQ Search Training Method Based on Automatically Generated Questions
Abstract
We propose a FAQ search method with automatically generated questions by a question generator created from community Q&As. In our method, a search model is trained with automatically generated questions and their corresponding FAQs. We conducted experiments on a Japanese Q&A dataset created from a user support service on Twitter. The proposed method showed better Mean Reciprocal Rank and Recall@1 than a FAQ ranking model trained with the same community Q&As.
Takuya Makino, Tomoya Noro, Hiyori Yoshikawa, Tomoya Iwakura, Satoshi Sekine, Kentaro Inui

Embedding

Frontmatter
A Neural Labeled Network Embedding Approach to Product Adopter Prediction
Abstract
On e-commerce websites, it is common to see that a user purchases products for others. The person who actually uses the product is called the adopter. Product adopter information is important for learning user interests and understanding purchase behaviors. However, effective acquisition or prediction of product adopter information has not been well studied. Existing methods mainly rely on explicit extraction patterns, and can only identify exact occurrences of adopter mentions from review data. In this paper, we propose a novel Neural Labeled Network Embedding approach (NLNE) to inferring product adopter information from purchase records. Compared with previous studies, our method does not require any review text data, but try to learn effective prediction model using only purchase records, which are easier to obtain than review data. Specially, we first propose an Adopter-labeled User-Product Network Embedding (APUNE) method to learn effective representations for users, products and adopter labels. Then, we further propose a neural prediction approach for inferring product adopter information based on the learned embeddings using APUNE. Our NLNE approach not only retains the expressive capacity of labeled network embedding, but also is endowed with the predictive capacity of neural networks. Extensive experiments on two real-world datasets (i.e., JingDong and Amazon) demonstrate the effectiveness of our model for the studied task.
Qi Gu, Ting Bai, Wayne Xin Zhao, Ji-Rong Wen
RI-Match: Integrating Both Representations and Interactions for Deep Semantic Matching
Abstract
Existing deep matching methods can be mainly categorized into two kinds, i.e. representation focused methods and interaction focused methods. Representation focused methods usually focus on learning the representation of each sentence, while interaction focused methods typically aim to obtain the representations of different interaction signals. However, both sentence level representations and interaction signals are important for the complex semantic matching tasks. Therefore, in this paper, we propose a new deep learning architecture to combine the merits of both deep matching approaches. Firstly, two kinds of word level matching matrices are constructed based on word identities and word embeddings, to capture both exact and semantic matching signals. Secondly, a sentence level matching matrix is constructed, with each element stands for the interaction between two sentence representations at corresponding positions, generated by a bidirectional long short term memory (Bi-LSTM). In this way, sentence level representations are well captured in the matching process. The above matrices are then fed into a spatial recurrent neural network (RNN), to generate the high level interaction representations. Finally, the matching score is produced by a k-Max pooling and a multilayer perceptron (MLP). Experiments on paraphrasing identification shows that our model outperforms traditional state-of-the art baselines significantly.
Lijuan Chen, Yanyan Lan, Liang Pang, Jiafeng Guo, Jun Xu, Xueqi Cheng
Modeling Relations Between Profiles and Texts
Abstract
We propose a method to model Twitter texts and user profiles simultaneously by considering the relations between the texts and profiles to obtain the distributed representations of the words in both.
Minoru Yoshida, Kazuyuki Matsumoto, Kenji Kita

Recommendation and Classification

Frontmatter
Missing Data Modeling with User Activity and Item Popularity in Recommendation
Abstract
User feedback such as movie watching history, ratings and consumptions of products, is valuable for improving the performance of recommender systems. However, only a few interactions between users and items can be observed in implicit data. The missing of a user-item entry is caused by two reasons: the user didn’t see the item (in most cases); or the user saw but disliked it. Separating these two cases leads to modeling missing interactions at a finer granularity, which is helpful in understanding users’ preferences more accurately. However, the former case has not been well-studied in previous work. Most existing studies resort to assign a uniform weight to the missing data, while such a uniform assumption is invalid in real-world settings. In this paper, we propose a novel approach to weight the missing data based on user activity and item popularity, which is more effective and flexible than the uniform-weight assumption. Experimental results based on 2 real-world datasets (Movielens, Flixster) show that our approach outperforms 3 state-of-the-art models including BPR, WMF, and ExpoMF.
Chong Chen, Min Zhang, Yiqun Liu, Shaoping Ma
Influence of Data-Derived Individualities on Persuasive Recommendation
Abstract
In this study, two machine learning based approaches have been compared that can add personal communication traits to a conversational recommender system. The first approach involves the creation of generative models for reactive tokens such as backchannels. The second approach involves a method for rewriting the conversational text by applying machine translation. Both approaches can impart personal communication traits to systems that incorporate a dialogue corpus. Two methods were implemented for a persuasive recommender system and their positive or negative effects based on an individual’s personality were experimentally analyzed through a restaurant ranking task. The results suggest that addition of personal communication traits decrease objective persuasiveness while increasing the individual’s impression on recommender systems.
Masashi Inoue, Hiroshi Ueno
Guiding Approximate Text Classification Rules via Context Information
Abstract
Human experts can often easily write a set of approximate rules based on their domain knowledge for supporting automatic text classification. While such approximate rules are able to conduct classification at a general level, they are not effective for handling diverse and specific situations for a particular category. Given a set of approximate rules and a moderate amount of labeled data, existing incremental text classification learning models can be employed for tackling this problem by continuous rule refinement. However, these models lack the consideration of context information, which inherently exists in data. We propose a framework comprising rule embeddings and context embeddings derived from data to enhance the adaptability of approximate rules via considering the context information. We conduct extensive experiments and the results demonstrate that our proposed framework performs better than existing models in some benchmarking datasets, indicating that learning the context of rules is constructive for improving text classification performance.
Wai Chung Wong, Sunny Lai, Wai Lam, Kwong Sak Leung

Medical and Multimedia

Frontmatter
Key Terms Guided Expansion for Verbose Queries in Medical Domain
Abstract
Due to the complex nature of medical concepts and information need, the queries tend to be verbose in medical domain. Verbose queries lead to sub-optimal performance since the current search engine promotes the results covering every query term, but not the truly important ones. Key term extraction has been studied to solve this problem, but another problem, i.e., vocabulary gap between query and documents, need to be discussed. Although various query expansion techniques have been well studied for the vocabulary gap problem, existing methods suffer different drawbacks such as inefficiency and expansion term mismatch. In this work, we propose to solve this problem by following the intuition that the surrounding contexts of the important terms in the original query should also be essential for retrieval. Specifically, we first identify the key terms from the verbose query and then locate the contexts of these key terms in the original document collection. The terms in the contexts are weighted and aggregated to select the expansion terms. We conduct experiments with five TREC data collections using the proposed methods. The results show that the improvement of the retrieval performance of proposed method is statistically significant comparing with the baseline methods.
Yue Wang, Hui Fang
Ad-hoc Video Search Improved by the Word Sense Filtering of Query Terms
Abstract
The performances of an ad-hoc video search (AVS) task can only be improved when the video processing for analyzing video contents and the linguistic processing for interpreting natural language queries are nicely combined. Among the several issues associated with this challenging task, this paper particularly focuses on the sense disambiguation/filtering (WSD/WSF) of the terms contained in a search query. We propose WSD/WSF methods which employ distributed sense representations, and discuss their efficacy in improving the performance of an AVS system which makes full use of a large bank of visual concept classifiers. The application of a WSD/WSF method is crucial, as each visual concept classifier is linked with the lexical concept denoted by a word sense. The results are generally promising, outperforming not only a baseline query processing method that only considers the polysemy of a query term but also a strong WSD baseline method.
Koji Hirakawa, Kotaro Kikuchi, Kazuya Ueki, Tetsunori Kobayashi, Yoshihiko Hayashi
Considering Conversation Scenes in Movie Summarization
Abstract
Given that manual video summarization is time consuming and calls for a high level of expertise, an effective automatic video summarization method is required. Although existing video summarization methods are usable for some videos, when they are applied to story-oriented videos such as movies, it sometimes becomes difficult to understand the stories from the generated summaries because they often lack continuity. In this paper, we propose a method for summarizing videos that can convey the story beyond the sequence of extracted shots so that they can fit user perception patterns. In particular, we examine the impact of conversation scenes in movie storytelling. The evaluation of summarized videos is another challenge because existing evaluation methods for text summarization cannot be directly applied to video summarization. Therefore, we propose a method for comparing summarized movies that maintains the integrity of conversation scenes with those that do not. We demonstrate how preserving conversational aspects influences the quality of summarized videos.
Masashi Inoue, Ryu Yasuhara

Best Paper Session

Frontmatter
Hierarchical Attention Network for Context-Aware Query Suggestion
Abstract
Query suggestion helps search users to efficiently express their information needs and has attracted many studies. Among the different kinds of factors that help improve query suggestion performance, user behavior information is commonly used because user’s information needs are implicitly expressed in their behavior log. However, most existing approaches focus on the exploration of previously issued queries without taking the content of clicked documents into consideration. Since many search queries are short, vague and sometimes ambiguous, these existing solutions suffer from user intent mismatch. To articulate user’s complex information needs behind the queries, we propose a hierarchical attention network which models users’ entire search interaction process for query suggestion. It is found that by incorporating the content of clicked documents, our model can suggest better queries which satisfy users’ information needs. Moreover, two levels of attention mechanisms are adopted at both word-level and session-level, which enable it to attend to important content when inferring user information needs. Experimental results based on a large-scale query log from a commercial search engine demonstrate the effectiveness of the proposed framework. In addition, the visualization of the attention layers also illustrates that informative words and important queries can be captured.
Xiangsheng Li, Yiqun Liu, Xin Li, Cheng Luo, Jian-Yun Nie, Min Zhang, Shaoping Ma

Short Papers from AIRS 2017

Frontmatter
Assigning NDLSH Headings to People on the Web
Abstract
We investigate a method that assigns National Diet Library Subject Headings (NDLSH) to the results of web people searches to help users select and understand people on the web. NDLSH is a controlled subject vocabulary list compiled and maintained by the National Diet Library (NDL) as a subject access tool. By assigning NDLSH headings to people, well-formed keywords can be assigned, and exploratory searches using related terms are possible. We examined the following combination of factors: (a) web-page rank (the number of pages), (b) position inside the HTML, (c) synonyms, and (d) document frequency. We report our experimental results for 405 combination patterns (\(5 \times 9 \times 3 \times 3\)) using our 80-person dataset. Overall, under our experimental settings, the best combination was (a) the top ten pages, (b) 100 characters before and after a person’s name (i.e., 200 characters), (c) half weight for synonyms, and (d) document frequency divided by number of web pages.
Masayuki Shimokura, Harumi Murakami
MKDS: A Medical Knowledge Discovery System Learned from Electronic Medical Records (Demonstration)
Abstract
This paper presents a medical knowledge discovery system (MKDS) that learns the medical knowledge from electronic medical records (EMRs). The distributed word representations model the relations among medical concepts such as diseases and medicines. Four tasks, including spell checking, clinical trait extraction, analogical reasoning, and computer-aided diagnosis, are demonstrated in our system.
Hen-Hsen Huang, An-Zi Yen, Hsin-Hsi Chen
Predicting Next Visited Country of Twitter Users
Abstract
We develop a classification model to predict which country will be visited next by Twitter users. In our model we incorporate a range of spatial and temporal attributes as well as we use language as one of additional, novel attributes for predicting user movement. We found that these attributes can be used to obtain a consistent classification model.
Muhammad Syafiq Mohd Pozi, Yuanyuan Wang, Panote Siriaraya, Yukiko Kawai, Adam Jatowt
Backmatter
Metadata
Title
Information Retrieval Technology
Editors
Yuen-Hsien Tseng
Dr. Tetsuya Sakai
Jing Jiang
Lun-Wei Ku
Dae Hoon Park
Jui-Feng Yeh
Liang-Chih Yu
Lung-Hao Lee
Zhi-Hong Chen
Copyright Year
2018
Electronic ISBN
978-3-030-03520-4
Print ISBN
978-3-030-03519-8
DOI
https://doi.org/10.1007/978-3-030-03520-4