Skip to main content

2016 | Buch

Information Retrieval

9th Russian Summer School, RuSSIR 2015, Saint Petersburg, Russia, August 24-28, 2015, Revised Selected Papers

herausgegeben von: Pavel Braslavski, Ilya Markov, Panos Pardalos, Yana Volkovich, Dmitry I. Ignatov, Sergei Koltsov, Olessia Koltsova

Verlag: Springer International Publishing

Buchreihe : Communications in Computer and Information Science

insite
SUCHEN

Über dieses Buch

This book constitutes the thoroughly refereed proceedings of the 9th Russian Summer School on Information Retrieval, RuSSIR 2015, held in Saint Petersburg, Russia, in August 2015.

The volume includes 5 tutorial papers, summarizing lectures given at the event, and 6 revised papers from the school participants. The papers focus on various aspects of information retrieval.

Inhaltsverzeichnis

Frontmatter

Tutorial Papers

Frontmatter
Contextual Search and Exploration
Abstract
Personalized (mobile) devices are radically changing information access tools, with rich context allowing for far more powerful, personalized search. Rather than retrieving a “document” on the topic of a “query,” the rich contextual information allows for tailored search and recommendation, and solve user’s complex tasks by taking into account complex constraints, exploring options, and combining individual answers into a coherent whole. This paper reports on a RuSSIR 2015 course covering the challenges of contextual search and recommendation, with a concrete focus on the venue recommendation task as run as part of TREC 2012–2015. It consisted of both lectures and hands-on “hackathon” sessions with data derived from the TREC task.
Julia Kiseleva, Jaap Kamps, Charles L. A. Clarke
A Tutorial on Leveraging Knowledge Graphs for Web Search
Abstract
Knowledge Graphs are large repositories of structured information about entities like persons, locations, and organizations and their relations. Modern Web search engines leverage such background Knowledge Graphs to create rich search engine result pages for entity-centric search queries.
In this document we provide an introduction to Knowledge Graphs and their application to search-related problems. We present techniques to search for entities instead of documents as answer to a search query. Finally we present human computation techniques to build hybrid human-machine systems to solve entity-oriented search tasks making use of Knowledge Graphs.
Gianluca Demartini
A Short Survey on Online and Offline Methods for Search Quality Evaluation
Abstract
Evaluation has always been the cornerstone of scientific development. Scientists come up with hypotheses (models) to explain physical phenomena, and validate these models by comparing their output to observations in nature. A scientific field consists then merely by a collection of hypotheses that could not been disproved (yet) when compared to nature. Evaluation plays the exact key role in the field of information retrieval. Researchers and practitioners develop models to explain the relation between an information need expressed by a person and information contained in available resources, and test these models by comparing their outcomes to collections of observations.
This article is a short survey on methods, measures, and designs used in the field of Information Retrieval to evaluate the quality of search algorithms (aka the implementation of a model) against collections of observations. The phrase “search quality” has more than one interpretations, however here I will only discuss one of these interpretations, the effectiveness of a search algorithm to find the information requested by a user. There are two types of collections of observations used for the purpose of evaluation: (a) relevance annotations, and (b) observable user behaviour. I will call the evaluation framework based on the former a collection-based evaluation, while the one based on the latter an in-situ evaluation.
This survey is far from complete; it only presents my personal viewpoint on the recent developments in the field.
Evangelos Kanoulas
Data Science for Massive Networks
Abstract
In this chapter we attempt to briefly describe a history of massive networks, their place in modern life, and discuss open problems related to them. We start with giving a historical overview indicating the most influential milestones in the development of networks. Then we consider how real-life massive datasets can be represented in terms of networks describing some examples and summarizing properties of such networks. We also discuss cases of modeling real-life massive networks. In addition, we give some examples of how to optimize in massive networks and in which areas we can apply these techniques. We conclude by discussing open problems of massive networks.
Anton Kocheturov, Panos M. Pardalos
Models of Random Graphs and Their Applications to the Web-Graph Analysis
Abstract
This course provides an overview of various models for random graphs and their applications to the Web graph. We start with the classical Erdős-Rényi model, then proceed with the most recent models describing the topology and growth of the Internet, social networks, economic network, and biological networks, and finally present several applications of these models to the problems of search and crawling.
Andrei Raigorodskii

Young Scientist Conference Papers

Frontmatter
Who Are My Ancestors? Retrieving Family Relationships from Historical Texts
Abstract
This paper presents an approach for automatically retrieving family relationships from a real-world collection of Dutch historical notary acts. We aim to retrieve relationships like husband - wife, parent - child, widow of, etc. Our approach includes person names extraction, reference disambiguation, candidate generation and family relationship prediction. Since we have a limited amount of training data, we evaluate different feature configurations based on the n-gram analysis. The best results were obtained by using a combination of bi-grams and tri-grams of words together with the distance in words between two names. We evaluate our results for each type of the relationships in terms of precision, recall and \(f-score\).
Julia Efremova, Alejandro Montes García, Alfredo Bolt Iriondo, Toon Calders
Exploiting Semantic Annotation of Content with Linked Open Data (LoD) to Improve Searching Performance in Web Repositories of Multi-disciplinary Research Data
Abstract
Searching for relevant information in multi-disciplinary repositories of scientific research data is becoming a challenge for research communities such as the Social Sciences. Researchers use the available keywords-based online search, which often fall short of meeting the desired search results given the known issues of content heterogeneity, volume of data and terminological obsolescence. This leads to a number of problems including insufficient content exposure, unsatisfied researchers and above all dwindling confidence in such repositories of invaluable knowledge. In this paper, we explore the appropriateness of alternative searching based on Linked Open Data (LoD)-based semantic annotation and indexing in online repositories such as the ReStore repository (ReStore repository is an online service hosting and maintaining web resources containing data about multidisciplinary research in Social Sciences. Available at http://​www.​restore.​ac.​uk.). We explore websites content annotations using LoD to generate contemporary semantic annotations. We investigate if we can improve accuracy and relevance in search results affected by concepts and terms obsolescence in repositories of scientific content.
Arshad Khan, Thanassis Tiropanis, David Martin
Construction of a Russian Paraphrase Corpus: Unsupervised Paraphrase Extraction
Abstract
This paper presents a crowdsourcing project on the creation of a publicly available corpus of sentential paraphrases for Russian. Collected from the news headlines, such corpus could be applied for information extraction and text summarization. We collect news headlines from different agencies in real-time; paraphrase candidates are extracted from the headlines using an unsupervised matrix similarity metric. We provide user-friendly online interface for crowdsourced annotation which is available at paraphraser.​ru. There are 5181 annotated sentence pairs at the moment, with 4758 of them included in the corpus. The annotation process is going on and the current version of the corpus is freely available at http://​paraphraser.​ru.
Ekaterina Pronoza, Elena Yagunova, Anton Pronoza
Using Levenshtein Distance for Typical User Actions and Search Engine Switching Detection
Abstract
This paper presents a new approach in automatic grouping of user search sessions. K-medoids clustering algorithm and Levenshtein distance function were used to group search sessions. We show that the groups obtained are meaningful and can be used to estimate the probability of user switching to another search engine. The proposed method was tested on real data provided by Yandex for 2012 Yandex Switching Detection Challenge and allowed for high AUC value (0.82 on internal tests). One more advantage of the presented approach is the possibility to visualize typical sequences of user action for simplified analyses of the data set.
Alexey Raskin, Petr Rudakov
Detecting Opinion Polarisation on Twitter by Constructing Pseudo-Bimodal Networks of Mentions and Retweets
Abstract
We present a novel approach to analyze and visualize opinion polarisation on Twitter based on graph features of communication networks extracted from tweets. We show that opinion polarisation can be legibly observed on unimodal projections of artificially created bimodal networks, where the most popular users in retweet and mention networks are considered nodes of the second mode. For this purpose, we select a subset of top users based on their PageRank values and assign them to be the second mode in our networks, thus called pseudo-bimodal. After projecting them onto the set of “bottom” users and vice versa, we get unimodal networks with more distinct clusters and visually coherent community separation. We developed our approach on a dataset gathered during the Russian protest meetings on 24th of December, 2011 and tested it on another dataset by Conover [13] used to analyze political polarisation, showing that our approach not only works well on our data but also improves the results from previous research on that phenomena.
Igor Zakhlebin, Aleksandr Semenov, Alexander Tolmach, Sergey Nikolenko
Languages of Russia: Using Social Networks to Collect Texts
Abstract
In this paper we outline a method of finding texts in minor languages of Russia in social networks by the example of VKontakte. We find language-specific markers – special tokens that contain letter combinations unique to a certain language and highly frequent in texts in this language. We use Yandex.XML to generate lists of web-pages that contain texts in these languages. We then download data from web-pages in the https://​vk.​com domain through Vkontakte API.
Irina Krylova, Boris Orekhov, Ekaterina Stepanova, Lyudmila Zaydelman
Backmatter
Metadaten
Titel
Information Retrieval
herausgegeben von
Pavel Braslavski
Ilya Markov
Panos Pardalos
Yana Volkovich
Dmitry I. Ignatov
Sergei Koltsov
Olessia Koltsova
Copyright-Jahr
2016
Electronic ISBN
978-3-319-41718-9
Print ISBN
978-3-319-41717-2
DOI
https://doi.org/10.1007/978-3-319-41718-9

Neuer Inhalt