Skip to main content
main-content

Über dieses Buch

This book constitutes the thoroughly refereed proceedings of the 9th Russian Summer School on Information Retrieval, RuSSIR 2015, held in Saint Petersburg, Russia, in August 2015.

The volume includes 5 tutorial papers, summarizing lectures given at the event, and 6 revised papers from the school participants. The papers focus on various aspects of information retrieval.

Inhaltsverzeichnis

Frontmatter

Tutorial Papers

Frontmatter

Contextual Search and Exploration

Personalized (mobile) devices are radically changing information access tools, with rich context allowing for far more powerful, personalized search. Rather than retrieving a “document” on the topic of a “query,” the rich contextual information allows for tailored search and recommendation, and solve user’s complex tasks by taking into account complex constraints, exploring options, and combining individual answers into a coherent whole. This paper reports on a RuSSIR 2015 course covering the challenges of contextual search and recommendation, with a concrete focus on the venue recommendation task as run as part of TREC 2012–2015. It consisted of both lectures and hands-on “hackathon” sessions with data derived from the TREC task.
Julia Kiseleva, Jaap Kamps, Charles L. A. Clarke

A Tutorial on Leveraging Knowledge Graphs for Web Search

Knowledge Graphs are large repositories of structured information about entities like persons, locations, and organizations and their relations. Modern Web search engines leverage such background Knowledge Graphs to create rich search engine result pages for entity-centric search queries.
In this document we provide an introduction to Knowledge Graphs and their application to search-related problems. We present techniques to search for entities instead of documents as answer to a search query. Finally we present human computation techniques to build hybrid human-machine systems to solve entity-oriented search tasks making use of Knowledge Graphs.
Gianluca Demartini

A Short Survey on Online and Offline Methods for Search Quality Evaluation

Evaluation has always been the cornerstone of scientific development. Scientists come up with hypotheses (models) to explain physical phenomena, and validate these models by comparing their output to observations in nature. A scientific field consists then merely by a collection of hypotheses that could not been disproved (yet) when compared to nature. Evaluation plays the exact key role in the field of information retrieval. Researchers and practitioners develop models to explain the relation between an information need expressed by a person and information contained in available resources, and test these models by comparing their outcomes to collections of observations.
This article is a short survey on methods, measures, and designs used in the field of Information Retrieval to evaluate the quality of search algorithms (aka the implementation of a model) against collections of observations. The phrase “search quality” has more than one interpretations, however here I will only discuss one of these interpretations, the effectiveness of a search algorithm to find the information requested by a user. There are two types of collections of observations used for the purpose of evaluation: (a) relevance annotations, and (b) observable user behaviour. I will call the evaluation framework based on the former a collection-based evaluation, while the one based on the latter an in-situ evaluation.
This survey is far from complete; it only presents my personal viewpoint on the recent developments in the field.
Evangelos Kanoulas

Data Science for Massive Networks

In this chapter we attempt to briefly describe a history of massive networks, their place in modern life, and discuss open problems related to them. We start with giving a historical overview indicating the most influential milestones in the development of networks. Then we consider how real-life massive datasets can be represented in terms of networks describing some examples and summarizing properties of such networks. We also discuss cases of modeling real-life massive networks. In addition, we give some examples of how to optimize in massive networks and in which areas we can apply these techniques. We conclude by discussing open problems of massive networks.
Anton Kocheturov, Panos M. Pardalos

Models of Random Graphs and Their Applications to the Web-Graph Analysis

This course provides an overview of various models for random graphs and their applications to the Web graph. We start with the classical Erdős-Rényi model, then proceed with the most recent models describing the topology and growth of the Internet, social networks, economic network, and biological networks, and finally present several applications of these models to the problems of search and crawling.
Andrei Raigorodskii

Young Scientist Conference Papers

Frontmatter

Who Are My Ancestors? Retrieving Family Relationships from Historical Texts

This paper presents an approach for automatically retrieving family relationships from a real-world collection of Dutch historical notary acts. We aim to retrieve relationships like husband - wife, parent - child, widow of, etc. Our approach includes person names extraction, reference disambiguation, candidate generation and family relationship prediction. Since we have a limited amount of training data, we evaluate different feature configurations based on the n-gram analysis. The best results were obtained by using a combination of bi-grams and tri-grams of words together with the distance in words between two names. We evaluate our results for each type of the relationships in terms of precision, recall and \(f-score\).
Julia Efremova, Alejandro Montes García, Alfredo Bolt Iriondo, Toon Calders

Exploiting Semantic Annotation of Content with Linked Open Data (LoD) to Improve Searching Performance in Web Repositories of Multi-disciplinary Research Data

Searching for relevant information in multi-disciplinary repositories of scientific research data is becoming a challenge for research communities such as the Social Sciences. Researchers use the available keywords-based online search, which often fall short of meeting the desired search results given the known issues of content heterogeneity, volume of data and terminological obsolescence. This leads to a number of problems including insufficient content exposure, unsatisfied researchers and above all dwindling confidence in such repositories of invaluable knowledge. In this paper, we explore the appropriateness of alternative searching based on Linked Open Data (LoD)-based semantic annotation and indexing in online repositories such as the ReStore repository (ReStore repository is an online service hosting and maintaining web resources containing data about multidisciplinary research in Social Sciences. Available at http://​www.​restore.​ac.​uk.). We explore websites content annotations using LoD to generate contemporary semantic annotations. We investigate if we can improve accuracy and relevance in search results affected by concepts and terms obsolescence in repositories of scientific content.
Arshad Khan, Thanassis Tiropanis, David Martin

Construction of a Russian Paraphrase Corpus: Unsupervised Paraphrase Extraction

This paper presents a crowdsourcing project on the creation of a publicly available corpus of sentential paraphrases for Russian. Collected from the news headlines, such corpus could be applied for information extraction and text summarization. We collect news headlines from different agencies in real-time; paraphrase candidates are extracted from the headlines using an unsupervised matrix similarity metric. We provide user-friendly online interface for crowdsourced annotation which is available at paraphraser.​ru. There are 5181 annotated sentence pairs at the moment, with 4758 of them included in the corpus. The annotation process is going on and the current version of the corpus is freely available at http://​paraphraser.​ru.
Ekaterina Pronoza, Elena Yagunova, Anton Pronoza

Using Levenshtein Distance for Typical User Actions and Search Engine Switching Detection

This paper presents a new approach in automatic grouping of user search sessions. K-medoids clustering algorithm and Levenshtein distance function were used to group search sessions. We show that the groups obtained are meaningful and can be used to estimate the probability of user switching to another search engine. The proposed method was tested on real data provided by Yandex for 2012 Yandex Switching Detection Challenge and allowed for high AUC value (0.82 on internal tests). One more advantage of the presented approach is the possibility to visualize typical sequences of user action for simplified analyses of the data set.
Alexey Raskin, Petr Rudakov

Detecting Opinion Polarisation on Twitter by Constructing Pseudo-Bimodal Networks of Mentions and Retweets

We present a novel approach to analyze and visualize opinion polarisation on Twitter based on graph features of communication networks extracted from tweets. We show that opinion polarisation can be legibly observed on unimodal projections of artificially created bimodal networks, where the most popular users in retweet and mention networks are considered nodes of the second mode. For this purpose, we select a subset of top users based on their PageRank values and assign them to be the second mode in our networks, thus called pseudo-bimodal. After projecting them onto the set of “bottom” users and vice versa, we get unimodal networks with more distinct clusters and visually coherent community separation. We developed our approach on a dataset gathered during the Russian protest meetings on 24th of December, 2011 and tested it on another dataset by Conover [13] used to analyze political polarisation, showing that our approach not only works well on our data but also improves the results from previous research on that phenomena.
Igor Zakhlebin, Aleksandr Semenov, Alexander Tolmach, Sergey Nikolenko

Languages of Russia: Using Social Networks to Collect Texts

In this paper we outline a method of finding texts in minor languages of Russia in social networks by the example of VKontakte. We find language-specific markers – special tokens that contain letter combinations unique to a certain language and highly frequent in texts in this language. We use Yandex.XML to generate lists of web-pages that contain texts in these languages. We then download data from web-pages in the https://​vk.​com domain through Vkontakte API.
Irina Krylova, Boris Orekhov, Ekaterina Stepanova, Lyudmila Zaydelman

Backmatter

Weitere Informationen

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

Whitepaper

- ANZEIGE -

Globales Erdungssystem in urbanen Kabelnetzen

Bedingt durch die Altersstruktur vieler Kabelverteilnetze mit der damit verbundenen verminderten Isolationsfestigkeit oder durch fortschreitenden Kabelausbau ist es immer häufiger erforderlich, anstelle der Resonanz-Sternpunktserdung alternative Konzepte für die Sternpunktsbehandlung umzusetzen. Die damit verbundenen Fehlerortungskonzepte bzw. die Erhöhung der Restströme im Erdschlussfall führen jedoch aufgrund der hohen Fehlerströme zu neuen Anforderungen an die Erdungs- und Fehlerstromrückleitungs-Systeme. Lesen Sie hier über die Auswirkung von leitfähigen Strukturen auf die Stromaufteilung sowie die Potentialverhältnisse in urbanen Kabelnetzen bei stromstarken Erdschlüssen. Jetzt gratis downloaden!

Bildnachweise