Skip to main content

Über dieses Buch

This book constitutes the thoroughly refereed post-conference proceedings of the 10th International Conference on Adaptive Multimedia Retrieval, AMR 2012, held in Copenhagen, Denmark, in October 2012.

The 17 revised full papers presented were carefully reviewed and selected from numerous submissions. The papers cover topics of state of the art contributions, features and classification, location context, language and semantics, music retrieval, and adaption and HCI.



State-of-the-Art Contributions


Defining and Applying a Language for Discovery

In order to design better search experiences, we need to understand the complexities of human information-seeking behaviour. In this paper, we propose a model of information behaviour based on the needs of users across a range of search and discovery scenarios. The model consists of a set of modes that that users employ to satisfy their information goals.
We discuss how these modes relate to existing models of human information seeking behaviour, and identify areas where they differ. We then examine how they can be applied in the design of interactive systems, and present examples where individual modes have been implemented in interesting or novel ways. Finally, we consider the ways in which modes combine to form distinct chains or patterns of behaviour, and explore the use of such patterns both as an analytical tool for understanding information behaviour and as a generative tool for designing search and discovery experiences.
Tony Russell-Rose, Joe Lamantia, Stephann Makri

A Survey of Evaluation in Music Genre Recognition

Much work is focused upon music genre recognition (MGR) from audio recordings, symbolic data, and other modalities. While reviews have been written of some of this work before, no survey has been made of the approaches to evaluating approaches to MGR. This paper compiles a bibliography of work in MGR, and analyzes three aspects of evaluation: experimental designs, datasets, and figures of merit.
Bob L. Sturm

The Reason Why: A Survey of Explanations for Recommender Systems

Recommender Systems refer to those applications that offer contents or items to the users, based on their previous activity. These systems are broadly used in several fields and applications, being common that an user interact with several recommender systems during his daily activities. However, most of these systems are black boxes which users really don’t understand how to work. This lack of transparency often causes the distrust of the users. A suitable solution is to offer explanations to the user about why the system is offering such recommendations. This work deals with the problem of retrieving and evaluating explanations based on hybrid recommenders. These explanations are meant to improve the perceived recommendation quality from the user’s perspective. Along with recommended items, explanations are presented to the user to underline the quality of the recommendation. Hybrid recommenders should express relevance by providing reasons speaking for a recommended item. In this work we present an attribute explanation retrieval approach to provide these reasons and show how to evaluate such approaches. Therefore, we set up an online user study where users were asked to provide movie feedback. For each rated movie we additionally retrieved feedback about the reasons this movie was liked or disliked. With this data, explanation retrieval can be studied in general, but it can also be used to evaluate such explanations.
Christian Scheel, Angel Castellanos, Thebin Lee, Ernesto William De Luca

Features and Classification


Cross-Dataset Learning of Visual Concepts

Visual content classification has become a keystone when opening up digital image archives to semantic search. Content-based explicit metadata often is only sparsely available and automated analysis of the depicted content therefore provides an important source of additional information. While visual content classification has proven beneficial, a major concern, however, is the dependency on large scale training data required to train robust classifiers. In this paper, we analyze the use of cross-dataset training samples to increase the classification performance. We investigate the performance of standardized manually annotated training sets as well automatically mined datasets from potentially unreliable web resources such as Flickr and Google Images. Next to brute force learning using this potentially noisy ground truth data we apply semantic post processing for data cleansing and topic disambiguation. We evaluate our results on standardized datasets by comparing our classification performance with proper ground truth-based classification results.
Christian Hentschel, Harald Sack, Nadine Steinmetz

Optimized SIFT Feature Matching for Image Retrieval

Applying SIFT features for retrieval of visual data not only requires proper settings for the descriptor extraction but also needs well selected parameters for comparing these descriptors. Most researchers simply apply the standard values of the parameters without an adequate analysis of the parameters themselves. In this paper, we question the standard parameter settings and investigate the influence of the important comparison parameters. Based on the analysis on diverse data sets using different interest point detectors, we finally present an optimized combination of matching parameters which outperforms the standard values. We observe that two major parameters, i.e., distmax and ratiomax seem to have similar outcomes on different datasets of diverse nature for the application of scene retrieval. Thus, this paper shows that there is an almost global setting for these two parameters for local feature matching. The outcomes of this work can also apply to other tasks like video analysis and object retrieval.
Christian Schulze, Marcus Liwicki

Representativeness and Diversity in Photos via Crowd-Sourced Media Analysis

In this paper we address the problem of user-adapted image retrieval. First, we provide a survey of the performance of the existing social media retrieval platforms and highlight their limitations. In this context, we propose a hybrid, two step, machine and human automated media analysis approach. It aims to improve retrieval relevance by selecting a small number of representative and diverse images from a noisy set of candidate images (e.g. the case of Internet media). In the machine analysis step, to ensure representativeness, images are re-ranked according to the similarity to the “most common” image in the set. Further, to ensure also the diversity of the results, images are clustered and the best ranked images among the most representative in each cluster are retained. The human analysis step aims to bridge further inherent descriptor semantic gap. The retained images are further refined via crowd-sourcing which adapts the results to human. The method was validated in the context of the retrieval of images with monuments using a data set of more than 25.000 images retrieved from various social image search platforms.
Anca-Livia Radu, Julian Stöttinger, Bogdan Ionescu, María Menéndez, Fausto Giunchiglia

Location Context


Exploring Geospatial Music Listening Patterns in Microblog Data

Microblogs are a steadily growing, valuable, albeit noisy, source of information on interests, preferences, and activities. As music plays an important role in many human lives we aim to leverage microblogs for music listening-related information. Based on this information we present approaches to estimate artist similarity, popularity, and local trends, as well as approaches to cluster artists with respect to additional tag information. Furthermore, we elaborate a novel geo-aware interaction approach that integrates these diverse pieces of information mined from music-related tweets. Including geospatial information at the level of tweets, we also present a web-based user interface to browse the “world of music” as seen by the “Twittersphere”.
David Hauger, Markus Schedl

A Study into Annotation Ranking Metrics in Community Contributed Image Corpora

Community contributed datasets are becoming increasing common in automated image annotation systems. One important issue with community image data is that there is no guarantee that the associated metadata is relevant. A method is required that can accurately rank the semantic relevance of community annotations. This should enable the extracting of relevant subsets from potentially noisy collections of these annotations. Having relevant, non-heterogeneous tags assigned to images should improve community image retrieval systems, such as Flickr, which are based on text retrieval methods. In the literature, the current state of the art approach to ranking the semantic relevance of Flickr tags is based on the widely used tf-idf metric. In the case of datasets containing landmark images, however, this metric is inefficient and can be improved upon. In this paper, we present a landmark recognition framework, that provides end-to-end automated recognition and annotation. In our study into automated annotation, we evaluate 5 alternate approaches to tf-idf to rank tag relevance in community contributed landmark image corpora. We carry out a thorough evaluation of each of these ranking metrics and results of this evaluation demonstrate that four of these proposed techniques outperform the current commonly-used tf-idf approach for this task. Our best performing evaluated approach achieves a significant F-Measure increase of .19 over tf-idf.
Mark Hughes, Gareth J. F. Jones, Noel E. O’Connor

Language and Semantics


Semantic Indexing for Efficient Retrieval of Multimedia Data

We present a novel approach, called SemI, to semantic indexing of annotated multimedia objects for their efficient retrieval. The generation of multimedia indices with SemI relies on the semantic annotation of these objects with references to concepts formally defined in standard OWL2 and semantic services described in OWL-S. For scoring the annotated multimedia data in these indices an appropriate semantic similarity measure makes use of approximated logical concept abduction in order to alleviate strict logical false negatives. Efficient query answering over SemI indices is performed with the use of Fagin’s threshold algorithm. The results of our comparative experimental evaluation reveals that SemI-enabled multimedia retrieval can significantly outperform representative approaches of LSA- and RDF-based semantic retrieval in this domain in terms of precision at recall, averaged precision and discounted cumulative gain.
Xiaoqi Cao, Matthias Klusch

A Proof-of-Concept for Orthographic Named Entity Correction in Spanish Voice Queries

Automatic speech recognition (ASR) systems are not able to recognize entities that are not present in its vocabulary. The problem considered in this paper is the misrecognition of named entities in Spanish voice queries introducing a proof-of-concept for named entity correction that provides alternative entities to the ones incorrectly recognized or misrecognized by retrieving entities phonetically similar from a dictionary. This system is domain-dependent, using sports news, especially football news, regardless of the automatic speech recognition system used. The correction process exploits the query structure and its semantic information to detect where a named entity appears. The system finds the most suitable alternative entity from a dictionary previously generated with the existing named entities.
Julián Moreno Schneider, José Luis Martínez Fernández, Paloma Martínez

Music Retrieval


From Improved Auto-Taggers to Improved Music Similarity Measures

This paper focuses on the relation between automatic tag prediction and music similarity. Intuitively music similarity measures based on auto-tags should profit from the improvement of the quality of the underlying audio tag predictors. We present classification experiments that verify this claim. Our results suggest a straight forward way to further improve content-based music similarity measures by improving the underlying auto-taggers.
Klaus Seyerlehner, Markus Schedl, Reinhard Sonnleitner, David Hauger, Bogdan Ionescu

Ambiguity in Automatic Chord Transcription: Recognizing Major and Minor Chords

Automatic chord transcription is the process of transforming the harmonic content of a music signal into chord symbols. We use difficult chord transcription cases in the Beatles material to compare human performance to computer performance. Surprisingly, in many cases musically oriented participants are unable to determine whether the chord is major or minor. We further analyze ambiguous chords and find out that there are often no clear rules for chord interpretation. This suggests that the standard evaluation method in automatic chord transcription based on a single ground truth is inadequate.
Antti Laaksonen

Capturing the Temporal Domain in Echonest Features for Improved Classification Effectiveness

This paper proposes Temporal Echonest Features to harness the information available from the beat-aligned vector sequences of the features provided by The Echo Nest. Rather than aggregating them via simple averaging approaches, the statistics of temporal variations are analyzed and used to represent the audio content. We evaluate the performance on four traditional music genre classification test collections and compare them to state of the art audio descriptors. Experiments reveal, that the exploitation of temporal variability from beat-aligned vector sequences and combinations of different descriptors leads to an improvement of classification accuracy. Comparing the results of Temporal Echonest Features to those of approved conventional audio descriptors used as benchmarks, these approaches perform well, often significantly outperforming their predecessors, and can be effectively used for large scale music genre classification.
Alexander Schindler, Andreas Rauber

Adaptation and HCI


SKETCHify – An Adaptive Prominent Edge Detection Algorithm for Optimized Query-by-Sketch Image Retrieval

Query-by-Sketch image retrieval, unlike content based image retrieval following a Query-by-Example approach, uses human-drawn binary sketches as query objects, thereby eliminating the need for an initial query image close enough to the users’ information need. This is particularly important when the user is looking for a known image, i.e., an image that has been seen before. So far, Query-by-Sketch has suffered from two main limiting factors. First, users tend to focus on the objects’ main contours when drawing binary sketches, while ignoring any texture or edges inside the object(s) and in the background. Second, the users’ limited ability to sketch the known item being searched for in the correct position, scale and/or orientation. Thus, effective Query-by-Sketch systems need to allow users to concentrate on the main contours of the main object(s) they are searching for and, at the same time, tolerate such inaccuracies. In this paper, we present SKETCHify, an adaptive algorithm that is able to identify and isolate the prominent objects within an image. This is achieved by applying heuristics to detect the best edge map thresholds for each image by monitoring the intensity, spatial distribution and sudden spike increase of edges with the intention of generating edge maps that are as close as possible to human-drawn sketches. We have integrated SKETCHify into QbS, our system for Query-by-Sketch image retrieval, and the results show a significant improvement in both retrieval rank and retrieval time when exploiting the prominent edges for retrieval, compared to Query-by-Sketch relying on normal edge maps. Depending on the quality of the query sketch, SKETCHify even allows to provide invariances with regard to position, scale and rotation in the retrieval process. For the evaluation, we have used images from the MIRFLICKR-25K dataset and a free clip art collection of similar size.
Ihab Al Kabary, Heiko Schuldt

Adaptive Temporal Modeling of Audio Features in the Context of Music Structure Segmentation

This paper describes a method for automatically adapting the length of the temporal modeling applied to audio features in the context of music structure segmentation. By detecting regions of homogeneous acoustical content and abrupt changes in the audio feature sequence, we show that we can consequently adapt temporal modeling to capture both fast- and slow- varying structural information in the audio signal. Evaluation of the method shows that temporal modeling is consistently adapted to different musical contexts, allowing for robust music structure segmentation while gaining independence regarding parameter tuning.
Florian Kaiser, Geoffroy Peeters

Detector Performance Prediction Using Set Annotations

Content-based videos search engines often use the output of concept detectors to answer queries. The improvement of detectors requires computational power and human labor. It is therefore important to predict detector performance economically and improve detectors adaptively. Detector performance prediction, however, has not received much research attention so far. In this paper, we propose a prediction approach that uses human annotators. The annotators estimate the number of images in a grid in which a concept is present, a task that can be performed efficiently. Using these estimations, we define a model for the posterior probability of a concept being present given its confidence score. We then use the model to predict the average precision of a detector. We evaluate our approach using a TRECVid collection of Internet archive videos, comparing it to an approach that labels individual images. Our approach requires fewer resources while achieving good prediction quality.
Robin Aly, Martha Larson

Personas – The Missing Link Between User Simulations and User-Centered Design?

Linking the Persona-Based Design of Adaptive Multimedia Retrieval Systems with User Simulations
In order to establish a reproducible evaluation setup in interactive information retrieval (IIR), user simulations have been suggested. Unlike the inclusion of “real” users into the evaluation loop, user simulations scale well, are not affected by learning or tiring effects of the probands, and can be conducted at low cost.
Unfortunately, the evaluation utilizing user simulations often takes place after the IIR system has been fully implemented. As such, it cannot give valuable feedback during the design phase of the system. In this paper, we propose a methodology for linking the persona-based approach from user interaction design with the field of IIR evaluation to address this problem.
To illustrate its utility, a user-centered multimedia retrieval scenario – the ImageCLEF 2012 pilot task on personal photo retrieval – is used as an example of the usage of the proposed evaluation methodology. To conclude with, we discuss the current limitations of the approach and address open issues such as the incorporation of multiple search strategies into user simulations.
David Zellhöfer


Weitere Informationen