Supporting exploratory video retrieval tasks with grouping and recommendation

doi:10.1016/j.ipm.2014.06.004

Information Processing & Management

Volume 50, Issue 6, November 2014, Pages 876-898

https://doi.org/10.1016/j.ipm.2014.06.004 Get rights and content

Highlights

•
Combine grouping of video search results with recommendation techniques to assist video retrieval.
•
Evaluate grouping and recommendation techniques in separate evaluations to assess impact.
•
Different recommendation approaches are relevant to the users at different stages of their search.
•
Organisational and recommendation functionalities can result in a significant improvement on the users’ search performance.

Abstract

In this paper, we present ViGOR (Video Grouping, Organisation and Recommendation), an exploratory video retrieval system. Exploratory video retrieval tasks are hampered by the lack of semantics associated to video and the overwhelming amount of video items stored in these types of collections (e.g. YouTube, MSN video, etc.). In order to help facilitate these exploratory video search tasks we present a system that utilises two complementary approaches: the first a new search paradigm that allows the semantic grouping of videos and the second the exploitation of past usage history in order to provide video recommendations. We present two types of recommendation techniques adapted to the grouping search paradigm: the first is a global recommendation, which couples the multi-faceted nature of explorative video retrieval tasks with the current user need of information in order to provide recommendations, and second is a local recommendation, which exploits the organisational features of ViGOR in order to provide more localised recommendations based on a specific aspect of the user task. Two user evaluations were carried out in order to (1) validate the new search paradigm provided by ViGOR, characterised by the grouping functionalities and (2) evaluate the usefulness of the proposed recommendation approaches when integrated into ViGOR. The results of our evaluations show (1) that the grouping, organisational and recommendation functionalities can result in an improvement in the users’ search performance without adversely impacting their perceptions of the system and (2) that both recommendation approaches are relevant to the users at different stages of their search, showing the importance of using multi-faceted recommendations for video retrieval systems and also illustrating the many uses of collaborative recommendations for exploratory video search tasks.

Introduction

As a result of the improving capabilities and the declining prices of current hardware systems, there are increasing possibilities to store and manipulate videos in a digital format. People now build their own digital libraries from materials created through digital cameras and camcorders, and use a number of systems to place this material on the web. However, the systems that currently exist to organise and retrieve these videos are insufficient for dealing with such large and increasing volumes of video. In particular, there is a growing need to develop tools and techniques to assist users in the complex task of searching for video; this is particularly true online with the increasing growth of online video search systems.

Current state of the art video retrieval systems rely on textual descriptions or methods that use low-level features (e.g., visual features such as colour, shape, or texture; audio features such as the Fourier transform or pitch; and additional features such as automatic speech recognition (ASR) or optical character recognition (OCR)) to find relevant videos within a large collection. Neither of these methods is sufficient to overcome the problems associated with video search. On the one hand, query by text relies on the availability of sufficient textual descriptions of the video and its content, resulting in a heavy system dependence on users providing relevant text descriptions and annotations. The main drawback of this approach is that often users can have very different perceptions about the same video and annotate that video differently (Guy, Tonkin, & Folksonomies, 2006), which makes it difficult for different users to retrieve the same video. It has also been found that users are reluctant to provide an abundance of annotations unless there is some benefit to the user (Halvey & Keane, 2007), resulting in a lack of available textual annotations. On the other hand, the difference between the low-level data representation of videos and the higher level concepts users associate with video, commonly known as the semantic gap (Smeulders, Worring, Santini, Gupta, & Jain, 2002), provides difficulties for using these low-level features. Consequently, while these low-level features are used in some state of the art systems, most online video retrieval systems (e.g. YouTube¹ or Blinkx²) rely only on query by text.

In order to alleviate some of these problems associated with video search we have developed ViGOR, a video retrieval system that allows users to create semantic groups of results to help conceptualise and organise their results for complex video search tasks. This interactive grouping is a flexible means for a user to illustrate their multi-faceted information needs. Multi-faceted information needs mean that the task that the user is conducting can be considered to be multiple specific tasks; it could also be considered that multi-faceted search tasks/information needs can have multiple solutions. Specific information needs can be related to short term information needs as the user is focused on one particular aspect of their search task. The grouping facilities also allow the user to focus on one particular aspect of a global task as they can focus on specific (or short-term) information needs while still solving the overall multi-faceted (or long-term) information need as embodied by their search task. We believe that the semantic gap is narrowed by this abstraction to high-level semantic groupings reflecting an individual’s task-specific mental model of the data and a more flexible user interaction with the video collection, thus the user is focused more on interaction with the data and less on the mechanics of their search. We also believe that the use of this system can result in a number of desirable outcomes for users: improved user performance in terms of task completion and task exploration, and increased user satisfaction with their search and their search results.

In addition, the interactions available in ViGOR make it an ideal system with which to integrate some recommendation techniques. We believe that many of the problems associated with searching large collections of video can be alleviated through the use of recommendation techniques. Recommendation techniques can offer a work around for the problems associated with the semantic gap and the unreliability of textual descriptions, as they utilise additional information about user interaction that is already available in many systems. However, it is also imperative that the recommendations relate to as many aspects of a user task as possible so as to ensure that the recommendations present the user with a diverse set of results that encompass as many interpretations of the user actions as possible. To that end, we have developed a recommendation approach that utilises the implicit actions involved in previous user searches to create a predictive model that can provide multi-faceted and diverse recommendations to assist users in completing their difficult search tasks. Our recommendations encompass many interpretations of user actions and numerous videos that users may not have seen using normal query methods. Providing these recommendations is not trivial, as due to the complex and difficult search process for video, implicit feedback from video search is quite noisy (Smeulders et al., 2002). However, we believe that this problem can be overcome by utilising collaborative recommendation techniques. In particular we believe that our approach of modelling many aspects of user needs via implicit user interactions can result in improved user performance in terms of task completion and reduce the user effort involved in finding relevant videos.

Before proceeding, it should be noted ViGOR has been developed as an interface that can sit on top of any video retrieval system. The recommendation and grouping facilities provided as part of the ViGOR system can be created based on the log files stored by almost any system, website, etc. As such while the recommendation and interface are coupled in this system, in some ways they can be viewed as two distinct parts that can be applied to any video retrieval system, however in this case we are attempting to leverage the benefit of both. Thus as well as solving problems surrounding exploratory video search tasks, we have developed a scalable solution which can be deployed on top of almost any existing video retrieval system anywhere (this has been demonstrated by conducting evaluations using ViGOR in conjunction with YouTube, the largest online video storage and retrieval system (see Sections 6 and 7)).

In order to test and validate the potential benefits of ViGOR in assisting with video search, we conducted two user studies, in which we tested two systems. The first ViGOR without recommendations, and the second a system based on ViGOR that provides recommendations based on a model of implicit actions. These systems were evaluated to determine whether any benefit to the users is achieved. The remainder of this paper is organised as follows: in the following section we will provide a rational for this work. Section 3 will describe the two systems that were used in our study. Subsequently, in Section 4 we will describe our approach for using implicit feedback to provide multi-faceted recommendations. In Section 5 we will describe our experimental methodology including our hypotheses, which is followed by the results of our experiments, presented in Sections 6 ViGOR interaction results, 7 ViGOR recommendation results. In Section 8 we will provide a discussion of our work and Section 9 will provide some final conclusion and a discussion of future directions for this work.

Section snippets

Interactive video retrieval

Interactive video retrieval consists of users formulating queries and carrying out video searches, and then reformulating those queries and thus the current results based on previously retrieved results. As video is in essence multimodal i.e. it consists of a variety of content types, there are a variety of methods that can be used to query a video retrieval system. One approach commonly adopted is to use the low-level features that are available in images and videos, i.e. colour, texture,

ViGOR: a video grouping and organization interface for video retrieval

The main goal of ViGOR is to provide grouping functionalities for interactive video retrieval tasks. ViGOR (see Fig. 1) comprises of a search panel (A), results display area (B), workspace (C) and playback panel (D). These facilities enable the user to both search and organise results effectively. The users enter a text based query in the search panel to begin their search session. The result panel is where users can view the search results (a). Additional information about each video shot can

A multi-faceted graph based recommendation approach

In this section we introduce the multi-faceted recommendation approach integrated into the ViGOR system. We present two recommendation techniques. The first is a global recommendation technique, which is an extension of a previous recommendation approach (Hopfgartner et al., 2008) that takes into consideration the new interactions provided by ViGOR, and incorporates the concept of soft-links for multi-faceted and diverse recommendations. The second approach is a novel local recommendation

Hypothesis

In order to measure the effectiveness of our proposed approach we conducted two user-centred evaluations. The two user evaluations conducted were both between subject evaluations that involved users carrying out broad video search tasks on YouTube. This provided us with a large and dynamic data collection, and facilitated the analysis of ViGOR in an online situation. The first evaluation compared a baseline system, which mimicked YouTube’s functionalities, with our own system ViGOR, without

ViGOR interaction results

The first evaluation compared the performance of the ViGOR system (see Section 3.1) with a baseline system that mimicked YouTube’s functionality, which will refer to as YouTube Interface (YI). ViGOR offers three expansion options for each group (see Fig. 1(c)): (1) related videos; (2) videos from the same user and (3) text expansion, which is the result of a new search using text extracted from the selected videos. All of the videos returned by these expansion options are retrieved using the

ViGOR recommendation results

The second evaluation compared the ViGOR system without recommendations used in the first evaluation with a ViGOR system extended with recommendations. As explained in Section 3.2, ViGOR with recommendations shows a global panel of video recommendations, above the search result panel (see Fig. 2(E)), which proactively changed with every interaction of the user with the system, using the global recommendation approach introduced in Section 4.4. The extended version of ViGOR with recommendations

Discussion

In this section we provide a summary of our results, as well as discussion of some of the wider implications of our findings.

Conclusions and future directions

In this paper we have introduced the ViGOR system, a video search and retrieval system that allow users to create groups of video search results to help conceptualise and organise their results for complex video search tasks. It was hoped that grouping search results on the workspace would motivate the user to organise results for their search/work task. This should enable the users to break up their overall search task into a small set of individual search tasks. Although the concept of

References (31)

K. Bystrom et al.
Task complexity affects information seeking and use
Information Processing and Management
(1995)
M. Nakazato et al.
Imagegrouper: A group-oriented user interface for content-based image retrieval and digital image arrangement
Journal of Visual Languages & Computing
(2003)
Bauer, T., & Leake, D. B. (2001). Real time user context modeling for information retrieval agents. In Proceedings of...
P. Borlund
The IIR evaluation model: A framework for evaluation of interactive information retrieval systems
Information Research
(2003)
I. Campbell
Interactive evaluation of the ostensive model using a new test collection of images with multiple relevance assessments
Information Retrieval
(2000)
Christel, M., & Conescu, R. (2006). Mining novice user activity with trecvid interactive retrieval tasks. In TRECVID...
Christel, M. G. (2007). Establishing the utility of non-text search for news video retrieval with real world users. In...
Craswell, N., & Szummer, M. (2007). Random walks on the click graph. In Proceedings of the 30th annual international...
Dou, Z., Song, R., & Wen, J. (2007). A large-scale evaluation and analysis of personalized search strategies. In...
Fass, A. M., Bier, E. A., & Adar E., (2000). Picturepiper: Using a re-configurable pipeline to find images on the web....

Fogarty, J., Tan, D., Kapoor, A., & Winder, S. (2008). Cueflik: Interactive concept learning in image search. In...

Girgensohn, A., Shipman, F., Wilcox, L., Turner, T., & Cooper, M. (2009). MediaGLOW: Organizing photos in a graph-based...

M. Guy et al.

Tidying up tags?

D-Lib Magazine

(2006)

Halvey, M. J., & Keane, M. T. (2007). Analysis of online video search and sharing. In Proceedings of the 18th...

Halvey, M., Vallet, D., Hannah, D., & Jose, J. M. (2009). ViGOR: A grouping oriented interface for search and retrieval...

Cited by (13)

Item-network-based collaborative filtering: A personalized recommendation method based on a user's item network
2017, Information Processing and Management
Citation Excerpt :
Because item-network-based collaborative filtering uses betweenness centrality and reflects a user's preferences without needing pre-assigned preference scores from users, it can analyze the user's items and recommend new items more practically. Considering the three advantages, the item-network-based collaborative filtering can be adopted to various domains including the traditional areas such as music, videos, movies, and e-commerce services (Bogdanov et al., 2013; Halvey et al., 2014; Liao & Chang, 2016; Pera & Ng, 2013). Regardless of the item types, any item network can be constructed based on an individual's use history and the item-network-based collaborative filtering can curate the items.
Recommendation systems are becoming important with the increased availability of online services. A typical approach used in recommendations is collaborative filtering. However, because it largely relies on external relations, such as items-to-items or users-to-users, problems occur when the relations are biased or insufficient. Focusing on that limitation, we here suggest a new method, item-network-based collaborative filtering, which recommends items through four steps. First, the system constructs item networks based on users’ item usage history and calculates three types of centrality: betweenness, closeness, and degree. Next, the system secures significant items based on the betweenness centrality of the items in each user's item network. Then, by using the closeness and degree centrality of the secured items, the algorithm predicts preference scores for items and their rank orders from each user's perspective. In the last step, the system organizes a recommendation list based on the predicted scores. To evaluate the performance of our system, we applied it to a sample dataset of 196 Last.fm users’ listening history and compared the results with those from existing collaborative filtering methods. The results showed that the suggested method performed better than the basic item-based and user-based collaborative filtering methods in terms of Accuracy, Recall, and F1 scores for top-k recommendations. This indicates that an individual user's item relations can be utilized to remedy the problems occurring when the external relations are biased or insufficient.
An adaptive fuzzy recommender system based on learning automata
2016, Electronic Commerce Research and Applications
Citation Excerpt :
In content-based RSs, Recommended items are those with content similar to the content of previously preferred items of a given user (Lops et al., 2011; Martinez-Romo and Araujo, 2012; Pera and Ng, 2013). In contrast, CF makes recommendations to each user based on information provided by those users that have the most in common with her (Bobadilla et al., 2011, 2012; Altingovde et al., 2013; Formoso et al., 2013; Ortega et al., 2013; Lian et al., 2013; Halvey et al., 2014; Ramezani et al., 2014; Choi et al., 2016; Wang et al., 2016). The hybrid approaches combine CF with CB methods or with different variants of other collaborative methods to gain advantages and avoid certain limitations of each method (Barragáns-Martínez et al., 2010; Kayaalp et al., 2011; Kim et al., 2010; Porcel et al., 2012; Zhang et al., 2012, 2014; Forsati et al., 2013).
Incorporating trust and distrust information into collaborative recommender systems alleviates data sparsity and cold start problems. Since trust and distrust are a gradual phenomenon, they can be stated more naturally by fuzzy logic. Finding the most appropriate fuzzy sets which cover the domains of trust and distrust is not an easy task. Existing research on fuzzy modelling of trust and distrust has not considered the optimization of membership functions. In this paper, we address this issue and propose a continuous action-set learning automata (CALA)-based method to adjust membership functions of fuzzy trust and distrust during the lifetime of recommender system in terms of recommendation error. By assigning a CALA to the centre parameter of each triangular membership function, the proposed method optimizes the number and the position of fuzzy sets. To the best of our knowledge, this is the first effort in this direction. The experimental results indicate that using the proposed method in fuzzy recommender systems improves the recommendation accuracy.
Exploratory search in information systems: a systematic review
2024, Electronic Library
Mitigating sensitive data exposure with adversarial learning for fairness recommendation systems
2022, Neural Computing and Applications
MIRRE approach: nonlinear and multimodal exploration of MIR aggregated search results
2021, Multimedia Tools and Applications
Automatic transformation of a video using multimodal information for an engaging exploration experience
2020, Applied Sciences (Switzerland)

View all citing articles on Scopus

View full text