Supporting exploratory video retrieval tasks with grouping and recommendation
Introduction
As a result of the improving capabilities and the declining prices of current hardware systems, there are increasing possibilities to store and manipulate videos in a digital format. People now build their own digital libraries from materials created through digital cameras and camcorders, and use a number of systems to place this material on the web. However, the systems that currently exist to organise and retrieve these videos are insufficient for dealing with such large and increasing volumes of video. In particular, there is a growing need to develop tools and techniques to assist users in the complex task of searching for video; this is particularly true online with the increasing growth of online video search systems.
Current state of the art video retrieval systems rely on textual descriptions or methods that use low-level features (e.g., visual features such as colour, shape, or texture; audio features such as the Fourier transform or pitch; and additional features such as automatic speech recognition (ASR) or optical character recognition (OCR)) to find relevant videos within a large collection. Neither of these methods is sufficient to overcome the problems associated with video search. On the one hand, query by text relies on the availability of sufficient textual descriptions of the video and its content, resulting in a heavy system dependence on users providing relevant text descriptions and annotations. The main drawback of this approach is that often users can have very different perceptions about the same video and annotate that video differently (Guy, Tonkin, & Folksonomies, 2006), which makes it difficult for different users to retrieve the same video. It has also been found that users are reluctant to provide an abundance of annotations unless there is some benefit to the user (Halvey & Keane, 2007), resulting in a lack of available textual annotations. On the other hand, the difference between the low-level data representation of videos and the higher level concepts users associate with video, commonly known as the semantic gap (Smeulders, Worring, Santini, Gupta, & Jain, 2002), provides difficulties for using these low-level features. Consequently, while these low-level features are used in some state of the art systems, most online video retrieval systems (e.g. YouTube1 or Blinkx2) rely only on query by text.
In order to alleviate some of these problems associated with video search we have developed ViGOR, a video retrieval system that allows users to create semantic groups of results to help conceptualise and organise their results for complex video search tasks. This interactive grouping is a flexible means for a user to illustrate their multi-faceted information needs. Multi-faceted information needs mean that the task that the user is conducting can be considered to be multiple specific tasks; it could also be considered that multi-faceted search tasks/information needs can have multiple solutions. Specific information needs can be related to short term information needs as the user is focused on one particular aspect of their search task. The grouping facilities also allow the user to focus on one particular aspect of a global task as they can focus on specific (or short-term) information needs while still solving the overall multi-faceted (or long-term) information need as embodied by their search task. We believe that the semantic gap is narrowed by this abstraction to high-level semantic groupings reflecting an individual’s task-specific mental model of the data and a more flexible user interaction with the video collection, thus the user is focused more on interaction with the data and less on the mechanics of their search. We also believe that the use of this system can result in a number of desirable outcomes for users: improved user performance in terms of task completion and task exploration, and increased user satisfaction with their search and their search results.
In addition, the interactions available in ViGOR make it an ideal system with which to integrate some recommendation techniques. We believe that many of the problems associated with searching large collections of video can be alleviated through the use of recommendation techniques. Recommendation techniques can offer a work around for the problems associated with the semantic gap and the unreliability of textual descriptions, as they utilise additional information about user interaction that is already available in many systems. However, it is also imperative that the recommendations relate to as many aspects of a user task as possible so as to ensure that the recommendations present the user with a diverse set of results that encompass as many interpretations of the user actions as possible. To that end, we have developed a recommendation approach that utilises the implicit actions involved in previous user searches to create a predictive model that can provide multi-faceted and diverse recommendations to assist users in completing their difficult search tasks. Our recommendations encompass many interpretations of user actions and numerous videos that users may not have seen using normal query methods. Providing these recommendations is not trivial, as due to the complex and difficult search process for video, implicit feedback from video search is quite noisy (Smeulders et al., 2002). However, we believe that this problem can be overcome by utilising collaborative recommendation techniques. In particular we believe that our approach of modelling many aspects of user needs via implicit user interactions can result in improved user performance in terms of task completion and reduce the user effort involved in finding relevant videos.
Before proceeding, it should be noted ViGOR has been developed as an interface that can sit on top of any video retrieval system. The recommendation and grouping facilities provided as part of the ViGOR system can be created based on the log files stored by almost any system, website, etc. As such while the recommendation and interface are coupled in this system, in some ways they can be viewed as two distinct parts that can be applied to any video retrieval system, however in this case we are attempting to leverage the benefit of both. Thus as well as solving problems surrounding exploratory video search tasks, we have developed a scalable solution which can be deployed on top of almost any existing video retrieval system anywhere (this has been demonstrated by conducting evaluations using ViGOR in conjunction with YouTube, the largest online video storage and retrieval system (see Sections 6 and 7)).
In order to test and validate the potential benefits of ViGOR in assisting with video search, we conducted two user studies, in which we tested two systems. The first ViGOR without recommendations, and the second a system based on ViGOR that provides recommendations based on a model of implicit actions. These systems were evaluated to determine whether any benefit to the users is achieved. The remainder of this paper is organised as follows: in the following section we will provide a rational for this work. Section 3 will describe the two systems that were used in our study. Subsequently, in Section 4 we will describe our approach for using implicit feedback to provide multi-faceted recommendations. In Section 5 we will describe our experimental methodology including our hypotheses, which is followed by the results of our experiments, presented in Sections 6 ViGOR interaction results, 7 ViGOR recommendation results. In Section 8 we will provide a discussion of our work and Section 9 will provide some final conclusion and a discussion of future directions for this work.
Section snippets
Interactive video retrieval
Interactive video retrieval consists of users formulating queries and carrying out video searches, and then reformulating those queries and thus the current results based on previously retrieved results. As video is in essence multimodal i.e. it consists of a variety of content types, there are a variety of methods that can be used to query a video retrieval system. One approach commonly adopted is to use the low-level features that are available in images and videos, i.e. colour, texture,
ViGOR: a video grouping and organization interface for video retrieval
The main goal of ViGOR is to provide grouping functionalities for interactive video retrieval tasks. ViGOR (see Fig. 1) comprises of a search panel (A), results display area (B), workspace (C) and playback panel (D). These facilities enable the user to both search and organise results effectively. The users enter a text based query in the search panel to begin their search session. The result panel is where users can view the search results (a). Additional information about each video shot can
A multi-faceted graph based recommendation approach
In this section we introduce the multi-faceted recommendation approach integrated into the ViGOR system. We present two recommendation techniques. The first is a global recommendation technique, which is an extension of a previous recommendation approach (Hopfgartner et al., 2008) that takes into consideration the new interactions provided by ViGOR, and incorporates the concept of soft-links for multi-faceted and diverse recommendations. The second approach is a novel local recommendation
Hypothesis
In order to measure the effectiveness of our proposed approach we conducted two user-centred evaluations. The two user evaluations conducted were both between subject evaluations that involved users carrying out broad video search tasks on YouTube. This provided us with a large and dynamic data collection, and facilitated the analysis of ViGOR in an online situation. The first evaluation compared a baseline system, which mimicked YouTube’s functionalities, with our own system ViGOR, without
ViGOR interaction results
The first evaluation compared the performance of the ViGOR system (see Section 3.1) with a baseline system that mimicked YouTube’s functionality, which will refer to as YouTube Interface (YI). ViGOR offers three expansion options for each group (see Fig. 1(c)): (1) related videos; (2) videos from the same user and (3) text expansion, which is the result of a new search using text extracted from the selected videos. All of the videos returned by these expansion options are retrieved using the
ViGOR recommendation results
The second evaluation compared the ViGOR system without recommendations used in the first evaluation with a ViGOR system extended with recommendations. As explained in Section 3.2, ViGOR with recommendations shows a global panel of video recommendations, above the search result panel (see Fig. 2(E)), which proactively changed with every interaction of the user with the system, using the global recommendation approach introduced in Section 4.4. The extended version of ViGOR with recommendations
Discussion
In this section we provide a summary of our results, as well as discussion of some of the wider implications of our findings.
Conclusions and future directions
In this paper we have introduced the ViGOR system, a video search and retrieval system that allow users to create groups of video search results to help conceptualise and organise their results for complex video search tasks. It was hoped that grouping search results on the workspace would motivate the user to organise results for their search/work task. This should enable the users to break up their overall search task into a small set of individual search tasks. Although the concept of
References (31)
- et al.
Task complexity affects information seeking and use
Information Processing and Management
(1995) - et al.
Imagegrouper: A group-oriented user interface for content-based image retrieval and digital image arrangement
Journal of Visual Languages & Computing
(2003) - Bauer, T., & Leake, D. B. (2001). Real time user context modeling for information retrieval agents. In Proceedings of...
The IIR evaluation model: A framework for evaluation of interactive information retrieval systems
Information Research
(2003)Interactive evaluation of the ostensive model using a new test collection of images with multiple relevance assessments
Information Retrieval
(2000)- Christel, M., & Conescu, R. (2006). Mining novice user activity with trecvid interactive retrieval tasks. In TRECVID...
- Christel, M. G. (2007). Establishing the utility of non-text search for news video retrieval with real world users. In...
- Craswell, N., & Szummer, M. (2007). Random walks on the click graph. In Proceedings of the 30th annual international...
- Dou, Z., Song, R., & Wen, J. (2007). A large-scale evaluation and analysis of personalized search strategies. In...
- Fass, A. M., Bier, E. A., & Adar E., (2000). Picturepiper: Using a re-configurable pipeline to find images on the web....
Tidying up tags?
D-Lib Magazine
Cited by (13)
Item-network-based collaborative filtering: A personalized recommendation method based on a user's item network
2017, Information Processing and ManagementCitation Excerpt :Because item-network-based collaborative filtering uses betweenness centrality and reflects a user's preferences without needing pre-assigned preference scores from users, it can analyze the user's items and recommend new items more practically. Considering the three advantages, the item-network-based collaborative filtering can be adopted to various domains including the traditional areas such as music, videos, movies, and e-commerce services (Bogdanov et al., 2013; Halvey et al., 2014; Liao & Chang, 2016; Pera & Ng, 2013). Regardless of the item types, any item network can be constructed based on an individual's use history and the item-network-based collaborative filtering can curate the items.
An adaptive fuzzy recommender system based on learning automata
2016, Electronic Commerce Research and ApplicationsCitation Excerpt :In content-based RSs, Recommended items are those with content similar to the content of previously preferred items of a given user (Lops et al., 2011; Martinez-Romo and Araujo, 2012; Pera and Ng, 2013). In contrast, CF makes recommendations to each user based on information provided by those users that have the most in common with her (Bobadilla et al., 2011, 2012; Altingovde et al., 2013; Formoso et al., 2013; Ortega et al., 2013; Lian et al., 2013; Halvey et al., 2014; Ramezani et al., 2014; Choi et al., 2016; Wang et al., 2016). The hybrid approaches combine CF with CB methods or with different variants of other collaborative methods to gain advantages and avoid certain limitations of each method (Barragáns-Martínez et al., 2010; Kayaalp et al., 2011; Kim et al., 2010; Porcel et al., 2012; Zhang et al., 2012, 2014; Forsati et al., 2013).
Exploratory search in information systems: a systematic review
2024, Electronic LibraryMitigating sensitive data exposure with adversarial learning for fairness recommendation systems
2022, Neural Computing and ApplicationsMIRRE approach: nonlinear and multimodal exploration of MIR aggregated search results
2021, Multimedia Tools and ApplicationsAutomatic transformation of a video using multimodal information for an engaging exploration experience
2020, Applied Sciences (Switzerland)