Skip to main content

Über dieses Buch

Multimedia '99 covers technological and scientific areas of media production, processing and delivery. 24 contributions from research laboratories and universities worldwide give a broad perspective on multimedia research with a special focus on media convergence. The topics treated in this volume: image and sound content analysis and processing, paradigms and metaphors for multimedia authoring and display, applications such as education or entertainment, and multimedia content authentication and security.



Keynote Address

Shaping Creative Research in New Media (Keynote Address)

Improvisation - the drive to make something out of something else - is fundamental to the human experience. All traditional objects of expression - writing, filmmaking, musical expression, theater, and even human conversation - present some trace of the process through which it came into exist. The very vitality of the expression is intimately connected with its passage; inspiration touched by structure, transformed and realized in a palpable entity. This trace reveals the human bricolleur who struggles or plays, attempts control until resigned, accepts the fragile state and relinquishes the entity out to the world. This talk presents research case studies to explore improvisation and bricollage as forces which can help grow context while driving researchers to unexpected but powerful new explorations.
Glorianna Davenport

Technical Papers

Content Analysis and Processing

Replay Detection in Sports Video Sequences

In many sports, the majority of highlights are confined to relatively short durations of intense action. In some sense these segments capture the essence of a game and summarize the moments of important action. Automatic detection of these highlights could provide an important browsing mechanism in a video library of sports games. In this paper, we present efficient algorithms, operating in the MPEG domain, for detecting two kinds of replay from sports video sequences. These replays often correspond to highlights in a game and can be used as indices of a sports video. The first algorithm detects exact replays while the second algorithm detects slow motion replays. Both algorithms operate directly on MPEG-1/MPEG-2 video data and thus they are very efficient because the expensive decoding operation is unnecessary. Experimental results on several video sequences show that the proposed algorithms are effective in detecting replays in most sports video sequences.
Lifang Gu, Don Bone, Graham Reynolds

Segmentation of Video Sequences using Volumetric Image Processing

This paper introduces an algorithm to detect cuts in a video sequence. The algorithm is based on volumetric processing of the video sequence, and uses techniques from differential geometry to classify the cut frames. The algorithm is very robust with respect to the presence of noise, and avoids the detection of “false cuts” in a video shot. We also describe an implementation of the algorithm that computes the video cut segmentation in real time.
Romildo José da Silva, Jonas Gomes, Luiz Velho

Visual Speech Analysis for Spoken Chinese Training of Oral Deaf Children

This paper presents a novel vision-based speech analysis system STODE which is used in spoken Chinese training of oral deaf children. Its design goal is to help oral deaf children overcome two major difficulties in speech learning: the confusion of intonations for spoken Chinese characters and timing errors within different words and characters. It integrates such capabilities as real-time lip tracking and feature extraction, multi-state lip modeling, Time-delay Neural Network (TDNN) for visual speech analysis. A desk-mounted camera tracks users in real-time. At each frame, region of interest is identified and key information is extracted. The preprocessed acoustic and visual information are then fed into a modular TDNN and combined for visual speech analysis. Confusion of intonations for spoken Chinese characters can be easily identified, and timing error within words and characters also can be detected using a DTW (Dynamic Time Warping) algorithm. For visual feedback we have created an artificial talking head directly cloned from user’s own images to generate correct outputs showing both correct and wrong ways of pronunciation. This system has been successfully used for spoken Chinese training of oral deaf children in cooperation with Nanjing Oral School under grants from National Natural Science Foundation of China.
Xiaodong Jiang, Qianghua Qiang, Zhisong Zhou, Yunlai Wang

Content Based Retrieval and Security

Extracting Bimodal Representations for Language-Based Image Retrieval

This paper explores two approaches to multimedia indexing that might contribute to the advancement of text-based conceptual search for pictorial information. Insights from relatively mature retrieval areas (spoken document retrieval and cross-language retrieval) are taken as a starting point. for an investigation of the usefulness of the concept of bimodal dictionaries and of clustering features from multi-modal documents into one semantic space. One of the advantages of the presented techniques is that they are domain independent.
Thijs Westerveld, Djoerd Hiemstra, Franciska de Jong

A Dissimilarity Measure for Query by Example Retrieval

In this paper we present a method for computing the dissimilarity between a digital image and a sketch produced with a simple paint tool. Images are roughly segmented and region information (size, position, color and shape) is extracted. Multiple segmentation results are then obtained by iteratively merging pairs of regions. The dissimilarity is computed according to the region information. Experiments show that this dissimilarity measure can be used to perform similarity searches.
Folco Banfi, Rolf Ingold

Content-based Video Retrieval by Example Clip on WWW

The similarity measure of two video clips is a key issue in video retrieval. In the development of our www-oriented video retrieval system, we propose a new model of video similarity. Comparing to existing algorithms, it proposes many influencing factors, such as order factor, speed factor, disturbance factor, etc, on the basis of human’s subjectivity in visual judgement. Thus this algorithm embodies the degree of similarity completely and systematically. On the other hand, it has resolution adaptation because it can be applied to every level of video structure. In the retrieval system, it is used to process video query by example clip on the World Wide Web. This paper introduces this algorithm in detail and presents experiment results at the end of the paper.
Xiaoming Liu, Yueting Zhuang, Yunhe Pan

Digital Sound Watermarks Based on Improved Sinusoidal Analysis/Synthesis Model

In this paper, a simple and efficient algorithm for designing digital sound watermark is presented. The main idea is by hiding the secret information in the amplitude of a certain frequency under 20Hz. This approach improved the sinusoidal analysis / synthesis model. Our approach can improve the sound watermark detection even after the signal has been compressed by MEPG lay-3. The experimental results illustrated the efficiency of our algorithm.
Zhe Song, Zhigeng Pan, Jiaoying Shi

Distributed Multimedia Systems

Video on Demand Servers: Storage Architecture with Disk Arrays

The storage organisation of Video on Demand (VOD) servers is an important issue in the design of VOD systems. VOD servers are responsible for storing videos in disks and feeding them to users on request. The strategy used for storing movies in multiple disks affects the overall server performance, especially when a large number of concurrent streams are supported. The existing phase based striping method used in VOD server tends to lead to areas of unutilised disk. This spare area becomes more significant as the number of disks in the server becomes large. In this paper we propose the Spare Phase Based Striping Method to utilise the spare area and further reduce the waiting time. In this method, we group together the spare disk space of each individual disk into one large single spare area and use it to duplicate the most popular movies. We also present the Half Reverse Phase Based Method for use as a data layout in the spare area. This method guarantees a maximum waiting time for popular movies, reducing RAM dependency in comparison with the Replication Scheme and improving the average waiting time. We evaluate our scheme by carrying out a simulation study to compare it with other schemes, particularly with regard to the usage of RAM and the average waiting time.
Putra Sumari, Madjid Merabti, Rubem Pereira

Object Carousel Simulator for Broadcast Applications

This paper describes an object carousel simulator developed for the testing of broadcast applications. It gives a short overview of the specific characteristics of a broadcast environment and describes tools developed to simulate them. Results and experiences from using a sample application, originally developed for end-user trials in the ACTS project IMMP, in the simulated broadcast environment conclude the paper.
Christian Fuhrhop, Andreas Kraft, Ralf Kubis

MMFramework - Distributed Environment for Control and Management of Multimedia Data

This paper presents an open multimedia framework, called MMFramework. It may be used for dynamic creation of various multimedia applications. The proposed framework architecture consists of six layers. Its definition results from decomposition of the system into components with well-defined interfaces and internal implementation dedicated to the given hardware usage. Each layer represents a collection of components which are characterized by similar functionality. The system has been designed with support of existing CORBA Services and OMG specifications related to multimedia.
Krzysztof Zieliński, Lukasz Czekierda, Piotr Nawrocki, Mariusz Michalski

Object Models and Multimedia

Manipulating 3D character animation media objects

Multimedia authoring tools enable the production of presentations that integrate text, video, audio, and images. To facilitate the creation of characters animation which include other media such as speech and sound, a new authoring system is defined that is able to generate a fully animated scene from a high level description supported by an extended version of the score paradigm. Introducing 3D animated data as a new kind of media object requires changes to both the timeline paradigm and to the traditional system architecture. Issues related to editing and to navigating inside the presentation are discussed.
Daniela Giorgetti, Patrizia Palamidese

Realization of an Extensible Multimedia Document Model

Most multimedia systems or standards provide one fixed concept for the temporal and spatial specification of documents. But different application areas - like computer based training or multimedia games - require different (application specific) concepts to support and simplify authoring. It is a very complex or even impossible task to specify documents for different application areas based on the same document model. This paper presents a document model, an approach to simplify authoring of documents in different application areas and a framework that supports the realization of the introduced concepts. The framework uses an event based approach and a scheduling graph to realize the semantics of an application concept.
Jürgen Hauser

Object Coherence in Distributed Interaction

Realtime distributed groupware (RDG) systems use computers and networks to support geographically separated users in their work on a common task at the same time. An RDG session typically consists of two types of interaction — that which involves one or more shared objects, and the use of audio-visual conferencing. It is important for users of such systems that the responsiveness of their interaction with shared objects has the same immediacy as their audio-visual communications. At the same time shared objects must have their integrity guarded in the face of multiple concurrent readers and writers. These coherence and responsiveness requirements often pull implementations in opposing directions. The concept of interactive atoms is introduced to address this problem. The successful construction of a distributed shared multi-user spreadsheet using multicast-based interactive atoms and multicast-based audio-visual channels is described.
Colin Allison, Feng Huang, Mike Livesey

Authoring Paradigms and Media Integration

Integration of structured video in a multimedia authoring system

In this paper, we integrate some results on automatic structuration of video in a multimedia authoring system. For that purpose, we identify the specific video structuration requirements of this kind of application and we propose an XML syntax for the description of the structure, the temporal and spatial organization of video. Finally we describe a prototype application that takes advantage of this structured video format.
Cécile Roisin, Tien Tran_Thuong, Lionel Villard

Head-Tracked Light Field Movies: A New Multimedia Data Type

This paper presents a new multimedia data type based on time-varying light fields. Such a light field movie may be precomputed using computer graphics, or photographically captured from real objects as they change over time. Such movies allow a user to experience motion parallax in response to head motion in addition to motion cues within the scene as well as stereo vision cues. Important elements to make such media practical include light field movie capture devices, effective compression schemes with real-time playback, and accurate head tracking. This paper addresses these areas and presents results from the current system as well as directions for future work.
Gavin S. P. Miller, Steven M. Rubin, Philip M. Hubbard, Jenny Dana, John Woodfill

Hypermedia and Web

Visually Critiquing Web Pages

This paper explores how visual information is organized in a web page. A cognitive framework is outlined for how web pages are processed into a visual hierarchy. The guidelines derived are implemented as rules in a critiquing system and embedded in a web editing tool, allowing the user to spot potential design problems. The paper concludes with a critique of two well known web pages.
Pete Faraday

ANTS: An Automatic Navigability Testing Tool for Hypermedia

Multiple navigational graphs can be obtained as the result of the design stage of a hypermedia-based artifact. The only way to know which one adapts better to the user navigational metaphor is by mean of usability testing. This technique is expensive in terms of the number of human resources needed to perform it, and it isn’t able to record spontaneous user behavior. The use of automatic testing tools is an interesting and cheap alternative that avoids the problems commented. We tested our own automatic navigability testing system (ANTS) conducting an experiment to determine where users expect to find the navigational bar of a web site.
Martín González Rodríguez

Education and Entertainment

Educational Applications of WWW-Based Asynchronous Video

In this paper we briefly inventory some of the traditional educational applications of video, and indicate how the integration of asynchronous video within WWW environments makes those applications more efficient and flexible. Second, we indicate a variety of ways in which the integration of asynchronous video and WWW environments introduces new forms of learning possibilities not available to either medium (video or WWW environments) in isolation. Third, we illustrate these new forms of learning possibilities via the example of a course recently (March-May 1999) completed at the University of Twente.
Betty Collis, Oscar Peters

EcoVasco, an Ecological Multimedia Adventure

The development of a multimedia computer game for young people is a complex task. It should take into consideration technical and human factors, applied to the development team and to the target audience. Topics like development platform, background and experience of the developing team, outsource material to be integrated, are important elements to consider. Also to consider are topics such as the end platform, content and technical quality expectations of the target audience. The challenge is harder when the target audience varies widely in age, implying different computer game interests. The paper describes the approaches we followed to achieve the original objectives.
Pedro Faria Lopes, Maria Vasconcelos Moreira, Miguel Lapa Duarte

Involving the Audience in Distributed, Interactive Entertainment Systems

One of the main criteria for judging the effectiveness of Web-based, interactive entertainment forms such as interactive stories or games is their ability to involve large numbers of participants during their performance. This paper describes a novel, distributed interaction framework for audience involvement in interactive entertainment systems. This framework allows multiple users on the Web to participate in the performance of the event either as players or spectators. The method supports a centralized communication scheme in which a server module is responsible for relaying all the messages generated by the participants. In addition, the server monitors the game/story action along with the audience response and sends messages of its own in an effort to emphasize the dramatic nature of the performance and stimulate audience participation. This method has been applied in the creation of MISSION, a multi-player game on the Web.
Nikitas M. Sgouros

Multi-Media Access and Presentation in a Theatre Information Environment

This paper discusses a virtual world for presenting multimedia information and for natural interactions with the environment to get access to this information. Apart from mouse and keyboard input, interactions take place using speech and language. It is shown how this virtual environment can be considered as an interest community and it is shown what further research and development is required to obtain an environment where visitors can retrieve information about artists, authors and performances, can discuss performances with others and can be provided with information and contacts in accordance with their preferences.
Anton Nijholt


Hypermedia and Web

Representing contemporary architecture using hypermedia “Eight Italian architectures in the postwar period”

The Mac and Windows hybrid CD-ROM “Eight Italian architectures in the postwar periodℍ took its cue from a fortuitous event: in 1995, the Solomon R. Guggenheim Museum donated 8 scale models developed for the Architecture section of the exhibition «The Italian Metamorphosis 1943–1968», held in New York in 1993, to the MusArc - Museo Nazionale dell’architettura located in Ferrara’s ancient Rossetti Mansion. The donation of the models appeared the ideal subject matter to shape a virtual environment that, while supplying an appropriate framework for the donation, could fully partake in the overall cultural project around which MusArc is being structured. Prepared with the involvement of some 20 experts, the hypertext is the first large effort in Europe to represent and analyze contemporary architecture using digital methods and is intended to be more than a scientifically research document. It aims to be readily understood by the non-expert user, eliminating the difference between works for popular dissemination, historical reviews and critical analyses that are a constant feature of hardcopy publishing. It is mainly conceived with two aims: to be an accompaniment to material on display in the Museum and to be available to the public. That is why the technical solution adopted was a multiplatform CD-ROM.
Marco Gaiani, Corrado Loschi, Marco Luitprandi, Stefano Zagnoni, Michele Zannoni

ETNIAS: Multimedia Performance Involving Music, Dance and Images

ETNIAS is a multimedia performance describing a metaphoric travel through a set of sound-images. Based on the concept of soundscapes, this paper elucidates how a compositional process was developed to create this artwork.
Jônatas Manzolli


Weitere Informationen