2002 | OriginalPaper | Chapter
Scene Determination Using Auditive Segmentation Models of Edited Video
Authors : Silvia Pfeiffer, Uma Srinivasan
Published in: Media Computing
Publisher: Springer US
Included in: Professional Book Archive
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
This chapter describes different approaches that use audio features for determination of scenes in edited video. It focuses on analyzing the sound track of videos for extraction of higher-level video structure. We define a scene in a video as a temporal interval which is semantically coherent. The semantic coherence of a scene is often constructed during cinematic editing of a video. An example is the use of music for concatenation of several shots into a scene which describes a lengthy passage of time such as the journey of a character. Some semantic coherence is also inherent to the unedited video material such as the sound ambience at a specific setting, or the change pattern of speakers in a dialog. Another kind of semantic coherence is constructed from the textual content of the sound track revealing, for example, different stories contained in a news broadcast or documentary. This chapter explains the types of scenes that can be constructed via audio cues from a film art perspective. It discusses the feasibility of automatic extraction of these scene types and finally presents s survey of existing approaches.