research-article

Automatic editing of footage from multiple social cameras

Authors:
Ido Arev

The Interdisciplinary Center Herzliya and Disney Research Pittsburgh

The Interdisciplinary Center Herzliya and Disney Research Pittsburgh
View Profile

,
Hyun Soo Park

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Yaser Sheikh

Carnegie Mellon University and Disney Research Pittsburgh

Carnegie Mellon University and Disney Research Pittsburgh
View Profile

,
Jessica Hodgins

Carnegie Mellon University and Disney Research Pittsburgh

Carnegie Mellon University and Disney Research Pittsburgh
View Profile

,
Ariel Shamir

The Interdisciplinary Center Herzliya and Disney Research Pittsburgh

The Interdisciplinary Center Herzliya and Disney Research Pittsburgh
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 33 Issue 4Article No.: 81pp 1–11https://doi.org/10.1145/2601097.2601198

Published:27 July 2014Publication History

ACM Transactions on Graphics

Abstract

We present an approach that takes multiple videos captured by social cameras---cameras that are carried or worn by members of the group involved in an activity---and produces a coherent "cut" video of the activity. Footage from social cameras contains an intimate, personalized view that reflects the part of an event that was of importance to the camera operator (or wearer). We leverage the insight that social cameras share the focus of attention of the people carrying them. We use this insight to determine where the important "content" in a scene is taking place, and use it in conjunction with cinematographic guidelines to select which cameras to cut to and to determine the timing of those cuts. A trellis graph representation is used to optimize an objective function that maximizes coverage of the important content in the scene, while respecting cinematographic guidelines such as the 180-degree rule and avoiding jump cuts. We demonstrate cuts of the videos in various styles and lengths for a number of scenarios, including sports games, street performances, family activities, and social get-togethers. We evaluate our results through an in-depth analysis of the cuts in the resulting videos and through comparison with videos produced by a professional editor and existing commercial solutions.

References

Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., and Szeliski, R. 2011. Building rome in a day. Communications of the ACM. Google ScholarDigital Library
Ballan, L., Brostow, G. J., Puwein, J., and Pollefeys, M. 2010. Unstructured video-based rendering: interactive exploration of casually captured videos. ACM Transactions on Graphics. Google ScholarDigital Library
Bao, X., and Choudhury, R. R. 2010. Movi: Mobile phone based video highlights via collaborative sensing. In Proceedings of the International Conference on Mobile systems, Applications, and Services. Google ScholarDigital Library
Barbieri, M., Agnihotri, L., and Dimitrova, N. 2003. Video summarization: methods and landscape. In Proceedings of the SPIE Internet Multimedia Management Systems.Google Scholar
Berthouzoz, F., Li, W., and Agrawala, M. 2012. Tools for placing cuts and transitions in interview video. ACM Transactions on Graphics. Google ScholarDigital Library
Cricri, F., Curcio, I. D. D., Mate, S., Dabov, K., and Gabbouj, M. 2012. Sensor-based analysis of user generated video for multi-camera video remixing. In Advances in Multimedia Modeling. Google ScholarDigital Library
Dale, K., Shechtman, E., Avidan, S., and Pfister, H. 2012. Multi-video browsing and summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Workshop on Large-Scale Video Search and Mining.Google Scholar
Dmytryk, E. 1984. On Film Editing: An Introduction to the Art of Film Construction. Focal Press.Google Scholar
Fathi, A., Hodgins, J., and Rehg, J. 2012. Social interactions: A first-person perspective. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
Gleicher, M. L., and Liu, F. 2007. Re-cinematography: Improving the camera dynamics of casual video. In Proceedings of the ACM International Conference on Multimedia. Google ScholarDigital Library
Hata, T., Hirose, T., and Tanaka, K. 2000. Skimming multiple perspective video using tempo-spatial importance measures. In Proceedings of Visual Database Systems. Google ScholarDigital Library
He, L.-w., Cohen, M. F., and Salesin, D. H. 1996. The virtual cinematographer: A paradigm for automatic real-time camera control and directing. ACM Transactions on Graphics.Google Scholar
Heck, R., Wallick, M., and Gleicher, M. 2007. Virtual videography. ACM Transactions on Multimedia Computing, Communications, and Applications. Google ScholarDigital Library
Jain, P., Manweiler, J., Acharya, A., and Beaty, K. 2013. Focus: Clustering crowdsourced videos by line-of-sight. In Proceedings of the International Conference on Embedded Networked Sensor Systems. Google ScholarDigital Library
Kim, K., Grundmann, M., Shamir, A., Matthews, I., Hodgins, J., and Essa, I. 2010. Motion field to predict play evolution in dynamic sport scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Kumar, K., Prasad, S., Banwral, S., and Semwa, V. 2010. Sports video summarization using priority curve algorithm. International Journal on Computer Science and Engineering.Google Scholar
Lee, Y. J., Ghosh, J., and Grauman, K. 2012. Discovering important people and objects for egocentric video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
Lepetit, V., Moreno-Noguer, F., and Fua, P. 2009. EPnP: An accurate O(n) solution to the PnP problem. International Journal of Computer Vision. Google ScholarDigital Library
Lu, Z., and Grauman, K. 2013. Story-driven summarization for egocentric video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
Machnicki, E. 2002. Virtual director: Automating a webcast. In Proceedings of the SPIE Multimedia Computing and Networking.Google Scholar
Money, A., and Agius, H. 2008. Video summarisation: A conceptual framework and survey of the state of the art. Journal of Visual Communications and Image Representation. Google ScholarDigital Library
Park, H. S., Jain, E., and Sheikh, Y. 2012. 3D social saliency from head-mounted cameras. In Advances in Neural Information Processing Systems.Google Scholar
Ponto, K., Kohlmann, J., and Gleicher, M. 2012. Effective replays and summarization of virtual experiences. IEEE Transactions on Visualization and Computer Graphics. Google ScholarDigital Library
Pundik, D., and Moses, Y. 2010. Video synchronization using temporal signals from epipolar lines. In Proceedings of the European Conference on Computer Vision. Google ScholarDigital Library
Rui, Y., He, L., Gupta, A., and Liu, Q. 2001. Building an intelligent camera management system. In Proceedings of the ACM International Conference on Multimedia. Google ScholarDigital Library
Shrestha, P., de With, P. H., Weda, H., Barbieri, M., and Aarts, E. H. 2010. Automatic mashup generation from multiple-camera concert recordings. In Proceedings of the ACM International Conference on Multimedia. Google ScholarDigital Library
Snavely, N., Seitz, S. M., and Szeliski, R. 2006. Photo tourism: exploring photo collections in 3D. ACM Transactions on Graphics. Google ScholarDigital Library
Sumec, S. 2006. Multi camera automatic video editing. Computer Vision and Graphics.Google Scholar
Takemae, Y., Otsuka, K., and Mukawa, N. 2004. Impact of video editing based on participants' gaze in multiparty conversation. In ACM CHI Extended Abstracts on Human Factors in Computing Systems. Google ScholarDigital Library
Taskiran, C., and Delp, E. 2005. Video summarization. Digital Image Sequence Processing, Compression, and Analysis.Google Scholar
Truong, B., and Venkatesh, S. 2007. Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications. Google ScholarDigital Library
Wardrip-Fruin, N., and Harrigan, P. 2004. First person: New media as story, performance, and game. MIT Press. Google ScholarDigital Library
Zsombori, V., Frantzis, M., Guimaraes, R. L., Ursu, M. F., Cesar, P., Kegel, I., Craigie, R., and Bulterman, D. C. 2011. Automatic generation of video narratives from shared UGC. In Proceedings of the ACM Conference on Hypertext and Hypermedia. Google ScholarDigital Library

Index Terms

Automatic editing of footage from multiple social cameras
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Complemental Use of Multiple Cameras for Stable Tracking of Multiple Markers
VMR '09: Proceedings of the 3rd International Conference on Virtual and Mixed Reality: Held as Part of HCI International 2009

In many applications of Augmented Reality (AR), rectangular markers are tracked in real time by capturing with cameras. In this paper, we consider the AR application in which virtual objects are displayed onto markers while the markers and the cameras ...
Read More
Pose estimation from multiple cameras based on Sylvester's equation

In this paper, we introduce a method to estimate the object's pose from multiple cameras. We focus on direct estimation of the 3D object pose from 2D image sequences. Scale-Invariant Feature Transform (SIFT) is used to extract corresponding feature ...
Read More
Recovering Multiple View Geometry from Mutual Projections of Multiple Cameras

In this paper, we analyze the computation of epipolar geometry in some special cases where multiple cameras are projected each other in their images. In such cases, epipoles can be obtained directly from images as the projection of cameras. As the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Graphics Volume 33, Issue 4
July 2014
1366 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2601097
Issue’s Table of Contents

Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 July 2014
Published in tog Volume 33, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
multiple cameras
video editing
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 97
  Total Citations
  View Citations
- 1,133
  Total Downloads
- Downloads (Last 12 months)72
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic editing of footage from multiple social cameras

ACM Transactions on Graphics

Abstract

References

Cited By

Index Terms

Recommendations

Complemental Use of Multiple Cameras for Stable Tracking of Multiple Markers

Pose estimation from multiple cameras based on Sylvester's equation

Recovering Multiple View Geometry from Mutual Projections of Multiple Cameras

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic editing of footage from multiple social cameras

ACM Transactions on Graphics

Abstract

References

Cited By

Index Terms

Recommendations

Complemental Use of Multiple Cameras for Stable Tracking of Multiple Markers

Pose estimation from multiple cameras based on Sylvester's equation

Recovering Multiple View Geometry from Mutual Projections of Multiple Cameras

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media