Abstract
We present an approach that takes multiple videos captured by social cameras---cameras that are carried or worn by members of the group involved in an activity---and produces a coherent "cut" video of the activity. Footage from social cameras contains an intimate, personalized view that reflects the part of an event that was of importance to the camera operator (or wearer). We leverage the insight that social cameras share the focus of attention of the people carrying them. We use this insight to determine where the important "content" in a scene is taking place, and use it in conjunction with cinematographic guidelines to select which cameras to cut to and to determine the timing of those cuts. A trellis graph representation is used to optimize an objective function that maximizes coverage of the important content in the scene, while respecting cinematographic guidelines such as the 180-degree rule and avoiding jump cuts. We demonstrate cuts of the videos in various styles and lengths for a number of scenarios, including sports games, street performances, family activities, and social get-togethers. We evaluate our results through an in-depth analysis of the cuts in the resulting videos and through comparison with videos produced by a professional editor and existing commercial solutions.
- Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., and Szeliski, R. 2011. Building rome in a day. Communications of the ACM. Google ScholarDigital Library
- Ballan, L., Brostow, G. J., Puwein, J., and Pollefeys, M. 2010. Unstructured video-based rendering: interactive exploration of casually captured videos. ACM Transactions on Graphics. Google ScholarDigital Library
- Bao, X., and Choudhury, R. R. 2010. Movi: Mobile phone based video highlights via collaborative sensing. In Proceedings of the International Conference on Mobile systems, Applications, and Services. Google ScholarDigital Library
- Barbieri, M., Agnihotri, L., and Dimitrova, N. 2003. Video summarization: methods and landscape. In Proceedings of the SPIE Internet Multimedia Management Systems.Google Scholar
- Berthouzoz, F., Li, W., and Agrawala, M. 2012. Tools for placing cuts and transitions in interview video. ACM Transactions on Graphics. Google ScholarDigital Library
- Cricri, F., Curcio, I. D. D., Mate, S., Dabov, K., and Gabbouj, M. 2012. Sensor-based analysis of user generated video for multi-camera video remixing. In Advances in Multimedia Modeling. Google ScholarDigital Library
- Dale, K., Shechtman, E., Avidan, S., and Pfister, H. 2012. Multi-video browsing and summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Workshop on Large-Scale Video Search and Mining.Google Scholar
- Dmytryk, E. 1984. On Film Editing: An Introduction to the Art of Film Construction. Focal Press.Google Scholar
- Fathi, A., Hodgins, J., and Rehg, J. 2012. Social interactions: A first-person perspective. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
- Gleicher, M. L., and Liu, F. 2007. Re-cinematography: Improving the camera dynamics of casual video. In Proceedings of the ACM International Conference on Multimedia. Google ScholarDigital Library
- Hata, T., Hirose, T., and Tanaka, K. 2000. Skimming multiple perspective video using tempo-spatial importance measures. In Proceedings of Visual Database Systems. Google ScholarDigital Library
- He, L.-w., Cohen, M. F., and Salesin, D. H. 1996. The virtual cinematographer: A paradigm for automatic real-time camera control and directing. ACM Transactions on Graphics.Google Scholar
- Heck, R., Wallick, M., and Gleicher, M. 2007. Virtual videography. ACM Transactions on Multimedia Computing, Communications, and Applications. Google ScholarDigital Library
- Jain, P., Manweiler, J., Acharya, A., and Beaty, K. 2013. Focus: Clustering crowdsourced videos by line-of-sight. In Proceedings of the International Conference on Embedded Networked Sensor Systems. Google ScholarDigital Library
- Kim, K., Grundmann, M., Shamir, A., Matthews, I., Hodgins, J., and Essa, I. 2010. Motion field to predict play evolution in dynamic sport scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Kumar, K., Prasad, S., Banwral, S., and Semwa, V. 2010. Sports video summarization using priority curve algorithm. International Journal on Computer Science and Engineering.Google Scholar
- Lee, Y. J., Ghosh, J., and Grauman, K. 2012. Discovering important people and objects for egocentric video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
- Lepetit, V., Moreno-Noguer, F., and Fua, P. 2009. EPnP: An accurate O(n) solution to the PnP problem. International Journal of Computer Vision. Google ScholarDigital Library
- Lu, Z., and Grauman, K. 2013. Story-driven summarization for egocentric video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
- Machnicki, E. 2002. Virtual director: Automating a webcast. In Proceedings of the SPIE Multimedia Computing and Networking.Google Scholar
- Money, A., and Agius, H. 2008. Video summarisation: A conceptual framework and survey of the state of the art. Journal of Visual Communications and Image Representation. Google ScholarDigital Library
- Park, H. S., Jain, E., and Sheikh, Y. 2012. 3D social saliency from head-mounted cameras. In Advances in Neural Information Processing Systems.Google Scholar
- Ponto, K., Kohlmann, J., and Gleicher, M. 2012. Effective replays and summarization of virtual experiences. IEEE Transactions on Visualization and Computer Graphics. Google ScholarDigital Library
- Pundik, D., and Moses, Y. 2010. Video synchronization using temporal signals from epipolar lines. In Proceedings of the European Conference on Computer Vision. Google ScholarDigital Library
- Rui, Y., He, L., Gupta, A., and Liu, Q. 2001. Building an intelligent camera management system. In Proceedings of the ACM International Conference on Multimedia. Google ScholarDigital Library
- Shrestha, P., de With, P. H., Weda, H., Barbieri, M., and Aarts, E. H. 2010. Automatic mashup generation from multiple-camera concert recordings. In Proceedings of the ACM International Conference on Multimedia. Google ScholarDigital Library
- Snavely, N., Seitz, S. M., and Szeliski, R. 2006. Photo tourism: exploring photo collections in 3D. ACM Transactions on Graphics. Google ScholarDigital Library
- Sumec, S. 2006. Multi camera automatic video editing. Computer Vision and Graphics.Google Scholar
- Takemae, Y., Otsuka, K., and Mukawa, N. 2004. Impact of video editing based on participants' gaze in multiparty conversation. In ACM CHI Extended Abstracts on Human Factors in Computing Systems. Google ScholarDigital Library
- Taskiran, C., and Delp, E. 2005. Video summarization. Digital Image Sequence Processing, Compression, and Analysis.Google Scholar
- Truong, B., and Venkatesh, S. 2007. Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications. Google ScholarDigital Library
- Wardrip-Fruin, N., and Harrigan, P. 2004. First person: New media as story, performance, and game. MIT Press. Google ScholarDigital Library
- Zsombori, V., Frantzis, M., Guimaraes, R. L., Ursu, M. F., Cesar, P., Kegel, I., Craigie, R., and Bulterman, D. C. 2011. Automatic generation of video narratives from shared UGC. In Proceedings of the ACM Conference on Hypertext and Hypermedia. Google ScholarDigital Library
Index Terms
- Automatic editing of footage from multiple social cameras
Recommendations
Complemental Use of Multiple Cameras for Stable Tracking of Multiple Markers
VMR '09: Proceedings of the 3rd International Conference on Virtual and Mixed Reality: Held as Part of HCI International 2009In many applications of Augmented Reality (AR), rectangular markers are tracked in real time by capturing with cameras. In this paper, we consider the AR application in which virtual objects are displayed onto markers while the markers and the cameras ...
Pose estimation from multiple cameras based on Sylvester's equation
In this paper, we introduce a method to estimate the object's pose from multiple cameras. We focus on direct estimation of the 3D object pose from 2D image sequences. Scale-Invariant Feature Transform (SIFT) is used to extract corresponding feature ...
Recovering Multiple View Geometry from Mutual Projections of Multiple Cameras
In this paper, we analyze the computation of epipolar geometry in some special cases where multiple cameras are projected each other in their images. In such cases, epipoles can be obtained directly from images as the projection of cameras. As the ...
Comments