ABSTRACT
Videos are an increasingly utilized part of the experience of the billions of people that use Facebook. These videos must be uploaded and processed before they can be shared and downloaded. Uploading and processing videos at our scale, and across our many applications, brings three key requirements: low latency to support interactive applications; a flexible programming model for application developers that is simple to program, enables efficient processing, and improves reliability; and robustness to faults and overload. This paper describes the evolution from our initial monolithic encoding script (MES) system to our current Streaming Video Engine (SVE) that overcomes each of the challenges. SVE has been in production since the fall of 2015, provides lower latency than MES, supports many diverse video applications, and has proven to be reliable despite faults and overload.
Supplemental Material
- Anne Aaron and David Ronca. 2015. High Quality Video Encoding at Scale. http://techblog.netflix.com/2015/12/high-quality-video-encoding-at-scale.html. (2015).Google Scholar
- Daniel J Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, et al. 2005. The Design of the Borealis Stream Processing Engine. In Proceedings of the 2005 Conference on Innovative Data Systems Research.Google Scholar
- Daniel J Abadi, Don Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003. Aurora: a new model and architecture for data stream management. The VLDB Journal--The International Journal on Very Large Data Bases 12, 2 (2003), 120--139. Google ScholarDigital Library
- Tyler Akidau, Alex Balikov, Kaya Bekiroğlu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. Proceedings of the VLDB Endowment 6, 11 (2013), 1033--1044. Google ScholarDigital Library
- Apache Storm 2017. http://storm.apache.org/. (2017).Google Scholar
- Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. 2010. Finding a Needle in Haystack: Facebook's Photo Storage. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation. USENIX. Google ScholarDigital Library
- Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani. 2013. TAO: Facebook's Distributed Data Store for the Social Graph. In Proceedings of the 2013 USENIX Annual Technical Conference. USENIX. Google ScholarDigital Library
- Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J Franklin, Joseph M Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel R Madden, Fred Reiss, and Mehul A Shah. 2003. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM. Google ScholarDigital Library
- Tyson Condie, Neil Conway, Peter Alvaro, Joseph M Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce Online. In Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation. USENIX. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation. USENIX. Google ScholarDigital Library
- Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation. USENIX. Google ScholarDigital Library
- HipHop Virtual Machine 2017. http://hhvm.com/. (2017).Google Scholar
- Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C. Li. 2013. An Analysis of Facebook Photo Caching. In Proceedings of the 24th ACM Symposium on Operating System Principles. ACM. Google ScholarDigital Library
- Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In Proceedings of the 2nd ACM SIGOPS European Conference on Computer Systems. ACM. Google ScholarDigital Library
- Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthikeyan Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM. Google ScholarDigital Library
- Wei Lin, Zhengping Qian, Junwei Xu, Sen Yang, Jingren Zhou, and Lidong Zhou. 2016. StreamScope: Continuous Reliable Distributed Processing of Big Data Streams. In Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation. USENIX. Google ScholarDigital Library
- Yao-Chung Lin, Hugh Denman, and Anil Kokaram. 2015. Multipass Encoding for Reducing Pulsing Artifacts in Cloud Based Video Transcoding. In Proceedings of the 2015 IEEE International Conference on Image Processing. IEEE.Google ScholarCross Ref
- Rajeev Motwani, Jennifer Widom, Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Gurmeet Manku, Chris Olston, Justin Rosenstein, and Rohit Varma. 2003. Query Processing, Resource Management, and Approximation in a Data Stream Management System. In Proceedings of the 2003 Conference on Innovative Data Systems Research.Google Scholar
- Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, et al. 2014. f4: Facebook's Warm BLOB Storage System. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. USENIX. Google ScholarDigital Library
- Derek Gordon Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: A Timely Dataflow System. In Proceedings of the 24th ACM Symposium on Operating System Principles. ACM. Google ScholarDigital Library
- Derek G Murray, Malte Schwarzkopf, Christopher Smowton, Steven Smith, Anil Madhavapeddy, and Steven Hand. 2011. CIEL: a universal execution engine for distributed data-flow computing. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation. USENIX. Google ScholarDigital Library
- Leonardo Neumeyer, Bruce Robbins, Anish Nair, and Anand Kesari. 2010. S4: Distributed Stream Computing Platform. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops. IEEE. Google ScholarDigital Library
- Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn. 2006. Rethink the Sync. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. USENIX. Google ScholarDigital Library
- Edmund B Nightingale, Kaushik Veeraraghavan, Peter M Chen, and Jason Flinn. 2008. Rethink the Sync. ACM Transactions on Computer Systems (TOCS) 26, 3 (2008), 6. Google ScholarDigital Library
- Russell Power and Jinyang Li. 2010. Piccolo: Building Fast, Distributed Programs with Partitioned Tables. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation. USENIX. Google ScholarDigital Library
- Ariel Rabkin, Matvey Arye, Siddhartha Sen, Vivek S Pai, and Michael J Freedman. 2014. Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation. USENIX. Google ScholarDigital Library
- Vijay Rao and Edwin Smith. 2016. Facebook's new front-end server design delivers on performance without sucking up power. https://code.facebook.com/posts/1711485769063510. (2016).Google Scholar
- Alyson Shontell. 2015. Facebook is now generating 8 billion video views per day from just 500 million people âĂŤ here's how that's possible. https://tinyurl.com/yc3jhxuu. (2015).Google Scholar
- Linpeng Tang, Qi Huang, Wyatt Lloyd, Sanjeev Kumar, and Kai Li. 2015. RIPQ: Advanced Photo Caching on Flash for Facebook. In Proceedings of the 13th USENIX Conference on File and Storage Technologies. USENIX. Google ScholarDigital Library
- Linpeng Tang, Qi Huang, Amit Puntambekar, Ymir Vigfusson, Wyatt Lloyd, and Kai Li. 2017. Popularity Prediction of Facebook Videos for Higher Quality Streaming. In Proceedings of the 2017 USENIX Annual Technical Conference. USENIX. Google ScholarDigital Library
- The Hack Programming Language 2017. http://hacklang.org/. (2017).Google Scholar
- Kaushik Veeraraghavan, Justin Meza, David Chou, Wonho Kim, Sonia Margulis, Scott Michelson, Rajesh Nishtalaand Daniel Obenshain, Dmitri Perelman, and Yee Jiun Song. 2016. Kraken: Leveraging Live Traffic Tests to Identify and Resolve Resource Utilization Bottlenecks in Large Scale Web Services. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation. USENIX. Google ScholarDigital Library
- Rick Wong, Zhan Chen, Anne Aaron, Megha Manohara, and Darrell Denlinger. 2016. Chelsea: Encoding in the Fast Lane. http://techblog.netflix.com/2016/07/chelsea-encoding-in-fast-lane.html. (2016).Google Scholar
- Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. 2008. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation. USENIX. Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation. USENIX. Google ScholarDigital Library
- Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-Tolerant Streaming Computation at Scale. In Proceedings of the 24th ACM Symposium on Operating System Principles. ACM. Google ScholarDigital Library
- Haoyu Zhang, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J. Freedman. 2017. Live Video Analytics at Scale with Approximation and Delay-Tolerance.. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation. USENIX. Google ScholarDigital Library
Recommendations
An Efficient Hybrid Peer-to-Peer System for Distributed Data Sharing
Peer-to-peer overlay networks are widely used in distributed systems. Based on whether a regular topology is maintained among peers, peer-to-peer networks can be divided into two categories: structured peer-to-peer networks in which peers are connected ...
Performance analysis of structured peer-to-peer overlays for mobile networks
Distributed Hash Table DHT based Peer-to-Peer P2P overlays have been widely researched and deployed in many applications such as file sharing, IP telephony, content distribution and media streaming applications. However, their deployment has largely ...
Comments