skip to main content
research-article

Characterizing Web-Based Video Sharing Workloads

Published:01 May 2011Publication History
Skip Abstract Section

Abstract

Video sharing services that allow ordinary Web users to upload video clips of their choice and watch video clips uploaded by others have recently become very popular. This article identifies invariants in video sharing workloads, through comparison of the workload characteristics of four popular video sharing services. Our traces contain metadata on approximately 1.8 million videos which together have been viewed approximately 6 billion times. Using these traces, we study the similarities and differences in use of several Web 2.0 features such as ratings, comments, favorites, and propensity of uploading content. In general, we find that active contribution, such as video uploading and rating of videos, is much less prevalent than passive use. While uploaders in general are skewed with respect to the number of videos they upload, the fraction of multi-time uploaders is found to differ by a factor of two between two of the sites. The distributions of lifetime measures of video popularity are found to have heavy-tailed forms that are similar across the four sites. Finally, we consider implications for system design of the identified invariants. To gain further insight into caching in video sharing systems, and the relevance to caching of lifetime popularity measures, we gathered an additional dataset tracking views to a set of approximately 1.3 million videos from one of the services, over a twelve-week period. We find that lifetime popularity measures have some relevance for large cache (hot set) sizes (i.e., a hot set defined according to one of these measures is indeed relatively “hot”), but that this relevance substantially decreases as cache size decreases, owing to churn in video popularity.

References

  1. Acharya, S., Smith, B., and Parnes, P. 2000. Characterizing user access to videos on the World Wide Web. In Proceedings of the SPIE Multimedia Computing and Networking (MMCN) Conference. 130--141.Google ScholarGoogle Scholar
  2. Adamic, L. Zipf, power-laws, and Pareto - A ranking tutorial. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html.Google ScholarGoogle Scholar
  3. Almeida, J., Krueger, J., Eager, D., and Vernon, M. 2001. Analysis of educational media server workloads. In Proceedings of the International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV). 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arlitt, M. and Williamson, C. 1997. Internet web servers: Workload characterization and performance implications. IEEE/ACM Trans. on Netw. 5, 5, 631--645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Barford, P. and Crovella, M. 1998. Generating representative web workloads for network and server performance evaluation. SIGMETRICS Perform. Eval. Rev. 26, 1, 151--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Breslau, L., Cao, P., Fan, L., Phillips, G., and Shenker, S. 1999. Web caching and Zipf-like distributions: Evidence and implications. In Proceedings of the IEEE INFOCOM Conference. 126--134.Google ScholarGoogle Scholar
  7. Cha, M., Kwak, H., Rodriguez, P., Ahn, Y., and Moon, S. 2007. I Tube, You Tube, Everybody Tubes: Analyzing the world’s largest user generated content video system. In Proceedings of the ACM Internet Measurement Conference (IMC). 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cheng, X., Dale, C., and Lui, J. 2008. Statistics and social network of YouTube videos. In Proceedings of the International Workshop on Quality of Service (IWQoS). 229--238.Google ScholarGoogle Scholar
  9. Clauset, A., Shalizi, C., and Newman, M. 2009. Power-law distributions in empirical data. SIAM Rev. 51, 4, 661--703. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cuong, C. D. 2007. YouTube scalability. Google Seattle Conference on Scalability. http://www.techpresentations.org/YouTube_Scalability.Google ScholarGoogle Scholar
  11. Downey, A. B. 2005. Lognormal and Pareto distributions in the Internet. Comput. Comm. 28, 7, 790--801. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gill, P., Arlitt, M., Li, Z., and Mahanti, A. 2007. YouTube traffic characterization: A view from the edge. In Proceedings of the ACM Internet Measurement Conference (IMC). 15--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gill, P., Arlitt, M., Li, Z., and Mahanti, A. 2008. Characterizing user sessions on YouTube. In Proceedings of the SPIE Multimedia Computing and Networking (MMCN) Conference.Google ScholarGoogle Scholar
  14. Gummadi, K., Dunn, R., Saroiu, S., Gribble, S., Levy, H., and Zahorjan, J. 2003. Measurement, modeling and analysis of a peer-to-peer file-sharing workload. SIGOPS Oper. Syst. Rev. 37, 5, 314--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Guo, L., Tan, E., Chen, S., Xiao, Z., and Zhang, X. 2008. The stretched exponential distribution of Internet media access patterns. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC). 283--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Halvey, M. and Keane, M. 2007a. Analysis of online video search and sharing. In Proceedings of the ACM Hypertext and Hypermedia Conference. 217--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Halvey, M. and Keane, M. 2007b. Exploring social dynamics in online media sharing. In Proceedings of the International Conference on World Wide Web (WWW). 1273--1274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hefeeda, M. and Saleh, O. 2008. Traffic modeling and proportional partial caching for peer-to-peer systems. IEEE/ACM Trans. Netw. 16, 6, 1447--1460. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mahanti, A., Williamson, C., and Eager, D. 2000. Traffic analysis of a web proxy caching hierarchy. IEEE Netw. 14, 3, 16--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mitra, S., Agrawal, M., Yadav, A., Carlsson, N., Eager, D., and Mahanti, A. 2009. Characterizing web-based video sharing workloads. In Proceedings of the International Conference on World Wide Web (WWW). 1191--1192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mitzenmacher, M. 2004. A brief history of generative models for power law and lognormal distributions. Internet Math. 1, 2, 226--251.Google ScholarGoogle ScholarCross RefCross Ref
  22. Newman, M. 2005. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 46, 5, 323--351.Google ScholarGoogle ScholarCross RefCross Ref
  23. Yu, H., Zheng, D., Zhao, B., and Zheng, W. 2006. Understanding user behavior in large-scale video-on-demand systems. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys). 333--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zink, M., Suh, K., and Kurose, J. 2008. Watch global, cache local: Youtube network traffic at a campus network - Measurements and implications. In Proceedings of the SPIE Multimedia Computing and Networking (MMCN) Conference.Google ScholarGoogle Scholar

Index Terms

  1. Characterizing Web-Based Video Sharing Workloads

          Recommendations

          Reviews

          Amos O Olagunju

          Web-based video distribution services are powerful tools that enterprises use to attract and retain customers, and to advertise merchandise over the Internet. Consequently, Web-based video distribution service providers can benefit from understanding the characteristics and quality of shared video workloads. How can service providers implement efficient storage techniques and improve access to voluminous worldwide video content generated by users__?__ This paper investigates the workload characteristics of video sharing on the Web in an effort to recommend effective system design for providers of Web-based video distribution services. The authors selected four providers of Web-based video distribution services and analyzed traces of metadata from nearly two million videos with over six billion views. They examine the cumulative distributions of ratings, comments, and uploads of videos by users. Video popularity is ascertained from the cumulative distributions of the total viewers and their ratings. Zipf's law [1] and power laws [2] are used to illuminate the hyperbolic behaviors of the video-sharing services, while Pareto distributions [3] are used to examine their invariant workload characteristics. The authors used datasets from Dailymotion, Metacafe, Yahoo, and Veoh to characterize Web-based video sharing workloads. A comparison of the video workloads of these four services reveals that most users don't rate videos unless they are popular; comment on or bookmark favorite videos; upload videos, except for when advertising house goods, music, or news, or promoting healthy habits; or upload longer videos, unless they are privileged users. The authors observe that perceptions of the popularity of videos viewed and rated by users exhibit Pareto distributions. Consequently, they used these factors to define lifetime measures of the popularity of individual videos. Twenty percent of the most popular videos in the study accounted for at least 85 percent of the total videos watched and at least 80 percent of the video ratings. Unfortunately, the authors could not establish a generic parametric power law model to explain the distributions of video popularity across the Web-based video distribution services. However, the authors identify two important invariant characteristics of Web-based video distribution services: the infrequent ratings of, and comments on, videos with social interaction tools; and the smaller number of users who upload videos compared to users who watch videos. They creatively use graphs to illuminate the relationships between computer memory caching efficiencies and the popularity of video content, without deriving any prediction equations. Prediction equations would be useful for allocating memory resources and scheduling frequently accessed videos during peak periods. The research results in this paper should encourage interesting debates among Web-based video distribution services and academics. Information storage and retrieval experts and operating systems researchers should participate in this discussion. Online Computing Reviews Service

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on the Web
            ACM Transactions on the Web  Volume 5, Issue 2
            May 2011
            190 pages
            ISSN:1559-1131
            EISSN:1559-114X
            DOI:10.1145/1961659
            Issue’s Table of Contents

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 May 2011
            • Accepted: 1 July 2010
            • Revised: 1 June 2010
            • Received: 1 February 2009
            Published in tweb Volume 5, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader