skip to main content
article
Free Access

The content and access dynamics of a busy Web site: findings and implications

Published:28 August 2000Publication History
Skip Abstract Section

Abstract

In this paper, we study the dynamics of the MSNBC news site, one of the busiest Web sites in the Internet today. Unlike many other efforts that have analyzed client accesses as seen by proxies, we focus on the server end. We analyze the dynamics of both the server content and client accesses made to the server. The former considers the content creation and modification process while the latter considers page popularity and locality in client accesses. Some of our key results are: (a) files tend to change little when they are modified, (b) a small set of files tends to get modified repeatedly, (c) file popularity follows a Zipf-like distribution with a parameter &agr that is much larger than reported in previous, proxy-based studies, and (d) there is significant temporal stability in file popularity but not much stability in the domains from which clients access the popular content. We discuss the implications of these findings for techniques such as Web caching (including cache consistency algorithms), and prefetching or server-based ``push'' of Web content.

References

  1. 1 http://www.abcnews.com.Google ScholarGoogle Scholar
  2. 2 V. Almeida, A. Bestavros, M. Crovella, and A. Oliveira. Characterizing Reference Locality inthe WWW. In Proc. of PDIS'96, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 M. F. Arlitt and C. L. Williamson. Web Server Workload Characterization: The Search for Invariants. In Proc. of SIGMETRICS'96, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 H. Balakrishnan, V. N. Padmanabhan, S. Seshan, M. Stemm, and R. H. Katz. TCP Behavior of a Busy Web Server: Analysis and Improvements. In Proc. Infocom 1998, March 1998.Google ScholarGoogle Scholar
  5. 5 L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web Caching and Zipf-like Distributions: Evidence and Implications. In Proc. of INFOCOMM'99, March 1999.Google ScholarGoogle ScholarCross RefCross Ref
  6. 6 E. Cohen, B. Krishnamurthy, and J. Rexford. Improving End-to-End Performance of the Web Using Server Volumes and Proxy Filters. In Proc. SIGCOMM'98, September 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 C. Cunha, A. Bestavros, and M. Crovella. Characteristics of WWW client-based traces. Technical Report TR-95-010, Boston University, Computer Science Dept., USA, April 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 http://www.cnn.com.Google ScholarGoogle Scholar
  9. 9 F. Douglis, A. Feldman, B. Krishnamurthy, and J. C. Mogul. Rate of Change and Other Metrics: A Live Study of the World Wide Web. In Proc. USITS '97, December 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 B. M. Duska, D. Marwood, and M. J. Feeley. The Measured Access of World Wide Web Proxy Caches. In Proc. USITS '97, December 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11 L. Fan, Q. Jacobson, P. Cao, and W. Lin. Web Prefetching Between Low-Bandwidth Clients and Proxies: Potential and Performance. In Proc. of SIGMETRICS'99, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 S. Glassman. A Caching Relay forthe World Wide Web. In First International Conference on the World Wide Web, CERN, Geneva, Switzerland, May 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 S. D. Gribble and E. A. Brewer. System Design Issues for Internet Middleware Services: Deductions from a Large Client Trace. In Proc. USITS '97, December 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 J. Gwertzman, M. Seltzer. World-Wide Web Cache Consistency. InProc. USENIX '96, January 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 J. Hunt, K. P. Vo, and W. Tichy. Delta Algorithms: An Empirical Analysis. In ACM Transactions on Software Engineering and Methodology, Vol. 7, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16 W. LeFebvre and K. Craig. Rapid Reverse DNS Lookups for Web Servers. In Proc. USITS '99, October 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 C. Liu and P. Cao. Maintaining Strong Cache Consistency in the World-Wide Web. In Proc. ICDCS'97, pp. 12-21, May 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18 Microsoft Corporation. http://www.microsoft.com.Google ScholarGoogle Scholar
  19. 19 J. C. Mogul. Network Behaviorof a BusyWeb Server and its Clients. Research Report 95/5, Compaq Western Research Lab, October 1995.Google ScholarGoogle Scholar
  20. 20 J. C. Mogul, F. Douglis, A. Feldman, and B. Krishnamurthy. Potential Benefits of Delta Encoding and Data Compression for HTTP. InProc. SIGCOMM '97, September 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21 http://www.mediametrix.com.Google ScholarGoogle Scholar
  22. 22 http://www.msnbc.com.Google ScholarGoogle Scholar
  23. 23 N. Nishikawa, T. Hosokawa, Y. Mori, K. Yoshidab, and H. Tsujia. Memory-based Architecture for Distributed WWW Caching Proxy. In7th WWW Conference, April 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24 V. N. Padmanabhan and J. C. Mogul. Using Predictive Prefetching to Improve World Wide Web Latency. ACM SIGCOMM Computer Communication Review, July 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25 V. N. Padmanabhan and L. Qiu. "The Content and Access Dynamics of a Busy Web Server: Findings and Implications". Microsoft Research Technical Report MSR-TR-2000-13, February 2000.Google ScholarGoogle Scholar
  26. 26 J. Touch. The LSAM Proxy Cache - a Multicast Distributed Virtual Cache.Google ScholarGoogle Scholar
  27. 27 A. Vahdat, M. Dahlin, T. Anderson, and A. Aggarwal, Active Names: Flexible Location and Transport of Wide-Area Resources. In Proc. USITS '99, October 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28 G. M. Voelker. Personal Communication, Feb 2000.Google ScholarGoogle Scholar
  29. 29 A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, M. Brown, T. Landray, D. Pinnel, A. Karlin, H. Levy. Organization-based Analysis of Web-Object Sharing and Caching. In Proc. USITS '99, October 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30 A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, M. Brown, T. Landray, D. Pinnel, A. Karlin, H. Levy. On the Scale and Performance of Cooperative Web Proxy Caching. In Proc. SOSP '99, December 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. 31 H. Yu, L. Breslau, and S. Shenker. A Scalable Web Cache Consistency Architecture. In Proc. of SIGCOMM'99, September 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The content and access dynamics of a busy Web site: findings and implications

                    Recommendations

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in

                    Full Access

                    • Published in

                      cover image ACM SIGCOMM Computer Communication Review
                      ACM SIGCOMM Computer Communication Review  Volume 30, Issue 4
                      October 2000
                      319 pages
                      ISSN:0146-4833
                      DOI:10.1145/347057
                      Issue’s Table of Contents
                      • cover image ACM Conferences
                        SIGCOMM '00: Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
                        August 2000
                        348 pages
                        ISBN:1581132239
                        DOI:10.1145/347059

                      Copyright © 2000 ACM

                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Published: 28 August 2000

                      Check for updates

                      Qualifiers

                      • article

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader