Abstract
In this paper, we study the dynamics of the MSNBC news site, one of the busiest Web sites in the Internet today. Unlike many other efforts that have analyzed client accesses as seen by proxies, we focus on the server end. We analyze the dynamics of both the server content and client accesses made to the server. The former considers the content creation and modification process while the latter considers page popularity and locality in client accesses. Some of our key results are: (a) files tend to change little when they are modified, (b) a small set of files tends to get modified repeatedly, (c) file popularity follows a Zipf-like distribution with a parameter &agr that is much larger than reported in previous, proxy-based studies, and (d) there is significant temporal stability in file popularity but not much stability in the domains from which clients access the popular content. We discuss the implications of these findings for techniques such as Web caching (including cache consistency algorithms), and prefetching or server-based ``push'' of Web content.
- 1 http://www.abcnews.com.Google Scholar
- 2 V. Almeida, A. Bestavros, M. Crovella, and A. Oliveira. Characterizing Reference Locality inthe WWW. In Proc. of PDIS'96, December 1996. Google ScholarDigital Library
- 3 M. F. Arlitt and C. L. Williamson. Web Server Workload Characterization: The Search for Invariants. In Proc. of SIGMETRICS'96, May 1996. Google ScholarDigital Library
- 4 H. Balakrishnan, V. N. Padmanabhan, S. Seshan, M. Stemm, and R. H. Katz. TCP Behavior of a Busy Web Server: Analysis and Improvements. In Proc. Infocom 1998, March 1998.Google Scholar
- 5 L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web Caching and Zipf-like Distributions: Evidence and Implications. In Proc. of INFOCOMM'99, March 1999.Google ScholarCross Ref
- 6 E. Cohen, B. Krishnamurthy, and J. Rexford. Improving End-to-End Performance of the Web Using Server Volumes and Proxy Filters. In Proc. SIGCOMM'98, September 1998. Google ScholarDigital Library
- 7 C. Cunha, A. Bestavros, and M. Crovella. Characteristics of WWW client-based traces. Technical Report TR-95-010, Boston University, Computer Science Dept., USA, April 1995. Google ScholarDigital Library
- 8 http://www.cnn.com.Google Scholar
- 9 F. Douglis, A. Feldman, B. Krishnamurthy, and J. C. Mogul. Rate of Change and Other Metrics: A Live Study of the World Wide Web. In Proc. USITS '97, December 1997. Google ScholarDigital Library
- 10 B. M. Duska, D. Marwood, and M. J. Feeley. The Measured Access of World Wide Web Proxy Caches. In Proc. USITS '97, December 1997. Google ScholarDigital Library
- 11 L. Fan, Q. Jacobson, P. Cao, and W. Lin. Web Prefetching Between Low-Bandwidth Clients and Proxies: Potential and Performance. In Proc. of SIGMETRICS'99, May 1999. Google ScholarDigital Library
- 12 S. Glassman. A Caching Relay forthe World Wide Web. In First International Conference on the World Wide Web, CERN, Geneva, Switzerland, May 1994. Google ScholarDigital Library
- 13 S. D. Gribble and E. A. Brewer. System Design Issues for Internet Middleware Services: Deductions from a Large Client Trace. In Proc. USITS '97, December 1997. Google ScholarDigital Library
- 14 J. Gwertzman, M. Seltzer. World-Wide Web Cache Consistency. InProc. USENIX '96, January 1996. Google ScholarDigital Library
- 15 J. Hunt, K. P. Vo, and W. Tichy. Delta Algorithms: An Empirical Analysis. In ACM Transactions on Software Engineering and Methodology, Vol. 7, 1998. Google ScholarDigital Library
- 16 W. LeFebvre and K. Craig. Rapid Reverse DNS Lookups for Web Servers. In Proc. USITS '99, October 1999. Google ScholarDigital Library
- 17 C. Liu and P. Cao. Maintaining Strong Cache Consistency in the World-Wide Web. In Proc. ICDCS'97, pp. 12-21, May 1997. Google ScholarDigital Library
- 18 Microsoft Corporation. http://www.microsoft.com.Google Scholar
- 19 J. C. Mogul. Network Behaviorof a BusyWeb Server and its Clients. Research Report 95/5, Compaq Western Research Lab, October 1995.Google Scholar
- 20 J. C. Mogul, F. Douglis, A. Feldman, and B. Krishnamurthy. Potential Benefits of Delta Encoding and Data Compression for HTTP. InProc. SIGCOMM '97, September 1997. Google ScholarDigital Library
- 21 http://www.mediametrix.com.Google Scholar
- 22 http://www.msnbc.com.Google Scholar
- 23 N. Nishikawa, T. Hosokawa, Y. Mori, K. Yoshidab, and H. Tsujia. Memory-based Architecture for Distributed WWW Caching Proxy. In7th WWW Conference, April 1998. Google ScholarDigital Library
- 24 V. N. Padmanabhan and J. C. Mogul. Using Predictive Prefetching to Improve World Wide Web Latency. ACM SIGCOMM Computer Communication Review, July 1996. Google ScholarDigital Library
- 25 V. N. Padmanabhan and L. Qiu. "The Content and Access Dynamics of a Busy Web Server: Findings and Implications". Microsoft Research Technical Report MSR-TR-2000-13, February 2000.Google Scholar
- 26 J. Touch. The LSAM Proxy Cache - a Multicast Distributed Virtual Cache.Google Scholar
- 27 A. Vahdat, M. Dahlin, T. Anderson, and A. Aggarwal, Active Names: Flexible Location and Transport of Wide-Area Resources. In Proc. USITS '99, October 1999. Google ScholarDigital Library
- 28 G. M. Voelker. Personal Communication, Feb 2000.Google Scholar
- 29 A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, M. Brown, T. Landray, D. Pinnel, A. Karlin, H. Levy. Organization-based Analysis of Web-Object Sharing and Caching. In Proc. USITS '99, October 1999. Google ScholarDigital Library
- 30 A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, M. Brown, T. Landray, D. Pinnel, A. Karlin, H. Levy. On the Scale and Performance of Cooperative Web Proxy Caching. In Proc. SOSP '99, December 1999. Google ScholarDigital Library
- 31 H. Yu, L. Breslau, and S. Shenker. A Scalable Web Cache Consistency Architecture. In Proc. of SIGCOMM'99, September 1999. Google ScholarDigital Library
Index Terms
- The content and access dynamics of a busy Web site: findings and implications
Recommendations
The content and access dynamics of a busy Web site: findings and implications
SIGCOMM '00: Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer CommunicationIn this paper, we study the dynamics of the MSNBC news site, one of the busiest Web sites in the Internet today. Unlike many other efforts that have analyzed client accesses as seen by proxies, we focus on the server end. We analyze the dynamics of both ...
The content and access dynamics of a busy Web server (poster session)
SIGMETRICS '00: Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systemsWe study the MSNBC Web site, one of the busiest in the Internet today. We analyze the dynamics of content creation and modification as well as client accesses. Our key findings are (a) files tend to change little upon modification, (b) a small set of ...
The content and access dynamics of a busy Web server (poster session)
Special issue on proceedings of ACM SIGMETRICS 2000We study the MSNBC Web site, one of the busiest in the Internet today. We analyze the dynamics of content creation and modification as well as client accesses. Our key findings are (a) files tend to change little upon modification, (b) a small set of ...
Comments