skip to main content
10.1145/2467696.2467718acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Evaluating sliding and sticky target policies by measuring temporal drift in acyclic walks through a web archive

Published:22 July 2013Publication History

ABSTRACT

When a user views an archived page using the archive's user interface (UI), the user selects a datetime to view from a list. The archived web page, if available, is then displayed. From this display, the web archive UI attempts to simulate the web browsing experience by smoothly transitioning between archived pages. During this process, the target datetime changes with each link followed; drifting away from the datetime originally selected. When browsing sparsely-archived pages, this nearly-silent drift can be many years in just a few clicks. We conducted 200,000 acyclic walks of archived pages, following up to 50 links per walk, comparing the results of two target datetime policies. The Sliding Target policy allows the target datetime to change as it does in archive UIs such as the Internet Archive's Wayback Machine. The Sticky Target policy, represented by the Memento API, keeps the target datetime the same throughout the walk. We found that the Sliding Target policy drift increases with the number of walk steps, number of domains visited, and choice (number of links available). However, the Sticky Target policy controls temporal drift, holding it to less than 30 days on average regardless of walk length or number of domains visited. The Sticky Target policy shows some increase as choice increases, but this may be caused by other factors. We conclude that based on walk length, the Sticky Target policy generally produces at least 30 days less drift than the Sliding Target policy.

References

  1. S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How much of the Web is archived? In Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital libraries, JCDL'11, pages 133--136, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How much of the Web is archived? Technical Report arXiv:1212.6177, Old Dominion University, December 2012.Google ScholarGoogle Scholar
  3. Y. AlNoamany, M. C. Weigle, and M. L. Nelson. Access patterns for robots and humans in web archives. In Proceedings of the 13th Annual International ACM/IEEE Joint Conference on Digital libraries, JCDL'13, July 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Ben Saad and S. Gançarski. Archiving the Web using page changes patterns: a case study. In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, JCDL'11, pages 113--122, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Ben Saad and S. Gançarski. Improving the quality of web archives through the importance of changes. In Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I, DEXA'11, pages 394--409, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Ben Saad, Z. Pehlivan, and S. Gançarski. Coherence-oriented crawling and navigation using patterns for web archives. In Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries, TPDL'11, pages 421--433, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. F. Brunelle and M. L. Nelson. Evaluating the sitestory transactional web archive with the apachebench tool. Technical Report arXiv:1209.1811, Old Dominion University, September 2012.Google ScholarGoogle Scholar
  8. C. Casey. The Cyberarchive: a look at the storage and preservation of web sites. College & Research Libraries, 59, 1998.Google ScholarGoogle Scholar
  9. M. Day. Preserving the fabric of our lives: A survey of web preservation initiatives. In Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2005), pages 461--472, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  10. D. Denev, A. Mazeika, M. Spaniol, and G. Weikum. SHARC: Framework for quality-conscious web archiving. volume 2, pages 586--597, August 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. E. Dyreson, H.-l. Lin, and Y. Wang. Managing versions of web documents in a transaction-time web server. In Proceedings of the 13th international conference on World Wide Web, WWW'04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Eysenbach and M. Trudel. Going, going, still there: Using the WebCite service to permanently archive cited web pages. Journal of Medical Internet Research, 7(5), 2005.Google ScholarGoogle ScholarCross RefCross Ref
  13. K. Fitch. Web site archiving: an approach to recording every materially different response produced by a website. In 9th Australasian World Wide Web Conference, Sanctuary Cove, Queensland, Australia,, pages 5--9, 2003.Google ScholarGoogle Scholar
  14. B. Kahle. Wayback machine: Now with 240,000,000,000 URLs. http://blog.archive.org/2013/01/09/updated-wayback/, January 2013.Google ScholarGoogle Scholar
  15. M. Kimpton and J. Ubois. Year-by-year:\ from an archive of the Internet to an archive on the Internet. In J. Masanès, editor, Web Archiving, chapter 9, pages 201--212. 2006.Google ScholarGoogle Scholar
  16. J. Masanès. Web archiving: issues and methods. In J. Masanès, editor, Web Archving, chapter 1, pages 1--53. 2006.Google ScholarGoogle Scholar
  17. F. McCown and M. L. Nelson. Characterization of search engine caches. In Proceedings of IS&T Archiving 2007, pages 48--52, May 2007.Google ScholarGoogle Scholar
  18. G. Mohr, M. Stack, I. Rnitovic, D. Avery, and M. Kimpton. Introduction to Heritrix, an archival quality web crawler. In 4th International Web Archiving Workshop, Bath, UK, September 2004.Google ScholarGoogle Scholar
  19. K. C. Negulescu. Web archiving @ the Internet Archive. http://www.digitalpreservation.gov/news/events/ndiipp_meetings/ndiipp10/docs/July21/session09/NDIIPP072110FinalIA.ppt, 2010.Google ScholarGoogle Scholar
  20. R. Sanderson, H. Shankar, S. Ainsworth, F. McCown, and S. Adams. Implementing time travel for the Web. Code\4\Lib Journal, (13), 2011.Google ScholarGoogle Scholar
  21. M. Spaniol, D. Denev, A. Mazeika, G. Weikum, and P. Senellart. Data quality in web archiving. In Proceedings of the 3rd Workshop on Information Credibility on the Web, WICOW'09, pages 19--26, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Spaniol, A. Mazeika, D. Denev, and G. Weikum. "Catch me if you can": Visual analysis of coherence defects in web archiving. In The 9 th International Web Archiving Workshop (IWAW 2009) Corfu, Greece, September/October, 2009 Workshop Proceedings, pages 27--37, 2009.Google ScholarGoogle Scholar
  23. M. Thelwall and L. Vaughan. A fair history of the Web? examining country balance in the Internet Archive. Library & Information Science Research, 26(2), 2004.Google ScholarGoogle Scholar
  24. B. Tofel. "Wayback" for accessing web archives. In Proceedings of the 7th International Web Archiving Workshop (IWAW'07), 2007.Google ScholarGoogle Scholar
  25. H. Van de Sompel, M. Nelson, and R. Sanderson. HTTP framework for time-based access to resource states -- Memento, November 2010. http://datatracker.ietf.org/doc/draft-vandesompel-memento/.Google ScholarGoogle Scholar
  26. H. Van de Sompel, M. L. Nelson, R. Sanderson, L. L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time travel for the Web. Technical Report arXiv:0911.1112, 2009.Google ScholarGoogle Scholar
  27. M. C. Weigle. How much of the web is archived? http://ws-dl.blogspot.com/2011/06/2011-06--23-how-much-of-web-is-archived.html, June 2011.Google ScholarGoogle Scholar

Index Terms

  1. Evaluating sliding and sticky target policies by measuring temporal drift in acyclic walks through a web archive

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        JCDL '13: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
        July 2013
        480 pages
        ISBN:9781450320771
        DOI:10.1145/2467696

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 July 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        JCDL '13 Paper Acceptance Rate28of95submissions,29%Overall Acceptance Rate415of1,482submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader