Skip to main content
Top
Published in: International Journal on Digital Libraries 2/2015

01-06-2015

Evaluating sliding and sticky target policies by measuring temporal drift in acyclic walks through a web archive

Authors: Scott G. Ainsworth, Michael L. Nelson

Published in: International Journal on Digital Libraries | Issue 2/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

When viewing an archived page using the archive’s user interface (UI), the user selects a datetime to view from a list. The archived web page, if available, is then displayed. From this display, the web archive UI attempts to simulate the web browsing experience by smoothly transitioning between archived pages. During this process, the target datetime changes with each link followed, potentially drifting away from the datetime originally selected. For sparsely archived resources, this almost transparent drift can be many years in just a few clicks. We conducted 200,000 acyclic walks of archived pages, following up to 50 links per walk, comparing the results of two target datetime policies. The Sliding Target policy allows the target datetime to change as it does in archive UIs such as the Internet Archive’s Wayback Machine. The Sticky Target policy, represented by the Memento API, keeps the target datetime the same throughout the walk. We found that the Sliding Target policy drift increases with the number of walk steps, number of domains visited, and choice (number of links available). However, the Sticky Target policy controls temporal drift, holding it to \(<\)30 days on average regardless of walk length or number of domains visited. The Sticky Target policy shows some increase as choice increases, but this may be caused by other factors. We conclude that based on walk length, the Sticky Target policy generally produces at least 30 days less drift than the Sliding Target policy.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
2.
go back to reference Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the Web is archived? In: Proceedings of JCDL’11, pp. 133–136 (2011). doi:10.1145/1998076.1998100 Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the Web is archived? In: Proceedings of JCDL’11, pp. 133–136 (2011). doi:10.​1145/​1998076.​1998100
3.
go back to reference Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the Web is archived? Tech. Rep. arXiv:1212.6177, Old Dominion University (2012) Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the Web is archived? Tech. Rep. arXiv:​1212.​6177, Old Dominion University (2012)
4.
go back to reference AlNoamany, Y., Weigle, M.C., Nelson, M.L.: Access patterns for robots and humans in web archives. In: Proceedings of JCDL’13, pp. 339–348 (2013). doi:10.1145/2467696.2467722 AlNoamany, Y., Weigle, M.C., Nelson, M.L.: Access patterns for robots and humans in web archives. In: Proceedings of JCDL’13, pp. 339–348 (2013). doi:10.​1145/​2467696.​2467722
5.
go back to reference AlSum, A., Weigle, M.C., Nelson, M.L., de Sompel, H.V.: Profiling web archive coverage for top-level domain and content language. In: Proceedings of TPDL 2013, pp. 60–71 (2013). doi:10.1007/978-3-642-40501-3_7 AlSum, A., Weigle, M.C., Nelson, M.L., de Sompel, H.V.: Profiling web archive coverage for top-level domain and content language. In: Proceedings of TPDL 2013, pp. 60–71 (2013). doi:10.​1007/​978-3-642-40501-3_​7
6.
8.
go back to reference Ben Saad, M., Pehlivan, Z., Gançarski, S.: Coherence-oriented crawling and navigation using patterns for web archives. In: Proceedings of TPDL’11, pp. 421–433 (2011). doi:10.1007/978-3-642-24469-8_42 Ben Saad, M., Pehlivan, Z., Gançarski, S.: Coherence-oriented crawling and navigation using patterns for web archives. In: Proceedings of TPDL’11, pp. 421–433 (2011). doi:10.​1007/​978-3-642-24469-8_​42
9.
go back to reference Brunelle, J.F., Nelson, M.L.: Evaluating the SiteStory transactional web archive with the ApacheBench tool. Tech. Rep. arXiv:1209.1811, Old Dominion University (2012) Brunelle, J.F., Nelson, M.L.: Evaluating the SiteStory transactional web archive with the ApacheBench tool. Tech. Rep. arXiv:​1209.​1811, Old Dominion University (2012)
10.
go back to reference Brunelle, J.F., Nelson, M.L., Balakireva, L., Sanderson, R., Van de Sompel, H.: Evaluating the SiteStory transactional web archive with the ApacheBench tool. In: 17th Annual Conference on the Theory and Practice of Digital Libraries, pp. 204–215 (2012). doi:10.1007/978-3-642-40501-3_20 Brunelle, J.F., Nelson, M.L., Balakireva, L., Sanderson, R., Van de Sompel, H.: Evaluating the SiteStory transactional web archive with the ApacheBench tool. In: 17th Annual Conference on the Theory and Practice of Digital Libraries, pp. 204–215 (2012). doi:10.​1007/​978-3-642-40501-3_​20
13.
go back to reference Denev, D., Mazeika, A., Spaniol, M., Weikum, G.: SHARC: framework for quality-conscious web archiving. Proc. VLDB Endow. 2, 586–597 (2009)CrossRef Denev, D., Mazeika, A., Spaniol, M., Weikum, G.: SHARC: framework for quality-conscious web archiving. Proc. VLDB Endow. 2, 586–597 (2009)CrossRef
14.
go back to reference Dyreson, C.E., Lin, H.l., Wang, Y.: Managing versions of web documents in a transaction-time web server. In: Proceedings of WWW’04 (2004). doi:10.1145/988672.988730 Dyreson, C.E., Lin, H.l., Wang, Y.: Managing versions of web documents in a transaction-time web server. In: Proceedings of WWW’04 (2004). doi:10.​1145/​988672.​988730
15.
go back to reference Eysenbach, G., Trudel, M.: Going, going, still there: using the WebCite service to permanently archive cited web pages. J. Med. Internet Res. 7(5) (2005). doi:10.2196/jmir.7.5.e60 Eysenbach, G., Trudel, M.: Going, going, still there: using the WebCite service to permanently archive cited web pages. J. Med. Internet Res. 7(5) (2005). doi:10.​2196/​jmir.​7.​5.​e60
16.
go back to reference Fitch., K.: Web site archiving: an approach to recording every materially different response produced by a website. In: 9th Australasian World Wide Web Conference, Sanctuary Cove, Queensland, Australia, pp. 5–9 (2003) Fitch., K.: Web site archiving: an approach to recording every materially different response produced by a website. In: 9th Australasian World Wide Web Conference, Sanctuary Cove, Queensland, Australia, pp. 5–9 (2003)
18.
go back to reference Kimpton, M., Ubois, J.: Year-by-year: from an archive of the Internet to an archive on the Internet. In: Masanès, J. (ed.) Web archiving, chap. 9, pp. 201–212 (2006). doi:10.1007/978-3-540-46332-0_9 Kimpton, M., Ubois, J.: Year-by-year: from an archive of the Internet to an archive on the Internet. In: Masanès, J. (ed.) Web archiving, chap. 9, pp. 201–212 (2006). doi:10.​1007/​978-3-540-46332-0_​9
19.
go back to reference Masanès, J.: Web archiving: issues and methods. In: Masanès, J. (ed.) Web archving, chap. 1, pp. 1–53 (2006) Masanès, J.: Web archiving: issues and methods. In: Masanès, J. (ed.) Web archving, chap. 1, pp. 1–53 (2006)
20.
go back to reference McCown, F., Nelson, M.L.: Characterization of search engine caches. In: Proceedings of IS&T Archiving 2007, pp. 48–52 (2007). (Also available as arXiv:cs/0703083v2) McCown, F., Nelson, M.L.: Characterization of search engine caches. In: Proceedings of IS&T Archiving 2007, pp. 48–52 (2007). (Also available as arXiv:​cs/​0703083v2)
21.
go back to reference Mohr, G., Stack, M., Rnitovic, I., Avery, D., Kimpton, M.: Introduction to Heritrix, an archival quality web crawler. In: Proceedings of IWAW’04 (2004) Mohr, G., Stack, M., Rnitovic, I., Avery, D., Kimpton, M.: Introduction to Heritrix, an archival quality web crawler. In: Proceedings of IWAW’04 (2004)
24.
25.
go back to reference Spaniol, M., Mazeika, A., Denev, D., Weikum, G.: Catch me if you can: visual analysis of coherence defects in web archiving. In: Proceedings of IWAW’09, pp. 27–37 (2009) Spaniol, M., Mazeika, A., Denev, D., Weikum, G.: Catch me if you can: visual analysis of coherence defects in web archiving. In: Proceedings of IWAW’09, pp. 27–37 (2009)
28.
go back to reference Tofel, B.: ‘Wayback’ for accessing web archives. In: Proceedings of IWAW’07) (2007) Tofel, B.: ‘Wayback’ for accessing web archives. In: Proceedings of IWAW’07) (2007)
30.
go back to reference Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., Ainsworth, S., Shankar, H.: Memento: time travel for the Web. Tech. Rep. arXiv:0911.1112 (2009) Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., Ainsworth, S., Shankar, H.: Memento: time travel for the Web. Tech. Rep. arXiv:​0911.​1112 (2009)
31.
go back to reference Van de Sompel, H., Sanderson, R., Nelson, M., Balakireva, L., Shankar, H., Ainsworth, S.: An HTTP-based versioning mechanism for linked data. In: Proceedings of LDOW’10 (2010). arXiv:1003:3661 Van de Sompel, H., Sanderson, R., Nelson, M., Balakireva, L., Shankar, H., Ainsworth, S.: An HTTP-based versioning mechanism for linked data. In: Proceedings of LDOW’10 (2010). arXiv:​1003:​3661
Metadata
Title
Evaluating sliding and sticky target policies by measuring temporal drift in acyclic walks through a web archive
Authors
Scott G. Ainsworth
Michael L. Nelson
Publication date
01-06-2015
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Digital Libraries / Issue 2/2015
Print ISSN: 1432-5012
Electronic ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-014-0120-4

Other articles of this Issue 2/2015

International Journal on Digital Libraries 2/2015 Go to the issue

Premium Partner