Skip to main content
Erschienen in: International Journal on Digital Libraries 2/2016

01.06.2016

The impact of JavaScript on archivability

verfasst von: Justin F. Brunelle, Mat Kelly, Michele C. Weigle, Michael L. Nelson

Erschienen in: International Journal on Digital Libraries | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As web technologies evolve, web archivists work to adapt so that digital history is preserved. Recent advances in web technologies have introduced client-side executed scripts (Ajax) that, for example, load data without a change in top level Universal Resource Identifier (URI) or require user interaction (e.g., content loading via Ajax when the page has scrolled). These advances have made automating methods for capturing web pages more difficult. In an effort to understand why mementos (archived versions of live resources) in today’s archives vary in completeness and sometimes pull content from the live web, we present a study of web resources and archival tools. We used a collection of URIs shared over Twitter and a collection of URIs curated by Archive-It in our investigation. We created local archived versions of the URIs from the Twitter and Archive-It sets using WebCite, wget, and the Heritrix crawler. We found that only 4.2 % of the Twitter collection is perfectly archived by all of these tools, while 34.2 % of the Archive-It collection is perfectly archived. After studying the quality of these mementos, we identified the practice of loading resources via JavaScript (Ajax) as the source of archival difficulty. Further, we show that resources are increasing their use of JavaScript to load embedded resources. By 2012, over half (54.5 %) of pages use JavaScript to load embedded resources. The number of embedded resources loaded via JavaScript has increased by 12.0 % from 2005 to 2012. We also show that JavaScript is responsible for 33.2 % more missing resources in 2012 than in 2005. This shows that JavaScript is responsible for an increasing proportion of the embedded resources unsuccessfully loaded by mementos. JavaScript is also responsible for 52.7 % of all missing embedded resources in our study.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Ainsworth, S., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the Web is archived? In: Proceedings of the 2011 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 133–136 (2011). doi:10.1145/1998076.1998100 Ainsworth, S., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the Web is archived? In: Proceedings of the 2011 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 133–136 (2011). doi:10.​1145/​1998076.​1998100
3.
Zurück zum Zitat Antoniades, D., Polakis, I., Kontaxis, G., Athanasopoulos, E., Ioannidis, S., Markatos, E.P., Karagiannis, T.: we.b: the web of short URLs. In: Proceedings of the 20th International Conference on World Wide Web, WWW ’11, pp. 715–724 (2011). doi:10.1145/1963405.1963505 Antoniades, D., Polakis, I., Kontaxis, G., Athanasopoulos, E., Ioannidis, S., Markatos, E.P., Karagiannis, T.: we.b: the web of short URLs. In: Proceedings of the 20th International Conference on World Wide Web, WWW ’11, pp. 715–724 (2011). doi:10.​1145/​1963405.​1963505
6.
Zurück zum Zitat Banos, V., Yunhyong, K., Ross, S., Manolopoulos, Y.: CLEAR: a credible method to evaluate website archivability. In: Proceedings of the 9th International Conference on Preservation of Digital Objects (2013) Banos, V., Yunhyong, K., Ross, S., Manolopoulos, Y.: CLEAR: a credible method to evaluate website archivability. In: Proceedings of the 9th International Conference on Preservation of Digital Objects (2013)
7.
Zurück zum Zitat Benjamin, K., von Bochmann, G., Dincturk, M., Jourdan, G.V., Onut, I.: A strategy for efficient crawling of rich internet applications. In: Proceedings of Web Engineering, Lecture Notes in Computer Science, vol. 6757, pp. 74–89. Springer, Berlin (2011). doi:10.1007/978-3-642-22233-7_6 Benjamin, K., von Bochmann, G., Dincturk, M., Jourdan, G.V., Onut, I.: A strategy for efficient crawling of rich internet applications. In: Proceedings of Web Engineering, Lecture Notes in Computer Science, vol. 6757, pp. 74–89. Springer, Berlin (2011). doi:10.​1007/​978-3-642-22233-7_​6
8.
Zurück zum Zitat Benson, E., Marcus, A., Karger, D., Madden, S.: Sync kit: a persistent client-side database caching toolkit for data intensive websites. In: Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pp. 121–130 (2010). doi:10.1145/1772690.1772704 Benson, E., Marcus, A., Karger, D., Madden, S.: Sync kit: a persistent client-side database caching toolkit for data intensive websites. In: Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pp. 121–130 (2010). doi:10.​1145/​1772690.​1772704
13.
Zurück zum Zitat Brunelle, J.F., Kelly, M., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: Not all mementos are created equal: measuring the impact of missing resources. In: Proceedings of the 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 321–330 (2014). doi:10.1109/JCDL.2014.6970187 Brunelle, J.F., Kelly, M., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: Not all mementos are created equal: measuring the impact of missing resources. In: Proceedings of the 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 321–330 (2014). doi:10.​1109/​JCDL.​2014.​6970187
14.
Zurück zum Zitat Brunelle, J.F., Kelly, M., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: Not all mementos are created equal: measuring the impact of missing resources. Int. J. Digit. Libr. (2014) (accepted for publication) Brunelle, J.F., Kelly, M., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: Not all mementos are created equal: measuring the impact of missing resources. Int. J. Digit. Libr. (2014) (accepted for publication)
16.
Zurück zum Zitat Chakrabarti, S., Srivastava, S., Subramanyam, M., Tiwari, M.: Memex: A browsing assistant for collaborative archiving and mining of surf trails. In: Proceedings of the 26th VLDB Conference, 26th VLDB (2000) Chakrabarti, S., Srivastava, S., Subramanyam, M., Tiwari, M.: Memex: A browsing assistant for collaborative archiving and mining of surf trails. In: Proceedings of the 26th VLDB Conference, 26th VLDB (2000)
18.
Zurück zum Zitat Crook, E.: Web archiving in a Web 2.0 world. In: Proceedings of the Australian Library and Information Association Biennial Conference, pp. 1–9 (2008) Crook, E.: Web archiving in a Web 2.0 world. In: Proceedings of the Australian Library and Information Association Biennial Conference, pp. 1–9 (2008)
20.
Zurück zum Zitat Dincturk, M.E., Jourdan, G.V., Bochmann, G.V., Onut, I.V.: A model-based approach for crawling rich internet applications. ACM Trans. Web 8(3), 19:1–19:39 (2014). doi:10.1145/2626371 CrossRef Dincturk, M.E., Jourdan, G.V., Bochmann, G.V., Onut, I.V.: A model-based approach for crawling rich internet applications. ACM Trans. Web 8(3), 19:1–19:39 (2014). doi:10.​1145/​2626371 CrossRef
21.
Zurück zum Zitat Duda, C., Frey, G., Kossmann, D., Zhou, C.: AjaxSearch: crawling, indexing and searching Web 2.0 applications. In: The Proceedings of the Very Large Database Endowment (VLDB) Endowment (PVLDB) 1, 1440–1443 (2008). doi:10.14778/1454159.1454195 Duda, C., Frey, G., Kossmann, D., Zhou, C.: AjaxSearch: crawling, indexing and searching Web 2.0 applications. In: The Proceedings of the Very Large Database Endowment (VLDB) Endowment (PVLDB) 1, 1440–1443 (2008). doi:10.​14778/​1454159.​1454195
22.
Zurück zum Zitat Eysenbach, G., Trudel, M.: Going, going, still there: using the WebCite service to permanently archive cited web pages. J. Med. Internet Res. 7(5) (2005). doi:10.2196/jmir.7.5.e60 Eysenbach, G., Trudel, M.: Going, going, still there: using the WebCite service to permanently archive cited web pages. J. Med. Internet Res. 7(5) (2005). doi:10.​2196/​jmir.​7.​5.​e60
25.
Zurück zum Zitat Flanagan, D.: JavaScript: the definitive guide. O’Reilly Media (2001) Flanagan, D.: JavaScript: the definitive guide. O’Reilly Media (2001)
30.
Zurück zum Zitat Hackett, S., Parmanto, B., Zeng, X.: Accessibility of internet websites through time. In: Proceedings of the 6th International ACM SIGACCESS Conference on Computers and Accessibility, (77–78), pp. 32–39 (2003). doi:10.1145/1029014.1028638 Hackett, S., Parmanto, B., Zeng, X.: Accessibility of internet websites through time. In: Proceedings of the 6th International ACM SIGACCESS Conference on Computers and Accessibility, (77–78), pp. 32–39 (2003). doi:10.​1145/​1029014.​1028638
33.
Zurück zum Zitat Kelly, M., Brunelle, J.F., Weigle, M.C., Nelson, M.L.: On the change in archivability of websites over time. In: Proceedings of the Third International Conference on Theory and Practice of Digital Libraries, pp. 35–47 (2013). doi:10.1007/978-3-642-40501-3_5 Kelly, M., Brunelle, J.F., Weigle, M.C., Nelson, M.L.: On the change in archivability of websites over time. In: Proceedings of the Third International Conference on Theory and Practice of Digital Libraries, pp. 35–47 (2013). doi:10.​1007/​978-3-642-40501-3_​5
34.
Zurück zum Zitat Kelly, M., Nelson, M.L., Weigle, M.C.: The archival acid test: evaluating archive performance on advanced HTML and JavaScript. In: Proceedings of the 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 25–28 (2014). doi:10.1109/JCDL.2014.6970146 Kelly, M., Nelson, M.L., Weigle, M.C.: The archival acid test: evaluating archive performance on advanced HTML and JavaScript. In: Proceedings of the 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 25–28 (2014). doi:10.​1109/​JCDL.​2014.​6970146
35.
Zurück zum Zitat Kenney, A.R., McGovern, N.Y., Botticelli, P., Entlich, R., Lagoze, C., Payette, S.: Preservation risk management for web resources. D-Lib Mag. 8(1) (2002). doi:10.1045/january2002-kenney Kenney, A.R., McGovern, N.Y., Botticelli, P., Entlich, R., Lagoze, C., Payette, S.: Preservation risk management for web resources. D-Lib Mag. 8(1) (2002). doi:10.​1045/​january2002-kenney
36.
Zurück zum Zitat Kiciman, E., Livshits, B.: AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications. In: Proceedings of The 21st ACM Symposium on Operating Systems Principles, SOSP ’07 (2007). doi:10.1145/1841909.1841910 Kiciman, E., Livshits, B.: AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications. In: Proceedings of The 21st ACM Symposium on Operating Systems Principles, SOSP ’07 (2007). doi:10.​1145/​1841909.​1841910
37.
Zurück zum Zitat Vikram, K., Prateek, A., Livshits, B.: Ripley: Automatically securing web 2.0 applications through replicated execution. In: Proceedings of the Conference on Computer and Communications Security (2009) Vikram, K., Prateek, A., Livshits, B.: Ripley: Automatically securing web 2.0 applications through replicated execution. In: Proceedings of the Conference on Computer and Communications Security (2009)
38.
Zurück zum Zitat Likarish, P., Jung, E.: A targeted web crawling for building malicious javascript collection. In: Proceedings of the ACM First International Workshop on Data-Intensive Software Management and Mining, DSMM ’09, pp. 23–26. ACM, New York (2009). doi:10.1145/1651309.1651317 Likarish, P., Jung, E.: A targeted web crawling for building malicious javascript collection. In: Proceedings of the ACM First International Workshop on Data-Intensive Software Management and Mining, DSMM ’09, pp. 23–26. ACM, New York (2009). doi:10.​1145/​1651309.​1651317
39.
Zurück zum Zitat Livshits, B., Guarnieri, S.: Gulfstream: incremental static analysis for streaming JavaScript applications. In: Proceedings of Technical Report MSR-TR-2010-4, Microsoft (2010) Livshits, B., Guarnieri, S.: Gulfstream: incremental static analysis for streaming JavaScript applications. In: Proceedings of Technical Report MSR-TR-2010-4, Microsoft (2010)
40.
Zurück zum Zitat Marshall, C.C., Shipman, F.M.: On the institutional archiving of social media. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 1–10 (2012). doi:10.1145/2232817.2232819 Marshall, C.C., Shipman, F.M.: On the institutional archiving of social media. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 1–10 (2012). doi:10.​1145/​2232817.​2232819
42.
Zurück zum Zitat McCown, F., Diawara, N., Nelson, M.L.: Factors affecting website reconstruction from the web infrastructure. In: JCDL ’07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 39–48 (2007). doi:10.1145/1255175.1255182 McCown, F., Diawara, N., Nelson, M.L.: Factors affecting website reconstruction from the web infrastructure. In: JCDL ’07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 39–48 (2007). doi:10.​1145/​1255175.​1255182
45.
Zurück zum Zitat Mesbah, A., Bozdag, E., van Deursen, A.: Crawling Ajax by inferring user interface state changes. In: Proceedings of Web Engineering, 2008. ICWE ’08. Eighth International Conference, pp. 122–134 (2008). doi:10.1109/ICWE.2008.24 Mesbah, A., Bozdag, E., van Deursen, A.: Crawling Ajax by inferring user interface state changes. In: Proceedings of Web Engineering, 2008. ICWE ’08. Eighth International Conference, pp. 122–134 (2008). doi:10.​1109/​ICWE.​2008.​24
46.
Zurück zum Zitat Mesbah, A., van Deursen, A.: An architectural style for ajax. In: Proceedings of Software Architecture, Working IEEE/IFIP Conference, pp. 1–9 (2007). doi:10.1109/WICSA.2007.7 Mesbah, A., van Deursen, A.: An architectural style for ajax. In: Proceedings of Software Architecture, Working IEEE/IFIP Conference, pp. 1–9 (2007). doi:10.​1109/​WICSA.​2007.​7
47.
Zurück zum Zitat Mesbah, A., van Deursen, A.: Migrating multi-page web applications to single-page ajax interfaces. In: Proceedings of the 11th European Conference on Software Maintenance and Reengineering, CSMR ’07, pp. 181–190. IEEE Computer Society, Washington, DC, USA (2007). doi:10.1109/CSMR.2007.33 Mesbah, A., van Deursen, A.: Migrating multi-page web applications to single-page ajax interfaces. In: Proceedings of the 11th European Conference on Software Maintenance and Reengineering, CSMR ’07, pp. 181–190. IEEE Computer Society, Washington, DC, USA (2007). doi:10.​1109/​CSMR.​2007.​33
48.
Zurück zum Zitat Mesbah, A., van Deursen, A., Lenselink, S.: Crawling ajax-based web applications through dynamic analysis of user interface state changes. ACM Trans. Web 6(1), 3:1–3:30 (2012). doi:10.1145/2109205.2109208 CrossRef Mesbah, A., van Deursen, A., Lenselink, S.: Crawling ajax-based web applications through dynamic analysis of user interface state changes. ACM Trans. Web 6(1), 3:1–3:30 (2012). doi:10.​1145/​2109205.​2109208 CrossRef
49.
Zurück zum Zitat Meyerovich, L.A., Livshits, B.: Conscript: Specifying and enforcing fine-grained security policies for javascript in the browser. In: Proceedings of the 2010 IEEE Symposium on Security and Privacy, SP ’10, pp. 481–496. IEEE Computer Society, Washington, DC, USA (2010). doi:10.1109/SP.2010.36 Meyerovich, L.A., Livshits, B.: Conscript: Specifying and enforcing fine-grained security policies for javascript in the browser. In: Proceedings of the 2010 IEEE Symposium on Security and Privacy, SP ’10, pp. 481–496. IEEE Computer Society, Washington, DC, USA (2010). doi:10.​1109/​SP.​2010.​36
50.
Zurück zum Zitat Mickens, J., Elson, J., Howell, J.: Mugshot: deterministic capture and replay for JavaScript applications. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, NSDI’10, pp. 159–173 (2010) Mickens, J., Elson, J., Howell, J.: Mugshot: deterministic capture and replay for JavaScript applications. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, NSDI’10, pp. 159–173 (2010)
51.
Zurück zum Zitat Mohr, G., Kimpton, M., Stack, M., Ranitovic, I.: Introduction to Heritrix, an archival quality web crawler. In: Proceedings of the 4th International Web Archiving Workshop (2004) Mohr, G., Kimpton, M., Stack, M., Ranitovic, I.: Introduction to Heritrix, an archival quality web crawler. In: Proceedings of the 4th International Web Archiving Workshop (2004)
58.
Zurück zum Zitat Pierce, M.E., Fox, G., Yuan, H., Deng, Y.: Cyberinfrastructure and web 2.0. In: Proceedings of High Performance Computing and Grids in Action, pp. 265–287 (2008) Pierce, M.E., Fox, G., Yuan, H., Deng, Y.: Cyberinfrastructure and web 2.0. In: Proceedings of High Performance Computing and Grids in Action, pp. 265–287 (2008)
63.
Zurück zum Zitat SalahEldeen, H.M., Nelson, M.L.: Losing my revolution: how many resources shared on social media have been lost? In: Proceedings of the Second international conference on Theory and Practice of Digital Libraries, pp. 125–137 (2012). doi:10.1007/978-3-642-33290-6_14 SalahEldeen, H.M., Nelson, M.L.: Losing my revolution: how many resources shared on social media have been lost? In: Proceedings of the Second international conference on Theory and Practice of Digital Libraries, pp. 125–137 (2012). doi:10.​1007/​978-3-642-33290-6_​14
64.
Zurück zum Zitat SalahEldeen, H.M., Nelson, M.L.: Resurrecting my revolution: using social link neighborhood in bringing context to the disappearing web. In: Proceedings of the Third International Conference on Theory and Practice of Digital Libraries, pp. 333–345 (2013). doi:10.1007/978-3-642-40501-3_34 SalahEldeen, H.M., Nelson, M.L.: Resurrecting my revolution: using social link neighborhood in bringing context to the disappearing web. In: Proceedings of the Third International Conference on Theory and Practice of Digital Libraries, pp. 333–345 (2013). doi:10.​1007/​978-3-642-40501-3_​34
65.
Zurück zum Zitat Sigursson, K.: Incremental crawling with Heritrix. In: Proceedings of the 5th International Web Archiving Workshop (2005) Sigursson, K.: Incremental crawling with Heritrix. In: Proceedings of the 5th International Web Archiving Workshop (2005)
67.
Zurück zum Zitat Tofel, B.: ‘Wayback’ for accessing web archives. In: Proceedings of the 7th International Web Archiving Workshop (2007) Tofel, B.: ‘Wayback’ for accessing web archives. In: Proceedings of the 7th International Web Archiving Workshop (2007)
68.
Zurück zum Zitat Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., Ainsworth, S., Shankar, H.: Memento: time travel for the web. In: Proceedings of Technical Report, Los Alamos National Laboratory (2009). arXiv:0911.1112 Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., Ainsworth, S., Shankar, H.: Memento: time travel for the web. In: Proceedings of Technical Report, Los Alamos National Laboratory (2009). arXiv:​0911.​1112
Metadaten
Titel
The impact of JavaScript on archivability
verfasst von
Justin F. Brunelle
Mat Kelly
Michele C. Weigle
Michael L. Nelson
Publikationsdatum
01.06.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Digital Libraries / Ausgabe 2/2016
Print ISSN: 1432-5012
Elektronische ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-015-0140-8

Premium Partner