Skip to main content
Erschienen in: International Journal on Digital Libraries 3/2017

09.05.2016

The evolution of web archiving

verfasst von: Miguel Costa, Daniel Gomes, Mário J. Silva

Erschienen in: International Journal on Digital Libraries | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Web archives preserve information published on the web or digitized from printed publications. Much of this information is unique and historically valuable. However, the lack of knowledge about the global status of web archiving initiatives hamper their improvement and collaboration. To overcome this problem, we conducted two surveys, in 2010 and 2014, which provide a comprehensive characterization on web archiving initiatives and their evolution. We identified several patterns and trends that highlight challenges and opportunities. We discuss these patterns and trends that enable to define strategies, estimate resources and provide guidelines for research and development of better technology. Our results show that during the last years there was a significant growth in initiatives and countries hosting these initiatives, volume of data and number of contents preserved. While this indicates that the web archiving community is dedicating a growing effort on preserving digital information, other results presented throughout the paper raise concerns such as the small amount of archived data in comparison with the amount of data that is being published online.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ntoulas, A., Cho, J., Olston, C.: What’s new on the web? The evolution of the web from a search engine perspective. In: Proc. of the 13th International Conference on World Wide Web, pp. 1–12 (2004) Ntoulas, A., Cho, J., Olston, C.: What’s new on the web? The evolution of the web from a search engine perspective. In: Proc. of the 13th International Conference on World Wide Web, pp. 1–12 (2004)
2.
Zurück zum Zitat Dellavalle, R., Hester, E., Heilig, L., Drake, A., Kuntzman, J., Graber, M., Schilling, L.: Going, going, gone: lost internet references. Science 302(5646), 787–788 (2003)CrossRef Dellavalle, R., Hester, E., Heilig, L., Drake, A., Kuntzman, J., Graber, M., Schilling, L.: Going, going, gone: lost internet references. Science 302(5646), 787–788 (2003)CrossRef
3.
Zurück zum Zitat SalahEldeen, H., Nelson, M.: Losing my revolution: how many resources shared on social media have been lost? In: Theory and Practice of Digital Libraries, pp. 125–137 (2012) SalahEldeen, H., Nelson, M.: Losing my revolution: how many resources shared on social media have been lost? In: Theory and Practice of Digital Libraries, pp. 125–137 (2012)
6.
Zurück zum Zitat Kitsuregawa, M., Tamura, T., Toyoda, M., Kaji, N.: Socio-sense: a system for analysing the societal behavior from long term web archive. In: Proc. of the 10th Asia-Pacific Web Conference on Progress in WWW Research and Development, pp. 1–8 (2008) Kitsuregawa, M., Tamura, T., Toyoda, M., Kaji, N.: Socio-sense: a system for analysing the societal behavior from long term web archive. In: Proc. of the 10th Asia-Pacific Web Conference on Progress in WWW Research and Development, pp. 1–8 (2008)
7.
Zurück zum Zitat Arms, W.Y., Aya, S., Dmitriev, P., Kot, B., Mitchell, R., Walle, L.: A research library based on the historical collections of the Internet Archive. D-Lib Mag. 12(2) (2006) Arms, W.Y., Aya, S., Dmitriev, P., Kot, B., Mitchell, R., Walle, L.: A research library based on the historical collections of the Internet Archive. D-Lib Mag. 12(2) (2006)
8.
Zurück zum Zitat Arms, W., Huttenlocher, D., Kleinberg, J., Macy, M., Strang, D.: From Wayback Machine to Yesternet: new opportunities for social science. In: Proc. of the 2nd International Conference on e-Social Science (2006) Arms, W., Huttenlocher, D., Kleinberg, J., Macy, M., Strang, D.: From Wayback Machine to Yesternet: new opportunities for social science. In: Proc. of the 2nd International Conference on e-Social Science (2006)
9.
Zurück zum Zitat Ackland, R.: Virtual observatory for the study of online networks (VOSON)—progress and plans. In: Proc. of the 1st International Conference on e-Social Science (2005) Ackland, R.: Virtual observatory for the study of online networks (VOSON)—progress and plans. In: Proc. of the 1st International Conference on e-Social Science (2005)
10.
Zurück zum Zitat Foot, K., Schneider, S.: Web Campaigning. The MIT Press, Cambridge (2006) Foot, K., Schneider, S.: Web Campaigning. The MIT Press, Cambridge (2006)
11.
Zurück zum Zitat Franklin, M.: Postcolonial Politics, the Internet, and Everyday Life: Pacific Traversals Online. Routledge (2004) Franklin, M.: Postcolonial Politics, the Internet, and Everyday Life: Pacific Traversals Online. Routledge (2004)
12.
Zurück zum Zitat Gomes, D., Costa, M.: The importance of web archives for humanities. Int. J. Humanit. Arts Comput. 8(1), 106–123 (2014)CrossRef Gomes, D., Costa, M.: The importance of web archives for humanities. Int. J. Humanit. Arts Comput. 8(1), 106–123 (2014)CrossRef
13.
Zurück zum Zitat Yamamoto, Y., Tezuka, T., Jatowt, A., Tanaka, K.: Honto? Search: estimating trustworthiness of web information by search results aggregation and temporal analysis. In: Advances in Data and Web Management, pp. 253–264 (2007) Yamamoto, Y., Tezuka, T., Jatowt, A., Tanaka, K.: Honto? Search: estimating trustworthiness of web information by search results aggregation and temporal analysis. In: Advances in Data and Web Management, pp. 253–264 (2007)
14.
Zurück zum Zitat Chung, Y., Toyoda, M., Kitsuregawa, M.: A study of link farm distribution and evolution using a time series of web snapshots. In: Proc. of the 5th International Workshop on Adversarial Information Retrieval on the Web, pp. 9–16 (2009) Chung, Y., Toyoda, M., Kitsuregawa, M.: A study of link farm distribution and evolution using a time series of web snapshots. In: Proc. of the 5th International Workshop on Adversarial Information Retrieval on the Web, pp. 9–16 (2009)
15.
Zurück zum Zitat Elsas, J., Dumais, S.: Leveraging temporal dynamics of document content in relevance ranking. In: Proc. of the 3rd ACM International Conference on Web Search and Data Mining, pp. 1–10 (2010) Elsas, J., Dumais, S.: Leveraging temporal dynamics of document content in relevance ranking. In: Proc. of the 3rd ACM International Conference on Web Search and Data Mining, pp. 1–10 (2010)
16.
Zurück zum Zitat Radinsky, K., Horvitz, E.: Mining the web to predict future events. In: Proc. of the 6th ACM International Conference on Web Search and Data Mining, pp. 255–264 (2013) Radinsky, K., Horvitz, E.: Mining the web to predict future events. In: Proc. of the 6th ACM International Conference on Web Search and Data Mining, pp. 255–264 (2013)
17.
Zurück zum Zitat Gomes, D., Miranda, J., Costa, M.: A survey on web archiving initiatives. In: Proc. of the International Conference on Theory and Practice of Digital Libraries, pp. 408–420 (2011) Gomes, D., Miranda, J., Costa, M.: A survey on web archiving initiatives. In: Proc. of the International Conference on Theory and Practice of Digital Libraries, pp. 408–420 (2011)
18.
Zurück zum Zitat Costa, M., Couto, F.M., Silva, M.J.: Learning temporal-dependent ranking models. In: Proc. of the 37th Annual ACM SIGIR Conference (2014) Costa, M., Couto, F.M., Silva, M.J.: Learning temporal-dependent ranking models. In: Proc. of the 37th Annual ACM SIGIR Conference (2014)
21.
Zurück zum Zitat Grotke, A.: IIPC—2008 member profile survey results. Technical report, International Internet Preservation Consortium (IIPC) (2008) Grotke, A.: IIPC—2008 member profile survey results. Technical report, International Internet Preservation Consortium (IIPC) (2008)
22.
Zurück zum Zitat Klein, M., Van de Sompel, H., Sanderson, R., Shankar, H., Balakireva, L., Zhou, K., Tobin, R.: Scholarly context not found: one in five articles suffers from reference rot. PloS One 9(12), 1–39 (2014) Klein, M., Van de Sompel, H., Sanderson, R., Shankar, H., Balakireva, L., Zhou, K., Tobin, R.: Scholarly context not found: one in five articles suffers from reference rot. PloS One 9(12), 1–39 (2014)
23.
Zurück zum Zitat Lazun, M.J.: “Link Rot” and legal resources on the web: a 2013 analysis by the chesapeake digital preservation group. Technical Report, The Chesapeake Digital Preservation Group (2013) Lazun, M.J.: “Link Rot” and legal resources on the web: a 2013 analysis by the chesapeake digital preservation group. Technical Report, The Chesapeake Digital Preservation Group (2013)
24.
Zurück zum Zitat Tofel, B.: ‘Wayback’ for accessing web archives. In: Proc. of the 7th International Web Archiving Workshop (2007) Tofel, B.: ‘Wayback’ for accessing web archives. In: Proc. of the 7th International Web Archiving Workshop (2007)
25.
Zurück zum Zitat Jaffe, E., Kirkpatrick, S.: Architecture of the Internet Archive. In: Proc. of SYSTOR 2009: The Israeli Experimental Systems Conference, pp. 1–10 (2009) Jaffe, E., Kirkpatrick, S.: Architecture of the Internet Archive. In: Proc. of SYSTOR 2009: The Israeli Experimental Systems Conference, pp. 1–10 (2009)
26.
Zurück zum Zitat Internet Memory Foundation: Web archiving in Europe. Technical Report, Internet Memory Foundation (2010) Internet Memory Foundation: Web archiving in Europe. Technical Report, Internet Memory Foundation (2010)
27.
Zurück zum Zitat Niu, J.: Functionalities of web archives. D-Lib Mag. 18(3/4) (2012) Niu, J.: Functionalities of web archives. D-Lib Mag. 18(3/4) (2012)
28.
Zurück zum Zitat Ras, M., van Bussel, S.: Web archiving user survey. Technical Report, National Library of the Netherlands (Koninklijke Bibliotheek) (2007) Ras, M., van Bussel, S.: Web archiving user survey. Technical Report, National Library of the Netherlands (Koninklijke Bibliotheek) (2007)
29.
Zurück zum Zitat Costa, M., Silva, M.J.: Characterizing search behavior in web archives. In: Proc. of the 1st International Temporal Web Analytics Workshop, pp. 33–40 (2011) Costa, M., Silva, M.J.: Characterizing search behavior in web archives. In: Proc. of the 1st International Temporal Web Analytics Workshop, pp. 33–40 (2011)
30.
Zurück zum Zitat Costa, M., Silva, M.J.: Evaluating web archive search systems. In: Proc. of the 13th International Conference on Web Information Systems Engineering, pp. 440–454 (2012) Costa, M., Silva, M.J.: Evaluating web archive search systems. In: Proc. of the 13th International Conference on Web Information Systems Engineering, pp. 440–454 (2012)
31.
Zurück zum Zitat Thomas, A., Meyer, E.T., Dougherty, M., Van den Heuvel, C., Madsen, C., Wyatt, S.: Researcher engagement with web archives: challenges and opportunities for investment. Technical Report, Joint Information Systems Committee (JISC) (2010) Thomas, A., Meyer, E.T., Dougherty, M., Van den Heuvel, C., Madsen, C., Wyatt, S.: Researcher engagement with web archives: challenges and opportunities for investment. Technical Report, Joint Information Systems Committee (JISC) (2010)
32.
Zurück zum Zitat Spaniol, M., Masanès, J., Baeza-Yates, R.: The 5th temporal web analytics workshop (tempweb’15). In: Proc. of the Companion Publication of the 24th International Conference on World Wide Web, pp. 863–864 (2015) Spaniol, M., Masanès, J., Baeza-Yates, R.: The 5th temporal web analytics workshop (tempweb’15). In: Proc. of the Companion Publication of the 24th International Conference on World Wide Web, pp. 863–864 (2015)
33.
Zurück zum Zitat Spaniol, M., Masanès, J., Baeza-Yates, R.: The 4th temporal web analytics workshop (tempweb’14). In: Proc. of the Companion Publication of the 23rd International Conference on World Wide Web, pp. 863–864 (2014) Spaniol, M., Masanès, J., Baeza-Yates, R.: The 4th temporal web analytics workshop (tempweb’14). In: Proc. of the Companion Publication of the 23rd International Conference on World Wide Web, pp. 863–864 (2014)
34.
Zurück zum Zitat Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 497–506 (2009) Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 497–506 (2009)
35.
Zurück zum Zitat Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)MathSciNetCrossRefMATH Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)MathSciNetCrossRefMATH
36.
Zurück zum Zitat Matthews, M., Tolchinsky, P., Blanco, R., Atserias, J., Mika, P., Zaragoza, H.: Searching through time in the New York Times. In: Proc. of the 4th Workshop on Human–Computer Interaction and Information Retrieval, pp. 41–44 (2010) Matthews, M., Tolchinsky, P., Blanco, R., Atserias, J., Mika, P., Zaragoza, H.: Searching through time in the New York Times. In: Proc. of the 4th Workshop on Human–Computer Interaction and Information Retrieval, pp. 41–44 (2010)
37.
Zurück zum Zitat Adar, E., Dontcheva, M., Fogarty, J., Weld, D.S.: Zoetrope: interacting with the ephemeral web. In: Proc. of the 21st Annual ACM Symposium on User Interface Software and Technology, pp. 239–248 (2008) Adar, E., Dontcheva, M., Fogarty, J., Weld, D.S.: Zoetrope: interacting with the ephemeral web. In: Proc. of the 21st Annual ACM Symposium on User Interface Software and Technology, pp. 239–248 (2008)
38.
Zurück zum Zitat Teevan, J., Dumais, S., Liebling, D., Hughes, R.: Changing how people view changes on the web. In: Proc. of the 22nd Annual ACM Symposium on User Interface Software and Technology, pp. 237–246 (2009) Teevan, J., Dumais, S., Liebling, D., Hughes, R.: Changing how people view changes on the web. In: Proc. of the 22nd Annual ACM Symposium on User Interface Software and Technology, pp. 237–246 (2009)
40.
Zurück zum Zitat Weikum, G., Ntarmos, N., Spaniol, M., Triantafillou, P., Benczur, A.A., Kirkpatrick, S., Rigaux, P., Williamson, M.: Longitudinal analytics on web archive data: it’s about time! In: Proc. of the 5th Conference on Innovative Data Systems Research, pp. 199–202 (2011) Weikum, G., Ntarmos, N., Spaniol, M., Triantafillou, P., Benczur, A.A., Kirkpatrick, S., Rigaux, P., Williamson, M.: Longitudinal analytics on web archive data: it’s about time! In: Proc. of the 5th Conference on Innovative Data Systems Research, pp. 199–202 (2011)
41.
Zurück zum Zitat Huurdeman, H.C., Ben-David, A., Sammar, T.: Sprint methods for web archive research. In: Proc. of the 5th Annual ACM Web Science Conference, pp. 182–190 (2013) Huurdeman, H.C., Ben-David, A., Sammar, T.: Sprint methods for web archive research. In: Proc. of the 5th Annual ACM Web Science Conference, pp. 182–190 (2013)
42.
Zurück zum Zitat Risse, T., Peters, W.: ARCOMEM: from collect-all ARchives to COmmunity MEMories. In: Proc. of the 21st International Conference Companion on World Wide Web, pp. 275–278 (2012) Risse, T., Peters, W.: ARCOMEM: from collect-all ARchives to COmmunity MEMories. In: Proc. of the 21st International Conference Companion on World Wide Web, pp. 275–278 (2012)
43.
Zurück zum Zitat Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., Ainsworth, S., Shankar, H.: Memento: time travel for the web. CoRR (2009). arXiv:0911.1112 Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., Ainsworth, S., Shankar, H.: Memento: time travel for the web. CoRR (2009). arXiv:​0911.​1112
45.
Zurück zum Zitat NDSA Content Working Group: Web archiving survey report. Technical Report, National Digital Stewardship Alliance (2012) NDSA Content Working Group: Web archiving survey report. Technical Report, National Digital Stewardship Alliance (2012)
46.
Zurück zum Zitat Bailey, J., Grotke, A., Hanna, K., Hartman, C., McCain, E., Moffatt, C., Taylor, N.: Web archiving in the United States: a 2013 survey. Technical Report, National Digital Stewardship Alliance (2014) Bailey, J., Grotke, A., Hanna, K., Hartman, C., McCain, E., Moffatt, C., Taylor, N.: Web archiving in the United States: a 2013 survey. Technical Report, National Digital Stewardship Alliance (2014)
47.
Zurück zum Zitat Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the web is archived? In: Proc. of the 11th Annual International ACM/IEEE joint Conference on Digital Libraries, pp. 133–136 (2011) Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the web is archived? In: Proc. of the 11th Annual International ACM/IEEE joint Conference on Digital Libraries, pp. 133–136 (2011)
48.
Zurück zum Zitat AlSum, A., Weigle, M.C., Nelson, M.L., Van de Sompel, H.: Profiling web archive coverage for top-level domain and content language. Int. J. Digit. Libr. 14(3–4), 149–166 (2014)CrossRef AlSum, A., Weigle, M.C., Nelson, M.L., Van de Sompel, H.: Profiling web archive coverage for top-level domain and content language. Int. J. Digit. Libr. 14(3–4), 149–166 (2014)CrossRef
Metadaten
Titel
The evolution of web archiving
verfasst von
Miguel Costa
Daniel Gomes
Mário J. Silva
Publikationsdatum
09.05.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Digital Libraries / Ausgabe 3/2017
Print ISSN: 1432-5012
Elektronische ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-016-0171-9

Weitere Artikel der Ausgabe 3/2017

International Journal on Digital Libraries 3/2017 Zur Ausgabe

Premium Partner