Skip to main content
Top
Published in: International Journal on Digital Libraries 3-4/2015

01-09-2015

Bridging the gap between real world repositories and scalable preservation environments

Authors: Bolette Ammitzbøll Jurik, Asger Askov Blekinge, Rune Bruun Ferneke-Nielsen, Per Møldrup-Dalum

Published in: International Journal on Digital Libraries | Issue 3-4/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Integrating large-scale processing environments, such as Hadoop, with traditional repository systems, such as Fedora Commons 3, has long proved to be a daunting task. In this paper, we will show how this integration can be achieved using software developed in the scalable preservation environments (SCAPE) project, and also how it can be achieved using a local more direct implementation at the Danish State and University Library inspired by the SCAPE project. Both allow full use of the Hadoop system for massively distributed processing without causing excessive load on the repository. We present a proof of concept SCAPE integration and an in-production local integration based on repository systems at the Danish State and University Library and the Hadoop execution environment. Both use data from the Newspaper Digitisation Project, a collection that will grow to more than 32 million JP2 images. The use case for the SCAPE integration is to perform feature extraction and validation of the JP2 images. The validation is done against an institutional preservation policy expressed in the machine readable SCAPE Control Policy vocabulary. The feature extraction is done using the Jpylyzer tool. We perform an experiment with various-sized sets of JP2 images, to test the scalability and correctness of the solution. The first use case considered from the local Danish State and University Library integration is also feature extraction and validation of the JP2 images, this time using Jpylyzer and Schematron requirements translated from the project specification by hand. We further look at two other use cases: generation of histograms of the tonal distributions of the images; and generation of dissemination copies. We discuss the challenges and benefits of the two integration approaches when having to perform preservation actions on massive collections stored in traditional digital repositories.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
4.
go back to reference Asseg, F., Razum, M., Hahn, M.: Apache hadoop as a storage backend for fedora commons. In: OR2012, The 7th International Conference on Open Repositories, Edinburgh. http://or2012.ed.ac.uk/ (2012) Asseg, F., Razum, M., Hahn, M.: Apache hadoop as a storage backend for fedora commons. In: OR2012, The 7th International Conference on Open Repositories, Edinburgh. http://​or2012.​ed.​ac.​uk/​ (2012)
9.
go back to reference CCSDS Secretariat: Audit and certification of Trustworthy Digital Repositories, Recommended Practice, CCSDS 652.0-M-1, issue 1 edn. CCSDS Secretariat (2011). (Magenta Book) CCSDS Secretariat: Audit and certification of Trustworthy Digital Repositories, Recommended Practice, CCSDS 652.0-M-1, issue 1 edn. CCSDS Secretariat (2011). (Magenta Book)
12.
go back to reference Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
30.
go back to reference Jurik, B., Blekinge, A., Ferneke-Nielsen, R., Møldrup-Dalum, P.: Bridging the gap between real world repositories and scalable preservation environments. In: Proceedings Digital Libraries 2014: conjoined conference for both the IEEE/ACM Joint Conference on Digital Libraries and the Theory and Practice of Digital Libraries Conference series (2014) Jurik, B., Blekinge, A., Ferneke-Nielsen, R., Møldrup-Dalum, P.: Bridging the gap between real world repositories and scalable preservation environments. In: Proceedings Digital Libraries 2014: conjoined conference for both the IEEE/ACM Joint Conference on Digital Libraries and the Theory and Practice of Digital Libraries Conference series (2014)
33.
go back to reference Kraxner, M., Plangg, M., Duretec, K., Becker, C., Faria, L.: The scape planning and watch suite—supporting the preservation lifecycle in repositories. In: IPRES 2013—Proceedings of the 10th International Conference on Preservation of Digital Objects (2013) Kraxner, M., Plangg, M., Duretec, K., Becker, C., Faria, L.: The scape planning and watch suite—supporting the preservation lifecycle in repositories. In: IPRES 2013—Proceedings of the 10th International Conference on Preservation of Digital Objects (2013)
48.
go back to reference Sierman, B., Jones, C., Bechhofer, S., Elstrøm, G.: Preservation policy levels in scape. In: iPRES 2013—Proceedings of the 10th International Conference on Preservation of Digital Objects (2013) Sierman, B., Jones, C., Bechhofer, S., Elstrøm, G.: Preservation policy levels in scape. In: iPRES 2013—Proceedings of the 10th International Conference on Preservation of Digital Objects (2013)
Metadata
Title
Bridging the gap between real world repositories and scalable preservation environments
Authors
Bolette Ammitzbøll Jurik
Asger Askov Blekinge
Rune Bruun Ferneke-Nielsen
Per Møldrup-Dalum
Publication date
01-09-2015
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Digital Libraries / Issue 3-4/2015
Print ISSN: 1432-5012
Electronic ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-015-0152-4

Other articles of this Issue 3-4/2015

International Journal on Digital Libraries 3-4/2015 Go to the issue

Premium Partner