Skip to main content

2016 | Buch

Digital Preservation Metadata for Practitioners

Implementing PREMIS

herausgegeben von: Angela Dappert, Rebecca Squire Guenther, Sébastien Peyrard

Verlag: Springer International Publishing

insite
SUCHEN

Über dieses Buch

This book begins with an introduction to fundamental issues related to digital preservation metadata before proceeding to in-depth coverage of issues concerning its practical use and implementation. It helps readers to understand which options need to be considered in specifying a digital preservation metadata profile to ensure it matches their individual content types, technical infrastructure, and organizational needs. Further, it provides practical guidance and examples, and raises important questions. It does not provide full-fledged implementation solutions, as such solutions can, by definition, only be specific to a given preservation context. As such, the book effectively bridges the gap between the formal specifications provided in a standard, such as the PREMIS Data Dictionary – a de-facto standard that defines the core metadata required by most preservation repositories – and specific implementations.

Anybody who needs to manage digital assets in any form with the intent of preserving them for an indefinite period of time will find this book a valuable resource. The PREMIS Data Dictionary provides a data model consisting of basic entities (objects, agents, events and rights) and basic properties (called “semantic units”) that describe them. The key challenge addressed is that of determining which information one needs to keep, together with one’s digital assets, so that they can be understood and used in the long-term – in other words, exactly which metadata one needs.

The book will greatly benefit beginners and current practitioners alike. It is equally targeted at digital preservation repository managers and metadata analysts who are responsible for digital preservation metadata, as it is at students in Library, Information and Archival Science degree programs or related fields. Further, it can be used at the conception stage of a digital preservation system or for self-auditing an existing system.

Inhaltsverzeichnis

Frontmatter
Chapter 1. An Introduction to Implementing Digital Preservation Metadata
Abstract
This chapter gives an end-to-end overview of the steps involved in determining what information one needs to keep, together with one’s digital assets, so that they can be understood and used in the long term. In other words, what digital preservation metadata is required, and how does one decide this? This includes risk and functional analysis to define the context-specific metadata requirements; applying best-practice frameworks, such as OAIS, PREMIS, or SPOT to choose and structure the required metadata; deriving a data model for a variety of content types, such as web archives, audio-visual materials, or e-books; determining the associated events, agent, rights, and computing environment information; choosing the best serialization method; combining multiple metadata standards; taking advantage of existing tools; and applying conformance considerations. The narrative links to the chapters in the book Digital Preservation Metadata for Practitioners-Implementing PREMIS.
Angela Dappert, Sébastien Peyrard, Rebecca Squire Guenther
Chapter 2. How to Develop a Digital Preservation Metadata Profile: Risk and Requirements Analysis
Abstract
There is no off-the-shelf solution when implementing preservation metadata. Standards such as the OAIS information model are a general guidance that lists the main information families that need to be expressed; closer to implementation, the PREMIS Data Dictionary provides core information elements that can accommodate a wide range of contexts, providing general implementation guidance. As such, these guidelines need to be tailored to specific needs so that the implemented preservation metadata supports all relevant requirements, making the most appropriate decisions in a constrained context. This chapter proposes important questions that help to break down the task into more manageable subtasks. Risk-oriented frameworks, such as the SPOT model, are efficient tools to start a requirement analysis for digital preservation metadata.
Sébastien Peyrard, Angela Dappert, Rebecca Squire Guenther
Chapter 3. An Introduction to the PREMIS Data Dictionary for Digital Preservation Metadata
Abstract
The PREMIS Data Dictionary for Preservation Metadata provides a comprehensive and widely implemented specification that is revised based on concrete experience and changing technological environments. In addition, it gives the preservation community a common data model for organizing and thinking about the information you need to preserve digital objects. It has become the de facto standard for preservation metadata and is built into many preservation repository systems, both open-source and commercial, such that essential preservation activities can be accomplished. This chapter reviews the development of PREMIS, now in version 3.0, its supporting maintenance activity, its goals, principles and scope, its relationship to OAIS, and it introduces the features of the Data Dictionary. As a shared community standard the PREMIS Data Dictionary is flexible, extensible, and provides for interoperability among repositories of digital objects, systems that support the preservation process, and data that are exchanged and reused.
Rebecca Squire Guenther, Angela Dappert, Sébastien Peyrard
Chapter 4. How to Develop a Digital Preservation Metadata Profile: Data Modeling
Abstract
Digital preservation metadata profiles vary because of different content types held in the repository, different functions performed on them, different organizational mandates and processes, different policies, different technical platforms, and other reasons. Because of this, one important step in their development is the definition of a logical data model. The logical data model declares the key context-specific entities for which metadata needs to be created, the relationships between them, and the specific metadata properties that should be captured for them. This chapter describes the principles of how to create a logical data model. Chapters 5 through 12 go on to present a number of case studies that illustrate how specific data model issues have been decided for different entity types, for different content types, such as web archives, audiovisual or e-book materials, and for different organization types.
Angela Dappert
Chapter 5. Digital Preservation Metadata Practice for Audio-Visual Materials
Abstract
Moving image objects can present unique preservation documentation needs due to the nature of their composition, creation, and renderability. PREMIS is well suited to support these needs in a variety of organizational and informational contexts. This chapter describes the structure and characteristics of digital moving image objects. It describes several use cases for preservation metadata, along with a description of approaches for expressing them using PREMIS. Finally, it provides a discussion of the implementation of PREMIS in different organizational environments.
Kara Van Malssen
Chapter 6. Digital Preservation Metadata Practice for Web Archives
Abstract
Twenty years after the pioneering experiments performed by Internet Archive and few national libraries, web archiving has become a common activity of many scientific, cultural, and heritage institutions. They are using a set of tools, generally open source, to identify, harvest, store, index, make available to end users, and preserve internet content over the long term. Institutions seeking to preserve web archives are however facing major challenges: not only the huge amount of collected data, but also the lack of fully reliable metadata, which are crucial to understand the web archives and inform future preservation actions upon them. Web archives are generally stored in container formats, notably the ARC file format and its successor, the WARC format—an ISO standard. Context and Provenance information, generated prior to or as part of the harvesting process, is stored in these container formats, but other metadata—especially information on the formats of the collected files—may be generated afterwards. To store and archive these assets in digital repositories, it is necessary to record and manage their metadata. Therefore, institutions need to make data and metadata modeling choices, which should be consistent not only with the design of their own repository and the kind and amount of data they have to preserve, but also with their conceptual view of the nature of web archives. This paper presents the choices and achievements of the National Library of France, called “container modeling”. It then compares it to the approaches of other members of the International Internet Preservation Consortium and to the projects of the New York Art Resources Consortium. It underlines how the different solutions are implemented with PREMIS and concludes with the use of format identification tools and metadata vocabularies for emulation strategies.
Clément Oury, Karl-Rainer Blumenthal, Sébastien Peyrard
Chapter 7. Digital Preservation Metadata Practice for E-Journals and E-Books
Abstract
There is no universally correct or uniform amount or type of preservation metadata to capture. E-books and e-journals present unique scalability challenges in the large variety of input formats a preservation agency will receive, the vast amount of published content, the need to manage a huge archive of billions of files, and the reality that sometimes articles and books are updated. What is more, e-books and e-journals have a complex content model that requires special attention. This chapter presents such challenges and illustrates them with the choices made in the Portico preservation service.
Amy Kirchhoff, Sheila M. Morrissey
Chapter 8. Digital Preservation Metadata Practice for Disk Image Access
Abstract
Many libraries, archives, and museums are now regularly acquiring, processing, and analyzing born-digital materials. Materials exist on a variety of source media, including flash drives, hard drives, floppy disks, and optical media. Extracting disk images (i.e., sector-by-sector copies of digital media) is an increasingly common practice. It can be essential to ensuring provenance, original order, and chain of custody. Disk images allow users to explore and interact with the original data without risk of permanent alteration. These replicas help institutions to safeguard against modifications to underlying data that can occur when a file system contained on a storage medium is mounted, or a bootable medium is powered up. Retention of disk images can substantially reduce preservation risks. Digital storage media become progressively difficult (or impossible) to read over time, due to “bit rot,” obsolescence of media, and reduced availability of devices to read them. Simply copying the allocated files off a disk and discarding the storage carrier, however, can be problematic. The ability to access and render the content of files can depend upon the presence of other data that resided on the disk. These dependencies are often not obvious upon first inspection and may only be discovered after the original medium is no longer readable or available. Disk images also enable a wide range of potential access approaches, including dynamic browsing of disk images (Misra S, Lee CA, Woods K (2014) A Web Service for File-Level Access to Disk Images. Code4Lib Journal, 25 [3]) and emulation of earlier computing platforms. Disk images often contain residual data, which may consist of previously hidden or deleted files (Redwine G, et al. in Born digital: guidance for donors, dealers, and archival repositories. Council on Library and Information Resources, Washington, 2013 [4]). Residual data can be valuable for scholars interested in learning about the context of creation. Traces of activities undertaken in the original environment—for example, identifying removable media connected to a host machine or finding contents of browser caches—can provide additional sources of information for researchers and facilitate the preservation of materials (Woods K, et al. in Proceedings of the 11th annual international ACM/IEEE joint conference on digital libraries, pp. 57–66, 2011 [5]). Digital forensic tools can be used to create disk images in a wide range of formats. These include raw files (such as those produced by the Unix tool dd). Quantifying successes and failures for many tools can require judgment calls by qualified digital curation professionals. Verifying a checksum for a file is a simple case; the checksums either match or are different. In the events described in the previous sections, however, the conditions for success are fuzzier. For example, fiwalk will often “successfully” complete whether or not it is able to extract a meaningful record of the contents of file system(s) on a disk image. Likewise, bulk_extractor will simply report items of interest it has discovered. Knowing whether this output is useful (and whether it has changed between separate executions of a given tool) depends on comparison of the output between the two runs, information not currently recorded in the PREMIS document. In the BitCurator implementation, events are often recorded as having completed, rather than as having succeeded, to avoid ambiguity. Future iterations of the implementation may include more nuanced descriptions of event outcomes.
Alexandra Chassanoff, Kam Woods, Christopher A. Lee
Chapter 9. Digital Preservation Metadata Practice for Archives
Abstract
In archives, the digital objects are not only born-digital and received from an electronic records management system; some instead are also digitized from analog originals. Image creation may be a way of preserving the analog object, and with this action, reducing the need of letting a researcher access the original analog artifacts. These newly created digital objects used for viewing are saved in the archive with metadata about the digitization actions added to them; thus, such digital objects become part of a digital preservation plan where the analog artifact will be the resource to use if the preservation action makes the digital copy corrupt. The digitized copy will not be considered as the new original and the analog artifact is used as evidence. These different kinds of digital objects found in the archives and the special characteristics in the information required as evidence impact the need and use of digital preservation metadata.
Karin Bredenberg
Chapter 10. Digital Preservation Metadata Practice for Computing Environments
Abstract
A digital object does not stand alone. We require a computing environment in order to render, interact with, or understand it. Over the long term, the computing environments that we use change dramatically so that the software, hardware, and formats that we once used are no longer widely available or even understood. Therefore, if we want to ensure the long-term usability of digital objects, it is necessary to either preserve their computing environments or at least bring together enough information so that the environment can be reconstructed or adapted to a changed world. Information that describes the components of the digital object’s computing environment is a key part of its preservation metadata. The need becomes even more acute as we strive to archive audiovisual files, web pages with JavaScript and Flash, office documents and spreadsheets that embed complex calculations, or research outputs with data and software. Fortunately, widespread use of emulators and virtual machines and improved focus on managing software dependencies give us options that we have not had in the past. Prompted by this growing demand, PREMIS version 3.0 (PREMIS Editorial Committee (2015) [1]) has changed the way computing environment information is recorded. The new approach greatly improves expressiveness and consistency. This chapter describes the basic concepts.
Angela Dappert, Adam Farquhar
Chapter 11. Implementing Event and Agent Metadata for Digital Preservation
Abstract
Event metadata is structured (human and machine readable) information that documents actions or activities that have happened and which relate to one or more objects that an organization is tasked with preserving. It is crucial to enabling the people, processes, and technologies involved in preserving digital objects to successfully preserve those objects. Event metadata is the glue that joins metadata about objects that are managed by an organization, to metadata about the people, systems, or software that interact with those objects while they are being managed. Events are defined in PREMIS as follows:
[An event is] an action that involves or impacts at least one Object or Agent associated with or known by the preservation repository. [1]
Event metadata is necessary for ensuring that there is evidence of interactions between digital objects and agents within digital preservation systems. This evidence can be used for many purposes including ensuring success in security and trustworthiness audits and in proving the provenance and authenticity of digital content preserved by an organization. Decisions about where and how to store event metadata are often dependent on the environment in which the preservation is being undertaken. While it can be important to store at least one copy of any event metadata alongside the data it pertains to, this can be avoided if the metadata storage systems have equally rigorous bit-preservation processes governing them. Tough decisions often need to be made by organizations implementing the capture of event metadata in order to ensure they don’t succumb to costly “metadata bloat.” Therefore, organizations need to consider what metadata is important to the long-term preservation of their content before they begin capturing and preserving unnecessary additional metadata.
Euan Cochrane
Chapter 12. Implementing Rights Metadata for Digital Preservation
Abstract
When repositories acquire and preserve digital objects, certain types of metadata are automatically generated by systems and software while others are added by users. Technical information about file formats or standard outputs resulting from preservation actions is typically machine-generated; descriptive or cataloguing information, information about archival processes such as accessioning and appraisal, and information about intellectual property and other types of rights must usually be created at some point by the user and entered into software tools via data entry templates or other means. Repositories use different types of metadata for different purposes: for example, file format metadata can be used to assess format obsolescence risk and select preservation plans; descriptive information can be exposed in online access systems for discovery and citation purposes; and information about rights can be used as the basis for understanding the range of actions that can be taken by repositories with respect to the digital objects they have acquired. Rights can be a complex area for preservation repositories. Copyright and other statute-based restrictions, restrictions imposed by licenses or donors, and restrictions derived from institutional policies can sometimes overlap and compete with one another. Metadata standards for capturing rights information need to be flexible enough for preservation repositories to record rights data in ways that best meet their needs, and software tools implementing those standards need to support this type of flexibility. However, the data still need to be standardized enough to support a common and consistent understanding of what the information means. Moreover, rights information should be understandable to both human readers and software systems, which may be called upon to automate certain processes based on restrictions or permissions associated with digital objects. This chapter provides an overview of the PREMIS rights entity and how it is implemented in Archivematica, a digital preservation software system, in a way that attempts to allow repositories to address these complex requirements.
Evelyn McLellan
Chapter 13. Serialization of PREMIS
Abstract
This chapter is primarily concerned with the technical options for serializing data conformant to the PREMIS Data Dictionary. Serialization is the process of mapping a data model into formatted bits; it is generally required to facilitate transmission, storage, or computation on the data. The chapter begins with an introduction to the concept of serialization. It then describes several general factors that should be considered when implementing a PREMIS-compliant preservation system, weighing the benefits of the different approaches and offering suggestions for specific use cases. The next three subsections delve in some detail into three common implementation options: XML, RDF, and relational databases. Each of these subsections describes the option in general and then examines specific issues to consider with each approach. Each subsection offers suggestions for specific implementation scenarios such as partitioning and linking between PREMIS entities, linking to file content, dealing with specific data elements such as identifiers, using custom or non-standard elements, and storing the PREMIS data. Although the chapter was written for PREMIS version 2.2, much of the content is equally applicable to PREMIS 3.0; when there are differences, footnotes describing the differences have been provided.
Thomas Habing
Chapter 14. Digital Preservation Metadata in a Metadata Ecosystem
Abstract
This chapter discusses how to make different metadata frameworks work together in an institutional ecosystem. In most contexts, PREMIS is not the only metadata standard implemented in an institution. Different standards aim at covering complementary functionality and can be combined in a modular fashion. For example, core digital preservation metadata need to be complemented with file-format-specific technical metadata standards, or with well-established standards for descriptive metadata that are used for discovery of and access to digital content. Additionally, standards exist for metadata containers which help in combining metadata from various sources in a standardized way and for associating metadata with the digital content. This chapter discusses available options, how to cope with mismatch or overlaps, and how to decide on implementation details.
Eld Zierau, Sébastien Peyrard
Chapter 15. Tools for Working with PREMIS
Abstract
The PREMIS Tools chapter provides a snapshot in time of many of the tools that could be used to output values for PREMIS semantic units. The tools included in the chapter are diverse in focus and functionality—some such as the PREMIS Event Service were specifically designed to support PREMIS; others such as MediaInfo, were not designed to address PREMIS specifically but can be used to output values that could be transformed into PREMIS values; still other tools included, such as Fedora, are large repository platforms that provide some support for PREMIS along with its many other functions. For many of these tools, specific examples for how they could be run on a command-line interface are shown.
Carol Chou, Andrea Goethals, Julie Seifert
Chapter 16. PREMIS in Open-Source Software: Islandora and Archivematica
Abstract
Open-source software is software whose source code is made freely available for use, modification and redistribution. Although there are many different models for developing and sustaining open-source tools, the tools are often developed in a collaborative and open environment, with development, technical, and user documentation made available online and community adoption supported by public discussion lists and user groups. The last few years have seen a considerable increase in the number of open-source software tools that have been or are being developed for use by archives and libraries. These tools provide a broad range of functionalities required for digital preservation, management of digital objects within a repository, cataloguing or archival description, and provision of online access. Many of these tools are now implementing all or part of the PREMIS Data Dictionary to record detailed technical, preservation, provenance, and rights information about digital holdings. This chapter describes implementations by two software tools, Islandora and Archivematica, providing practical examples of how PREMIS can be used to support the ability of archives and libraries to preserve digital holdings and make them accessible over time.
Mark Jordan, Evelyn McLellan
Chapter 17. Case Study: Implementing an Open-Source and In-House Developed PREMIS Events and Agents System
Abstract
One approach for implementing digital preservation metadata involves building the service alongside an existing platform. This provides a modular design option supplementing existing infrastructure with additional preservation functionality. By loosely coupling a PREMIS implementation to the primary repository, each system is able to change over time independently of the other as long as they maintain the linkage between the systems, often times by using unique identifiers. Implementing PREMIS services may be approached in this manner and can supplement existing infrastructure where preservation metadata is either not adequately documented, or where local requirements surpass what is present in available repository infrastructure. This article illustrates this approach with a case study on implementing PREMIS for the particular purpose of managing preservation events.
Mark Edward Phillips, Daniel Gelaw Alemneh
Chapter 18. Conformance with PREMIS
Abstract
The full range of benefits of implementing PREMIS only comes when the implementation is well considered and well executed. These benefits pertain both to the implementing organization and the digital preservation community. In order to support organizations, the PREMIS Conformance Statement defines what can be considered a well-executed implementation. This chapter focusses on the 2015 conformance statement, exploring the value of conformance and how best to achieve it. Finally, the chapter explores how conformance could be linked to assertions of best practice and certification.
Peter McKinney
Backmatter
Metadaten
Titel
Digital Preservation Metadata for Practitioners
herausgegeben von
Angela Dappert
Rebecca Squire Guenther
Sébastien Peyrard
Copyright-Jahr
2016
Electronic ISBN
978-3-319-43763-7
Print ISBN
978-3-319-43761-3
DOI
https://doi.org/10.1007/978-3-319-43763-7