Skip to content

NFSA/nfsa-digital-packaging

Repository files navigation

NFSA Digital Packaging

Overview

The NFSA Digital Packaging project provides specifications and tooling for the creation and management of digital preservation packages. This implementation draws on leading international frameworks including OAIS (Open Archival Information System), PREMIS (Preservation Metadata: Implementation Strategies), and METS (Metadata Encoding and Transmission Standard).

This project forms part of the NFSA’s broader digital preservation strategy and research and development goals, supporting sustainable packaging, storage, and metadata management for audiovisual materials within NFSA’s digital repository ecosystem.

It supports the NFSA’s strategic goal to preserve, sustain, and make discoverable the nation’s audiovisual memory through well-structured, standards-aligned, and verifiable digital packages.

"Digital packaging" here refers to the creation of structured information packages (SIPs, AIPs, and DIPs) informed by the OAIS reference model, using METS for structure and PREMIS for preservation metadata. The codebase provides NFSA-specific tooling to build, validate, and manage these packages for long-term preservation. A key inspiration for this project is the eArchiving initiative, which provides clear and actionable steps to implement OAIS.

This implementation has been written in Python, a popular and commonly used programming language, with uv as dependency manager.

Context and Purpose

The NFSA manages a growing volume of complex digital materials, ranging from born-digital works and video games to film scans comprising millions of frames. To ensure these materials remain authentic, usable, and meaningful over time, the NFSA’s digital preservation team is researching and developing a standards-based digital packaging approach that:

  • Enables consistent management of digital objects and their metadata across systems.
  • Has the ability to capture preservation actions, rights, and software dependencies.
  • Supports long-term interoperability and reuse across NFSA’s digital collection.

Motivation

The NFSA has historically packaged digital objects, either due to the necessity of retaining original file structures (e.g. games), or managing representations containing many thousands of discrete files (e.g. film scans). However, we have encountered issues with package standardisation and the consistent management of metadata, which METS explicitly addresses. We also wish to better capture preservation events and software environments for reproduction, both of which are supported by PREMIS.

We have elected to build this implementation direct from the ground up, as a means of truly understanding these specifications, as well as developing a solution which meets our specific needs. Self-implementation allows us to customise solutions to our collection and system requirements, maintain self-reliance, and avoid vendor lock-in or ongoing licensing costs.

Scope

The implementation as it currently exists can be considered minimum viable, with significant potential for extension into the future. Areas we are seeking to support in the coming months:

  • We have a hard issue that discrete representations of AV media can be larger (up to 20TB) than our mandated maximum size for individual TARs (4TB). This means supporting multi-part AIPs, or possibly dispensing with TAR files altogether, depending on our broader digital collection ecosystem.
  • Defining package types allows us to explicitly manage expectations around file types, file derivations, and metadata files to meet distinct profiles.
  • There are rich opportunities from PREMIS around capturing preservation events, rights management and retaining reproduction or emulation environment information.

Deployment

A detailed guide to working with the provided examples can be found here: EXAMPLE.md.

Broadly, the implementation comprises six scripts corresponding to creation/validation for SIP, AIP and DIP packages. The process requires a source JSON file, containing information pertaining to the related intellectual entity, linking identifiers, as well as paths to the representations for packaging. An example JSON can be found here: recipe.json.

Dependencies

This codebase expects Python 3.13+, and has been tested using the uv Python dependency manager.

The service also requires installation of DROID, which can be executed on a Linux machine using the following guide: DROID.md.

Deliverables

The resulting AIP packages can be archived to tape as preservation components.

Metadata which is stored inside the packages should be surfaced, or synced, outside the package to allow for analytics and interrogation without restoring, and unpacking, the full package. One consideration - all METS/PREMIS files exist natively as mets.xml (or premis.xml), so a rename to include the explicit package or representation UUID (e.g. mets-7f6a7f70-3705-4b30-b77d-110f9d6852a3.xml) to prevent filename collisions would be a good idea.

Definitions

So many acronyms! A partial guide to these, and other technical terms, can be found here: DEFINITIONS.md.

Attribution and License

Developed by the NFSA Digital Preservation Team, 2025

CC BY-NC-SA 4.0

About

Implementation of Digital Packages at the NFSA.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages