Tools for digital preservation

When researching and developing our own approach to digital preservation, the National Archives made several key decisions about the software it required. The software would:

  • operate under an open source license (the GNU General Public License or GPL) for transparency, to encourage its use by anyone needing to access digital records over a long period, and to capitalise on contributions from the community of its users
  • function across various platforms – hence the choice of the programming language Java
  • convert proprietary file formats into open, fully-specified, standards-based formats – most of which are are XML-based – because records in open formats have a greater potential lifespan in the digital future

Software developed in-house by the National Archives for digital preservation controls workflow and manages audit, conversion, storage, retrieval and access.

  • Workflow control and audit are carried out by Digital Preservation Recorder (DPR), which supports data integrity, authenticity and reliability by recording preservation metadata about the processed records.
  • Conversion of records into preservation formats is performed by Xena, which also enables preserved records to be displayed for access.
  • Storage in and retrieval from the digital archive are again handled by DPR, which manages and ultimately provides access to records stored in the archive.

XENA – XML Electronic Normalising of Archives

The core software application used in the digital preservation process is the Xena software. While Xena exists as a standalone application, within the Archives we use it via its Application Programming Interface and DPR.

Xena converts digital documents from their original format into selected open, fully-documented formats used for archival preservation by the National Archives – bitstream and normalised. The resulting data objects are referred to as Archival Information Packages (AIPs). 

Bitstream

The bitstream version that Xena creates is a metadata-wrapped bitstream – which is considered a secure original copy of the record. This version contains all of the information from the original, but can only be read with the original hardware, operating system and software.

Normalised

The normalised version that Xena creates is also wrapped in metadata. The process of normalising converts the record from its original format into an open, standards-based format. The normalised version is not considered to be an original copy of the record as some information may be lost during the normalisation process. However, the performance of the normalised object is the closest to the original that is currently possible. Xena is being continually improved so, over time, the performance of normalised versions is expected to replicate the original more closely.

A useful tool

Although the National Archives of Australia has developed Xena as a digital preservation software application for its internal use, it believes that Xena may be a useful tool for many other individuals and organisations as well:

  • Other archival institutions may find Xena useful in developing their own in-house digital preservation programs.
  • Government agencies and other organisations may find it useful to integrate Xena into their own records management systems so that they can normalise digital records at the point of capture and/or batch-convert existing corporate records repositories for long-term accessibility and preservation.
  • Individuals may find Xena useful if they need to store digital information for periods that exceed the life of software used to create such information. For example, documents, images or financial information that needs to be available beyond the life of present computing systems may be converted by Xena to formats that will be accessible via future computing systems.

In short, Xena is a tool that allows archival institutions, organisations, and individuals to place their digital documents and digital records in open, documented, and accessible formats where important business information can be accessed from a wide range of applications on a wide range of computing platforms.

Currently, the Archives converts (normalises) office documents, emails, images and some other files into open file formats, but there are many more digital formats, and more will evolve in the future. Xena’s plug-in architecture enables the software to be readily enhanced to meet this challenge.

List of supported formats:

  • AIFF
  • BMP
  • CSS
  • CSV
  • CUR
  • DOC
  • FLAC
  • GIF
  • gzip
  • HTML
  • jar
  • JPEG
  • MP3
  • MacBinary
  • MPP
  • ODP
  • ODS
  • ODT
  • PCX
  • PDF
  • PNG
  • PPS
  • PPT
  • PSD
  • RTF
  • sql
  • SVG
  • SXC
  • SXI
  • SXW
  • SYLK
  • tar
  • tar.gz
  • TIFF
  • TSV
  • TXT
  • war
  • WAV
  • WPD
  • WRI
  • XBM
  • XHTML
  • XLS
  • XML
  • XSLT
  • zip

DPR – Digital Preservation Recorder

Digital Preservation Recorder (DPR) is the workflow tool that guides our digital preservation process. DPR calls on Xena to perform file conversions into preservation formats. DPR is also an audit tool that records metadata associated with every step of the preservation process. By capturing this preservation metadata we can ensure the authenticity of the digital records.

DPR is backed by a Postgres database on each of the quarantine, preservation and digital archive facilities described in the page about getting records into the digital archive. There is no data connection of any kind between these facilities or from them to anywhere else. The physical isolation of the facilities requires the DPR to exist separately on all three. At each stage of the process, the DPR database content is exported in XML to accompany the data on a carrying device to the next. Ultimately, the instance of DPR held by the digital archive contains all of the preservation metadata for the preceding stages of the process.

Access to the software

Xena is available for download from Sourceforge at http://xena.sourceforge.net

DPR is available for download from Sourceforge at sourceforge.net/projects/dpr