Long-term File Formats

Purpose

This Guideline identifies file formats that the Archives has reasonable confidence will continue to be accessible over time. The Guideline is intended to have the following uses:

  • restrict the number and types of file formats transferred by government agencies and personal records donors to those formats the Archives is confident can be preserved and made accessible over time;
  • provide a resource to help government agencies make risk-based decisions about file format selection for activities like digitisation projects or other digital projects;
  • indicate preferred formats for digitisation projects undertaken by the Archives, both internally and outsourced to a third party service provider;
  • provide information about the Archives' approach to digital preservation for digital records that have been transferred into our custody.

The Guidelines are not intended to be exhaustive, but to identify the common types of formats the Archives expects to receive from agencies and provide an indication of their long-term sustainability.
Agencies can contact the Archives if they have identified other file formats they would like investigated and added to the table.

Characteristics of preservation formats

The Archives distinguishes between 'preferred', 'acceptable' and 'at risk' preservation formats. 'Preferred' preservation formats are formats that the Archives has determined are a very low risk of becoming obsolete in the long term. The Archives normalises 'at risk' formats into a 'preferred' preservation format.

'Acceptable' preservation formats are formats that the Archives has determined are a low risk of becoming obsolete in the long term. Records in 'acceptable' formats are not normalised but are stored as they are and are monitored over time to confirm their continued accessibility.

'At-risk' formats are those the Archives has determined are at significant risk of becoming inaccessible. The Archives will normalise these digital records into 'preferred' preservation formats. If there is no 'preferred' preservation format for the material it will be retained in its native format until a preferred format is identified, at which time the format will be normalised to it.

Transfer requirements

When transferring digital records to the Archives, the transferring agency will identify the category and format of the records in the transfer documentation. While the Archives always prefers to receive a preferred file format over an acceptable file format, there may be sound business reasons why the agency has selected an acceptable file format.

The transferring agency will need to take further actions to ensure that digital records are acceptable for transfer:

  • Provide metadata in a digital file as set out in the Archives transfer requirements
  • Deactivate any file level encryption
  • Deactivate any digital rights management technologies
  • Provide Representation Information, which is information or documentation required to access the record or to provide additional technical information, eg data dictionaries, manuals etc.

Note that this Guideline is not a procedure for transferring digital records to the Archives. Transfer procedures can be discussed with the Archives' officer responsible for the transfer.

Table of file formats

In the attached form digital formats are divided into categories relating to content and record type. Formats are listed by name and include a link to the format specification that defines appropriate encoding methods.  Given the dynamic nature and availability of data formats the ratings given in the table will change over time.

Computer aided design

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Autodesks Drawing Interchange File Format/Data eXchange Format (DXF)

Binary and ASCII

DXF General File Structure (ASCII):
autodesk.com General File structure

Binary DXF File Format:
autodesk.com Binary DXF File

Highly used but problematic format due to unreliable mobility between different software iterations.

Binary and ASCII versions are partially specified. The ASCII version of this format should be used (not the binary). Should be used for 2D models, not 3D models.

Autodesks Drawing File (DWG)Design Specification for .dwg filesMost widely used format for CAD designs. It is proprietary software owned by Autodesk though other software readers can be used with varying results. While Autodesk's proprietary DWG format is undocumented, the Open Design Alliance publishes a full, open source specification for DWG.
Standard for the Exchange of Product Model Data (STEP)

ISO 10303-21:2002

ISO 10303-28:2007

STEP can represent 3D objects in CAD and related information. Is an ISO standard and extremely large and covers many industrial aspects. Is copyrighted and not freely available.
Portable Document Format/Engineering (PDF/E)ISO 24517-1:2008

PDF/E is used for the creation of documents used in geospatial, construction and manufacturing including interactive media, animation and 3D.

Is an ISO standard with specifications fully available.

At-risk formats (formats that may be normalised when received by NAA)

FormatVersion/Codec and SpecificationComment
 Visio (VSD)(VDX)Not available 

Widely used format, but earlier versions are no longer supported directly by MS. May have rendering issues due to incompatable software.

Format is proprietary and no specifications have been published.

Format to be assessed

FormatVersion/Codec and SpecificationComment
 Universal 3D (U3D)Standard ECMA-363U3D is a compressed file format standard for 3D computer graphics data that can be used on a variety of systems. It is an ECMA standard and specifications are fully available.
Product Representation Compact (PRC)Acrobat 9 HTMLHelp

Highly compressed format that facilitates the storage of representations of 3D models in a PDF file.

Specifications are fully available.

 Extensible 3D (X3D)ISO/IEC 19775-1:2008

Royalty-free ISO standard for representing 3D computer graphics. Though it is not a widely accepted format at present it works on a variety of common platforms.

The specifications are available but may include some proprietary elements.

Data sets

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Plain Text

ASCII, EBCDIC, Unicode-based encodings (UTF-8, UTF-16 etc)

American Standard Code for
Information Interchange Text (ASCII Text):
ISO/IEC 646:1991

Unicode Text: RFC 3629: UTF-8 A
Transformation Format of ISO 10646:
tools.ietf.org rfc3629

RFC 2781: UTF-16: An Encoding of ISO 10646:
http://www.ietf.org/rfc/rfc2781.txt

Simplest format supported by a wide range of applications including text editors, work processors, web bowsers etc. Specifications fully available.

Comma Separated Value (CSV)

n/a

tools.ietf.org rfc4180

Very widely used and accepted plain text format but which has many variations and no formal definitive specification.
Open Document Format Spreadsheet (ODS)

1.1, 1.2

oasis-open.org OpenDocument for Office Applications

Identifical specifications are published at:

ISO/IEC 26300-1:2015, Information technology –
Open Document Format for Office Applications (OpenDocument)
v1.2 Part 1: OpenDocument Schema:
www.iso.org ISO/IEC 26300-1:2015

ISO/IEC 26300-2:2015, Information technology –
Open Document Format for Office Applications (OpenDocument)
v1.2 Part 2: Recalculated Formula (OpenFormula) Format:
iso.org ISO/IEC 26300-2:2015

ISO/IEC 26300-2:2015, Information technology –
Open Document Format for Office Applications (OpenDocument)
v1.2 Part 3: Packages: iso.org ISO/IEC 26300-3:2015

Part of the Open Document Format family, ODS is an XML-based format for editable spreadsheet documents.

Specifications are fully available. NAA normaises at risk spreadsheet formats to this format.

Extensible Markup Language (XML)

1.1

w3.org Extensible Markup Language (XML)

Extensively used data format that underpins many internet standards. Suitable both for long term storage and access. Specifications are fully available.
Microsoft Excel Office Open XML
(XLSX)

Version 2007, OOXML Workbook Excel 2007-2010 XLXS

ecma-international.org Standard ECMA-376

Very widely used and accepted international standard format by MS.
Microsoft and others have published all specifications.

Microsoft Excel Binary Document Format (XLS)

Version 8

msdn.microsoft.com Open Specifications Dev Center

Older but still widely used format with many suitable readers. Is no longer directly supported by MS but plenty of help is available from others.

Specifications are fully available.

SIARD 2 (.siard)

2.0

github.com SIARD 2

Open format for archiving relational databases.

MS Access

2000, 2002

github.com HACKING

Widely used but no longer directly supported by MS.

No specification published by MS but reverse engineered versions are available. May be suitable for long term retention, but alternatives may be sought for access, eg export as CSV or pdf reports.

JavaScript Object Notation (JSON)ecma-international.org Standard ECMA-404

Widely used and open text syntax for data interchange between all programming languages.

Specifications are fully available.

Format to be assessed

FormatVersion/Codec and SpecificationComment
dBASE Table File Format (DBF)Data File Header Structure for the dBASE Version 7 Table Files:
dbase.com Data File Header Structure

Originally popular due to a simple structure and support for data types appropriate for business use on PCs.

Specification of present format fully available but earlier versions are unofficial versions.

Digital audio

Formats preferred by NAA

FormatVersion/Codec and SpecificationComment
Broadcast Wave (BWF)

0, 1, 2
Linear Pulse Code Modulated Audio (LPCM)

Version 0: tactilemedia.com MCI Control Information

Version 1: web.archive.org EBU Boradcast Wave Format

Version 2: tech.ebu.ch Specification of the Broadcast Wave Format

BWF is simple in structure. It is widely used and a wide range of players are available. NAA prefers this format for internal digitisation projects.

Specifications are fully available.

Formats acceptable to NAA

FormatVersion/Codec and SpecificationComment
Free Lossless Audio Codec (FLAC)

1.21

xiph.org FLAC format

FLAC is widely used and non-proprietary format, but has limited playback support so is not preferred by AV Preservation at the NAA.

Specifications are fully available.

Audio Interchange Format (AIFF)

1.3
Linear Pulse Code Modulated Audio (LPCM)

mmsp.ece.mcgill.ca Audio File Format

Non compressed format acceptable in several overseas archives but is somewhat limited in use.

Specifications are fully available.

Moving Pictures Expert Group Layer 3
MPEG-1 Layer 3
MPEG-2 Layer 3

MP3enc, Lame
MPEG-1 Audio Layer 3
MPEG-2 Audio Layer 3

MPEG-1: ISO/IEC 11172-3:1993 Information technology –
Coding of moving pictures and associated audio for
digital storage media up to about 1,5 Mbit/s -
Part 3: Audio iso.org ISO/IEC 11172-3:1993

MPEG-2: ISO/IEC 13818-3:1998 Information technology –
Coding of moving pictures and associated audio for
digital storage media up to about 1,5 Mbit/s -
Part 3: Audio iso.org ISO/IEC 13818-3:1998

MP3 is a widely used format for transmittting audio streams and a wide range of players is available.

MP3 is fully specified.

Moving Pictures Expert Group
MPEG-4
Advanced Audio Coding (AAC)

AAC

ISO/IEC 14496-3:2009 Information Technology –
Coding of audio-visual objects -

Part 3: Audio: iso.org ISO/IEC 14496-3:2009

MPEG-4 is a very common highly compressable format widely used with a variety of players.

specifications are fully available.

WAVeform Audio (WAV)

Linear Pulse Code Modulated Audio (LPCM)

mmsp.ece.mcgill.ca Multimedia Programming Interface

Typically uncompressed audio used generally on PCs. Developed by MS and IBM and full specifications are available.

Digital scanned motion picture film

Formats preferred by NAA

FormatVersion/Codec and SpecificationComment
Digital Moving Picture Exchange Bitmap (DPX)

1 and 2
Uncompressed

http://standards.smpte.org/

Very widely used and accepted format. DPX provides great flexibility in storing colour information, colour spaces and colour planes.

Specifications are fully available.

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Digital Cinema Distribution Master (DCDM)

n/a

smpte.or Digital Library

DCDM is a post production format used to encode Digital Cinema Package (DCP) for theatrical release and therefore limited in use. Specifications are fully available.

Digital video

Formats preferred by NAA

FormatVersion/Codec and SpecificationComment
Motion JPEG 2000(MJP2 or MJ2)

sis.se ISO/IEC 15444-3

Widely used and well-documented format. NAA prefers this format for internal digitisation projects, and it is the preferred format for digitisation projects underatken under GRA 31.

Specifications are fully available.

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Audio Video Interleaved Format (AVI)

Uncompressed 4:2:2

Open DML AVI File Format Extensions

Widely used multimedia container format. Can have problems with aspect ratio and compression on replay. Specification fully available.
Material Exchange Format (MXF)

JPEG 2000 losslessly compressed

smpte.org Digital Library

SO/IEC 15444-1:2004 Information technology - JPEG 2000 image coding system: Core coding system: iso.org ISO/IEC 15444-1:2004

A container format that is widely used by professional users but less so by desktop applications. Specifications are freely available.
Quicktime (MOV)

Uncompressed 4:2:2

developer.apple.com QuickTime File Specification
A container format that is no longer supported by Apple though used on many devices. Specifications fully available.
MPEG-2 Video (MPEG2)

H.262

ISO/IEC 13818-2:2000 Information technology - Generic coding of moving pictures and associated audio information: Video
iso.org ISO/IEC 13818-2:2000

Widely used as DVD and digital television standard. Specification fully available though some questions remain re patent and royalty payments.
MPEG-4

H.264

Information technology - Coding of audio-visual objects - Part 10: Advanced Video Coding:
iso.org ISO/IEC 14496-14:2003

H.264 or MPEG-4 is a commonly used video compression format for the recording, compression, and distribution of video content. Generally accepted preservation format for video.
Windows Media Video 9 File Format (WMV)

9 and VC-1

WMV (Windows Media Video) File Format

msdn.microsoft.com Windows Media Video 9 Encoder

Somewhat specialised, WMV version 9 was adopted for physical-delivery formats such as HD DVD and Blu-ray Discs. Format can have major problems on playback and is not supported well by Microsoft. An acceptable format for video, but not broadcast content.

Specifications are fully available.

Email (Aggregates)

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Microsoft Personal Folders Format (PST)

Outlook Personal Folders (.pst) File Format

Stores multiple emails, folders and calendar items using Microsoft Outlook. Widely used though now superceded, PST files are used to store items and to maintain off-line availability of them. PST is no longer supported directly by MS and is not recommended for access.

Specifications are fully available.

MBOX Email Format

MBOX Email Format:
tools.ietf.org rfc4155

And MIME:
tools.ietf.org rfc2045,
tools.ietf.org rfc2046
tools.ietf.org rfc2047
tools.ietf.org rfc4288
tools.ietf.org rfc4289
tools.ietf.org rfc2049

A family of related file formats used for storing collections of electronic mail messages. A single file with the extension .mbox or .mbx contans the contents of an entire folder, with MIME content stored directly in the file. Reasonable well documented and in effect a de facto standard. Attachments are stored in the MIME format, meaning migration will need to occur to ensure accessibility.

Email (Individual messages)

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Internet Message Format (EML)ietf.org rfc2822

Widely used format. A number of libraries, archives and digital preservation systems have identified EML as a preferred preservation format.

Specifications are fully available.

Microsoft Outlook Item Message Format (MSG)[MS-OXMSG]: Outlook Item (.msg) File Format

Very widely used but earlier iterations are not directly supported by MS. There can be problems rendering messages without MS software so is not suitable for access purposes.

Specifications are fully available.

Encapsulation

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
ZIP

6.3.3

pkware.cachefly.net .ZIP File Format Specification

ISO/IEC CD 21320-1 - Information technology – Document Container File –
Part 1: Core: iso.org ISO/IEC 21320-1:2015

Very widely used format designed for cross-platform data exchange and data storage for a set of related files.

For archival purposes the ZIP container is generally discarded after the files have been extracted.

Format is proprietary but the specifications are fully published.

TARgithub.com FreeBSD File

Format supported by most modern file archiving systems.

Has widespread use but many of its features are considered dated.

Specifications are fully available.

BagIt

digitalpreservation.gov BagIt File

Widely used, BagIt is a file packaging format designed to support storage and transfer of digital content.

A 'bag' has just enough structure for descriptive 'tags' and a 'payload' but does not require knowledge of the payload's internal semantics.

Specifications are fully available.

Geospatial

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Geographic Markup Language (GML)

2.0 - 3.2

ISO 19136:2007 - Version 3.2, OpenGIS Geography Markup Language (GML) Encoding Standard 07-036:
iso.org ISO 19136:2007

Commonly used with preferred status in some overseas Libraries and Archives. Used as an ISO standard and specifications are fully available.

Geospacial Tagged Image File Format (GeoTIFF)

1.8.2

geotiff.maptools.org Specification Contents

GeoTIFF allows georeferencing information to be embedded and opened within a TIFF file.

Specifications are fully available with some portions available under permissive X licence.

Format to be assessed

FormatVersion/Codec and SpecificationComment
Band Interleaved by Line (BIL)webhelp.esri.com ArcGIS Desktop 9.3 Help

Versatile binary raster format.

Specifications are fully available.

Band Interleaved by Pixel (BIP)webhelp.esri.com ArcGIS Desktop 9.3 Help

Versatile binary raster format accepted by several overseas institutions.

Specifications are fully available.

Band Interleaved by Sequence (BIS/BSQ)webhelp.esri.com ArcGIS Desktop 9.3 Help

Versatile binary raster format accepted by several overseas institutions.

Specifications are fully available.

Environmentsal Systems Research Institute (ESRI) Shapefile (SHP)

1997 - Current version

ESRI Shapefile Technical Description

A popular and simple geospatial vector data format for geographic information system (GIS) software.

'Most' specifications for this format are available.

Keyhole Markup Language (KML)

2.2

Open Geospatial Consortium Inc. OGC KML 07-147r2:

KML expresses displays geographic features within Internet-based, 2D maps and 3D Earth browsers and is an international standard. KMZ is the ZIP version.

Specifications are fully available.

Environmental Systems Reseach Institute (ESRI) Export Format (EOO)Reverse engineered specification, Arc/Info Export (E00) Format Analysis

Increasingly seldom used format superceded by others with superior interoperability.

Format is proprietary to ESRI and specifications only partially released.

Geospacial PDFISO 32000-1:2008, Document management – Portable document format – Part 1: PDF 1.72

Popular, well supported and simple system of saving and reading geospacial information based on PDF 1.7, which has been accepted as an ISO standard.

Specifications are fully available.

TerraGo GeoPDFOpen Geospatial Consortium Inc. OGC 08-139r2:
OGC Implementation Specification

GeoPDF uses geospatial PDF as a container for maps, imagery, and other data and are readable over a variety of platforms, esp Adobe. It is used extensively, esp in the US.

Specifications are fully available.

Vector Product Standard (Format)(VPF)Interface Standard for Vector Product Format

Developed by US military for geospatial data based on a georelational data model. It is widely used in the US and acceptable to NARA but other use has not been investigated.

VPF and some table names have been trademarked but the specification if fully available.

ESRI ARC/INFO Interchange File FormatArc/Info Export (E00) Format Analysis

Versatile binary raster format accepted by several overseas institutions. It is not considered acceptable for long term storage though can be used for access.

Specifications are fully available.

Images (Raster)

Formats preferred by NAA

FormatVersion/Codec and SpecificationComment
Portable Network Graphics (PNG)

1.2

ISO/IEC 15948:2004 Information technology - Computer graphics and image processing - Portable Network Graphics (PNG):
iso.org ISO/IEC 15948:2004

Widely used as a long-term preservation format in the heritage sector.

PNG has a fully open specification. The Archives normalises at risk formats to this format.

Tagged Image File Format (TIFF)

4, 5, 6

loc.gov TIFF, Revision 6.0

TIFF is a widely used and there are numerous implementations of the format. Though proprietry to Adobe the specifications are freely available. The Archives prefers this format for digitisation projects. TIFF is the Archives' preferred transfer format for scanned text and digital photographs. 
Portable Document Format/Archival (PDF/A-1)

PDF/A-1

adobe.com PDF Reference

PDF/A is a form of Adobe PDF version 1.4 intended to be suitable for long-term preservation of page-oriented documents and is widely used. Full specifications have been provided by Adobe.

Portable Document Format/Archival (PDF/A-2)

PDF/A-2

adobe.com PDF Reference

PDF/A is a form of Adobe PDF version 1.4 intended to be suitable for long-term preservation of page-oriented documents and is widely used. Full specifications have been provided by Adobe.

JPEG 2000 (JP2), lossless

Part 1 (JP2)

ISO/IEC 15444-1:2004 Information technology - JPEG 2000 image coding system: Core coding system:
iso.org ISO/IEC 15444-1:2004

Limited use as requires large computational power and is not generally supported in Web browsers. While portions of the specification are fully available other parts may be covered by active patents.

JPEG File Interchange Format (JPEG/JFIF) with JPEG compression

1.02

ISO/IEC 10918-5:2013 Information technology - Digital compression and coding of continuous-tone still images: JPEG File Interchange Format: iso.org ISO/IEC 10918-5:2013

ISO/IEC 10918-1:1994 Information technology - Digital compression and coding of continuous-tone still images: Requirements and guidelines: iso.org ISO/IEC 10918-1:1994

Most common container for exchanging JPEG encoded files. Uses JPEC compression techniques. Very widely used format and ISO standard.

Specifications are fully available.

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Exchangeable Image File Format (Exif)

2.0 - 2.3

cipa.jp CIPA DC-008-Translataion-2012

Widely implemented and fully documented format. Uses existing file formats such as JPEG for compressed and TIFF for uncompressed image files. Specifications are fully available.

Graphics Interchange Format (GIF)

87a and 89a

Graphics Interchange Format version 89a: w3.org GIF89a Specification

Very widely used simple lossless compressed image format. GIF was subject to patent concerns in the 1990s and early 2000s due to patents covering LZW compression, however the patents expired in 2003/2004. Specifications are fully available.

At-risk formats (formats that may be normalised when received by NAA)

FormatVersion/Codec and SpecificationComment
Encapsulated Postscript (EPS, EPSF)www-cdf.fnal.gov PostScript File.pdf

Older format not widely used due to limited players and portability. Has been superceded by later technologies.

EPS is a proprietary but publicly documented format.

Photoshop (PSD, PSB)adobe.com Photoshop File Formats Specification

Widely used but not preferred by most archives and libraries for long term storage.

Specifications are fully published but some portions may be covered by active patents.

Bitmap (BMP)dragonwins.com BMP File

Simple and widely used format over a variety of platforms. No formal specification has been issued but Microsoft has published detailed descriptions of the different structures in use.

Format to be assessed

FormatVersion/Codec and SpecificationComment
Digital Negative (DNG)www.loc.gov Digital Formats

An increasingly popular Camera Raw format using TIFF 6. Created by Adobe which has published all specifications.

Images (Vector)

Formats preferred by NAA

FormatVersion/Codec and SpecificationComment
Scalable Vector Graphics (SVG)w3.org SCALABLE VECTOR GRAPHICS (SVG)

Commonly used format using the description of an image as an application of XML.

Specifications are fully available. The Archives normalises at-risk formats to this format.

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Open Document Graphics (ODG)oasis-open.org OASIS Standards

OpenDocument, is an widely used and supported XML-based file format for various office applications.

Specifications are fully available.

At-risk formats (formats that may be normalised when received by NAA)

FormatVersion/Codec and SpecificationComment
Adobe illustrator (AI)No specification available

Widely used but not for preservation purposes.

Proprietary and no specifications have been published.

Encapsulated Postscript (EPS, EPSF)www-cdf.fnal.gov Encapsulated PostScript

Older format not widely used due to limited players and portability. Has been superceded by later technologies.

EPS is a proprietary but publicly documented format.

Presentations

Formats preferred by NAA

FormatVersion/Codec and SpecificationComment
OpenDocument Presentation Format (ODP)

1.0, 1.1, 1.2

oasis-open.org OASIS Open Document Format for Office Applications

Identifical specifications are published at:

ISO/IEC 26300-1:2015, Information technology –

Open Document Format for Office Applications (OpenDocument) v1.2

Part 1: OpenDocument Schema: iso.org ISO/IEC 26300-1:2015

ISO/IEC 26300-2:2015, Information technology –

Open Document Format for Office Applications (OpenDocument) v1.2

Part 2: Recalculated Formula (OpenFormula) Format: iso.org ISO/IEC 26300-2:2015

ISO/IEC 26300-2:2015, Information technology –

Open Document Format for Office Applications (OpenDocument) v1.2

Part 3: Packages: iso.org ISO/IEC 26300-3:2015

Part of the Open Document Format family, a non-proprietary format for editable documents that are presentations based on sequences of 'slides'. The Archives normalises at-risk formats to this format.

Portable Document Format Archival (PDF/A-1)

PDF/A-1

adobe.com PDF Reference

PDF/A is a form of Adobe PDF version 1.4 intended to be suitable for long-term preservation of page-oriented documents and is widely used.

Full specifications have been provided by Adobe.

Portable Document Format Archival (PDF/A-2)

PDF/A-2

adobe.com PDF Reference

PDF/A is a form of Adobe PDF version 1.4 intended to be suitable for long-term preservation of page-oriented documents and is widely used.

Full specifications have been provided by Adobe.

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Microsoft PowerPoint 1997-2007 Binary Format (PPT)

8.0

msdn.microsoft.com Open Specifications Dev Center

One of the dominant presentation formats but has been superceded and is no longer supported directly by Microsoft.

Microsoft has released the full specification.

Microsoft PowerPoint Office XML Format (PPTX)

2007+

ecma-international.org Standard ECMA-376

Another dominant presentation format that is no longer directly supported by MS.

Specifications are fully available.

Project Management

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Microsoft Project (MPP) 2000-2009

No specification available

Presently the dominant desktop project management application though earlier versions are becoming progressively unsupported directly from MS.

MS Project is proprietary and MS has not published the specification to date.

Raw images

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Digital Negative (DNG)

1.4.0.0

Digital Negative (DNG) Specification

A popular Camera Raw format using TIFF 6. Created by Adobe which has published all specifications.

Text

Formats preferred by NAA

FormatVersion/Codec and SpecificationComment
Open document Text Format (ODT)

1.1, 1.2

oasis-open.org OASIS OpenDocument

Identifical specifications are published at:

ISO/IEC 26300-1:2015, Information technology – Open Document Format for Office Applications (OpenDocument) v1.2 Part 1: OpenDocument Schema: iso.org ISO/IEC 26300-1:2015

ISO/IEC 26300-2:2015, Information technology – Open Document Format for Office Applications (OpenDocument) v1.2 Part 2: Recalculated Formula (OpenFormula) Format: iso.org ISO/IEC 26300-2:2015

ISO/IEC 26300-2:2015, Information technology – Open Document Format for Office Applications (OpenDocument) v1.2 Part 3: Packages: iso.org ISO/IEC 26300-3:2015

Part of the Open Document Format family. Open Document Text Format, is a non-proprietary format for editable textual documents. Specifications are fully available. The Archives normalises at-risk formats to this format.

Plain Text (TXT)

ASCII, EBCDIC, Unicode-based encodings (UTF-8, UTF-16 etc)

American Standard Code for Information Interchange Text (ASCII Text): iso.org ISO/IEC 646:1991

Unicode Text (UTF-8): RFC 3629: UTF-8 A Transformation Format of ISO 10646: ietf.org rfc3629

Unicode Text (UTF-16): RFC 2781: UTF-16: An Encoding of ISO 10646: ietf.org rfc2781

Simplest format supported by a wide range of applications including text editors, work processors, web bowsers etc. Specifications fully available.

Portable Document Format/Archival (PDF/A-1)

PDF/A-1

ISO 19005-1:2005 Document management - Electronic document file format for long-term preservation -
Part 1: Use of PDF 1.4 (PDF/A-1): iso.org ISO 19005-1:2005

Widely used 'archival' PDF format that prohibits the use of certain features of PDF that may make it difficult to render in the future. Well supported worldwide with many implementations, including official and third party. Complete specifications available. Based on PDF version 1.4.

Portable Document Format/Archival (PDF/A-2)

PDF/A-2

ISO 19005-2:2011 Document management - Electronic document file format for long-term preservation -
Part 2: Use of ISO 32000-1 (PDF/A-2): iso.org ISO 19005-2:2011

Widely used 'archival' PDF format that prohibits the use of certain features of PDF that may make it difficult to render in the future. Well supported worldwide with many implementations, including official and third party. Complete specifications available. Based on PDF version 1.7.

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Microsoft Word 97 Binary Document Format (DOC)

1997 - 2003

msdn.microsoft.com Open Specifications Dev Center

Old but widely used format. Has support but not directly from Microsoft. Microsoft has published comprehensive documentation relating to MS Word 97-2003.

Microsoft Word Office Open XML (DOCX)

2007 - 2010

ecma-international.org Standard ECMA 376

Widely used and accepted international standard format by MS. Microsoft and others have published all specifications. Preferred to DOC.

Portable Document Format (PDF)

PDF 1.1 - 1.7

ISO 32000-1:2008 Document management - Portable document format -
Part 1: PDF 1.7: iso.org ISO 32000-1:2008

Widely used and accepted international standard format by MS. Microsoft and others have published all specifications. Preferred to DOC.

EPUB, Electronic Publication, version 3

3

EPUB 3.0

As yet, not a widely used format, however it is well-documented and satisfies the requirements for a long-term preservation format.

At-risk formats (formats that may be normalised when received by NAA)

FormatVersion/Codec and SpecificationComment
Microsoft Word 95 and earlier versions of MS Wordnationalarchives.gov.uk Microsoft Word Document 6.0/95

Widely used but not well supported due to age and lack of definitive specification. Not considered suitable for long term storage or access.

Rich Text Format (RTF)fileformats.archiveteam.org RTF

Format developed by Microsoft to encode formatted text and graphics. It is closely associated with Microsoft Word. Has been overtaken by other more flexible formats.

Microsoft Publisher 2002Not available

This format is part of the Office suite and is therefore widely used but is no longer directly supported by Microsoft.

Specifications are not available.

Web formats

Formats preferred by NAA

FormatVersion/Codec and SpecificationComment
WEbARChive (WARC)

0.18

WARC ISO 28500

Widely used international standard format used by organisations here and overseas.

Specifications are fully available.

Extensible Hypertext Markup Langtuage (XHTML)w3.org XHTML 1.0

Created to be more extendable and interoperable form of HTML.

Specifications are fully available but has several versions.

Acceptable formats to the NAA

FormatVersion/Codec and SpecificationComment
Internet Archive ARC file format (ARC-1A)

1.0

archive.org Arc File Format

A popular early lossless data compression format now rarely used except for archiving purposes here and overseas.

Specifications are fully available.

Hypertext Markup Language (HTML)

CSS 2.1: Cascading Style Sheets Specification

Simple and widely used format and technical support is readily available. HTML specifications are available but format is constantly changing. All versions acceptable, with a preference for later versions.

Format to be assessed

FormatVersion/Codec and SpecificationComment
Active Server Pages (ASP, ASPX)Not specification avaliableUsed extensively with Windows and a number of other players because of the simplicity of using HTML pages but does not appear to be preferred by any library or archive. Proprietary software.
Copyright National Archives of Australia 2019