Your agency will have business systems that create, keep and manage digital information and records, including data and datasets. Data and datasets retained in business systems, like other information created and received in connection with Australian Government business, are Commonwealth records and must be managed in accordance with the Archives Act 1983.
This advice provides guidance on how to manage data and datasets.
What is the difference between data and a dataset?
There are various useful definitions of data, datasets and databases. The following is adapted from the USGS definition:
- Data may be numeric, spatial, statistical, structured or unstructured information (unprocessed or processed) represented in a formalised manner, such as by text, numbers or multimedia. A logical, identifiable unit of data that forms the basic organisational component in a database is known as a data element.
- A dataset is a structured collection of data generally associated with a unique body of work, a particular subject, or created for a specific purpose.
- A database is an organised collection of data stored as a single or multiple datasets that are generally stored and accessed electronically from a computer system that allows the data to be easily accessed, manipulated, and updated. A business system may comprise of one or more databases.
Some examples of datasets include:
- research datasets, such as data collected in longitudinal studies or surveys
- case management system datasets
- scientific datasets, such as climate data collected by the Bureau of Meteorology
- e-commerce system datasets, supporting e-business and online transactions
- geospatial datasets
- content management system datasets, such as website content
- register datasets, such as the Pharmaceutical Benefits Scheme
Identifying data and datasets as digital records created by business systems
Business systems create and manage digital records as data and datasets. Identifying digital records created by business systems is the essential first step in determining and applying appropriate information management strategies for managing those records. Successful identification and management of these digital records will depend on factors such as the nature and purpose of the business system, the type of data created, the system’s information management functionality, and the ability to export representations of system content.
Part 3 of ISO 16175 provides detailed advice on identifying records within business systems and integrating information management functionality. Business systems that manage related data elements as discrete digital records (as per scenarios 1 and 2 below) should be compliant with the functional requirements set out in ISO 16175.
Australian Government agencies should evaluate the information management functionality of their business systems against the Archives' Business Systems Assessment Framework (BSAF). The results of this evaluation will help you to select the most appropriate option for managing the data and datasets as digital records.
The BSAF covers four scenarios that describe how data and datasets can be identified and managed within business systems:
- Business systems with built-in information management functionality - where the business system has sufficient built-in information management functionality, it is possible to identify and manage related data elements as discrete digital records within the system.
- Integration with other systems - where related data elements are identified as discrete digital records within a business system and then exported to one or more integrated external systems that are capable of providing adequate information management functionality for the records (usually a specialised records and information management system). Integration should enable data to flow seamlessly between systems.
- External (export) - where business systems are incapable of providing information management functionality through built-in or integrated functionality (i.e. scenarios 1 and 2), it may be necessary to periodically export data for management in a separate external system. This may involve exporting related data elements identified as individual digital records within the system (e.g. each case file in a case management system is exported as a separate digital record); or may involve exporting an entire dataset from the business system which can then be managed as a single digital record (e.g. exporting the dataset as a .csv or .xlsx file). Exporting entire datasets in this way may be more appropriate for smaller, less complex databases.
- External (governance) - where the system does not have the required built-in information management functionality and is incapable of integration or exporting data (i.ei scenarios 1, 2 or 3), it may be necessary to apply governance arrangements (eg policy controls) to the business system to manage the software and data in situ for as long as required. This approach is not preferred and usually only suitable for business systems that manage temporary value information.
Managing data and datasets – key points
Data and datasets need to be actively managed in order to maintain their authenticity, reliability, integrity and usability, to ensure that they can be found, accessed and understood. Managing data and datasets is necessary for ensuring business continuity, accountability and evidence of decision-making. This involves managing not only the data itself, but also the associated metadata and documentation required to understand and use the dataset. Knowing the value and retention requirements of the data and datasets created through your agency’s business will help you make decisions about managing them.
Good data governance practices ensure the responsibility for managing and maintaining datasets is clearly assigned and that responsibilities and operational practices are reflected in your agency’s relevant strategies and policies; such as, a data strategy, information management policy and information governance framework.
The following factors should be considered when managing agency data and datasets.
When designing or purchasing business systems that create data and datasets it is necessary to identify and implement appropriate descriptive standards that satisfy your agency business needs and adequately and accurately describe your dataset’s content and format. The Australian Government Recordkeeping Metadata Standard is such a standard and includes a minimum metadata set to support the management, interoperability, accessibility and transfer of datasets.
When storing data and datasets over time, ensure that the storage methods, devices and facilities used are appropriate to protect the integrity, accessibility and usability of the data and datasets and make certain that they remain accessible for as long as needed. If your agency is considering outsourcing this activity, such as through the use of data centres, digital repositories, or in the cloud, understand the potential risks and considerations associated with outsourcing storage of your agency’s datasets. Consideration should also be given to adopting appropriate business continuity and disaster planning strategies, such as maintaining multiple copies of datasets stored off-site and in separate systems.
Obsolescence of storage media, hardware and software all pose significant risks to the long term accessibility of data and datasets, particularly when the records are required to be retained beyond the life of the creating system. Your agency’s digital preservation strategy should establish a proactive program to identify risks and stipulate appropriate digital preservation techniques to be implemented to ensure that agency data and datasets remain accessible and usable over time.
- Data migration may be an appropriate digital preservation technique for your agency. If so, plan ahead for the migration of data and datasets to new systems before existing ones become obsolete, in order to reduce the risk of information loss. Agencies should assess and mitigate the risks of data loss and ensure that migrated data and datasets remain accessible for as long as required.
- Normalisation, which involves converting of data and datasets from their original format to a standardised, long-term preservation format, may also be an appropriate preservation measure for your agency’s data. If so, ensure that appropriate descriptive, technical, administrative and preservation metadata is maintained throughout the lifecycle of the record.
Ideally, digital preservation should be considered before new systems are acquired and implemented. ISO 16175 can assist with evaluating the records management capability of proposed customised or commercial off-the-shelf business system software.
Your agency may employ a combination of systems and processes to ensure that data is not tampered with or misplaced, or inappropriately accessed, altered or deleted. Access controls and authentication mechanisms may be applied to data and datasets to prevent access by unauthorised users, and may include the definition of user access groups and ad hoc lists of individual named users – where the systems that retain the data and datasets support such functionality. A risk assessment can inform business decisions of how rigorous the access controls need to be.
Some systems may support the application of security measures such as encryption and digital signatures to ensure the authenticity and reliability of data and datasets. These security measures may present risks to the ongoing usability of datasets, as decryption keys and public keys for digital signatures may expire before the dataset is eligible for disposal.
If you need to apply security measures to protect your data and datasets, carefully document the procedures and strategies used, including encryption and decryption processes that apply to your datasets. Take care to ensure that security mechanisms do not inadvertently make data and datasets inaccessible in the long term. This is particularly important for records of archival value.
Data and datasets should be managed to ensure that they are disposed of in accordance with their minimum retention period in an approved records authority. Disposal may involve arranging secure destruction of the data and datasets or, where applicable, transferring custody and/or ownership of the data and datasets to another entity through machinery of government changes, or to the National Archives.
Where datasets are to be transferred you should ensure that they meet the requirements of the receiving agency or repository, and that your business system is capable of exporting datasets in a format that supports long-term preservation activities. Datasets transferred to the National Archives should be in an acceptable long-term file format. Specific advice should be sought via the Agency Service Centre in cases where datasets are required to be exported and transferred to the National Archives as periodic snapshots of business system content.
Determining how long to keep data and datasets
Under Section 24 of the Archives Act 1983 disposal of Commonwealth records, including data and datasets, can take place if it is:
- approved by the National Archives (e.g. through an approved records authority);
- required by another law; or
- a normal administrative practice that is not disapproved of by the Archives.
The retention period for agency data and datasets must be set out in the agency’s records authority or an approved general records authority. Your agency must retain its data and datasets in accessible formats until the minimum retention period set out in the relevant records authority has been reached.
In determining the retention and disposal requirements for data and datasets to be included in your agency's records authority, you need to assess the business needs, accountability requirements and community expectations that are likely to apply. Other factors to consider include:
- how frequently to create, capture and manage data and datasets
- retaining supporting or associated information that is required to provide meaning and context to the datasets (eg data dictionaries and manuals)
- applying appropriate digital preservation strategies to ensure datasets remain accessible and useable for as long as they are required to be retained.
Certain categories of data and datasets have enduring value beyond their original business purpose, so it is important to know their value and manage them accordingly. Datasets of enduring value (such as those identified as ‘Retain as national archives’) will be identified in a National Archives’ approved records authority.
For assistance in determining appropriate retention periods for data and datasets and supporting records, contact the Agency Service Centre.
Sentencing data and datasets against a records authority
Sentencing is when you match data and datasets to a specific disposal class in your agency's records authority or an approved general records authority. The identified disposal class establishes when and how your agency's data and datasets can be disposed of. Generally, the disposal class will provide a minimum retention period for the data and datasets, or in some cases may identify them for transfer to the National Archives.
The appropriate disposal class may not always refer specifically to data and datasets and will often be determined by the manner in which data and datasets are identified and managed as records within the business system.
Where related data elements are identified and managed as discrete digital records in a business system, the appropriate disposal class will generally refer to the records according to their content and purpose – rather than mention data specifically. For example, related data elements comprising a case record in a case management system may be referred to in the appropriate records authority as case records and their retention determined based on the outcome of the case.
In contrast, where business systems manage datasets as digital records in their own right, the records authority may refer to the content of the dataset or may specifically refer to ‘datasets’. For example, the content of business systems supporting major research activities may be referred to in the appropriate records authority as ‘research datasets and associated information’.
Datasets created in the course of an agency conducting its core business may often have reuse value, not only within the originating agency but as a resource for use by other agencies or within the broader community.
A key benefit of well managed data is the increased potential for interoperability and data sharing that results in social and economic benefits.
You can use the Data Interoperability Maturity Model to build your sharing capability and help you meet your retention and disposal requirements.
The best practice guide to applying data sharing principles is available to assist agencies holding data (data custodians) to safely and effectively share Australian Government data and datasets. Formal data sharing arrangements that detail the conditions under which data is shared and used need to cover responsibilities and obligations for the authorised disposal of shared data.
Datasets submitted to data.gov.au
In accordance with the Australian Government’s Declaration of Open Government, agencies are encouraged to publish a variety of public datasets via the data.gov.au website. The site encourages public access to and reuse of public data.
Agencies must only submit copies of datasets to data.gov.au for publication and remain responsible for the appropriate management and disposal of the original datasets, which must be preserved and ultimately disposed of in accordance with an approved records authority.
- The Business Systems Assessment Framework provides a streamlined, risk-based approach to the assessment of information management functionality in business systems.
- Part 3 of ISO 16175:2011 Principles and Functional Requirements for Records in Electronic Office Environments provides internationally agreed principles and functional requirements for software used to create and manage digital information in office environments.
- The Office of the National Data Commissioner has released The Foundational Four: Starting an ongoing data improvement journey, which provides a basis for building data culture and capability within an agency.