Metadata for interoperability
Metadata is structured information that defines and describes data. It plays an important role in ensuring users and systems understand the meaning of exchanged information and data.
Metadata exists in many sources within and external to your agency. Eg: documents, spreadsheets, databases, data models, legacy systems, social media and big data platforms.
You participate in creating metadata by using your agency's applications and tools, eg:
- electronic document and records management system (EDRMS)
- content management system (CMS)
- customer-relations management system (CRM)
- extract, transform and load tools (ETL)
Metadata standards ensure metadata is consistent useful and understood over time. Establishing a minimum metadata set in your agency will also help you understand your user and business needs. This will help prioritise metadata for remediation and ensure that compliance with metadata standards is not an after-thought in developing new systems and processes.
Assessing metadata to see if it meets the standards for a specific process is a common task in building interoperability and may be included in your data quality assessment.
A metadata strategy assists in improving metadata governance across your agency. It documents current and future-state practices and how your agency manages its metadata.
Your strategy can also establish metadata standards that facilitate interoperability within your agency, between agencies and across jurisdictions. Eg, the ANZLIC Metadata Profile is used across jurisdictions to describe geographic information and services. This provides a consistent way to communicate information about the resources, objects and assets.
The metadata strategy links to your broader data and information governance environment, including your information governance framework.
Your agency should aim to manage and create metadata in a way that allows the metadata to be integrated. Ensuring metadata is relevant and kept up to date facilitates these processes and can be supported by your metadata strategy including the implementation of:
- roles and responsibilities for the creation and maintenance of metadata
- shared data dictionaries
- accurate tagging and identifiers
- change management processes
Metadata harvesting tools and protocols can assist agencies in indexing large batches of metadata records. Metadata harvesting uses automated tools to collect metadata descriptions from diverse sources such as catalogues, websites and other repositories. Open Archives Initiative protocol for Metadata Harvesting (OAI-PMH) is an example of a protocol or Application Programming Interface (API) for harvesting data. It supports aggregating data from multiple sources into one collection.
A metadata repository is a data store for metadata. It is the aggregation of a wide range of metadata from across your agency.
Bringing your individual repositories together to develop a central repository of metadata can be beneficial. It allows consumers to look across all of your agency's available information from one point.
TIP: technologies such as ETL have their own metadata repository. Don't forget to integrate these when creating a central repository.
Metadata statements are a detailed technical description of a dataset and can include information such as the:
- data custodian
- keywords (related subjects)
- unique identifier
- extent (the geographical area covered by the data)
- data quality
- limitations on how it is used.
Metadata statements can help users understand the dataset so they can make decisions to speed up and facilitate the use, publication and exchange of data.
Metadata for publication and exchange
Your agency can enhance metadata management to support publication and exchange of data by:
- implementing centralised metadata repositories
- updating metadata files to align with standards used at other agencies.
Many approaches to designing and implementing metadata exchange across your agency are available. Metadata architecture can be planned so that it facilitates this exchange. Consult with your internal specialists to determine which solutions work best for your agency.
Common architectural approaches include:
- Centralised metadata architecture
This involves copying metadata information from other applications and replicating it in a centralised repository. Users can perform global searches through a single application.
- metadata information is accessed from one point
- opportunity for improving metadata quality by aggregating and transforming metadata sources into one standard
- prompt query retrieval
- manual metadata entry is possible.
- complex maintenance and version control
- challenging tasks such as rapid replication or synchronisation of metadata
- custom code may be required to integrate metadata into the centralised repository's schema.
- Distributed metadata architecture
This consists of an application that retrieves data from source metadata in real-time, when a user requests information. There is no centralised repository. The intermediary application uses source catalogues to determine which repository to request information from.
- no maintenance or version control required as the metadata is from the source
- processing is reduced as there is no metadata replication and queries are distributed among different sources
- metadata sources may not adhere to the same standards. Custom code may be required to retrieve the different metadata structures
- capture of additional metadata from external repositories can be difficult
- Hybrid metadata architecture
This uses a combination of centralised and distributed metadata architecture. It provides both real-time access and allows manual entry of metadata.
- metadata information is accessed from one point in real-time
- metadata can be added to the repository
- manual metadata entry is possible
- by adding metadata you can implement version control
- metadata quality can be improved by users
- dependent on source metadata repositories being available.