Data integration

Data integration is the process of combining data from different sources into usable and trusted information.

 

To engage technical experts in interoperability projects you need to understand technical aspects and language used in data integration, including:

  • the various approaches to integrating data
  • how web data services and APIs can help exploit data sharing
  • various architecture models influence what tools and techniques you choose
  • how Extract, Transform, Load (ETL) tools may help you.

There are several different ways of integrating data – whether it is within your agency, between agencies or with third parties.

Data integration approaches include:

Data integration approach

Description

Data Consolidation

Involves physically bringing data together from various sources to create one single data store. ETL is commonly used to consolidate data.

Data Propagation

Involves copying of data from one location to another and can often be a two-way data exchange between the source and target data store.

Methods include:

Large scale database data integrations often use the Enterprise Data Replication (EDR) method to copy and collect data from different source database systems and move that data to a broad range of target locations.

Data Virtualisation

Involves providing a single view of data by retrieving and interpreting data from different sources. This means that data can be viewed from one location but may be stored in other locations.

Data Federation

Involves using virtual databases to provide a single virtual view of data from various sources.

Data federation provides users with a view to a collection of data sources regardless of their structure.

Enterprise Information Integration (EII) is a technology that supports data federation.

Web data services

Web data services or web services helps improve interoperability by enabling machine-to-machine interaction, allowing data to be shared and re-used over a network (web) internal or external to your agency. Interoperability is achieved through XML-based standards that provide a common way to define, publish, and use web services.

Your agency may use web data services with other agencies or third parties eg, to make performance reporting processes more efficient. Their data can be transmitted to your agency using system to system transfers that comply with a web service exchange specification.

TIPS: when you design or implement new data services, you should always check there are no existing services that can be used or reused to meet the service's requirements.

Data integration should be based on standards so that services can easily be interchanged by your agency without the end user having to make any amendments on their systems.

Application Programming Interfaces (APIs)

An API is a set of rules and specifications for a software program to follow to facilitate communication across applications. Similar to web services, an API's end goal is to facilitate the integration and sharing of information and data. APIs are often referred to as the broker between two applications that controls what data can be shared through requests by other parties. Representational State Transfer (RESTful) or Simple Object Access Protocol (SOAP) are common APIs.

RESTful APIs are increasingly used in government settings and are also an efficient way for you to package your data for the public and third parties to unpack and consume.

APIs can be used by your agency to:

  • streamline business processes across business lines, agencies and third parties eg, the Australian Tax Office uses APIs with third parties to streamline the collection of data for tax related activities
  • share your information and services to the public and third parties eg, APIs allow the Australian Bureau of Statistics to transfer statistical data from the ABS to user’s machines or applications
  • help collaborate with other agencies to build solutions on top of your current systems.

API management

API management is responsible for handling user access such as API access keys or tokens. Security provisions can be built into an API management to control data access.

TIP: when designing APIs, consider their scalability, availability and reuse potential within and external to your agency. It is also important to ensure they are well defined with information and guidance made available on the options and parameters to use the APIs.

Useful guidance on developing APIs includes:

Architecture models

Architecture models are the building blocks and influence the way you design your web data services and APIs. Popular architecture models include:

Service oriented architecture (SOA) is a software architecture where distinct components of the application provide services to other components via a communications protocol over a network. SOA have historically been used in larger, complex enterprise environments that require many different types of integrations.

Microservices architecture is a software architecture that is made up of small, independent processes which communicate with each other using APIs. Each service is created to serve only one specific business function and is completely independent of other services. It is commonly used in smaller application environments that integrate with other APIs such as developing a mobile or web application.

Extract, Transform, Load (ETL) technologies

ETL is traditionally used for moving data within and across applications and agencies. ETL tools can be used in the creation of web data services to create workflows to transform and deliver the data in the required format, taxonomy and structure.

ETL technologies can be used to:

  • collect data from various sources with potentially different taxonomies, formats and structures
  • check for new updates to data and process it to an agency's master database – while performing automated validation checks on the data to ensure it has been received in the correct format and quality
  • transform the data according to business rules eg, into structured and validated datasets
  • load the data into the target destination database or application.

TIP: ETL may be a slower integration method – consider other methods that suit your integration project.

Copyright National Archives of Australia 2019