Data migration is the process of transferring data from one application or format to another. It is often required with the implementation of a new application, which may require data to be moved from an incompatible proprietary data format to a format that is futureproof and can be integrated with new applications.
Data migration considerations and activities
Understand your data and its quality
A data migration can identify if data from an old application is not up to the quality that is suitable for a new application. Before you start a migration activity, you should refer to your data catalogue or information asset register. A recent data holdings audit or information review may also be required to understand the data to be migrated. This can be assisted by automated indexing and discovery tools.
TIP: it is critical to understand where the master data or authoritative source is held prior to the data migration to ensure the latest version of the data is being migrated.
Following a data holdings audit a data quality assessment can be undertaken to understand the quality of data and determine the rules and actions required.
Data profiling can help you determine the quality of data, including: relevance, format, consistency, validity, complexity, completeness, accuracy, accessibility, compliance and structure of the data. Automated data profiling tools can be used to streamline this process, especially when there is a large quantity of data to profile.
Identify stakeholders
Prior to starting the transfer of data it is important to ensure the necessary planning, profiling, and migration plans have been created and approved by the business. Data migration can only be a success if business is engaged throughout the migration, and requires business stakeholders ranging from the senior leadership team to data analysts.
You can start by identifying team members that need to be involved in the migration, testing, auditing, review and sign-off stages.
TIP: ensure that end users are also involved in business rule validation and testing throughout the migration process.
Data extraction
This stage involves copying or moving the data from legacy stores to a secure location to have the data prepared for migration. During this process data aggregation may be required to bring together several datasets to make the data more meaningful and fit for purpose for the new application. This may involve the creation of a new combined master dataset which maps out the data linkages between the different datasets to be aggregated.
TIP: ensure the necessary backups have been put in place in the event of data corruption.
Data remediation – improving the quality of data
Data migration processes can provide an opportunity to improve the quality of the data. Once the quality of the data has been measured, certain actions can be applied to remediate data. Data remediation can occur before, during or after the transfer of data to the new application. ETL tools are often used to categorise and improve data quality.
An example of data quality actions before, during or after a data transfer:
Data quality action |
Description |
---|---|
No Action |
Data issues are small and not meaningful and will not cause a problem post migration. |
Remediate during the Extract Transform Load (ETL) processes |
Data issues should be remediated during the ETL transformation of data from the legacy to the new application. |
Remediate in source databases |
Data issues should be remediated in the source database. |
Remediate in target application |
Data issues should be remediated once the data has been loaded into the application. This may cause some complications as the data may not pass target validation or have errors using the new business rules. |
Data migration techniques
A wide range of techniques can be used to perform a data migration. The level of automation that is possible during a data migration will depend on the maturity of your data. For example it may be necessary to first scan physical documents for capture before Optical Character Recognition (OCR) software can be used to convert scanned copies to digital structured data.
Once data is in a digital format potential data migration technologies could involve:
- Optical Character Recognition (OCR) software for character recognition and digitisation
- Extract Transform Load (ETL) software for data transformation such as format conversions
- OGR2OGR software for spatial data migrations
- Machine Learning (ML) software for automated mapping of data structures for migration
Data can be transferred in one go, or in stages to ensure quality is maintained and can be completed in smaller agile sprints.
Data validation, auditing and verification
Testing and quality check points should be occurring throughout the data migration with all results recorded for auditing. Unit, application, system and volume tests should be undertaken as early as possible in the migration to ensure there is time to update any code or business rules.
TIP: all new processes and data transformations should be documented to ensure there is traceability and auditability of the data. The migrated data needs to demonstrate the data is still authentic so it can be relied on as evidence.
Automated validation should also be considered to check the volume and quality of the data being migrated to ensure no data is lost and the data is fit for purpose. Ensure any subsequent metadata, lineage and data quality statements are updated with the data migration.
TIP: consider running legacy and new systems concurrently until you are confident with the new process and data. When stakeholders are confident with the new application the legacy system can be decommissioned.