Ensuring data quality should be seen as an ongoing improvement program that is managed throughout data's lifecycle.
Data Management Association (DAMA) defines common characteristics (dimensions) of data quality as:
Data quality management is a continuous process which involves managing data from its initial creation to its potential destruction. The quality of your agency's data should always be fit for purpose. You can support this by establishing a data quality strategy that facilitates proactive monitoring and managing of data quality. Eg, data quality assessments are embedded in data migration activities.
A data quality strategy should link to your broader data and information governance environment, including your information governance framework.
Data quality assessment
A good data quality strategy defines appropriate standards, requirements and specifications for data quality controls. This includes developing data dimensions relevant to your business needs to monitor, measure, and report on quality levels of your data.
Data quality assessment tells you how effective data is in meeting your stakeholders' requirements and also helps you prioritise remediation on high value datasets.
Data quality is assessed by measuring specific dimensions of your data.
They provide a:
- vocabulary for defining data requirements
- way to determine data quality assessment results
- metric for ongoing measurement and improvement. (DAMA, 2017)
There are different dimensions that can be used to assess data quality, eg:
- common dimensions of data quality from DAMA's Body of Knowledge
- the Australian Bureau of Statistics (ABS) provides guidance on assessing against ABS dimensions, to determine the quality of statistical data
- ISO8000 – a global standard for data quality and enterprise master data. You can use this to inform your agency’s data quality standards.
Data quality tools
Tools can be used as a guide to understand the different dimensions of data quality and generate data quality statements. An example is the NSW Government, data quality reporting tool that can be used to generate data quality statements in various document formats.
Tools that automate data profiling and cleansing are also available and can help your agency enhance large amounts of data.
These tools can:
- profile, clean and monitor data quality over time
- assist in the validation of data
- provide statistics on agencies data
- help to identify patterns and provide direction on future data remediation.
Poor data quality
Common culprits for poor data quality
Incorrect data entry validation
Invalid data is entered into the database
Change in business rules
New rules are not correctly propagated throughout existing data
Changes to the source data structure
Third-parties implement changes without notifying downstream users; business rules are not updated on systems following notification of changes.
Requirement for uniqueness of instances
Incorrect identifiers being created
Incorrect business rules being applied to data
Loss of data
Incorrect temporal information
Difficulty to identify latest version of information and data, resulting in duplication
Data quality and metadata
Good metadata is essential in understanding and assessing the quality of your data. Data quality assessments determine if your data meets the expectations of its consumers and metadata plays a key role in clarifying those expectations. Eg, you can look at a record’s metadata to see if it meets format requirements or if it has been updated according to business rules.
Metadata can also be used to record data quality assessments; this means metadata repositories can be used for storing and sharing data quality assessment results across your organisation.
Your metadata and data quality teams can work closely together to develop these processes. Their combined expertise can ensure that business rules, measurements or issues related to data quality are documented, developed and managed as per your agency's data strategy.