The primary reason for data governance is the respect and compliance of the data policy in a company. These policies can cover many objectives, and rely on guidelines on data protection and validation. Data management and governance managers must solicit business users. The goal is to clearly articulate data quality requirements, specify metrics, and measure compliance with data policies.
However, the challenge is to bridge the gap between the very definition of these data governance policies and their implementation. Policies aim to ensure data quality control across all workflows. However, managers who are often assigned responsibility for data quality management remain untrained or without appropriate tools.
This is where the tools of data traceability (data lineage) come into play. This function documents the data flow in the enterprise and helps to simplify two key procedures of data governance: root cause analysis and impact analysis.
Traceability and data governance
If there is no way to identify where the errors are, data managers (the so-called data stewards) will have trouble identifying and correcting data quality issues. When these mistakes continue to spread, the company may be confronted with inconsistent reports and analysis – and therefore bad decisions.
Data lineage tools can simplify the process of root cause analysis by mapping the different processes by which data is passed. Data quality can be examined at each point in the processing flow, which allows the IT to find the source of the errors.
Going back to this first error, the data steward can insert controls at each step to check whether the data was as expected or if the error was already present. The step that indicates that the data was compliant with the input, but defective at the output, is where the error was introduced. So the data administrator can focus on eliminating the root cause instead of just trying to fix the wrong data.
Tracing the data history can also help the data steward to identify unexpected changes in data format and structure – the current environments are indeed much more dynamic than in the past. When data sources change, there may be unintended consequences.
From its origin, the data manager can also trace the dependencies and determine the processing steps impacted by the change.
What to look for in data traceability tools
The manual collection of metadata and the documentation of data lineage require a large investment in resources. However, they remain prone to error, especially in companies that rely on reports and analysis for decision making.
It is then necessary to look for products which make it possible to:
Native access to a wide range of data sources
Group metadata into a centralized repository
Provide simplified metadata presentations to different users and encourage collaboration to help validate metadata,
Document how data flows in the process streams,
Provide a visual presentation of data traceability,
Provide APIs for developers to query traceability information,
Create an inverted index to match data elements to their uses,
Provide search modules to quickly trace the data flow from its point of origin to all its downstream targets.
Browse the data feeds.