Steps to build a data catalog

If aggregating data is a first step, making it accessible to the right population of users is another, equally important. This is where the data catalog comes into play . Its creation today becomes a major challenge to achieve using data as a real asset of the company. But beware, it is also a collaborative process. Companies should not undertake such a project without the contribution of their business partners or business departments.

A Data Catalog is a reference application that enables business and IT users to explore data sources, understand their content through metadata , connect that data to the source, and access it independently – free -service. A data catalog therefore explores databases and BI systems . It also provides a single point of reference for enterprise metadata management that is faster and more efficient than legacy systems.

The main steps in creating a data catalog are:

Design a data model that will serve as the basis for the catalog architecture. An efficient data catalog must match the data usage of the trades, and not be a mere technical implementation. A template must (SAM – Subject Area Model) define each topic and associated concepts. It shows trades the location of their data without reference to applications, files or databases. The data catalog must be built on the basis of this SAM.

Rely on Data Stewards and IT Managers to discover and access the metadata of all databases and files. Data catalogs use metadata to identify tables, files, and databases. To do this, it searches the company’s databases and loads the metadata (and not the actual data) into its repository. Before creating, metadata sources must be identified and saved. This is a major step that requires a strong governance program. Data Stewards are important here for an overview of the data sources to use.

Build a metadata dictionary (not a business glossary). This dictionary contains the description and the mapping of all the tables or files and all their metadata. This dictionary becomes the base of the data catalog. Again, business data stewards are essential because they identify the metadata to use in the catalog – by source, concept, and domain.

Profiler data to provide statistics to users. These profiles are informative summaries that explain and help understand the metadata. For example, the profile of a database often includes the number of tables, files, and the number of rows.

Identify the relationships between the sources. This is to discover the associated data on several databases. For example, an analyst may need consolidated information about the customer. Thanks to the data catalog, it can be noted that five files on five different systems contain customer data.

Develop a data traceability. ETL tools (Extract, Transfer, Load) are used to extract metadata from source databases, transform and clean them, and load them into a target database. This can be useful for looking for any errors in an analysis.

Structuring the catalog for the human (according to SAM). Most files and databases are designed to be used by technology tools. Data catalogs must be designed for both those who consume the data and those who manufacture the technologies. Another key element: a catalog of data must remain searchable from a computer, a tablet and mobile applications.

Leave a Reply

Your email address will not be published. Required fields are marked *