Get to Know Data Catalog

Get to Know Data Catalog

When we create a database, we are thinking of data modeling. Tables, columns, primary keys, foreign keys, and how to join them. When we know how to join them, we start considering how to increase the performance. Index here and there and maximize the queries using the execution plan.  It’s all good, usual stuff to do and of course necessary. Until someone new comes  in and asks, “can I see your data catalog?”

As the database administrator in your organization, you don’t want to look dumb and perhaps tell her that you would send her the information later, to buy some time. But mostly you would think to yourself, what is a data catalog, why do we need it, and how to do it?

What is a data catalog?

Until a few years ago, I hadn’t heard of it myself. As someone who was responsible for anything with data in the company, I could only guess what it was. And my guess was right. As the name suggests, a data catalog is a complete information about data. Just like a product catalog, where you can find all information of a certain product (size, what it does, description, delivery time, etc), a data catalog provides a detailed information as follows, but not limited to (it also depends on your need):

  • the data (each column) definition in business language, not technical
  • datatypes
  • the sources, where the data come from
  • users, who use the data
  • reports, where the data are available or being consumed
  • if it is a person-related data, particularly  important in Europe (GDPR)

Basically, data catalog contains the metadata (data about the data). The more detailed the information is, the better quality is the data catalog. Of course, the data catalog must be as dynamic as the business. It has to be updated from time to time, accordingly.

Why do we need it?

It sounds like an extra work for the database administrator or developer or anyone who is assigned to do the task. Well, it is, unfortunately. But it is important, as it serves just like a dictionary. You don’t need it everyday, but when you need it, you will be thankful that it is there. Data catalog is where you go, when you need to know the definition of each data coming into your database system. A database column named Start Date could mean anything without understanding the context.

When you have such definition (as detailed as possible and in business language), you will less likely have misunderstanding with colleagues in interpreting the data. It’s all there, in written. There is no around it.

Another reason is for compliance and audit. Just in case your company got audited, you need to able to show where you get the data from, what you do with the data (particularly the person-related data), and who has access to these data.

How do we do it?

There is no standard format for how a data catalog should look like. The easiest way is to use a spreadsheet or a document. Everyone can do it, and a mixed team of business and IT would be ideal. As this is a continuous task, it is a good idea to start small, for example start with the most important tables and the really necessary information. Regularly, such as once a week, the catalog must be updated and reviewed, and it must be accessible for everyone (or at least for everyone that uses its data). Once the catalog is “mature”, the update frequency could be set up less.

There is a fancy and automated tool to create a data catalog, such as this one from Red Gate (I’ve got no affiliate to them at the time of writing). It costs something, but if it’s worth the time, I don’t see why not.

***

Having a data catalog is not a must, but it sure could make life easier, especially the life of the database and report developer, report users, compliance officer, and a new colleague, to say the least. Consider it, if you don’t have one yet.

For a more thorough article regarding data catalog, Red Gate published this wonderful article.


Photo by Aleksi Tappura on Unsplash

Leave a Reply