A lot of time is wasted in organizations due to the lack of metadata management. Think about going to the library to borrow the latest novel you are eager to read. But imagine all the books are in random order – without information about genres or authors. This means the entire library has no metadata information. Good luck finding what you are looking for – you won’t. This example applies to an organization’s data also: without metadata – data that describes data – you do not understand what data you have and where to find it. As data volumes and complexity grow exponentially, the criticality of easily available metadata also increases.
Active metadata management increases the value organizations can extract from their data. It reduces time spent searching for data, improves data quality, speeds up development and improves communication between business and IT and enables collaboration between all data users – not to mention improving operations and increasing cost-efficiency. It also supports achieving legal and regulatory compliance by design (e.g. GDPR or BCBS239 for banking).
It is essential to document any critical data. Data and its documentation are intended to be shared and reused; therefore, it is important to foster common ways of understanding, finding, using and treating data across the organization, instead of creating silos.
Let’s start with how to understand what data means from a business perspective. The key here is setting up business definitions for data and documenting them in a business glossary, serving as an organization’s common language. Documentation work can be top-down or bottom-up – or even tackled with a hybrid approach. The best approach depends on the organization and what makes sense in their unique situation.
Managing the information that makes data meaningful is at the core of data governance, which aims for data to be seen and managed as a strategic asset, as my colleague Ritva Aula detailed in her recent blog post. As a result, the metadata – the common business knowledge such as definitions, classifications, rules and roles – is transparent to everyone, which helps to streamline reporting, analytics, data protection and all other data initiatives.
A business glossary is a way to understand the business meaning of data, but it alone does not help in finding the data for consumption. The technical documentation details must also be captured and linked to the business glossary to explain what a specific piece of data means and where it is physically located.
Documenting each piece of data from thousands of databases, let alone maintaining them all, is impossible to accomplish manually. Data cataloguing tools provide capabilities that scan through data content to discover and collect metadata automatically and enable active metadata management. Sample data and machine learning are utilized to match the data with the right business meaning. Data flows, transformations and authoritative sources are part of metadata documentation, ensuring that the right data is used and re-used, from a single source – rather than creating overlapping datasets and data silos in which the same data is stored numerous times across different locations for different units, teams and so on.
Information about horizontal lineages, i.e. the data flows from source to target, support impact analysis for regression testing and vice versa, root cause analysis to address issues. Documentation of vertical lineages, i.e. the linkages from business meaning to user interface language to database names, enables understanding the data and the requirements related to it – whether your approach is top down to create new data or bottom up to understand existing data.
Even the best data cataloguing tool cannot replace people. However, a catalogue can be used as a shared workspace for everyone managing or using data, which prevents silos and working in isolation. With a collaboration platform, the automatically collected metadata can be enhanced by information that resides only in people’s minds via experience, ensuring that the information is not lost, even when people leave the organization. Sharing data and metadata helps to answer data users’ questions and to ensure not only the efficient use but also re-use of data, eliminating the need for different data stakeholders to repeat the same searches, analyses, or reporting, unaware of each other’s work.
The maturity of metadata management is still low in the Nordics. Apart from banks and a few other exceptions, businesses do not fully understand the potential impacts of metadata on cost-effectiveness, speed and agility in their operations. At the same time, metadata management and data cataloguing tools are rapidly developing to make the job easier – presenting opportunities for companies to step up their game to create competitive advantages with data.
Whatever the trigger for your organization is – regulatory requirements, data quality issues, analytics (AI/ML) – use it as an accelerator to get started, prioritize and aim for a robust data catalogue that allows you to realize the benefits and value of all of it – based on enterprise metadata management.
Hanna works as a senior business consultant in the data advisory team, focusing on metadata management and data cataloguing. She has experience in different data management roles, such as a metadata management product owner in banking and data process designer in finance data warehousing. In her role she supports customers in data-driven projects to ensure that data is defined and documented to gain the most value out of it.