A data glossary will help you to speak the language of data.
Content updated in June 2022.
To help you out, we put together a data glossary related to the many hot topics in the data domain. Here you will also find links to other content produced by our experts to dive deeper into the fascinating world of data.
Application programming interface (API) allows parties to exchange data or initiate transaction in a system/service. API is seen also as a contract between API provider and API consumers, where API consumers receive the service (data or transaction) as promised and documented. APIs enable faster development of new business services, improve efficiency, decrease costs related to integration landscape, and enable more agility and flexibility in the IT landscape.
API Management platforms ensure centralized full lifecycle control and security of APIs.
Why APIs are important to your business?
Citizen data science means that a knowledge worker without formal expertise in statistics and analytics uses data and analytics independently to solve business problems. Citizen data science is enabled by modern analytics tools that can automate data preparation and modeling tasks and minimize coding in data science work.
Self-service analytics is an approach to business analytics / business intelligence that enables business users to access data and build reports and analysis without heavy involvement from central analytics or IT team.
By training users to understand data, establishing security controls and auditing measures to ensure the appropriate access to data, and providing curated datasets and intuitive and easy-to-use tools organizations can democratize the use of data across different end-user roles and skill levels.
Corporate performance management (CPM), also called enterprise performance management (EPM) and business performance management, helps monitor and manage the business performance of an organization. CPM/EPM solutions are designed to streamline financial processes and provide integrated approach to activities such as budgeting, planning and forecasting; consolidation and financial close; and performance analysis.
A modern, cloud-based data and analytics platform combines the traditionaltraditional reporting with modern analytics and data scientists' services. It provides a platform, for example, for data-based applications that use artificial intelligence.
The platform can use all forms of data from within the organization, from partners, and from external parties. The data can be processed almost in real time into various data products.
The platform consists of public cloud services, typically
native components from cloud providers and third-party tools available via cloud marketplaces. The services' license models and technical features are flexible, so the service's cost is calculated according to the usage. Moreover, the services can be sliced for different user groups, which can be very useful: For example, running a heavy analytical process does not interfere with normal operational data processes. Thus, an annual process run doesn't require reserving resources for a whole year.
Data architecture is a part of the overall enterprise architecture and can refer to several perspectives. It often relates to the artifacts of data architecture on multiple abstraction levels, such as data models, definitions, and descriptions of information flows and metadata.
With the artifacts, a system project's data processing can be designed and implemented to support data reuse, quality, data security, and privacy, as well as to meet business requirements across functional silos.
Entity that is comprised of data that companies use to generate value, such as revenues. Can be e.g., system, application output file, database, document, or web page.
A data catalog is an organized inventory of available data assets, combined with data management and search tools. Data catalogs use metadata to describe data assets and are designed to help data users to find the data they need for analytics or other business purpose.
A data ecosystem is an open or closed network with an interest in exchanging data between the actors of the network, following common rules: interfaces and data models. The members of an ecosystem have one thing in common: they all benefit from the data so much that it's worth sharing their data with the network. The exchange can also rely on monetary compensation.
A data ecosystem shares a vision of enabling more diverse data and solutions than a single actor could achieve alone. An ecosystem can have an owner, in which case it is about the dominance and benefit of one actor. Alternatively, ecosystem ownership can be decentralized to the members, making all ecosystem actors equal.
In some cases, a data ecosystem has a separate operator taking care of correspondence between actors and data transferring, without utilizing the data in its own operations.
Being involved in a suitable data ecosystem or owning one can, at best, be a significant competitive advantage.
Basically, data governance is about data ownership. The owner of a company's business units, equipment, and properties manages the usage of the company's assets and strives to maximize its business benefits. This should also be the case with company-owned data sets.
The owner of a data set is responsible for ensuring the data is of good quality and making sure the user rights comply with the set rules. Thus, corporate data governance should define the policies and tools for data owners and other users of data.
Data governance includes the idea of enabling access and visibility into the data for as many employees as possible—across organizational units. Data access should only be restricted for good reasons, such as privacy.
To comprehensively use the data and develop a data-driven business, the organization needs to have an existing and implemented data governance model. If the model does not exist yet, it's good to start from a data set that has the most business value and is prioritized by the organization's top management.
Often, the quickest results happen when the starting point is an analytics development project that generates a significant business advantage.
A data lifecycle refers to the different stages of data elements and data resources from the creation of information to its destruction. The stages can include storing, warehousing, transferring, using, and archiving the information. Like every other product’s lifecycle, data lifecycle needs to be managed. Successful organizations govern each stage of the data lifecycle by policies and practices to maximize data’s value.
Due to data security and privacy requirements, it is important to set business requirements for the end of the data lifecycle as well. Those requirements can include rules, such as how long the information can/should be stored and why.
Metadata management systems visualize the data transfers between various systems and describe how the data transforms from the source to its users. Data lineage refers to the visualization of the data lifecycle.
Data literacy is a term used to describe an individual’s ability to read, understand, and utilize data in different ways. As the ability to work with data and use it for decision-making has been recognized as an essential skill for executives, managers, and employees alike, data literacy training programs are often included in organization-wide data and analytics initiatives.
A data pipeline is a controlled function for data processing and data product creation that brings business value. A data product can be, for example, a report or a prediction produced by a machine learning algorithm that’s used via an interface.
The data pipeline includes and combines several components. The components cover data source reading, editing, analyzing, storing the data in different data models, and activating the data through the processed data product. The components are based on a micro-service model, which means that individual components may have different developers and life cycles.
A business transformation aims for fundamental changes in a business or its processes. A data-driven business transformation uses data and analytics to enable those changes.
Now organizations either use little or no data in addition to traditional financial reporting or use it only in certain operations. The new data-driven way of thinking harnesses data to improve business, management, and service production processes across the organization.
A data-driven business transformation means not only deploying the technology but also developing data availability, data quality, procedures, and a data-driven culture.
A data-driven approach means that an organization makes decisions based on information. Being able to make data-driven decisions requires reliable and accessible data. Having the technology and systems is not enough—success takes people, and cultural changes. The data-driven approach creates new opportunities: if used correctly, your data will not only streamline your operations, but also improves results, gives you a competitive advantage, and creates new business opportunities.
DataOps (data operations) refers to an operating model that uses various personnel roles and technologies to manage data pipelines automatically, and to support data-driven business development.
Companies understand the value of data better than before, but commercializing business data for judicious use requires collaboration between business processes and organizations. As it's important to be able to quickly produce value adding entities (data products) from business data, this collaboration requires a new kind of approach. The goal of DataOps is to meet that need.
DataOps is actually an interdisciplinary team formed around a business problem that makes use of agile way of working, DevOps practices and automation to manage entire data supply chain from source to value. The team organizes the data, tools, code, and development environments while taking care of scalability, functionality, and changes in data pipelines. Following the principles of continuous delivery, the team strives to swiftly generate valuable information from source data to support business.
A data strategy is a shared view on how data and analytics is used to achieve strategic business objectives. A good data strategy includes a strong vision and business reason, short-term and long-term objectives, defined roles and responsibilities, and metrics for success. Essentially, it aligns and prioritizes data and analytics activities with key organizational priorities and goals and works as a tool in a communication between business, IT, and data organization.
The growth in data volumes, changes in source systems and demands to make data available for decision making in near real-time has created pressure for traditional data warehousing design methods such as Kimball and Inmon. Data Vault method is designed to improve agility and scalability in data modelling, especially in large scale enterprise data warehouses.
The Data Vault paradigm embraces the idea that data models change and expand over time. The Data Vault model enables incremental changes and frequent updates to the data model. However, the modeling work needs to be done meticulously and correctly, making the process prone to human errors. Thus, a data warehouse automation tool is recommended to leverage the pros of a Data Vault.
Data warehouse automation (DWA) eliminates repetitive design, development, deployment, and operational tasks within the data warehouse lifecycle and thus accelerates the availability of analytics-ready data. Data warehouse automation solutions are integrated platforms that include design tools, development wizards, templates and visual interfaces. Following efficient design patterns instead of custom development improves speed and quality in data warehouse development and maintenance.
DWA solutions enable to leverage the benefits of modern data modelling paradigms such as Data Vault and are used as a tool to create a shared understanding on the data model between data engineers and business users of data. Data warehouse automation is considered as a crucial ingredient in DataOps way of working.
Edge analytics is a model of data analysis that brings data analysis and processing to the location where the data is collected. Instead of sending data back to centralized data store, incoming data streams are automatically analyzed at a network edge, for example self-driving car, mobile phone or another connected device. Key benefit of edge analytics is speed. Edge eliminates latency and enables producing real-time analytics.
Enterprise content management is an umbrella term for methods, tools and strategies that allows organization to capture, manage and deliver critical information of any format to its employees, stakeholders, and customers.
Today, out of organizations’ information over 70% is unstructured and the definition of content range from paper documents such as invoices and contracts to emails, images, and video files. Content management solutions are set to automate and digitalize content-intensive processes, deliver relevant content to users when they need it, and ensure organizations to be legally compliant in managing their information.
Master data represents the key data entities of a company, such as customers, suppliers, and products, and the relationships between these data domains. It is the data that is commonly used across business processes, organizational units, and between operational systems and reporting & analytics systems, and therefore should be managed in one place.
Master data management refers to discipline and technologies used to coordinate master data across the enterprise. The need for master data management emerges from necessity for organizations to improve the consistency and quality of their core data assets.
Machine learning and statistical methods allow us to model future events based on previous data. Such modeling is called predictive analytics. Typical applications for predictive analytics can be, for example, customer attrition expectations, financial data predictions, and predicting machinery maintenance needs.
Modeling that retrieves new information from a previous event is called predictive analytics. Sentiment analysis is a good example: that's when the feedback or the tone of a customer comment is assessed automatically. This enables an immediate reaction to negative feedback.
Predictive analytics is usually distinguished from descriptive analytics. Instead of reporting the situation with available information, predictive analytics acquire new information.
A data warehouse supports the organization's traditional core functions and obtains answers to defined questions from known source data. A data lake supports a more predictive and experimental approach.
A data warehouse is mainly for structural information processing. A data lake enables the processing of all kinds of data in the organization. As the data warehouse and data lake are used for different purposes, they complement each other.
A data lake is often used together with a data warehouse to store all the raw data, and only an applicable part of it is transmitted to the data warehouse. Recently, we've seen new solutions on the market that combine a data lake and a data warehouse. Such a hybrid solution doesn't have a well-established term yet.
Artificial intelligence (AI) is an umbrella term for solutions that are regarded as intelligent. Search engines, smart speakers, and self-driving cars are examples of artificial intelligence. It’s often associated with system autonomy and independence from human decision-making. Analytics, on the other hand, refers to data-based reporting and visualization produced for human decision-making.
The information obtained with artificial intelligence—which often means machine learning—can be utilized with analytics. Simultaneously, the available data analytics is often used to develop artificial intelligence. For example, we can find out what people want from a smart speaker and how the product meets customers’ needs. Artificial intelligence-based decision-making systems require accurate analysis of financial figures.
Compared to analytics, artificial intelligence takes many steps further towards independent data use.
Subscribe to our Tietoa Tulevasta podcast on Spotify and follow us on Instagram to stay tuned. Our podcast brings data glossary terms to life with practical examples from everyday business life. Once your business needs new technology to reach your goals, Tietoevry’s in-depth expertise is at your service.
Data changes the world – does your company take full advantage of its benefits? Join Data Insiders, the #1 Nordic data community, a powerful network of top professionals and visionaries of data-driven business.
Data Insiders addresses the trends and phenomena around this hot topic in an understandable and interesting way. Together we share knowledge, offer collegial support and reveal the truth behind hype and buzzwords. We seek the answer to one particular question: how can data help us all to do better business?
Join Data Insiders today and stay at the forefront of the data revolution with access to quality podcasts, peer events and insights.
Janne is a data generalist who uncovers something new about data and analytics every day. To him, data is the most thrilling domain to work in because the pace of change is fast, the opportunities are endless, and the community is always eager to collaborate and share knowledge. His mission is to uncover the truth behind buzzwords, so that the real success stories are heard.