A glossary will help you get up to speed on the world of data.
Content updated in November 2021.
To help you out, we put together a data glossary related to the many hot topics in the data domain. You’ll find links to other content produced by our experts to dive deeper into the fascinating world of data.
API Monetization is the process by which businesses create revenue from their existing data and APIs. Monetization allows API providers to move beyond their current business models, scale API programs, and make more possibilities of value creation for their customers, developers, and partners.
API is an application programming interface that allows parties to exchange data or initiate transaction in a system/service. API is seen also as a contract between API provider and API consumers, where API consumers receive the service (data or transaction) as promised and documented. APIs enable for business faster development of new business services, improve efficiency and decrease costs related to integration landscape. APIs also enable more agility and flexibility in the IT landscape.
There’s a three types of APIs: Private/Internal APIs, Customer/Partner APIs, Public/Open APIs. Companies internal APIs are used e.g for internal application integrations, integrate microservices and for cloud integrations. Customer APIs give possibilities for new real-time services for the customers, and Partner APIs more agile information sharing and new business models with the partners. Open APIs bring totally new business possibilities for the companies, like different web and mobile phone applications.
API Management platforms ensure centralized and smooth full lifecycle control of APIs.
API Management also ensures security of APIs, and it gives ability to optimize, monitor and give analytics from APIs.
A modern, cloud-based data and analytics platform combines the traditional reporting with modern analytics and data scientists' services. It provides a platform, for example, for data-based applications that use artificial intelligence.
The platform uses all forms of data from within the organization, from partners, and from external parties. The data is processed almost in real time into various data products, allowing an up-to-date view of the organization's situation. In addition to basic information, that can include predictions produced by machine learning algorithms.
The platform consists of modern public cloud services. The services' license models and technical features are flexible, so the service's cost is calculated according to the usage. Moreover, the services can be sliced for different user groups, which can be very useful: For example, running a heavy analytical process does not interfere with normal operational data processes. Thus, an annual process run doesn't require reserving resources for a whole year.
Data architecture is a part of the overall architecture and can refer to several perspectives. It often relates to the artifacts of data architecture on multiple abstraction levels, such as data models, definitions, and descriptions of information flows and metadata.
With the artifacts, a system project's data processing can be designed and implemented to support data reuse, quality, data security, and privacy, as well as to meet business requirements across functional silos.
A data ecosystem is an open or closed network with an interest in exchanging data between the actors of the network, following common rules: interfaces and data models. The members of an ecosystem have one thing in common: they all benefit from the data so much that it's worth sharing their data with the network. The exchange can also rely on monetary compensation.
A data ecosystem shares a vision of enabling more diverse data and solutions than a single actor could achieve alone. An ecosystem can have an owner, in which case it is about the dominance and benefit of one actor. Alternatively, ecosystem ownership can be decentralized to the members, making all ecosystem actors equal.
In some cases, a data ecosystem has a separate operator taking care of correspondence between actors and data transferring, without utilizing the data in its own operations.
Being involved in a suitable data ecosystem or owning one can, at best, be a significant competitive advantage.
Basically, data governance is about data ownership. The owner of a company's business units, equipment, and properties manages the usage of the company's assets and strives to maximize its business benefits. This should also be the case with company-owned data sets.
The owner of a data set is responsible for ensuring the data is of good quality and making sure the user rights comply with the set rules. Thus, corporate data governance should define the policies and tools for data owners and other users of data.
Data governance includes the idea of enabling access and visibility into the data for as many employees as possible—across organizational units. Data access should only be restricted for good reasons, such as privacy.
To comprehensively use the data and develop a data-driven business, the organization needs to have an existing and implemented data governance model. If the model doesn't exist yet, it's good to start from a data set that has the most business value and is prioritized by the organization's top management.
Often, the quickest results happen when the starting point is an analytics development project that generates a significant business advantage.
A data lifecycle refers to the different stages of data elements and data resources from the creation of information to its destruction. The stages can include storing, warehousing, transferring, using, and archiving the information.
Due to data security and privacy requirements, it is important to set business requirements for the end of the data lifecycle as well. Those requirements can include rules, such as how long the information can/should be stored and why.
Metadata management systems visualize the data transfers between various systems and describe how the data transforms from the source to its users. Data lineage refers to the visualization of the data lifecycle.
The business of manufacturing companies depends on building equipment that is either sold or rented to a customer. Such companies collect plenty of information about their business operations, including, sales (what has been sold and to whom), components used in production, equipment usage, and information about maintenance.
If this data is managed properly, the life cycle of devices can be accurately modeled. This enables the production of various services, such as financing solutions based on the use of equipment, proactive maintenance, and the sale of spare parts.
If the data is not of high quality, the digitalization of the business is impossible.
Combining and enriching data from different basic systems and making predictions based on that data enables the automation of service processes related to all equipment delivered to customers.
Read more about the topic or listen to our Tietoa Tulevasta podcast, which explores data management and digitization of a globally operating manufacturing company. The article and podcast are in Finnish.
A data pipeline is a controlled function for data processing and data product creation that brings business value. A data product can be, for example, a report or a prediction produced by a machine learning algorithm that’s used via an interface.
The data pipeline includes and combines several components. The components cover data source reading, editing, analyzing, storing the data in different data models, and activating the data through the processed data product. The components are based on a micro-service model, which means that individual components may have different developers and life cycles.
A business transformation aims for fundamental changes in a business or its processes. A data-driven business transformation uses data and analytics to enable those changes.
At the moment, organizations either use little or no data in addition to traditional financial reporting or use it only in certain operations. The new data-driven way of thinking harnesses data to improve business, management, and service production processes across the organization.
A data-driven business transformation means not only deploying the technology but also developing data availability, data quality, procedures, and a data-driven culture.
A data-driven approach means that an organization makes decisions based on information. Being able to make data-driven decisions requires reliable and accessible data. Having the technology and systems is not enough—success takes people, and cultural changes. The data-driven approach creates new opportunities: if used correctly, your data will not only streamline your operations, but also improves results, gives you a competitive advantage, and creates new business opportunities.
DataOps (data operations) refers to an operating model that uses various personnel roles and technologies to manage data pipelines automatically, and to support data-driven business development.
Companies understand the value of data better than before, but commercializing business data for judicious use requires collaboration between business processes and organizations. As it's important to be able to quickly produce value adding entities (data products) from business data, this collaboration requires a new kind of approach. The goal of DataOps is to meet that need.
DataOps is actually an interdisciplinary team formed around a business problem that makes use of agile way of working, DevOps practices and automation to manage entire data supply chain from source to value. The team organizes the data, tools, code, and development environments while taking care of scalability, functionality, and changes in data pipelines. Following the principles of continuous delivery, the team strives to swiftly generate valuable information from source data to support business.
The growth in data volumes, changes in source systems and demands to make data available for decision making in near real-time has created pressure for traditional data warehousing design methods such as Kimball and Inmon. Data Vault method is designed to improve agility and scalability in data modelling, especially in large scale enterprise data warehouses.
The Data Vault paradigm embraces the idea that data models change and expand over time. The Data Vault model enables incremental changes and frequent updates to the data model. However, the modeling work needs to be done meticulously and correctly, making the process prone to human errors. Thus, a data warehouse automation tool is recommended to leverage the pros of a Data Vault.
Data warehouse automation (DWA) eliminates repetitive design, development, deployment, and operational tasks within the data warehouse lifecycle and thus accelerates the availability of analytics-ready data. Data warehouse automation solutions are integrated platforms that include design tools, development wizards, templates and visual interfaces. Following efficient design patterns instead of custom development improves speed and quality in data warehouse development and maintenance.
DWA solutions enable to leverage the benefits of modern data modelling paradigms such as Data Vault and are used as a tool to create a shared understanding on the data model between data engineers and business users of data. Data warehouse automation is considered as a crucial ingredient in DataOps way of working.
Industry 4.0 is a vision of an advanced industry that leverages ecosystems, the industrial Internet, modern technologies, and new business models. The vision is based on the digital transformation of traditional manufacturing and production methods. It is driven by the explosive growth of intelligence and compatibility of machines and devices, as well as rapidly evolving technologies such as digital production chains, robotics, sensors, 3D printing, augmented reality, digital twins, Big Data platforms, artificial intelligence, and machine learning.
Cyber-physical systems are at the core of Industry 4.0. They describe intelligent, interconnected industrial production and logistics units that can communicate with each other, and operate and adapt independently in versatile conditions. The operation of such systems also requires and produces a lot of data, and using this data requires analyzing and processing it with the help of artificial intelligence and machine learning.
Proactive integration, information transparency, and transmission between companies, customers, and products are thus the key to harnessing the benefits of technological development. Hence, data-driven thinking, analytics, data ecosystems, and data management will play an even more significant role in business in the future.
Machine learning and statistical methods allow us to model future events based on previous data. Such modeling is called predictive analytics. Typical applications for predictive analytics can be, for example, customer attrition expectations, financial data predictions, and predicting machinery maintenance needs.
Modeling that retrieves new information from a previous event is called predictive analytics. Sentiment analysis is a good example: that's when the feedback or the tone of a customer comment is assessed automatically. This enables an immediate reaction to negative feedback.
Predictive analytics is usually distinguished from descriptive analytics. Instead of reporting the situation with available information, predictive analytics acquire new information.
A data warehouse supports the organization's traditional core functions and obtains answers to defined questions from known source data. A data lake supports a more predictive and experimental approach.
A data warehouse is mainly for structural information processing. A data lake enables the processing of all kinds of data in the organization. As the data warehouse and data lake are used for different purposes, they complement each other.
A data lake is often used together with a data warehouse to store all the raw data, and only an applicable part of it is transmitted to the data warehouse. Recently, we've seen new solutions on the market that combine a data lake and a data warehouse. Such a hybrid solution doesn't have a well-established term yet.
Artificial intelligence (AI) is an umbrella term for solutions that are regarded as intelligent. Search engines, smart speakers, and self-driving cars are examples of artificial intelligence. It’s often associated with system autonomy and independence from human decision-making. Analytics, on the other hand, refers to data-based reporting and visualization produced for human decision-making.
The information obtained with artificial intelligence—which often means machine learning—can be utilized with analytics. Simultaneously, the available data analytics is often used to develop artificial intelligence. For example, we can find out what people want from a smart speaker and how the product meets customers’ needs. Artificial intelligence-based decision-making systems require accurate analysis of financial figures.
Compared to analytics, artificial intelligence takes many steps further towards independent data use.
Subscribe to our Tietoa Tulevasta podcast on Spotify and follow us on Instagram to stay tuned. Our podcast brings data glossary terms to life with practical examples from everyday business life. Once your business needs new technology to reach your goals, Tietoevry’s in-depth expertise is at your service.
Data changes the world – does your company take full advantage of its benefits? Join Data Insiders, the #1 Nordic data community, a powerful network of top professionals and visionaries of data-driven business.
Data Insiders addresses the trends and phenomena around this hot topic in an understandable and interesting way. Together we share knowledge, offer collegial support and reveal the truth behind hype and buzzwords. We seek the answer to one particular question: how can data help us all to do better business?
Join Data Insiders today and stay at the forefront of the data revolution with access to quality podcasts, peer events and insights.