Author: Rick F. van der Lans
Date: June 2018
The classic data warehouse architecture, consisting of a chain databases and ETL processes, has served may organizations well the last twenty-five years; see Figure 1. But is it the still the right architecture? For example, is it still the architecture for all our new forms of data usage, such as self-service BI, data science and customer-based apps, and can it easily deal with all forms of big data? For more and more organizations the answer is: No. Many of them have started to look for an alternative, more flexible architecture. The one many have found is the logical data warehouse architecture. This is the topic of this third article in a series on use cases of data virtualization.
The Logical Data Warehouse Architecture in a Nutshell
The logical data warehouse architecture is an agile architecture for developing BI systems, in which data consumers and data stores are decoupled from each other; see Figure 2. The logical data warehouse architecture presents all the data stored in a heterogeneous set of data stores as a single logical database. In this architecture, data consumers don’t have to be aware of where and how the data is stored. All the details of data storage are hidden for them. They should not have to know or care about whether the data they’re using is coming from a data mart, a data warehouse, or even a production database. They should not have to be aware that data from multiple data stores have to be joined, nor should they know whether they are accessing a SQL database, a Hadoop cluster, a NoSQL database, a web service, or simply one or more flat files. The structure of the data stores is hidden as well; data consumers only see the data in the way that’s convenient for them, and they only see data that is relevant to their task. This is all achieved by decoupling data consumers from data stores.
Here Comes Data Virtualization
Now, there is not one supernatural tool that can do all the above and magically turn an existing data warehouse architecture into a logical data warehouse one. No silver bullet exists. Several tools are needed to accomplish this, such as a database server, a master data management system, and a data cleansing tool. However, the most important component is the data virtualization server. It’s the driving technology of the entire architecture.
Data virtualization servers support all the right features to develop a logical data warehouse. It provides the right features for data security, scalability, query performance, agile development, reuse of metadata, discovery and search of specifications, big-data access, and so on. But most importantly, it offers a comprehensive abstraction layer that decouples data consumers from data stores.
Features of Data Virtualization Servers
The following features make data virtualization the right technology:
- On-demand data transformation
- On-demand data integration
- On-demand data federation
- On-demand data cleansing
- ETL (lite)
- Data source-aware query optimization
- Network-aware query optimization
- Scheduling jobs
Is “Logical” the Right Term?
One important side remark, due to the term logical data warehouse, some have the impression that such an architecture does not require physical data stores at all. They assume that every time when data is queried, the production systems are accessed. This is not the case. For various reasons, data stores are still needed. For example, if a production system doesn’t keep track of historic data, it has to be stored somewhere else, meaning the logical data warehouse architecture needs a separate data store; see also Figure 2. Or, the production system can’t handle the extra workload generated by the data warehouse. In this case, data has to be physically copied to a separate data store. The caching mechanism of the data virtualization server can be used here.
In other words, the word logical in the name logical data warehouse doesn’t mean no physical data stores. It means that we try to minimize the amount of physical data stores. If they are not really needed, they are not developed. The less physical data stores are created and the less data is duplicated, the more flexible the architecture is.
Why the Logical Data Warehouse is Agile?
The technology used to develop classic data warehouse architectures demands that everything is built right the first time. Changing specifications afterwards can be time-consuming and expensive. This is not the case for data virtualization servers. Changing the data structures or the transformation logic of virtual tables only involves changing the specifications. There is no need, for example, to unload and reload tables. Almost all the work that has to be done is simply defining new specifications or changing existing ones. There is no large chain of databases.
The logical data warehouse architecture is suitable for all our new forms of data usage, such as self-service BI, data science and customer-based apps, and it’s capable of dealing with all forms of big data easily. It is the modern architecture that organizations have been looking for. The heart of this architecture is formed by a data virtualization server, making it a very dominant use case for this technology.
In the fourth article of this series [link to next article], we focus on a less well-known use case of data virtualization namely database migration and acceleration.
For more information on the logical data warehouse, see the following articles and whitepapers:
Developing a Bi-Modal Logical Data Warehouse Architecture Using Data Virtualization
Designing a Logical Data Warehouse
The Logical Data Warehouse Architecture is Tolerant to Change
The Logical Data Warehouse Architecture is Not the Same as Data Virtualization