Understanding the alternatives to a data warehouse
Julian Thomas, Principal Consultant at PBT Group
Often, when an organisation believes it is implementing an alternative to a data warehouse it is only using different technologies to do the same thing. A true alternative requires developing a design methodology that can cater for an organisation’s data and information requirements.
Before I look at three of them, it is good to understand what a data warehouse is. Effectively, it is a central repository of data that is business process aligned. The design of the data warehouse is optimised for analytical performance, and for enabling self-service analysis and reporting of data. The key focus of the data warehouse is to enable strategic decision making. An important factor of the data warehouse is that it is time based – and provides users with the ability to perform time series analysis, trending and forecasting. This helps business users analyse past behaviour in order to make future predictions.
#1 Operational data store
An alternative to a data warehouse that has been around for some time is an operational data store (ODS). Unlike the data warehouse, it is not focused on the past, but tries to report on what happens at a company at an operational level. An ODS shows a real-time, current view of the data so an organisation can better manage the day-to-day tasks. It is therefore more of a tactical focus as opposed to the strategic perspective a data warehouse provides.
Furthermore, an ODS looks closely at what the operational systems look like. For smaller companies, an ODS could be a replica of these systems. If the business is larger or more complex, the ODS could be a custom-designed operational model that needs to integrate data across enterprise systems in a single view.
For example, if a company needs to support a call centre performing outbound sales, the ODS can provide a current view of the team, its performance and targets, deliver a sales funnel report, and identify the leads with the most potential, to name a few.
The key aspect of an ODS is that it tactically supports the company. It is not a long-term data storage solution with much of the data typically being limited to a short period, typically not more than six months to a year or two..
#2 Data Lake
More recently, solutions have emerged based on big data technologies. With big data platforms either being on-premises or in the cloud, most of these environments are based on Hadoop. The emergence of this technology has led to the concept of a Data Lake, which is a data repository built on Big Data platforms, that can serve as a repository for vast amounts of raw data.
The Data Lake allows for data of all formats, for instance non-relational data (images, audio, video), the Internet of Things IoT), real-time data processing, and so on. As the Data Lake platform has evolved, more technology has come out that enables companies to create database engines running in the Big Data landscape that can expose this raw data in SQL-like technologies and formats. Today, companies are looking at building data warehouse solutions in the Data Lake. But the Data Lake is still just a technology. A company will still need a design pattern and methodology around its data, in order to truly get the benefit and value out of the underlying data, and to enable the data warehouse use cases.
What the Data Lake does provide however, is the ability to rapidly ingest raw data into the Data Lake platform, that can be exposed to data scientists and analysts for the building of predictive and machine learnings models.
#3 Data Vault
Beyond a Data Lake, a new design pattern has emerged – that of a data vault. It works along the same lines as an ODS and data warehouse with the methodology focused on how to design data solutions from an ETL, reporting and querying point of view. This makes the data vault a better alternative to a data warehouse than the data lake.
It is a blend of an ODS and data warehouse and takes the best of both worlds. For instance, it has a lot of the relational aspects of the ODS to manage data better while also incorporating data warehouse characteristics to focus on the business areas. Ultimately, the data vault provides a more flexible approach around the design of a data solution.
Let’s face it, designing a data warehouse is complicated as the organisation tries to model all its business processes. Each business unit has a different definition of things with the design becoming laborious and time-intensive. And when the data warehouse design is finally completed, one of the lines of business might change significantly impacting on all the elements within the data warehouse.
A Data Vault design pattern works in such a way that the business can break the model into sub-components making it more flexible and scalable. This means a Data Vault is easier to change and can provide a far more effective alternative to the design challenges of a data warehouse.
In my next article, I will take a closer look at another alternative – the Data Lakehouse.