Kimball vs Data Vault, or taking the best of both?
Dudley Drummond-Hay, Senior Data Architect at PBT Group
In 1996, Ralph Kimball introduced a dimensional modelling technique that used a set of defined methods, processes, and techniques to design and develop a data warehouse. Such has been its effectiveness, that the Kimball methodology has become the standard for many years. But the advent of cloud technology and Big Data have seen the need arise to structure data while still making it understandable from a business context. Enter Data Vault modelling for large scale data warehouse platforms. But does this mean that Kimball has become irrelevant, or can organisations combine the respective strengths of both dimensional modelling techniques to harness the potential of their data more effectively?
One of the challenges of Kimball in the modern data environment is that it requires many updates and lookups. Continually updating Big Data in the cloud is expensive making this a cost-prohibiting exercise. However, a Data Vault uses a pure insert and append model with no updates or lookups required. There is also no need to join data entities that tend to be problematic with the variety of new technologies being implemented.
A Data Vault is therefore a good model for structuring the data lake. Unfortunately, data lakes are all too often seen as just being dumping grounds for data. This means companies must start structuring them better to allow data scientists to analyse the information they contain. When it comes to the downstream of data vaults, an organisation can build business vaults using the Kimball methodology as that data can be destroyed and recreated whenever needed. Simply put, Data Vaults allow for quick access to data in a structured way.
An analytical journey
Kimball arose at a time when businesses had a singular focus. For example, a company who sold pencils would always be selling pencils. Its business processes would never change. This means these organisations are shielded from changes for long time periods.
Today, there are lots of mergers and acquisitions taking place. Business processes must therefore adapt rapidly to changing markets. As such, the Kimball model has come under massive pressure to change within the evolving business environment. This involves expensive change control to model and reload data, or to rejig the model entirely to fit with new business processes.
For instance, one of my clients is a global manufacturing firm who grows predominantly through mergers and acquisitions. One of the challenges it faces is that whenever it acquires a business, there is a new ERP system to contend with. This means that it does not have easy access to data for analysis especially during the acquisition process when evaluating how the new entity will fit into its existing processes.
A Data Vault can be partitioned by source, so a company can quickly onboard a new source without impacting on any of the existing structures already in place. That does not mean that Kimball has no use in today’s environment.
Data Vault is a very efficient methodology for modern technologies, it can land data in a structure that has no updates or lookups required. The data can still be related to each other and provide an understandable structure for the business user.
If a business has a singular focus, like an SME who uses one ERP system to do some reporting and analysis, then the Data Vault methodology is not the right fit. A Kimball model can be built to do reporting as the SME will likely focus on its specific niche for several years or until it is acquired by a larger organisation.
These larger enterprises who absorb smaller companies into their business, must give serious consideration to Data Vault methodology. This is a strong foundation for cloud technology and online data storage repositories that deal with large volumes of data coming in frequently. It also makes sense for businesses who run different ERP systems in their environment to use a Data Vault.
It comes down to selecting the model that fits the business priorities of the organisation and even potentially combining both methodologies to deliver value.