Data Warehouse modelling: Data Vault vs Persistent Staging Area Data Warehouse modelling: Data Vault vs Persistent Staging Area database database

Data Warehouse modelling: Data Vault vs Persistent Staging Area


Data Vault vs. Persistent Staging Area sounds to me like apples and pears - hard to compare. You should not try to define a Data Vault to capture source data without knowing the business ontology - otherwise you're building a source system vault, which offers no or little benefit to the business. Building a Data Vault on a PSA or a data lake makes much more sense to me. Landing the data as an image of the source systems and then step by step building a sustainable data collection out of it.


The complexity that is added corresponds to the relational model that is introduced earlier in the Data Vault case. I guess it depends on what level you want to model your data and make it reusable across different use-cases resulting in different data marts. What I mean is that the data marts are designed for a specific business cases and the data vault model is more designed to be overarching (enterprise model). Hence, the data marts based on DV model have no need to physically materialise any data at all. A layer of views can be set up which look like star schema tables, but which in fact have:

•   Zero maintenance cost.•   Zero storage costs.•   High flexibility.

Additionally, it is definitely nice to know how the data is related in a more general sense (organization wide) - if that information and the mentioned advantages are justifying the extra effort to build a DV model is difficult to judge.