What are Data Warehouses and Data Marts?
Data is the lifeblood of any good decision-making process. With the rise of big data and advanced analytics, there has been an explosion in the amount of data collected in manufacturing, allowing for quick and accurate insights.
The way businesses handle, store, analyze, report, and consume data will be critical to their success. In this blog, we will go through the fundamentals of data warehouses and data marts.
- A data warehouse is a centralized repository for data from various sources such as ERP, legacy, or external systems and applications. It collects data from across the enterprise. Then makes it available for analysis, reporting, and other business intelligence (BI) functions to aid manufacturers in making data-driven decisions.
- A data mart is a subset of a data warehouse that comprises structured data on a single subject or business line. It divides enormous datasets into smaller, more manageable bits. Such as inventory or production data, to increase data aggregation and agility without the need to retrieve data extracts from a centralized corporate database.
The difference between a data warehouse and a data mart is the size and or scope of information in the database.
Why are data warehouses and marts needed?
Data warehouses and marts are built because the information in their source databases is not organized to make it readily accessible. They are too complicated, difficult to access, or resource-intensive to access.
The influx and amount of data are what contributed to the problem. Sifting through the “noise” in data to provide quick and accurate insights is a key business value to BI.
Is a data warehouse just a database?
A data warehouse is more than just a database; it is a system.
The system brings in data from other systems, which could include databases or flat files. Data is generally not brought in directly from machines or PLCs, but it can be. In modern architectures, data warehouses can even be updated in real-time.
Which comes first, the data warehouse or the data mart?
There are several theories or practices (Inmon & Kimball) dealing with the flow of data. One of the differences pertains to the architecture of the systems. Either the data warehouse or the data mart can come first, as the following diagrams show. The architecture chosen is dependent on requirements.
How does the data get into the data warehouse?
A staging and integration area sits between the data sources and the data warehouse.
- A data staging area is where data is stored when brought over from the data sources. Staged data is typically a 1-1 of the source system. A data lake is a type of staging area. The data is in its natural/raw format and is usually object blobs or files.
- A data integration area is where data is transformed before being added to the data warehouse. The transformation includes combining data with multiple sources, cleaning, de-duplication, adding business rules, adding calculations, and optimizing the data for retrieval.
This is a simple representation of how data is handled throughout the data warehouse system.
What questions must be answered when establishing a data warehouse or data mart in a manufacturing environment?
As you dig deeper into data warehouses and data marts, many decisions will be made. Here are a few of those questions.
- What type of data needs to be loaded (images, for instance, are better stored as blobs than in a relational database)?
- Are there any regulatory requirements around the data or auditing that need to take place?
- What data is sensitive such as Personally Identifiable Information (PII)?
- Are there any accessibility requirements around whether the data can be pushed or pulled in the warehouse?
- How frequently can we load the data warehouse/mart?
- What is the volume of data that needs to be loaded, and how frequently does the data change?
- Will time-series data be included? Should it be summarized?
What is the time and effort required to build a data warehouse or data mart?
The book Business Intelligence Guidebook From Data Integration to Analytics by Rick Sherman talks about how moving from data to decisions can take significant effort.
Data preparation processes are the lion’s share of the work of any DW or BI project—estimated at 60 to 75% of the project time. Project delays and cost overruns are frequently tied to underestimating the amount of time and resources necessary to complete data preparation or, even more frequently, to do the rework necessary when the project initially skimps on these activities and then data consistency, accuracy, and quality issues arise
Almost everyone—IT, vendors, consultants, and industry analysts—associates data preparation solely with ETL development work. ETL tools are indispensable in a BI project because of the many benefits they provide (see section on data integration for further details), but do not significantly reduce or speed up the majority of the data preparation work. The reason is that the bulk of the time is spent on defining the sources (getting data from source systems), data profiling, defining the targets (putting data in DWs, ODSs, and data marts), source-to-target mapping, and business transformations. This definitional work is time-consuming because it involves meeting, discussing, and obtaining consensus on the definitions and transformations with source systems’ subject matter experts and business people. As the number of systems and people expands, these activities expand disproportionally.
Using the 5C’s to prepare your data
Sherman also defines the 5C’s. They give more clarity to why the effort is so great in the data preparation phase.
- Clean—dirty data can really muddy up a company’s attempt at real-time disclosure and puts the CFO at high risk when signing off on financial reports and even press releases based on incorrect information. Dirty data has missing items, invalid entries, and other problems that wreak havoc with automated data integration and data analysis. Customer and prospect data, for example, is notorious for being dirty. Most source data is dirty to some degree, which is why data profiling and cleansing are critical steps in data warehousing.
- Consistent—there should be no arguments about whose version of the data is the correct one. Management meetings should never have to break down into arguments about whose number is correct when they really need to focus on how to improve customer satisfaction, increase sales, or improve profits. Business people using different hierarchies or calculations for metrics will argue regardless of how clean the transactional data is.
- Conformed—the business needs to analyze the data across common, shareable dimensions if business people across the enterprise are to use the same information for their decision-making.
- Current—the business needs to base decisions on whatever currency is necessary for that type of decision. In some cases, such as detecting credit card fraud, the data needs to be up to the minute.
- Comprehensive—business people should have all the data they need to do their jobs—regardless of where the data came from and its level of granularity.
Although data preparation takes time, source systems and data warehouses can be configured to speed up the process.
Who benefits from using data warehouses and marts?
Decision-makers, the IT departments, and all business areas see benefit from data warehouses and data marts.
- Decision-makers get the most benefit from data warehouses because they can,
- Make decisions without having to understand the complexities of the data sources or the intricacies of data transformations.
- Use BI tools that can more effectively present data. A chart is worth a thousand words. Being able to drill into a chart’s detail provides immense value.
- Build and adjust their reports to answer new questions.
- IT departments see productivity improvements. When IT builds effective self-service models and marts for consumers, IT can spend their time building new models and less time changing fonts on reports or PowerPoints, for instance.
- All areas of business benefit when the data warehouse initiative is focused on ROI, business value, and continuous improvement.
Data marts are quicker and easier to build than data warehouses. The data marts focus on a single business area instead of the entire business. They improve end-user response time by allowing users to access the specific type of data they need to view most often by providing the data in a way that supports the collective view of a group of users.
Data warehouses and data marts benefit your company’s decision-makers. They provide the 5Cs of data, transforming it into information, allowing critical decisions to be made with confidence.