Build your first embedded data product now. Talk to our product experts for a guided demo or get your hands dirty with a free 10-day trial.
When setting up your business intelligence operations or determining how you want to store your data, data warehousing is a huge consideration. Storing large volumes of business-critical data should not be left up to chance, and data warehouses are the ideal setup for further operations, be it ETLs, data visualizations, or something else.
But there is one important consideration - how do you track changes in your data over time? You can do this with something called slowly changing dimensions, and today, we’re going to show you what that is all about.
Slowly changing dimensions are ways in which data in your data warehouse changes over time. No matter what kind of data you store, it’s not likely to be static. For example, the price of an item in your store may go up or down over time.
Depending on your business model and overall needs, you may want to:
If the data in your business is constantly changing and gets overwritten, you may want to have a list of historical changes to that data. On the other hand, if changes are not that frequent, you don’t need to track them in your data warehouse.
This crucial differentiation is reflected in something called slowly changing dimensions. Depending on how important it is for you to track the changes in your data, you’ll choose one of the six available types of slowly changing dimensions.
Depending on how you want to set up your data warehouse system, there are several different types of SCDs. Depending on the source you look at, there will be varying numbers of slowly changing dimensions. However, we’ll cover the most important ones and their use cases.
In this type, the data never changes in a dimensional table. These are common data points that, well, don’t change, such as the date of birth, zip codes, social security numbers, and more.
For example, you have a web app that logs the maximum number of log attempts to an account. You currently have it set at 5 attempts. Suppose you change it to 3 attempts. In the type 0 SCD, that information will not get stored in your data warehouse.
In type 1 of slowly changing dimensions, the new data in the data warehouse simply overwrites the old data. You only retain the most recent data values and there is no historical tracking. Type 1 is a good choice for those business intelligence needs where only the most recent version of the data is what matters. For example, the effective date of a new role in a company.
Let’s take a look at an example of SCD type 1 in practice. You run an ecommerce business and you stock a wide variety of products. A product changes its price and goes from $20 to $25. With a Type 1 SCD, the data warehouse would only store the most recent value of $25, while the old value of $20 is overwritten.
When data changes, a new record is made so you get historical data, as well as a new entry. The old record gets an end date, while the new record gets a start date in a new row. You get historical changes and the data warehouse stores a record of how the data has changed over time, with a timestamp for each change in dimension data.
You want to use this type when it’s crucial to track changes in datasets. Here’s an example. You run a business and have an HR system that tracks all of your employees and their data. You’ll have different dimension attributes such as job title, name, location, and others.
Over time, a change happens and employee John Doe gets promoted from Regional Sales Manager and moves from Basel to Geneva in Switzerland, which is reflected in your dimension records. With type 2 SCD, all the historical records remain and you can see the previous value at all times - and track where the employee worked.
In this type of slowly changing dimensions, you add new columns to the existing tables in a data warehouse and store historical changes. You won’t get all the data history, but it’s a good compromise between types 1 and 2.
Let’s consider a situation where you’re in charge of managing projects and the data on them is stored in a data warehouse. For each project, you store data points such as:
If you take a project and complete it, then the only thing that changes is the project status that goes from “in progress” to “completed”, and this information is stored in an additional column. This allows for a limited historical overview of data in your data warehouse, where you can see the current value and the most recent previous one.
This one is similar to SCD type 2. The main difference is that two tables are maintained at once. It’s often called a history table type as well because it entails historical data being stored in one table, while new data is stored in a different table altogether.
So, the main difference between the example in type 2 and the same one in type 4 is that you would still have all the historical data, just stored in a different location in your data warehouse.
So, if your employee changes their position and physical location (like in type 2), the historical changes would be stored in a new table.
The reason why this is called type 5? Because 4+1=5. This is a combination of these two types, and it entails embedding a current profile mini-dimension key in the base dimension table that gets overwritten as a type 1 attribute. Both the dimension and mini-dimension profiles are shown in a single table.
It’s called 6 because it’s a combination of 1, 2, and 3, and 1+2+3=6. This type of slowly changing dimensions was developed by Ralph Kimball, who dubbed it the “unpredictable changes with single version overlay”.
Let’s say we have a situation with an employee who changed their position, like in some of our previous examples. In this case, when a change happens in the data warehouse, the following takes place:
In short, it’s a bit more complex than the rest.
You’ll need the help of a data analyst or a data engineer to make this call. The truth is, the type you choose depends strictly on your data warehousing needs and what you plan to do with that data later on.
For an embedded analytics use case, you’re going to need historical and current data so you can have the ability to compare them, as well as get real-time insights. That’s why it’s probably best to use types 2 and 4, so you can cover all time periods and have real-time dashboards at your fingertips.
Once you’ve chosen the slowly changing dimension type that works best for your use case, it’s time to take things to the next level. If you want to create stunning data visualizations for your SaaS product, Luzmo has everything you need in one tool.
Our API connects to all the most popular data warehouses, ETLs and web services like Google Analytics.
Get your free demo and find out how Luzmo can help you with your embedded analytics!
Build your first embedded data product now. Talk to our product experts for a guided demo or get your hands dirty with a free 10-day trial.