When analyzing data, analysts should adhere to best practices. Once such best practice is to use a star schema for the data. At its basic level, a star schema contains one or more fact tables, and several lookup tables. But data often doesn't come to the analyst as a star schema, which means the analyst must transform it. That often requires extracting lookup table information from the fact tables. This article will give you steps on how to do this. Specifically, it will show you how to create lookup tables from fact tables using merge in Power BI.
What Is Needed for this Tutorial?
You'll need the following:
- Power BI - most versions should work (you should always upgrade though)
- A data set containing information that should be in a separate lookup table but isn't. Think Customer information or Product information, etc.
- Some basic knowledge of the Power BI interface and concepts
- Understanding of a star schema.
Dataset Used
For this tutorial, I used a sales table from Kaggle. It was a perfect sample for this tutorial as it contained both product and customer information embedded inside the main sales table. In fact, there is only one table, and that is the sales table that contains all the information. It is not a star schema in the format that Kaggle stores it.
NOTE: you will need to sign up for a free account before downloading.
You don't have to use Kaggle for this exercise. Just make sure there is a column that seems like it should have a corresponding ID column, but nothing exists in the main (fact) table.
Steps for the Product Lookup Table
I am going to start on the product lookup table, because the sample sales data that I am using contains product information that has an identifying ID column. Creating a lookup table when this condition is met, is easier than creating a lookup table where no ID column is included. Our sample dataset also contains customer names (and other data), but no underlying ID column. This would be explored after the product lookup table.
The steps are as follows:
- Duplicate the main fact (sales) table.
- Rename the duplicated table as Products.
- While in the new Products table, delete any column that is not associated with Products. NOTE: this can sometimes be a judgment call. Other times, you'll need to speak with the owner of the data.
- Using the ID column, right-click and select Remove Duplicates.
- In the main (sales) fact table, delete all product-related columns that exist in the Products table, except for the Product ID.
Related: How to Get Familiar with Your Data
Helpful tips
Rename the identifying column for products in both the Products table and the main (sales) fact table. If you make the name the same for both, the relationships section will likely create a relationship for you. In other words, rename the product identifier as Product ID in both the Products table and the Sales table (whatever the sales table is called).
Rename columns that you'll want to appear in the reports. In our Kaggle database, you'll notice that the columns are mostly capital letters and the names are all mushed together. Make the names camel case and separate by columns.
The reason to rename these columns is to make it look nicer when creating reports. Which will look better to you:
ORDERNUMBER CUSTOMERNAME
1 John Doe
OR
Order Number Customer Name
1 John Doe
I think you'll agree that the second instance looks much better. The names of the columns won't affect the functionality of the analysis, though.
1
Duplicate the Fact Table
In Power Query (Transform Table), select the fact table, right click, and select Duplicate.
2
Rename the new table to Products
Select the new table and rename it to "Products" (no quotes).
3
In Products, delete columns not associated with products.
In our example, the only three columns that should exist within the Products table are:
- PRODUCTLINE
- MSRP
- PRODUCTCODE
4
Remove Duplicates
To change the aspect of the number box (color, border or shadow) click on the number and select in the breadcrumbs the content box just before the paragraph.
5
Remove Columns in the Fact Table
To change the aspect of the number box (color, border or shadow) click on the number and select in the breadcrumbs the content box just before the paragraph.