At some point, you'll need to create lookup tables from data sources. Excel users often clump all entities together in each row. This is a natural way for them to view the data and it's the easiest for them to enter it this way. You'll likely find customer names and addresses, product names, prices and quantities sold, etc., repeated on each row of the spreadsheet.
The format these Excel users implement is not a natural fit for business intelligence (BI) tools. This is why these tools include Edit, Transform, and Load (ETL) functions. They can help place the data in the format that the BI tool works best. For this, you'll need to to learn how to create lookup tables from fact tables in Power BI.
The sample file we'll use for this tutorial is an Excel file. It is short by design: to help you understand the concepts without dealing with large files. This will serve as the fact table mentioned in the beginning of this tutorial. Here is the source to download:
In many cases, files you'll load in from clients or at work will contain data in the wrong format for Power BI to work optimally. The data usually contains every bit of information that a client or boss deals with.
For example, spreadsheets with sales information often contains the order number, order date, customer id and customer name, sometimes even the address of the customer. The order line item may contain the product id and name, the category, etc.
As you can imagine, the information is repeated for the purchase of similar products or if a customer is a repeat buyer. Take a look at the following:
This spreadsheet is the data we'll be working with for the tutorial. The Customer name is repeated several times (See Mary Resnor, etc.). The product name is repeated, too (look at Apple iPhone 5, etc.)
The spreadsheet is available for download earlier in this tutorial.
While the impact of repeating these values may not be too great with a data sample as small as this spreadsheet, the large files will start to become unwieldy. Imagine if the user misspelled a word in one of the product names that gets repeated.
The bad news is that the clients or owners of these spreadsheets will likely continue entering their data in the above spreadsheet format. No amount of processing by Power BI can prevent that from happening.
But the processing (known as ETL) can detect when it occurs and if you set rules ahead of time, you can reduce the number of these errors automically via the ETL process. This is an advanced topic however.
Most of the time, the owners of these spreadsheets will copy a previous entry to make sure that it does get copied properly.
Spreadsheet Owners Won't Change their Data-Entry Procedures
There's a good bet that the owners of these spreadsheets will not change how they enter their data. That requires a lot of revamping of the procedures and of the spreadsheet itself.
But this format will not work well in Power BI. It's possible, but you'll have a lot of work to do any time you make changes to your dashboard or reports. It also will prevent the assets that Power BI uses from working properly.
A small amount of effort during the ETL process can go a long way in making your life easier during design.
How the Tutorial Is Structured
If you notice from the spreadsheet above, there are four tables-in-one. You could likely find even more, but sometimes you can take the concept a bit too far. However, once you learn these techniques, you are welcome to use them as you see fit.
You could, for instance, break out the orders into an sales master and sales detail table. But for the purposes of this tutorial, we'll stick with the four tables.
The tables are as follows:
- Sales Type
All the sales information is contained in the spreadsheet. Once we process the other tables, what's left over in the main table will be the sales information.
To start the process, the first step is to load the data. It's a Microsoft Excel file called Sample Data.xlsx. Load the file as follows:
You'll notice the name of the is called Transactions. You are welcome to leave that name. For this tutorial, we'll rename it to Sales.
After the data is loaded, click on the Transform Data tab. Note - you could have clicked on the Transform Data in the Load screen to arrive at the same option.
Change the name of the table from Transactions to Sales.
We'll now duplicate the Sales table to use as our new customer table. The concept is to create a duplicate of the entire table and only keep the columns that we need, in this case, the ones related to customers.
Locate the Sales table (the one you renamed from Transactions) and right-click on it. Then, you'll see an option called Duplicate. Click on that option.
In the new table called Sales (2), select both the Customer ID and the Customer Name columns.
Select the Remove Columns and then the Remove Other Columns option. This will remove all columns that are not selected, leaving only the Customer ID and the Customer Name.
We have made some progress, but you may notice one issue - some of the rows are duplicates. We'll need to remove those duplicates. Luckily, in Power BI, the is easy.
Select the Customer ID column and right-click. Then, select the remove duplicates option.
As customer information is the only information contained in this table, let's rename the table to Customers. You can call the table whatever you like, including Customer_Lookup, etc.
Click on the main Sales table. You'll notice the Customer Name is still in this table. We don't need this anymore because this information is now contained in the Customers table. We use the Customer ID to link them (when set up correctly, Power BI handle this internally).
Select the Customer Name column, right-click and select the Remove option.
Table relationhips are established using some common bond, usually a key. When we duplicated the sales table and kept only the customer information, one of those columns is the Customer ID. Since the Customer ID still exists in the sales table, this is the common bond we need to "glue" the tables together.
Once the relationships are established, Power BI will know what to do with them. If you included the Customer ID from the Sales table in your reports, it will provide you with exactly the same information as when you include the Customer ID from the Customer table.
To illustrate this, I added the Customer ID from both tables (Sales and Customers) into a matrix control. Then I added the Quantity and Amount columns from the Sales table:
What this also means, however, is that we can include the Customer Name into the report, and Power BI will know how to make the connection.
The report shown includes the Customer ID from the Sales table and the Customer Name from the Customers table. Other than establishing those relationships, there is nothing that you needed to do to make this connection. This is part of the POWER in Power BI.
The dynamics of how table relationships work is beyond the scope of this tutorial (due to time restrictions). However, you'll find great tutorials on the internet that describe how to set up the relationships properly.
As an experiment, you can try deleting the relationship between Customers and Sales. See how the report components react.
This section will be shorter as the process is exactly the same. You'll duplicate the sales table, remove all columns except for the Product ID and the Product Name. Remove the duplicates (after seleting the Product ID column) and remove the Product Name from the Sales table. Then, rename the Sales (2) table to Products.
I'll show the images without adding too much commentary just because it's the same process as customers. Trust me! When you do this once (Customers) you'll have no struggles with doing it for Products.