Power BI Data Flows - Creating Re-usable Data

Thinking:

Data Source (e.g. Data-warehouse)
Data Flow (re-usable data in Power BI Service pulled from the data-source)
Power Query in Power BI Service for Data Flows (central data transformation point)
Data Set (data pulled from Data Flow in Power BI Desktop)
Data model (model in Power BI Desktop)
Report (Report published from Power BI Desktop to Power BI Service).

Challenges - Data Flows pull tables by default, which cannot be related in the Data Flow. You can however pull a query into a data flow. So how far can this go to full data centralisation in Power BI service > Data Marts!

Introduction
 

OK, so Power BI data flows are all about creating re-usable data and having the Power Query Editor within the Power BI service, rather than having to use it in Power BI. The benefits are therefore:

1. Having data that can be re-used would also be excellent. It can be refreshed once (or up to 8 times a a day with Power BI Pro license or up to 48 times a day with Power BI Premium license).
For most people a one of query is going to be fine at the start of the day, once your data warehouse is loaded. 

2. Having the Power Query editor within Power BI, means that the transformations can be made directly within the Power Bi service, which will not only be faster, but allows for other developers to be able to see and edit the transformations if required. Remember however, you can't create a report in Power BI Service from a data flow, you can only create a report from a data set, so you still need to use Power BI desktop to create the data set and report.

3.  If we can move to have all the data loaded into the Power BI service and nicely transformed, we can clearly separate the task of modelling the data and creating the Power BI data visualisations in Power BI desktop, by pulling the data flows in.

4. Once the Power BI work report is done, this can be published to the Power BI service. 
So we have the report that pulls data from the dataset, that gets data from the dataflow, which has previously been refreshed from the data source (e.g. the data warehouse).

Getting a Power BI Pro Free 60 day Trial account

If you have a work email, your ready to go, but if you don't you can buy a domain and email from a cheap domain provider. I used Fasthosts. I got a .co.uk domain for a £1 and webmail service for a £1 a month. Easy to setup and I was able to create a working email immediately. Once I had that I could then upgrade the desktop version to Pro, by clicking on the free trail link next to your account (top right hand side of Power BI desktop).

Creating a Data Flow (Simple method)

TIP: You can only create data flows with a Power BI Pro license and you cannot create a data flow in the my workspaces workspace. This will save you time getting lost.

So enter a workspace and select New > Data Flow 
Select Add new Tables from the Define new tables (first box)

TIP You can also create a dataflow and pull some data from an existing dataflow, so not to reuse tables. I think you need a premium license to do this, so it may not be an option for you if you only have Pro

You can then select the Database or file 


You can then connect the Power BI Service (not the desktop), to your data warehouse.
If you don't have a data warehouse, please see the blog on setting up a Microsoft Azure data warehouse using the Adventure Works DW 2019 sample. 





TIP: I kept getting authentication errors, but leaving the database name blank did the trick.

If you are using azure, the database server name and database name are all in azure (as above).
You can then select the tables you want to import.


Once the tables are imported to the data flow in the Power BI service, you can apply your transformation to the data, with the idea this is the only time and place you need to do this. 
Note: If you select map to entity, you are given a whole list of commonly used field names that make up the CDM or Common Data Model. This helps you use standardization for your field names.





OK....


So this is ok so far, but you will still need to do the modelling work in Power BI to connect to the tables. Relationships cannot be made within the dataflows - but we can pull queries using data flows and this is where it gets interesting.

So if we can pull SQL queries into data flows, we can use relations between tables that can't be done otherwise. This means we can refresh these re-usable queries with a tonne of data in them once. 

Method - Pulling SQL queries into Data Flows:

Data Flows use M code (Mashable code), which is the code used in the Power Query Editor. When you select buttons, such as text to columns, the M code is generated for you, but you can write M code yourself. 

If you select New Data Flow > Define New Tables (Add New Tables) > Other > Blank Query

You will be presented with an interface to enter M Code.




If you don't know M code, you can use Power BI Desktop to autogenerate the M code for you.
To do this first create a simple SQL query joining two tables together for testing. I did this first using SSMS as it was easy.



Next I go open Power Query in Power BI Desktop and create a new data source connection in Power Query (in this case I connect to an Azure SQL database. 
When you run the query the M Code is generated automatically for you. This is mine:

= Sql.Database("mydatabasename.database.windows.net", "adventureworks2019", [Query="select #(lf)[FirstName],[MiddleName],[LastName],[AddressLine1],[AddressLine2],[PostalCode]#(lf)from#(lf)[SalesLT].[Customer]#(lf)left outer join [SalesLT].[CustomerAddress]#(lf)on#(lf)[SalesLT].[Customer].[CustomerID] = [SalesLT].[CustomerAddress].[CustomerID]#(lf)left outer join #(lf)[SalesLT].[Address]#(lf)on#(lf)[SalesLT].[CustomerAddress].[AddressID] = [SalesLT].[Address].[AddressID]"])



The final step is to go back to Power Query service and open blank query from data flows and enter you M Code in there. My M query is below (you just need to remove the equals sign and paste):

Sql.Database("mydatabasename.database.windows.net", "adventureworks2019",  [Query = "select #(lf)[FirstName],[MiddleName],[LastName],[AddressLine1], [AddressLine2],[PostalCode]#(lf)from#(lf)[SalesLT].[Customer]#(lf)left outer  join [SalesLT].[CustomerAddress]#(lf)on#(lf)[SalesLT].[Customer].[CustomerID] =  [SalesLT].[CustomerAddress].[CustomerID]#(lf)left outer join #(lf)[SalesLT]. [Address]#(lf)on#(lf)[SalesLT].[CustomerAddress].[AddressID] = [SalesLT].[Address].[AddressID]"]) 

This is a great video explaining Data Flows





Comments