Tag Archives

17 Articles

Loading multiple JSON files using Power Query

I had to figure out recently how to load multiple JSON files using Power Query. It turned out to be less easy than expected, so I figured it is worth blogging about…

The scenario: I have multiple JSON files sitting in a container in Azure Blob Storage; I would like to load them all into a data model for use in Power BI. I am assuming all the files you want to load are in one container. My solution will not work for multiple containers.

I will be using Power Query for this, from the Power BI Designer. You could do the same using Power Query in Excel.

First, let’s connect to the blob storage. This part is easy. Just click Get Data à More in the Power BI Designer and then select ‘Azure’ and then choose Microsoft Azure Blob Storage and click Connect:

 

In Excel, navigate to the Power Query tab, select From Azure à From Microsoft Azure Blob Storage:

 

 

You will need to enter your Azure Storage account name and key. Next, you will see a list of containers in the blob storage. Select the container the data is in and choose Edit:

 

What we will need to do is create a function that loads the JSON files. To do this we use an approach similar to loading multiple Excel or CSV files (see here and here respectively): first we just load one file and then we convert it into a function which we will call for all files we want to load.

So first, click on ‘Binary’ in the first column for one of the rows representing a JSON file. You will a one column table listing all records in the JSON file (the exact number of rows changes with the length of the JSON file):

What you want to do is convert the records into a table by clicking on the button:

You will probably see a ‘to Table’ dialogue, allowing customization of the conversion; for JSON you normally should not have to change the defaults, so click OK.

Next step is to expand the resulting Column1 to see some actual data. To do this click the expand button to the right of the column header and click OK (I deselected the ‘use original column name as prefix’ option):

And voila: a nice looking table of the records in this JSON file:

We are not done however; this was the easy part. Remember we need to create a function that will enable us to iterate over multiple files.

To start editing the code hop over to the Advanced Editor (ViewàAdvanced Editor). Your code should look something like this: (Your last line will be different from mine since it is dependent on the contents of the JSON)

First, we will need to wrap this in a function, so add this line at the top:

Then, add this at the bottom:

We need to edit the line that defines ‘contents’ to look like this:

Your code should look like this:

Click ‘Done’ and give the query a descriptive name (I suggest naming it the same as the function: LoadJSON)… pfew, that was not too bad right? So now, let’s use this function on all our JSON files. Let’s do the same as we did at the start; connect to the blob and stop at the screen where you have a list of files in the container:

Since we only can apply the function to JSON files, my first step is to filter on the Extension being just ‘.json’:

Then, we need to get rid of all the columns except Name and Folder Path. To do this, select the columns to keep and choose Remove Other Columns.

Now, let’s call the function and pass in the path and name parameters. Insert a custom column (Add Column à Add Custom Column) with the following setting:

Then, we need to expand the resulting custom column by clicking on the little expand button again:

Click ‘OK’. Now you have all the contents visible. To clean up lets delete the Name and Folder Path column since we do not need them anymore. Since this is JSON you will probably want to fix data types before reporting on this.

And…. You’re done, how cool is this?

Walkthrough: how to connect to Dynamics CRM Online with Power BI

In this walkthrough I will step through the process of connecting to a Dynamics CRM Online instance with Power BI (specifically Power Query).

For this you will need to latest version of Power Query installed. After you launched Excel, navigate to the Power Query tab and choose From Other Sources à Dynamics CRM Online. You will need to enter the service URL in this window:

The OData service URL takes the following format: https://.crm.dynamics.com/XRMServices/2011/OrganizationData.svc.

Once you filled out your tenant name click OK. In the next screen you will be asked for your credentials. Since I use a demo environment I will need to use an organizational account. If your organization uses Dynamics CRM Online in production chances are you will be automatically authenticated or can use your Windows account.

Next step is to specify what tables I would like to load:

I chose OpportunitySet, since I wanted to get a list of the opportunities in the system. The opportunities have a modified date which I would like to show as an ‘age’ in days; meaning that I would like to show the number of days that have passed since the opportunity was last modified. I can easily do that using the Power Query editor (select the table and click Edit); select the ModifiedDate column and use Transform à Date à Age to calculate a rather exact age:

After the transformation the column looks like this:

This is awfully exact, I only wanted the age in number of days. To change this choose Duration à Days:

And now the column reports 60 days.

 

As you can see, it is very easy to retrieve data from Dynamics CRM Online; we even did a typical ‘age’ or ‘number of days passed since’ type of calculation, because retrieving the data was so easy!

New Power Query update

Recently a Power Query update was released (see http://blogs.office.com/2015/03/05/3-updates-excel-power-query/). Mayor updates: performance on load, Dynamics CRM Online connector and new transformations, most notably advanced date/time calculations. Personally I enjoy the CRM Online connector, but I am most fond of the ‘Age’ transformation; it makes it very easy to do the typical ‘number of days since this order was entered’ type of calculations, since it compares the date in the column with today.

The update to Power Query is available here: http://www.microsoft.com/en-us/download/details.aspx?id=39379&WT.mc_id=Blog_PBI_Announce_DI

Enjoy!

 

Power BI connectivity to Salesforce.com

Here is a short video on how to connect to Salesforce.com from Power BI:

You can now easily get data from Salesforce.com into your Excel / Power BI solution. Enjoy!

Power BI connectivity to SAP Business Objects Universe

Here is a short how-to video on how to connect to SAP Business Objects Universe through Power BI:

Enjoy!

Power BI Pro Tip: Dealing with errors when reading Excel files

If you use Power Query to read an Excel file that has errors (such as #VALUE, #REF, #N/A, etc) the rows that contain the error will not be loaded into the data model but instead will be flagged as error rows. But what to do if you really want to keep the data rows with the errors with some replacement value for the error instead of skipping the error row altogether?

To demonstrate this I created this simple Excel table that contains only errors (actually all errors that Excel can generate that I am aware of):

How they are generated and what they mean is not the point here, but I am sure most of you have seen some if not all of them before.

On loading this table using Power Query all five rows will be marked as error rows:

As a result, no data is loaded into your data model. This is as expected since Power Query filters out the rows that have an error, which in this case are all of the input rows.

However, in some cases this is not what you want. Suppose this was a 50 column dataset and that in one column sometimes an error occurs. Then do you want to disregard the rows with errors and thus delete all possible valuable info in the rows or do you want to somehow fix the errors and flag them as problematic but still load them? The latter is sometimes the better choice.

This can easily be done by editing the Power Query and using the Replace Errors function (on the Transform tab):

Make sure you have the column with the errors in it selected, enter a valid value for the data type in the column (in my case it was Any since there is really no other data available in the MyCol) and hit OK.

Power Query will replace the error with the value you entered:

Now, on loading the rows with errors will end up in the data model and your data will be flagged with the label you chose.

Hope this helps! Until next time!

Power BI pro tip: using Access Online for data entry

With powerful self-service BI tools such as Power BI comes the need for business user data entry; data does not exist in source systems or does need to be enhanced / enriched before going into the report, or the business user just wants to change the way the data is organized. In those cases (which are present more often than not) we need to find a way to give the business user an easy to use way to do data entry while keeping it robust: i.e. not use a tool the user could easily make mistakes in and hurt the reporting process. You could use Excel but you would have to secure it so no mistakes can be made. Also, SharePoint lists are a good option if you have less than 5000 data rows (that’s the hard limit in SharePoint Online). If you need to store a lot of data and need a robust solution, Access Services or Access Online is a great tool for the job and the best part is it works perfectly with Power BI.

Perhaps the biggest change in Access 2013 is that it now stores that in SQL Server Databases rather than Access files. In this post I will show you how to build a sample application concerning reports on KPIs for production plants around the world. The data is entered by the business user using a web form generated by Access and the dashboard is created using Power BI. So here we go.

First step is to get the data. For that I created a simple Access 2013 application that I published on my SharePoint Online site. The Access application consists of three tables: KPIs, Periods, Plants and of course the actual facts: the KPI Values. On top of this sits a very basic data entry screen that enables the user to enter new actuals and targets for a KPI for a period for a given plant:

I entered some test data and saved the app. Imagine your business user just entering their data in here.

The next step is to get the data out of the SQL database Access Services will store it in and build a report / dashboard on top of it. For this, you will need to go to the Info pane of the File menu in Access. Look for the ‘Manage’ button next to Connections:

If you click it you get a big flyout presenting you with a lot of options. You will need to select the following:

-From My location or From Any location. I chose from Any.

-Enable Read Only connections.

See this screenshot:

Now, click on ‘View Read-Only Connection Information’ and leave it open for now. You will need to later.

Next step is to start Excel, go to Power Query, select From Database à SQL Server (and not Access since data is stored in SQL Server by default in Access 2013).

Copy paste the server and database name from the Connection information screen in Access and choose Ok. In next screen enter your credentials and passwords (again copy/paste from the connection information screen in Access). After a while you can select the table you are interested in and you can load the data into PowerPivot. I loaded my Plants, Periods and Values (I skipped KPIs since it was only the KPI label):

Next step is to create relationships between tables in PowerPivot, hide some columns as well as add a KPI definition. I ended up with this model:

Now, with Power View I created the following basic report (I did not give myself time to work on the layout, this is just quick and dirty):

 

This concludes this Power BI Pro Tip!

Combining text (csv) files using Power Query – follow-up

I have been getting quite a few responses on my original post on how to combine text (csv) files using Power Query. One of the FAQs is how to keep the filename where the data came from in your result set. Said differently: what you want is the contents of all files in a folder plus the filename of the originating files all in one table. I thought it was simply a matter of adding a column, but Nicolas pointed out in a comment that adding a column would create a cross product of all data and all filenames. So, I needed to come up with another solution. The solution I present here might not be the best one, but it works. The magic trick here is knowing that the Csv.Document() function exists. Allow me to explain. First of all, I followed the “normal” approach to list files in the folder using Power Query. What you get is: Now, it might be tempting to expand the ‘Content’ column (as I did in the original post). However, as in this scenario the goal is to get the contents as well as keep the originating filename, we need a different approach. What we need to do is add a custom column that equals the following:

What this is doing is opening the contents in a single column and it expects CSV format. The custom column shows up like this: Next step is to expand the table. The contents will be displayed in your custom column: Then, I did a split on the CSV separator (semi-column in this case), so I ended up with three columns: The only thing left to do is clean up (remove columns) and filter rows (since my CSVs had headers the header from the second CSV is still in my data). The end result is: For your reference, here is my code:

With this I hope Sunflowers and Nicolas are happy J

Analyzing Amsterdam Crime Rates with Power BI

Here is another example of what can be done with Power BI. This time I attempted to analyze crime rates in Amsterdam.

I retrieved burglary and burglary attempts figures from the Dutch Police website: http://www.politie.nl/misdaad-in-kaart/lijst?geoquery=amsterdam&categorie=1&categorie=2&pageSize=500&page=1

There are three pages here, so I use Power Query to get all three pages into Excel and then appended them together in one big table. I did this using three ‘from web’ commands in Power Query, so I got the following queries:

Next step was to append the three tables together. I started with the ‘append’ command in Power Query which allowed me to append Page1 and Page 2 together. A quick edit of the code gave me the result I was looking for:

And the resulting table:

Next, I loaded the table to PowerPivot, added a quick sum and then opened Power Map by going to InsertàMap in Excel.

In Power Map I created three layers, one using shapes, another one using a heat map and a third using a column graph. Then I created a tour using these layers. The resulting video is below.

As you can see, it is really easy to visualize geographical data using Power BI in Excel! Pretty cool huh?

Top 3 of what is new in Power Query for Excel

Just recently the December update of Power Query for Excel has been published. See this blog post on the Power BI for the details: http://blogs.msdn.com/b/powerbi/archive/2013/12/13/one-step-closer-to-simplifying-data-analysis-and-visualization-new-features-added-to-power-bi-for-office-365-and-power-query-add-in-for-excel-preview.aspx.

In this post I want to give you my personal top 3:

1. Automatic detection of table relationships. Like PowerPivot if you import multiple tables from the same source Power Query now detects the relationships and brings them over to the Power Pivot Data Model automatically.

2. Connectivity to new data sources. Now supported: Exchange, Dynamics CRM Online, JSON light and Sybase IQ.

3. Fill Down transformation. Fills all empty cells in a column with the value of the first non-empty cell above them. Very handy for those matrix like Excel sheets!

I will come back to some of these in later posts. Happy Power Querying!

%d bloggers like this: