Category Archives

44 Articles

Comparing Datazen, SSRS and Power View

It is a difficult task, but it can be done… comparing Datazen, SSRS and Power View. See http://www.sqlchick.com/entries/2015/6/20/comparison-of-datazen-vs-ssrs-reporting-services-vs-power-view for a in-depth comparison!

Webinar on Azure Machine Learning

Last week I did a live webinar on Azure Machine Learning. The webinar can serve as a introduction into the subject of machine learning, data mining, predictive analytics as well as Microsoft’s solution for it: Azure Machine Learning.

Watch the recording here (in Dutch, sorry).

Power BI Pro Tip: making date / time calculations work (Time Intelligence)

Ever so often I get asked how to do a year-over-year, quarter-over-quarter, month-over-month or year-vs-year calculation in Power BI. In most cases people would like to create a KPI to measure a certain periods performance compared to another. Power BI (specifically DAX) provides great functions for this; the Time Intelligence functions. In this scenarios PREVIOUSMONTH, PREVIOUSYEAR, SAMEPERIODLASTYEAR are used most. However, there are some frequent mistakes that result in errors when using these functions:

1) You will need to have a Date table in your model. Technically you do not need one, but you need to make sure the column you use for the time based calculations contains only unique values/dates. This is often not the case with sales happening more than once a day! Once you have the date table in the model, make sure to create a relationship between your facts date (for example sales date) and the date table.

2) The time intelligence functions should really be used as measures; not as calculated columns. This means their position in the Excel PowerPivot 2013 screen is under the horizontal line, not in the columns above.

3) Time intelligence functions work best when using totals, averages or other aggregated info.

Here are some examples. I will use ‘Date'[Date] as the reference to my date column in my date table. Also, for a best practice I split the calculation in two parts: the first part just calculates the total sales, while the other calculations refer to that base calculation.

Note that the last one uses SAMEPERIODLASTYEAR which is more flexible as it will select the same day in the previous year or the same month in the previous year depending on the selection the user makes in the tables/graphs. This is however not always what you want; so you can make it more specific by using PREVIOUSYEAR / PREVIOUSMONTH etc.

You could also use DATEADD to be even more flexible:

Notice by the way that I tend to use the short-hand notation to prevent me from having to type CALCULATE all the time (yes I am lazy :)). Here is an example of the two ways to get the sales for the previous year, the first line is the short-hand, the second is the more elaborate but not less correct option:

Hope this helps!

 

Loading multiple JSON files using Power Query

I had to figure out recently how to load multiple JSON files using Power Query. It turned out to be less easy than expected, so I figured it is worth blogging about…

The scenario: I have multiple JSON files sitting in a container in Azure Blob Storage; I would like to load them all into a data model for use in Power BI. I am assuming all the files you want to load are in one container. My solution will not work for multiple containers.

I will be using Power Query for this, from the Power BI Designer. You could do the same using Power Query in Excel.

First, let’s connect to the blob storage. This part is easy. Just click Get Data à More in the Power BI Designer and then select ‘Azure’ and then choose Microsoft Azure Blob Storage and click Connect:

 

In Excel, navigate to the Power Query tab, select From Azure à From Microsoft Azure Blob Storage:

 

 

You will need to enter your Azure Storage account name and key. Next, you will see a list of containers in the blob storage. Select the container the data is in and choose Edit:

 

What we will need to do is create a function that loads the JSON files. To do this we use an approach similar to loading multiple Excel or CSV files (see here and here respectively): first we just load one file and then we convert it into a function which we will call for all files we want to load.

So first, click on ‘Binary’ in the first column for one of the rows representing a JSON file. You will a one column table listing all records in the JSON file (the exact number of rows changes with the length of the JSON file):

What you want to do is convert the records into a table by clicking on the button:

You will probably see a ‘to Table’ dialogue, allowing customization of the conversion; for JSON you normally should not have to change the defaults, so click OK.

Next step is to expand the resulting Column1 to see some actual data. To do this click the expand button to the right of the column header and click OK (I deselected the ‘use original column name as prefix’ option):

And voila: a nice looking table of the records in this JSON file:

We are not done however; this was the easy part. Remember we need to create a function that will enable us to iterate over multiple files.

To start editing the code hop over to the Advanced Editor (ViewàAdvanced Editor). Your code should look something like this: (Your last line will be different from mine since it is dependent on the contents of the JSON)

First, we will need to wrap this in a function, so add this line at the top:

Then, add this at the bottom:

We need to edit the line that defines ‘contents’ to look like this:

Your code should look like this:

Click ‘Done’ and give the query a descriptive name (I suggest naming it the same as the function: LoadJSON)… pfew, that was not too bad right? So now, let’s use this function on all our JSON files. Let’s do the same as we did at the start; connect to the blob and stop at the screen where you have a list of files in the container:

Since we only can apply the function to JSON files, my first step is to filter on the Extension being just ‘.json’:

Then, we need to get rid of all the columns except Name and Folder Path. To do this, select the columns to keep and choose Remove Other Columns.

Now, let’s call the function and pass in the path and name parameters. Insert a custom column (Add Column à Add Custom Column) with the following setting:

Then, we need to expand the resulting custom column by clicking on the little expand button again:

Click ‘OK’. Now you have all the contents visible. To clean up lets delete the Name and Folder Path column since we do not need them anymore. Since this is JSON you will probably want to fix data types before reporting on this.

And…. You’re done, how cool is this?

First look: Project Oxford – powerful face, vision, language and speech APIs

Just announced: Project Oxford (http://www.projectoxford.ai), a project that aims at providing powerful APIs for developers that are looking to use face, speech and language recognition capabilities to their applications. The site of Project Oxford provides a way to interact with the APIs currently available.

In this first look we will focus on the Face APIs that can be used for face detection (finding faces in photos, determining gender and age), age verification (checking that a person in two photos is the same), similar face searching, face grouping and face identification.

Here is a sample result of the Face Detection API (you can do this yourself too!) using a photo of me:

This is actually quite good, this picture was made a while ago and I was 29 at the time, so not too far off.

The Face Verification API allows you to check if the person in two photos is the same. Here is the result of my test (again, do this yourself!):

Have a look at that, I turn out to be the same person in both photos J

This is very powerful stuff, I am looking forward to start using this in projects. Will keep you posted on that.

Walkthrough: how to connect to Dynamics CRM Online with Power BI

In this walkthrough I will step through the process of connecting to a Dynamics CRM Online instance with Power BI (specifically Power Query).

For this you will need to latest version of Power Query installed. After you launched Excel, navigate to the Power Query tab and choose From Other Sources à Dynamics CRM Online. You will need to enter the service URL in this window:

The OData service URL takes the following format: https://.crm.dynamics.com/XRMServices/2011/OrganizationData.svc.

Once you filled out your tenant name click OK. In the next screen you will be asked for your credentials. Since I use a demo environment I will need to use an organizational account. If your organization uses Dynamics CRM Online in production chances are you will be automatically authenticated or can use your Windows account.

Next step is to specify what tables I would like to load:

I chose OpportunitySet, since I wanted to get a list of the opportunities in the system. The opportunities have a modified date which I would like to show as an ‘age’ in days; meaning that I would like to show the number of days that have passed since the opportunity was last modified. I can easily do that using the Power Query editor (select the table and click Edit); select the ModifiedDate column and use Transform à Date à Age to calculate a rather exact age:

After the transformation the column looks like this:

This is awfully exact, I only wanted the age in number of days. To change this choose Duration à Days:

And now the column reports 60 days.

 

As you can see, it is very easy to retrieve data from Dynamics CRM Online; we even did a typical ‘age’ or ‘number of days passed since’ type of calculation, because retrieving the data was so easy!

Power BI Public Preview now available worldwide

Yesterday, Microsoft announced that the Power BI Public Preview is now available worldwide. Until now only US based users could access the preview. Not anymore, so you can checked out all the great new stuff right now on http://www.powerbi.com.

See http://blogs.msdn.com/b/powerbi/archive/2015/03/16/power-bi-preview-now-available-worldwide.aspx

Happy Power BI-ing!

New Power Query update

Recently a Power Query update was released (see http://blogs.office.com/2015/03/05/3-updates-excel-power-query/). Mayor updates: performance on load, Dynamics CRM Online connector and new transformations, most notably advanced date/time calculations. Personally I enjoy the CRM Online connector, but I am most fond of the ‘Age’ transformation; it makes it very easy to do the typical ‘number of days since this order was entered’ type of calculations, since it compares the date in the column with today.

The update to Power Query is available here: http://www.microsoft.com/en-us/download/details.aspx?id=39379&WT.mc_id=Blog_PBI_Announce_DI

Enjoy!

 

In memory technology in SQL Server

Everybody noticed the increase in technology using in memory techniques. At SAP they fully go for Hana, Oracle just started last year with in-memory database. At Microsoft we started in 2012 with in memory analytics and added OLPT in memory April 2014. The buzz is high with big marketing events, lots of whitepapers and broad press coverage.

So, but what about the real life practice?

When I visit my customers (top-50 in The Netherlands) I rarely see in memory databases used. So I always ask why they don’t make use of it. This resulted in the following reasons:

  1. I didn’t know I could run in memory with my databases.
    The marketing engine could be hitting the wrong people. Lots of database administrators are not up to speed.
  2. We don’t do it because it must be very difficult.
    True – if you use SAP then it’s common knowledge that implementing SAP Hana is not very easy. And you have to rewrite some of your programs. False – if you use Microsoft SQL Server. To start using in memory you can switch it on for certain tables (or part of tables) and without any change to the application it will work.
  3. The power of our servers is high enough. We don’t need the power.
    This is of course a compliment that our SQL Servers run so smoothly (-:

But still I think that by using in memory technology you can achieve the following:

  • Prevent hardware refresh. If servers run out of performance, moving to in memory the speeds increases again by 5x – 10x. Thus the servers can remain the same.
  • Run more VM’s on a host. By using in memory technology the number of cores can be less because of the more economic processing in memory. Thus more core’s for new VM’s on the same host.
  • Increase processing to reduce wait time for your users

So my advice: Start experimenting with the technology and look for those business cases.

My ask: anyone who has experience with the practical implementation: please reply with your live experience!

R Plotting using Azure Machine Learning

Azure Machine Learning is Microsoft’s cloud data mining and machine learning solution. It features a studio that is fully web based. One of the best features is integration with R through the ‘Execute R Script’ component. One of the best things of R is the plotting capability and I recently decided to try to make R plots from Azure ML studio. It is amazing how easy this works and it really brings the power of Azure ML together with the great exploration, plotting and data manipulation capabilities of R.

Here is a very simple sample I made:

I used to Flight Delays sample dataset from Azure ML to make this. In the ML Studio you will need to create a new experiment and drag the ‘Flight Delays Data’ component to the canvas. The only other component you will need to drop on the canvas is ‘Execute R Script’ (I told you this was a very simple example). Drag a line from the data to the left most input port of the R script container like so:

Click on the R script component and edit the R script on the right. Here is my script:

 

This script gets the data from the input port and rbinds it into data.set. Then I executed a very simple plot using the plot base R package to create the plot shown above. The last line of this code is not even necessary but it was there by default.

After running the experiment the plot can be seen by selecting the right output port of the ‘Execute R Script’ container and selecting ‘Visualize’:

The plot will be at the bottom of this page.

Pretty cool huh? Stay tuned for more as I will continue experimenting with R integration in Azure ML as well as other ML things.

%d bloggers like this: