Tag Archives

25 Articles

Automatically building a Microsoft BI machine using PowerShell – preparation: logging infrastructure (post #4)

This post is #4 in the series to automatically build a Microsoft BI machine using PowerShell – see the start of series.

In this series so far:

Start of series – introduction and layout of subjects
Post #2 – Preparation: install files using Azure disk
Post #3 – Preparation: install files using Azure File Service

Our final step in preparation is setting up a logging infrastructure. I found a very simple to use function online, see the code below:

Including this function in the script enables any step to write to a log by passing a $Message to this function.

Next post will be our master script.

Power BI Pro Tip: Pareto analysis with DAX / Power Pivot

Today’s post is a guest post by Michiel Rozema (https://www.linkedin.com/in/michielrozema). Thanks Michiel!

Dutch Data Dude Jeroen approached me with the question whether it would be possible to create a Pareto chart from a Power Pivot model, using DAX. Doing a Pareto analysis using Excel is easy and numerous ways of doing it can be found online, but Jeroen wanted to use DAX formulas and could not find the solution online. I’m always in for a challenge, so here we go…

A Pareto chart (https://en.wikipedia.org/wiki/Pareto_chart) is a combo chart containing a column chart for a certain value, sorted in descending order, and a line chart with the cumulative column values, expressed as a percentage. Like this:

The issue here is, of course, the cumulative percentage. It resembles a year-to-date total where we have months on the X-axis: for e.g. the month of May, the year-to-date total is the total for all months up to and including May. In the Pareto chart above, the percentage value for Accessories is the total of all product categories up to and including the Accessories category itself. There is no built-in DAX function for this, but as it turns out, a simple combination of a few DAX table functions does the trick; including a use of TOPN that I had not thought of before.

Let’s start with the data model. I have created a simple model with two tables, one for sales numbers and one for products:

We want to create a Pareto chart based on product categories, which is actually the chart shown above. For the column values in the chart, I create a basic calculated field:

For the cumulative percentage field, we need to calculate the cumulative total and divide that by the total amount for all categories. So let’s first create a calculated field for the latter one:

In this formula, ALL(Product[Category]) removes an existing filter from the Category column, therefore returning the result [TotalAmount] for all categories instead of only one.

Now it’s time to calculate the cumulative total. Let’s take the Accessories category as an example. To calculate the cumulative total for Accessories, we need to somehow determine that there are three categories placed to the left of Accessories, calculate their values, and add up the whole thing.

Remember that in the chart, the results for [TotalAmount] are shown in descending order. So we can say that for Accessories, we need to sum all categories for which [TotalAmount] is larger than the result for Accessories. If we had a Category table in our model with [TotalAmount] as a column, we could have made this calculation in a calculated column with a formula like the following:

However, we don’t have this column, [TotalAmount] cannot be a column either (we may want to add other tables to the model later on and to be able to filter the chart on customer segment, or year) and using calculated columns is not a good idea in general. So we need to take a different approach using calculated fields, and we cannot use EARLIER because we will not have a row context EARLIER can refer to.

To rephrase the cumulative total problem, we need to be able to pick some categories out of the whole list of categories based on the results of [TotalAmount]. There is a DAX function that can do this: TOPN. The obvious use of TOPN is to do calculation on for instance the top 10 customers, but in this case we will use a variable value of N in TOPN. Taking Accessories as an example again, we need to calculate the total amount for the top 4 categories. But to do that, we need to determine that Accessories is the number 4 category when it comes to [TotalAmount]. For this, we use another table function, RANKX. So we first create the calculated field below:

What does RANKX do? To quote the Power Pivot tool tip, it ‘Returns the rank of an expression in the current context in the list of values for the expression evaluated for each row in the specified table’. So, our calculation evaluates [TotalAmount] in the current context (in our example, the Accessories category), then loops through the rows of ALL(Product[Category]), which is a list of all categories (remember that ALL is a table function, and we need to use ALL because of the current context), and evaluates [TotalAmount] for each category. It then returns the rank of the result for Accessories in the list for all categories. Below is the list of results of [TotalAmount] for all categories:

When we sort the list in descending order, we can see that indeed, Accessories is the 4th category:

With the rank, we can now calculate the cumulative total using the TOPN function:

The calculated table we use in this SUMX statement:

returns, in our example Accessories category, the four categories with the largest value of [TotalAmount]. The SUMX itself sums the [TotalAmount] values of these four categories.

Now, the only thing left to do is to calculate the Pareto percentage:

In the chart, we sort on the [TotalAmount] field used for the columns, and put [Pareto%Category] as a line chart on the secondary axis.

Creating a Pareto analysis on the Product level works exactly the same, obviously, the only difference is that we have to take care of two columns that can filter the products, [ProductCode] and [ProductName]. The calculated fields are below:

 

 

Here’s the Pareto chart with the large number of products:

 

Just for fun, we can add categories to the X-axis and have many Pareto charts in one. I don’t really think this makes sense, but it’s nice that it works and returns the right percentages in each category. It works this way because we used the right ALL statement in our calculations.

So, creating a Pareto chart with mostly DAX can be done. And the combination of RANKX and TOPN turns out to be a very powerful one, which will certainly prove useful in other situations.

Automatically building a Microsoft BI machine using PowerShell – preparation: install files using Azure File Service (post #3)

This post is #3 in the series to automatically build a Microsoft BI machine using PowerShell – see the start of series.

 

In this series so far:

Start of series – introduction and layout of subjects
Post #2 – Preparation: install files using Azure disk

 

 

In our last post we looked a one way of working with the install files required for automating the installation of a BI machine, using disks. This post will focus on sharing the install files using Azure File Service. The Azure File Service exposes file shares using the standard SMB 2.1 protocol. It is in some ways an addition to storage accounts. See http://blogs.msdn.com/b/windowsazurestorage/archive/2014/05/12/introducing-microsoft-azure-file-service.aspx for more information. This service is in beta at the moment, so you will need to subscribe to the beta using the Azure Preview portal: http://azure.microsoft.com/en-us/services/preview/. Look for ‘Azure Files’ in the list and click on ‘Try it’ to get your account activated for the preview.

The Azure File Service is not exposed in any portal, probably since it is in preview. Also, keep in mind that while the service is in preview existing storage accounts will not have access to the File Service, so we will need to create a new storage account as well. To do this login into the portal and click on ‘New’ and create a new storage account. After the storage account has been created, you will need to use PowerShell to create a file share. Make sure you have the latest version of Azure PowerShell installed and then run the following in Azure PowerShell or use ISE:

After this runs you should be able to access the file share in multiple ways, but the easiest way I found is mapping the share as a folder in a VM by running:

Now you can download and store files on the share just as you can with disks, as discussed in post #2 on using install files using Azure disks.

Next post will be our final step of the preparation: logging.

 

 

 

Automatically building a Microsoft BI machine using PowerShell – preparation: install files using disk (post #2)

This post is #2 in the series to automatically build a Microsoft BI machine using PowerShell – see the start of series.

The first step in our preparation is making the install files available. I see two options for this, namely using VHD / disk and Azure File Service. In this post we will walk through how to make the install files available using a VHD / disk in Azure.

The way this works is making a new disk that you can store the installer files on. After you created your virtual machine that you would like to automatically you would attached that disk so the installer files are available to the machine and thus to the automatic installer script.

To create a new disk, log in into the Azure portal (either the production or the preview) and navigate to Virtual Machines. Select a machine you have created. This could be the one you will be setting up or any other VM. The disk you will create is re-usable across machines. Once you have a machine selected, click ‘Attach’ and select ‘Attach empty disk’ to create a new empty disk:

Enter a file name for the new disk and set up the size in GB you expect to be using. No need to change the host cache settings here. When done, click button at the bottom right:

After the disk has been created, login to the machine you attached the disk to using Remote Desktop. You can you download the install files and save them to the disk you have just attached. In the VM, Start ‘Disk management’ (for example by right-clicking the Windows button and selecting it from the list). You will see a notification to initialize the disk. Accept the defaults and click OK:

Once the initialization is done we need to create a partition on the disk. In Disk Management, right-click on the new disk and choose ‘New Simple Volume’. Follow the steps of the wizard, taking note of the drive letter assigned. Also make sure to set a volume label and wait for the format to finish.

Once the formatting is done, you can download and store you install files to the disk. I created a folder in the root of the disk named ‘Resources’ and created a sub-folder per software item required. I saved the install files in these folders. The scripts we will create will point to these install files. The scripts I also store on the same install disk.

When you are done downloading the files (and later making the scripts) you can detach the disk from the VM and re-attach it to another machine by using the portal and select the VM that currently has the install disk attached and selecting ‘detach disk’ and choosing the disk to detach. You can then re-attach the disk to another VM.

This is an OK way to work with the install files. In the next post we will explore an alternative way using Azure File Service.

 

 

 

 

 

Automatically building a Microsoft BI machine using PowerShell – Start of Series

I used to spend quite some time on building and re-building Microsoft BI demo machines. As you can imagine this manual process takes a lot of time and effort. Therefore (and also for my own education on PowerShell) I decided to look into automating the whole process. I will explain this in this series of posts.

The goal

In the end, we want to have a virtual machine that is configured as follows: Windows Server 2012 R2, with Active Directory Domain Controller role. Additionally, SQL Server 2014 is installed and configured as well as SharePoint 2013. Finally, the BI tools like Power Pivot and Power View are configured.

Ok, but how do we build such a machine?

Here are the steps to take. I always do them in this order, partly because there are some dependencies and partly because it stops me from going insane.

  1. Install Windows (doh). I will skip this step (therefore it is number 0) since I use Azure and a VM in Azure comes with Windows Server pre-installed. I happen to use Windows Server 2012 R2 b.t.w.
  2. Disable Internet Explorer Enhanced Security Configuration. Although it is a great idea (see http://technet.microsoft.com/en-us/library/dd883248(v=WS.10).aspx for more info on this) it is hard to give a good demo on the machine with this thing on. So first step is disabling it.
  3. Set up Active Directory; AD is required for the PowerPivot service.
  4. After AD has been set up we need to promote the Domain Controller.
  5. After promotion we configure a very unrestrictive password policy; remember, this is just a demo machine!
  6. Virus protection is important, even for a demo machine; therefore set up System Center Endpoint Protection.
  7. Install SQL Server 2014.
  8. Install SharePoint.
  9. Install PowerPivot Service.
  10. Configure PowerPivot Service.
  11. Configure last parts of PowerPivot Service.
  12. Configure Master Data Services.
  13. Configure Data Quality Services.
  14. Configure other SharePoint Service Applications.
  15. Activate SharePoint site features.
  16. Add favorites in Internet Explorer to point to MDS and SharePoint site.

In this blog series I will share my PowerShell code to accomplish this. Please note that I am not a developer so things can probably be done a lot smarter J

Next step is preparation: the install files.

Power BI Pro Tip: Confusion about TOPN() versus RANKX()

This post serves as a follow-up on my Power BI Pro Tip about using RANKX to show Top X results. I am writing this because I discovered that there is a lot of confusion about the RANKX() versus the TOPN() function.

Let me try to explain what each function does. The RANKX() function ranks individual data rows according to a certain ranking attribute. It’s result is a numerical value associated with each and every single row of the data table, as you can see below:

 

Now, the TOPN() function sounds the same, right? That’s were the confusion comes from. Also, the call to the function is really similar:

RANKX( ; )

TOPN( ; ; )

 

However, TOPN does not return a value for each row in the data table. It returns a table that contains the top N items (N is the number you specified in the first argument) from the original data table according to the ranking attribute you specified.

In itself this is pretty useless, since you cannot display this data in any way. If you could it would be an alternative way to get a top N ranking to RANKX.

To make TOPN useful you need to wrap it in another function, such as SUMX or AVERAGEX. Let’s see an example:

MyMeasure := SUMX(TOPN(10;Sales;Sales[Sales Amount]);Sales[Sales Amount])

 

Now, MyMeasure equals the sum of sales amount for the best performing cities. At this point I do not know which cities it were, and maybe that is not even important to you. The total sales amount for the top 10 performers is returned.

This comes in handy when benchmarking an individual or organization against a bigger population. You could do an AVERAGEX of the TOPN result and that would be the average score for the top 10 performers. A dashboard showing how an individual measures up against the top 10 is then quickly created.

 

 

Power BI pro tip: using Access Online for data entry

With powerful self-service BI tools such as Power BI comes the need for business user data entry; data does not exist in source systems or does need to be enhanced / enriched before going into the report, or the business user just wants to change the way the data is organized. In those cases (which are present more often than not) we need to find a way to give the business user an easy to use way to do data entry while keeping it robust: i.e. not use a tool the user could easily make mistakes in and hurt the reporting process. You could use Excel but you would have to secure it so no mistakes can be made. Also, SharePoint lists are a good option if you have less than 5000 data rows (that’s the hard limit in SharePoint Online). If you need to store a lot of data and need a robust solution, Access Services or Access Online is a great tool for the job and the best part is it works perfectly with Power BI.

Perhaps the biggest change in Access 2013 is that it now stores that in SQL Server Databases rather than Access files. In this post I will show you how to build a sample application concerning reports on KPIs for production plants around the world. The data is entered by the business user using a web form generated by Access and the dashboard is created using Power BI. So here we go.

First step is to get the data. For that I created a simple Access 2013 application that I published on my SharePoint Online site. The Access application consists of three tables: KPIs, Periods, Plants and of course the actual facts: the KPI Values. On top of this sits a very basic data entry screen that enables the user to enter new actuals and targets for a KPI for a period for a given plant:

I entered some test data and saved the app. Imagine your business user just entering their data in here.

The next step is to get the data out of the SQL database Access Services will store it in and build a report / dashboard on top of it. For this, you will need to go to the Info pane of the File menu in Access. Look for the ‘Manage’ button next to Connections:

If you click it you get a big flyout presenting you with a lot of options. You will need to select the following:

-From My location or From Any location. I chose from Any.

-Enable Read Only connections.

See this screenshot:

Now, click on ‘View Read-Only Connection Information’ and leave it open for now. You will need to later.

Next step is to start Excel, go to Power Query, select From Database à SQL Server (and not Access since data is stored in SQL Server by default in Access 2013).

Copy paste the server and database name from the Connection information screen in Access and choose Ok. In next screen enter your credentials and passwords (again copy/paste from the connection information screen in Access). After a while you can select the table you are interested in and you can load the data into PowerPivot. I loaded my Plants, Periods and Values (I skipped KPIs since it was only the KPI label):

Next step is to create relationships between tables in PowerPivot, hide some columns as well as add a KPI definition. I ended up with this model:

Now, with Power View I created the following basic report (I did not give myself time to work on the layout, this is just quick and dirty):

 

This concludes this Power BI Pro Tip!

Power BI Pro Tip: Name your linked tables

Just a quick Power BI Pro Tip this time: if you use linked tables to add data to your data model in Excel, before you press the ‘Add To Data Model’ button be sure to go to the table properties in Excel and give your table a better name (better than Table X). This makes figuring out which data you are looking at so much easier. You will thank me later J

Power BI Pro Tip: Show Top x results with RANKX() function

One of the more frequent scenarios is listing the top X results, such as most profitable products, biggest customers, top 10 best selling stores, etc. Also doing a top X selection helps reduce clutter in charts: a lot of data points can work as noise and obscure the data points that really matter and make the biggest impact.

In this post I describe an approach to implementing these scenarios using Power Pivot’s RANKX() function.

Let’s start with a simple dataset consisting of products (P1…P20 in my sample), Cities, Sales Amount and Number of products sold:

 

After adding this table to the Power Pivot data model, we can use the RANKX() function to get the best selling products / cities etc. I added the following measures to my table:

Sum of Sales Amount:=SUM([Sales Amount])

Sum of Number Sold:=SUM([Number Sold])

Rank of products by sales amount:=RANKX(ALL(Sales[Product]);[Sum of Sales Amount])

Rank of city by number sold:=RANKX(ALL(Sales[City]);[Sum of Number Sold])

 

These measures allow me to determine the top selling products by sales amount and best cities by number of products sold.

Only thing left to do is to use a Pivot Table / Pivot Graph or Power View / Power Map visualization and display the results.

 

If you create a new Pivot Table and add the Product column and the ‘Rank of product by sales amount’ measure you get the following:


 

So how do we get the top 10 selling products by sales amount is a nice ordered fashion? Very easy, just a matter of the right sorting and filtering. Click on the little downwards pointing triangle button at Row Labels and choose ‘More Sort Options’. There I chose Ascending and then selected the rank measure:


 

Now the Pivot Table is sorted by rank with the highest ranking product at the top. Now, to filter out only the top ten, we press the same button again and choose Value Filters and then Top 10. Here I made the following selections:


 

This seems maybe a bit counter intuitive, but what this does is return the lowest ten ranks (which would be 1 to 10 or the highest ranking products). Alternatively I could have used a Lower Than or Equal To Value Filter with these settings to produce the same result:

And here it is: a top 10 of products by sales amount.

 

Of course, you can also use Power View or Power Map to visualize these results. Here is a Power View based on the same information:

The trick here is to create the visualization just as normal (as above). Above displays the sales amount by product and the number sold by city. However, the catch here is that both the graph as well as the map have a filter on them that utilizes the rank measures I created. Here is the filter for the chart. The ‘Rank of products by sales amount’ measure is filtered to showing only values less than or equal to 10, i.e. the top 10.

What’s best about this is that it is very easy to change from top 10 to top 15 to top 5 or anything you desire. Also, the Power View is fully interactive. For example, clicking on one of the cities on the right shows which products are sold in that city. Note that it does not show the top 10 products in that city however.

Hope you liked this Power BI Pro Tip!

Power BI Pro Tip: LOOKUPVALUE() function

Power Pivot is a great way to do data modelling and analysis right in Excel. It works great for data that is dimensionally organized (facts and dimensions) as well as other forms of data. It even enables you to define relationships between datasets regardless of source. However, one thing that has been hard is the following: consider the scenario where you have stock values for certain stocks for certain days, like below:

 

Let’s also assume you have a table that shows marketshare (or something else) per company on a certain date, like this:

Finally, you have a third table that lists the stock label by company, like so:

Now, assume that you would like to add the stock value of a company on a certain date next to the market share for that company at that date so as to provide more context to a potential relationship between market share and stock value. Maybe a bigger market share has an impact on stock value?

Naturally, what you would do is load these tables into Power Pivot so you get the following:

Now, the next step would be to add relationships between these two tables. The relationships should be defined as follows:

I.E.: Stock labels and Market shares are related on the Company column, whereas Stock and Stock Labels are related on the Stock / Stock label column.

We can now try to get the stock value for the company at a certain date, but how? Just using RELATED() to get stock values will not work as it will return a table. You could use MAX or MIN to then get a maximum or minimum value, but that is not what we are after: we wanted to return the stock value for that company at exact that date. More generally, this problem occurs when a table is related “twice” to another table, such as monthly targets by person vs. actuals (the relationship between the actual and target table is double: both on month as well as person).

The solution is using LOOKUPVALUE() and here is how. In the Marketshares table I add a calculated column with the following definition:

 

 

This might seem complex, so allow me to explain. What this does is the following:

Look up and return a value from

The Value column in the Stocks table (Stocks[Value])

For which

The stock label is equal to the stock label on record for the company (RELATED(‘Stock Labels'[Stock Label]))

And

The date equals the date of the market share information.

The result is:

Pretty nifty huh? Turns out that to use LOOKUPVALUE() this way you do not even have to be able to relate the lookup table to the data model at all. In my example the relationship between Stocks and Stock Labels is not even necessary, although I find it good practice to include all relationships just for clarity.

%d bloggers like this: