Warning: preg_replace(): Compilation failed: invalid range in character class at offset 4 in /home/customer/www/dutchdatadude.com/public_html/wp-content/plugins/crayon-syntax-highlighter/crayon_langs.class.php on line 340
2014 - Dutch Data Dude

Yearly Archives

40 Articles

Power BI learning resources – follow up 2

Looking to get up to speed with Power BI in limited time? You might be interested in this. “Power BI fundamentals” is a new course published to Microsoft Virtual Academy:



Power BI connectivity to Salesforce.com

Here is a short video on how to connect to Salesforce.com from Power BI:

You can now easily get data from Salesforce.com into your Excel / Power BI solution. Enjoy!

Power BI connectivity to SAP Business Objects Universe

Here is a short how-to video on how to connect to SAP Business Objects Universe through Power BI:


First Look: Azure DocumentDB

The First Look series focusses on new products, recent announcements, previews or things I have not had the time to provide a first look at and serves as introduction to the subject. First look posts are fairly short and high level.

In this First Look: Azure DocumentDB. DocumentDB is a NoSQL document database service designed for mobile / web applications. Since it is a NoSQL database it is a schema-free system it allows flexible storage and processing of data with scale up and down on demand, while still allowing easy querying using a SQL dialect. As such it operates in the same space as other NoSQL systems such as Cassandra, MongoDB, HBase and CouchDB and provides a feature-rich, cost effective and enterprise ready solution.

Development against DocumentDB can be done using .NET, Node.js, JavaScript and Python. On top of this DocumentDB provides a SQL Query language and it is elastically scalable both on throughput and storage.

More info is available at http://azure.microsoft.com/en-us/documentation/articles/documentdb-introduction/.

More to come!

First Look: Azure Stream Analytics

The First Look series focusses on new products, recent announcements, previews or things I have not had the time to provide a first look at and serves as introduction to the subject. First look posts are fairly short and high level.

Today in First Look: Azure Stream Analytics. This service is currently in preview and provides low latency, high available and scalable complex event processing in the cloud. To do complex event processing SQL Server has StreamInsight; with Azure Stream Analytics we provide a CEP engine in the cloud.

Azure Stream Analytics is built to read data from a source, let’s say a device sensor and collect, process and take action on it in a blink of the eye. A simple example would be to read temperatures from a sensor, aggregate them to calculate an average temperature per 5 seconds and store that data into SQL server. Or a fairly more complex example would be to take output from a video camera that reads license plates, run the license plate through a list of license plates of stolen cars and immediately send a message to a police car nearby.

Because this solution is cloud based it is easy to get started. You can be up and running in literally minutes.

More info is available on http://azure.microsoft.com/en-us/documentation/articles/stream-analytics-introduction/

More to come!

Zet uw data aan het werk

Big Data en Analytics zijn de populaire termen die je overal hoort. Iedereen heeft er de mond van vol. Vaak komen daar nog een of meerdere van de volgende termen bij: Machine Learning, Predictive Analytics, Self-Service BI, Advanced Visualization en Data Exploration.

Het lijkt zo langerzamerhand wel of al die zaken een doel zijn geworden in plaats van een middel. Onlangs zei een klant tegen mij: “wij doen aan Big Data”. Omdat ik wilde weten wat het doel van die effort was, zei ik: “Stoer, maar waarom?” Het antwoord was:
“Hoezo waarom? Nu we aan Big Data doen, worden we succesvol!”

Wij (technologie leveranciers, Big Data enthousiastelingen, adviesbureaus, consultants, noem maar op) vergeten soms dat al die technologie slechts een middel is in de handen van mensen om een doel te bereiken. Het doel is dan vaak het optimaliseren van de bedrijfsvoering of op een andere manier een economisch voordeel te behalen. Om dat doel te bereiken zijn de technologieën een fantastische aanzet, maar niet genoeg. Mensen maken of breken de inzet van de technologie. Data does not lie, interpretation does. Oftewel: de data kun je wel hebben, maar het is de interpretatie die maakt of het een succes wordt. De vraag “wat zegt de data” is al lang niet meer de enige vraag die gesteld moet worden, maar vooral ook “wat kan ik hiermee” en “hoe communiceren we het”.

Als we samen Big Data e.d. een succes willen laten worden dan moeten we investeren in mogelijkheden om het gebruik, de inbedding en het beschikbaar stellen van informatie gemakkelijk te maken en ook leuk. Want data is leuk.

Onlangs kreeg ik om 22:30 een sms van een data analist bij een klant. Ik had hem die dag gesproken en had hem uitgerust met wat ideeën over hoe hij de middelen kon gebruiken en naar zijn hand zetten. De sms luidde: “Ben net ‘wakker’ geworden achter mijn laptop op kantoor. Het is hier helemaal donker en iedereen is al lang naar huis. Normaal zou ik als een van de eerste buiten staan na de werkdag, maar nu heb ik me enorm vermaakt en ben ik de tijd vergeten!”

Ondanks dat doorwerken tot 22:30 niet echt gezond is en ik dat zeker niet aan iedereen wil aanraden, gun ik wel iedereen die met data werkt die magische creatieve kriebel als je iets moois aan het maken bent; ach, en als je daardoor eens de tijd vergeet, daar zou ik mee kunnen leven…

Op maandag 24 en dinsdag 25 november wordt weer het BI & IM Symposium gehouden (http://www.bisymposium.nl); het grootste evenement in zijn soort van de Benelux. Graag zie ik je daar.

First Look: Azure Data Factory

This is the first post of my new first look series. This series focusses on new products, recent announcements, previews or things I have not had the time to provide a first look at and serves as introduction to the subject.
First look posts are fairly short and high level.

Today in first look: Azure Data Factory. This service was only recently announced as is available to all Azure customers in preview mode. To get a hold of it make sure you open the Azure preview portal. In your classic Azure portal click on your email address in the top right and choose ‘Switch to new portal’ or go directly to https://portal.azure.com.

So what is Azure Data Factory? I may be downplaying it a bit, but essentially Data Factory gives you ETL in the cloud. It connects to both on premises as well as cloud data stores and enables you to read data from the stores, do transformations and publish data in stores, while at the same time providing rich analytics on how the data flow is doing. The paradigm here is a factory floor: pieces of data enter the factory floor as raw materials, they undergo some treatment (transformations) and go out the door at the other end of the floor as finished product. The canvas of Data Factory closely resembles this floor and shows an assembly line for data. Here is very simple example, which retrieves data in hourly batches from a table in Blob Storage and stores it in a SQL Azure table:


More info is available on http://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/.

More to come!


Power BI Pro Tip: Confusion about TOPN() versus RANKX()

This post serves as a follow-up on my Power BI Pro Tip about using RANKX to show Top X results. I am writing this because I discovered that there is a lot of confusion about the RANKX() versus the TOPN() function.

Let me try to explain what each function does. The RANKX() function ranks individual data rows according to a certain ranking attribute. It’s result is a numerical value associated with each and every single row of the data table, as you can see below:


Now, the TOPN() function sounds the same, right? That’s were the confusion comes from. Also, the call to the function is really similar:

RANKX( ; )

TOPN( ; ; )


However, TOPN does not return a value for each row in the data table. It returns a table that contains the top N items (N is the number you specified in the first argument) from the original data table according to the ranking attribute you specified.

In itself this is pretty useless, since you cannot display this data in any way. If you could it would be an alternative way to get a top N ranking to RANKX.

To make TOPN useful you need to wrap it in another function, such as SUMX or AVERAGEX. Let’s see an example:

MyMeasure := SUMX(TOPN(10;Sales;Sales[Sales Amount]);Sales[Sales Amount])


Now, MyMeasure equals the sum of sales amount for the best performing cities. At this point I do not know which cities it were, and maybe that is not even important to you. The total sales amount for the top 10 performers is returned.

This comes in handy when benchmarking an individual or organization against a bigger population. You could do an AVERAGEX of the TOPN result and that would be the average score for the top 10 performers. A dashboard showing how an individual measures up against the top 10 is then quickly created.



Very inspiring talk about Big Data

Just wanted to share this with you. Kenneth Cukier talks about big data and its future, risks and issues. I found it very inspiring! Also interesting to see that he reads the same books as I do: we use the same examples 🙂

Power BI Pro Tip: Dealing with errors when reading Excel files

If you use Power Query to read an Excel file that has errors (such as #VALUE, #REF, #N/A, etc) the rows that contain the error will not be loaded into the data model but instead will be flagged as error rows. But what to do if you really want to keep the data rows with the errors with some replacement value for the error instead of skipping the error row altogether?

To demonstrate this I created this simple Excel table that contains only errors (actually all errors that Excel can generate that I am aware of):

How they are generated and what they mean is not the point here, but I am sure most of you have seen some if not all of them before.

On loading this table using Power Query all five rows will be marked as error rows:

As a result, no data is loaded into your data model. This is as expected since Power Query filters out the rows that have an error, which in this case are all of the input rows.

However, in some cases this is not what you want. Suppose this was a 50 column dataset and that in one column sometimes an error occurs. Then do you want to disregard the rows with errors and thus delete all possible valuable info in the rows or do you want to somehow fix the errors and flag them as problematic but still load them? The latter is sometimes the better choice.

This can easily be done by editing the Power Query and using the Replace Errors function (on the Transform tab):

Make sure you have the column with the errors in it selected, enter a valid value for the data type in the column (in my case it was Any since there is really no other data available in the MyCol) and hit OK.

Power Query will replace the error with the value you entered:

Now, on loading the rows with errors will end up in the data model and your data will be flagged with the label you chose.

Hope this helps! Until next time!

%d bloggers like this: