Monthly Archives

5 Articles

Power BI licensing explained: Free vs. Pro – when do you need Pro?

The pricing table on the Power BI website does a good job at explaining when a free account is acceptable and when a pro account is required. However, it does not explain all nor is really clear (in my opinion). So, after some digging I came up with this: an step-wise wizard that helps you determine if you can use a free account for Power BI or if you need pro (below); simply answer a series of Yes/No questions and you will know if you can use free or really need pro. Please note that this is no official communication and by no means I am responsible for any errors. Use this at your own risk.

Enjoy!

Using custom .NET activities in Azure Data Factory

Azure Data Factory provides a great number of data processing activities out of the box (for example running Hive or Pig scripts on Hadoop / HDInsight).

In many case though, you just need to run an activity that you already have built or know how to build in .NET. So, how would you go about that? Would you need to convert all those items to Hive scripts?

Actually, no. Enter Custom .NET activities. Using this you can run a .NET library on Azure Batch or HDInsight (whatever you like) and make it part of your Data Factory pipeline. Regardless of whether you use Batch or HDInsight you can just run your .NET code on it. I prefer using Batch since it provides more auto-scaling options, is cheaper and makes more sense to me in general; I mean, why run .NET code on a HDInsight service that runs Hive and Pig? It feels weird. However, if you already have HDInsight running and prefer to minimize the number of components to manage, choosing HDInsight might make more sense than using Batch.

So, how would you do this? First of all, you would need a custom activity. For this you will need to .NET class library and need to extend IDotNetActivity interface. Please refer to https://azure.microsoft.com/en-us/documentation/articles/data-factory-use-custom-activities/ for details. Trust me, it is not hard; I have done it.

Next, once you have a zip file as indicated on the page above, make sure to upload it to a Azure Blob store you can use later. The pipeline will need to know what assembly to load from where later on.

You will need to create an Azure Batch account and pool if you use that. If you decide to use HDInsight either let ADF spin one up on demand or make sure you have your HDInsight cluster ready.

You will need to create input and output tables in Azure Data Factory, as well as linked services to Storage and Batch or HDInsight. Your pipeline will look a bit like this:

 

Switching from Batch to HDInsight means to changing the LinkedServiceName for the activity to point to your HDInsight or HDInsight on demand cluster.

Tables are passed to the .NET activity using a connection string, so essentially if you have both input and output tables defined as blob storage items, your custom assembly will get a connection string to the blob storage items, read the input files, do its processing and write the output files before passing on the control to ADF.

Using this framework the sky is the limit: anything you can run in .NET can now be part of your ADF processing pipeline…pretty cool!

R package for Azure Machine Learning

A little while ago an R package for AzureML was released, which enables R users to interface with Azure Machine Learning (Azure ML). Specifically, it enables you to easily use one of the coolest features of Azure ML: publishing and consuming algorithms / experiments as web services.

Check it out: https://cran.r-project.org/web/packages/AzureML/vignettes/AzureML.html.

First Look: Cortana Analytics Suite

The First Look series focusses on new products, recent announcements, previews or things I have not had the time to provide a first look at and serves as introduction to the subject. First look posts are fairly short and high level.

Cortana Analytics Suite is Microsoft’s connecting and integrating suite of products for Big Data and Advanced Analytics. It combines a number of technologies Microsoft had before into one suite and adds new, ready to use capabilities for business solutions such as churn analysis.

For more information on Cortana Analytics Suite see http://www.microsoft.com/en-us/server-cloud/cortana-analytics-suite/overview.aspx.

Also, please note that there will be a Cortana Analytics Workshop 10/11 september 2015: https://microsoft.eventcore.com/caw2015 .

First Look: Azure Data Catalog

The First Look series focusses on new products, recent announcements, previews or things I have not had the time to provide a first look at and serves as introduction to the subject. First look posts are fairly short and high level.

Azure Data Catalog is a service that is now in public preview that provides a one-stop access layer to data sources; it abstracts away specifics of accessing data that are dependent on where and how data is stored, such as server names, protocols, ports, etc. It includes easy to use search and publishing tools, so both business and IT can collaborate together on providing a general, easy to use data access layer to all employees.

For more info on Azure Data Catalog see: http://azure.microsoft.com/en-us/services/data-catalog/

%d bloggers like this: