Category Archives

47 Articles

Reporting on your Azure subscription made easy using Power BI

Just released, an Azure Enterprise Subscription content pack for Power BI. Reporting on the usage of your Azure subscription does not get any easier!

Check it out: https://powerbi.microsoft.com/en-us/documentation/powerbi-content-pack-azure-enterprise/

Using custom .NET activities in Azure Data Factory

Azure Data Factory provides a great number of data processing activities out of the box (for example running Hive or Pig scripts on Hadoop / HDInsight).

In many case though, you just need to run an activity that you already have built or know how to build in .NET. So, how would you go about that? Would you need to convert all those items to Hive scripts?

Actually, no. Enter Custom .NET activities. Using this you can run a .NET library on Azure Batch or HDInsight (whatever you like) and make it part of your Data Factory pipeline. Regardless of whether you use Batch or HDInsight you can just run your .NET code on it. I prefer using Batch since it provides more auto-scaling options, is cheaper and makes more sense to me in general; I mean, why run .NET code on a HDInsight service that runs Hive and Pig? It feels weird. However, if you already have HDInsight running and prefer to minimize the number of components to manage, choosing HDInsight might make more sense than using Batch.

So, how would you do this? First of all, you would need a custom activity. For this you will need to .NET class library and need to extend IDotNetActivity interface. Please refer to https://azure.microsoft.com/en-us/documentation/articles/data-factory-use-custom-activities/ for details. Trust me, it is not hard; I have done it.

Next, once you have a zip file as indicated on the page above, make sure to upload it to a Azure Blob store you can use later. The pipeline will need to know what assembly to load from where later on.

You will need to create an Azure Batch account and pool if you use that. If you decide to use HDInsight either let ADF spin one up on demand or make sure you have your HDInsight cluster ready.

You will need to create input and output tables in Azure Data Factory, as well as linked services to Storage and Batch or HDInsight. Your pipeline will look a bit like this:

 

Switching from Batch to HDInsight means to changing the LinkedServiceName for the activity to point to your HDInsight or HDInsight on demand cluster.

Tables are passed to the .NET activity using a connection string, so essentially if you have both input and output tables defined as blob storage items, your custom assembly will get a connection string to the blob storage items, read the input files, do its processing and write the output files before passing on the control to ADF.

Using this framework the sky is the limit: anything you can run in .NET can now be part of your ADF processing pipeline…pretty cool!

R package for Azure Machine Learning

A little while ago an R package for AzureML was released, which enables R users to interface with Azure Machine Learning (Azure ML). Specifically, it enables you to easily use one of the coolest features of Azure ML: publishing and consuming algorithms / experiments as web services.

Check it out: https://cran.r-project.org/web/packages/AzureML/vignettes/AzureML.html.

First Look: Cortana Analytics Suite

The First Look series focusses on new products, recent announcements, previews or things I have not had the time to provide a first look at and serves as introduction to the subject. First look posts are fairly short and high level.

Cortana Analytics Suite is Microsoft’s connecting and integrating suite of products for Big Data and Advanced Analytics. It combines a number of technologies Microsoft had before into one suite and adds new, ready to use capabilities for business solutions such as churn analysis.

For more information on Cortana Analytics Suite see http://www.microsoft.com/en-us/server-cloud/cortana-analytics-suite/overview.aspx.

Also, please note that there will be a Cortana Analytics Workshop 10/11 september 2015: https://microsoft.eventcore.com/caw2015 .

First Look: Azure Data Catalog

The First Look series focusses on new products, recent announcements, previews or things I have not had the time to provide a first look at and serves as introduction to the subject. First look posts are fairly short and high level.

Azure Data Catalog is a service that is now in public preview that provides a one-stop access layer to data sources; it abstracts away specifics of accessing data that are dependent on where and how data is stored, such as server names, protocols, ports, etc. It includes easy to use search and publishing tools, so both business and IT can collaborate together on providing a general, easy to use data access layer to all employees.

For more info on Azure Data Catalog see: http://azure.microsoft.com/en-us/services/data-catalog/

SQL Server 2005 is End-of-life in april volgend jaar

Voor iedere DBA-er en database manager is het belangrijk om alvast voor te sorteren op het feit dat het support van Microsoft op SQL 2005 eindigt in april volgend jaar.

Dit lijkt ver weg, maar omdat je als beheerder te maken hebt met een applicatie eigenaar per applicatie en dus per database moet er nog veel communicatie komen voor de daadwerkelijke migratie plaats kan vinden.

Op onderstaande link vind je een artikel van mij hierover op Webwereld:

Artikel SQL 2005 EOL op Webwereld.nl

 

Comparing Datazen, SSRS and Power View

It is a difficult task, but it can be done… comparing Datazen, SSRS and Power View. See http://www.sqlchick.com/entries/2015/6/20/comparison-of-datazen-vs-ssrs-reporting-services-vs-power-view for a in-depth comparison!

Webinar on Azure Machine Learning

Last week I did a live webinar on Azure Machine Learning. The webinar can serve as a introduction into the subject of machine learning, data mining, predictive analytics as well as Microsoft’s solution for it: Azure Machine Learning.

Watch the recording here (in Dutch, sorry).

Loading multiple JSON files using Power Query

I had to figure out recently how to load multiple JSON files using Power Query. It turned out to be less easy than expected, so I figured it is worth blogging about…

The scenario: I have multiple JSON files sitting in a container in Azure Blob Storage; I would like to load them all into a data model for use in Power BI. I am assuming all the files you want to load are in one container. My solution will not work for multiple containers.

I will be using Power Query for this, from the Power BI Designer. You could do the same using Power Query in Excel.

First, let’s connect to the blob storage. This part is easy. Just click Get Data à More in the Power BI Designer and then select ‘Azure’ and then choose Microsoft Azure Blob Storage and click Connect:

 

In Excel, navigate to the Power Query tab, select From Azure à From Microsoft Azure Blob Storage:

 

 

You will need to enter your Azure Storage account name and key. Next, you will see a list of containers in the blob storage. Select the container the data is in and choose Edit:

 

What we will need to do is create a function that loads the JSON files. To do this we use an approach similar to loading multiple Excel or CSV files (see here and here respectively): first we just load one file and then we convert it into a function which we will call for all files we want to load.

So first, click on ‘Binary’ in the first column for one of the rows representing a JSON file. You will a one column table listing all records in the JSON file (the exact number of rows changes with the length of the JSON file):

What you want to do is convert the records into a table by clicking on the button:

You will probably see a ‘to Table’ dialogue, allowing customization of the conversion; for JSON you normally should not have to change the defaults, so click OK.

Next step is to expand the resulting Column1 to see some actual data. To do this click the expand button to the right of the column header and click OK (I deselected the ‘use original column name as prefix’ option):

And voila: a nice looking table of the records in this JSON file:

We are not done however; this was the easy part. Remember we need to create a function that will enable us to iterate over multiple files.

To start editing the code hop over to the Advanced Editor (ViewàAdvanced Editor). Your code should look something like this: (Your last line will be different from mine since it is dependent on the contents of the JSON)

First, we will need to wrap this in a function, so add this line at the top:

Then, add this at the bottom:

We need to edit the line that defines ‘contents’ to look like this:

Your code should look like this:

Click ‘Done’ and give the query a descriptive name (I suggest naming it the same as the function: LoadJSON)… pfew, that was not too bad right? So now, let’s use this function on all our JSON files. Let’s do the same as we did at the start; connect to the blob and stop at the screen where you have a list of files in the container:

Since we only can apply the function to JSON files, my first step is to filter on the Extension being just ‘.json’:

Then, we need to get rid of all the columns except Name and Folder Path. To do this, select the columns to keep and choose Remove Other Columns.

Now, let’s call the function and pass in the path and name parameters. Insert a custom column (Add Column à Add Custom Column) with the following setting:

Then, we need to expand the resulting custom column by clicking on the little expand button again:

Click ‘OK’. Now you have all the contents visible. To clean up lets delete the Name and Folder Path column since we do not need them anymore. Since this is JSON you will probably want to fix data types before reporting on this.

And…. You’re done, how cool is this?

CTP2 of the new SQL Server 2016 has been released to public

With this new beta release of SQL Server we get a better insight in what we can expect in the full product.
Some of the top capabilities in SQL Server 2016 CTP2 are:

  • Real-time Operational Analytics & In-Memory OLTP, enhanced for up to 30x faster transactions for a greater number of applications, customers can configure the in-memory columnstore to work on top of a transactional database to achieve real-time operational analytics with breakthrough OLTP performance.
  • Always Encrypted helps protect data at rest, in motion and while in use, on-premises and in the cloud.
    With Always Encrypted, SQL Server can perform operations on encrypted data.
    Best of all the encryption key resides with the application in the customer’s trusted environment.
  • Stretch Database technology keeps historical data at users’ fingertips by transparently stretching warm and cold data in a more secure manner to Microsoft Azure on demand without application changes.

I advise you to take a special look into the stretched databases.

It is a technique originating in Azure where it’s already very easy to scale up or down on a working database.
And not only on a single database!
Also a set of databases (called Elastic Pool) can be tailored to meet the per time different demand of power for each individual database.
A brand new concept which you could use in use-cases like having a database per customer or project which scale needs are different in time and in database.

In my next blogpost I will dig deeper on the new concept of elastic database.

Harry

%d bloggers like this: