Using custom .NET activities in Azure Data Factory
Azure Data Factory provides a great number of data processing activities out of the box (for example running Hive or Pig scripts on Hadoop / HDInsight).
In many case though, you just need to run an activity that you already have built or know how to build in .NET. So, how would you go about that? Would you need to convert all those items to Hive scripts?
Actually, no. Enter Custom .NET activities. Using this you can run a .NET library on Azure Batch or HDInsight (whatever you like) and make it part of your Data Factory pipeline. Regardless of whether you use Batch or HDInsight you can just run your .NET code on it. I prefer using Batch since it provides more auto-scaling options, is cheaper and makes more sense to me in general; I mean, why run .NET code on a HDInsight service that runs Hive and Pig? It feels weird. However, if you already have HDInsight running and prefer to minimize the number of components to manage, choosing HDInsight might make more sense than using Batch.
So, how would you do this? First of all, you would need a custom activity. For this you will need to .NET class library and need to extend IDotNetActivity interface. Please refer to https://azure.microsoft.com/en-us/documentation/articles/data-factory-use-custom-activities/ for details. Trust me, it is not hard; I have done it.
Next, once you have a zip file as indicated on the page above, make sure to upload it to a Azure Blob store you can use later. The pipeline will need to know what assembly to load from where later on.
You will need to create an Azure Batch account and pool if you use that. If you decide to use HDInsight either let ADF spin one up on demand or make sure you have your HDInsight cluster ready.
You will need to create input and output tables in Azure Data Factory, as well as linked services to Storage and Batch or HDInsight. Your pipeline will look a bit like this:
{ "name": "ADFTutorialPipelineCustom", "properties": { "description": "Use custom activity", "activities": [ { "Name": "MyDotNetActivity", "Type": "DotNetActivity", "Inputs": [ { "Name": "EmpTableFromBlob" } ], "Outputs": [ { "Name": "OutputTableForCustom" } ], "LinkedServiceName": "AzureBatchLinkedService1", "typeProperties": { "AssemblyName": "AzureDataFactoryCustomActivity.dll", "EntryPoint": "AzureDataFactoryCustomActivityNS.AzureDataFactoryCustomActivity", "PackageLinkedService": "AzureStorageLinkedService1", "PackageFile": "adfcustomactivity/customactivitycontainer/AzureDataFactoryCustomActivity.zip", "extendedProperties": { "SliceStart": "$$Text.Format('{0:yyyyMMddHH-mm}', Time.AddMinutes(SliceStart, 0))" } }, "Policy": { "Concurrency": 1, "ExecutionPriorityOrder": "OldestFirst", "Retry": 3, "Timeout": "00:30:00", "Delay": "00:00:00" } } ], "start": "2015-09-07T14:00:00Z", "end": "2015-09-07T18:00:00Z", "isPaused": false } }
Switching from Batch to HDInsight means to changing the LinkedServiceName for the activity to point to your HDInsight or HDInsight on demand cluster.
Tables are passed to the .NET activity using a connection string, so essentially if you have both input and output tables defined as blob storage items, your custom assembly will get a connection string to the blob storage items, read the input files, do its processing and write the output files before passing on the control to ADF.
Using this framework the sky is the limit: anything you can run in .NET can now be part of your ADF processing pipeline…pretty cool!
Share this:
- Click to share on LinkedIn (Opens in new window)
- Click to share on Facebook (Opens in new window)
- Click to share on Twitter (Opens in new window)
- Click to share on Skype (Opens in new window)
- Click to share on WhatsApp (Opens in new window)
- Click to share on Pocket (Opens in new window)
- Click to share on Tumblr (Opens in new window)
- Click to share on Pinterest (Opens in new window)
- Click to share on Telegram (Opens in new window)
- Click to share on Reddit (Opens in new window)
- Click to print (Opens in new window)
- Click to email this to a friend (Opens in new window)
Related
R package for Azure Machine Learning
You May Also Like

Never forget to water the plants again: building a DIY automatic plant watering / irrigation system for <$5
November 12, 2020
Working with aggregations in Power BI Desktop
April 23, 2020