Working with Azure and HDInsight from SSIS

A while ago a whitepaper was published on how to work with Azure and HDInsight (Hadoop on Azure) from SSIS. In that whitepaper some code samples were given. That code is also available here (including some components): http://code.msdn.microsoft.com/SSIS-Packages-Sample-for-2ffd9c32 . After you have downloaded the zip make sure you have Visual Studio installed, start the Developer Command for Visual Studio as Administrator and run the ‘deploy_SSIS_packages_and_components.bat’ file which is included in the zip. This will install some of the DLLs included into the GAC. Then, you can open the solution in Visual Studio.

The solution includes the following sample SSIS packages:

  • PigSqoopPackage: shows how to work with Pig and SQOOP tasks
  • HadoopJobAutomation: shows how to start jobs on Hadoop and how to consume results
  • ComplexSourceDestination: shows how to get data from Azure Blog Storage and save the results into various targets, including Azure Blob Storage.
  • BlobSourceTestPackage: sample package showing how to read data from Azure Blob Storage.
  • BlobDestinationTestPackage: sample package showing how to write data to Azure Blob Storage.

The first two packages essentially contain some script tasks with complete samples on how to work with Piq, SQOOP and Hadoop jobs respectively. The other packages use the components provided and provide a quick start on getting data from Azure Blob Storage and getting data into Azure Blog Storage using SSIS.