Azure Data Factory and Data Pipeline - Copy data from Azure Blob to Azure SQL



Azure Data Factory and Data Pipeline

Azure Data Factory and Data Pipeline – Hands on activity


In reference to Azure Data Factory hands on activities, we already walked through in one of the previous post, provisioning an Azure Data Factory and copy data from a file in Azure Blob Storage to a table in an Azure SQL Database using Copy Wizard. Along with it, we covered some other activities simultaneously like creating a Storage Account and a Blob Container as well set up an Azure SQL Database in context of Copy Wizard task.

Next, in this series here we will extend some more hands on activities on top of Azure Data Factory, like – 
  • Implement a data Pipeline.
  • Copy data from a file in Azure Blob Storage to a table in an Azure SQL Database using Pipeline.


Pre-requisites


Before moving ahead, we need some pre-requisites to accomplish this Azure Data Factory hands on activities on top of Azure Cloud.

Meanwhile, in previous article already we covered provisioning few resources, from now assuming you completed those. In case not covered, then please move to that article and do all preliminary hands on activities.
  1. Azure subscription, if you don't have an account then sign up for a free Azure account - https://azure.microsoft.com/en-gb/free/
  2. Download the required lab files, essential to this task from the GitHub repository - https://github.com/rajendraazure/azure-data-factory-copy-wizard
  3. Previously provisioned Storage Account and a Blob Container
  4. Previously provisioned Azure SQL Database.
  5. Previously provisioned Azure Data Factory.


Validate the SQL Database and prepare the Table 


In this walk-through, you will consume all previously provisioned resources though better to validate before proceeding ahead. To avoid mix-up, you will have to prepare the SQL table again by deleting the database records, which were inserted by the Azure Data Factory Copy Wizard in the previous post.

Login to the Azure portal https://portal.azure.com/

STEP – 1: Launch SQL Database 


In the Microsoft Azure portal, click the All resources from the Hub and click your Azure SQL Database.

All resources


It will load the SQL database blade for the selected database; you either can connect this database remotely or can go ahead in the Azure web-based query interface.

SQL DB overview


STEP – 2: Launch Query editor (preview) 


Next, click the Query editor (preview) from the database blade. Subsequently, it will launch login blade and submit the same administrator credentials, which you had provided during the provision of SQL database.

Query editor connect


If you provide the correct credentials, then it will log on and load the query blade promptly where you can see all DB objects in left side as well a new query window in the right side.

New Query editor


STEP – 3: Execute and prepare the SQL Table


In the query editor, write the following T-SQL query to truncate and prepare the dbo.transactions table in your SQL database – 

TRUNCATE TABLE dbo.transactions; 

Next, click Run to execute the T-SQL statement.

Run Truncate statement


Run the following command to verify that the table is empty again - 

SELECT * FROM dbo.transactions;

Run Select statement

Validate the existing Linked Services 


An Azure Data Factory uses linked services to access, transform, and store data. Even in the previous article, you applied the same using Copy Wizard

Here in the case of the pipeline approach, you require linked services for the blob store account where the source data is stored as well the Azure SQL Database containing the table you want to load.

Move to the Microsoft Azure Portal.

STEP – 1: Launch Azure Data factory 


In the Microsoft Azure portal, click the All resources from the Hub and click your Azure Data Factory.

All resources - data factory


It will load the Data factory blade for the selected Azure Data Factory; and required to click the Author and deploy tile.

Data Factory - Author and deploy


STEP – 2: Validate existing Linked services 


By clicking the Author and deploy tile, the next blade will be available along with all required configurable settings like linked services, datasets and pipelines, etc.

New data store


In the pane on the left, expand the first option Linked Services and you can notice two linked services named destination-sqldatabase-linked-service and source-blobstorage-linked-service are already defined for the blob store and SQL database. 

In fact, the Copy Data wizard in the previous hands on activities created these linked services.

Linked services


Next, click the blob linked service source-blobstorage-linked-service, the JSON definition should like something this – 

{
    "name": "source-blobstorage-linked-service",
    "properties": {
        "hubName": "democopydatafactory_hub",
        "type": "AzureStorage",
        "typeProperties": {
            "connectionString": "DefaultEndpointsProtocol=https;AccountName=dfactorystorageaccount;AccountKey=**********"
        }
    }
}

Source linked service


Similarly, click the SQL linked service destination-sqldatabase-linked-service, the JSON definition should like something this – 

{
    "name": "destination-sqldatabase-linked-service",
    "properties": {
        "hubName": "democopydatafactory_hub",
        "type": "AzureSqlDatabase",
        "typeProperties": {
            "connectionString": "Data Source=datafactorydbserver.database.windows.net;Initial Catalog=datafactorydb;Integrated Security=False;User ID=dbadmin;Password=**********;Connect Timeout=30;Encrypt=True"
        }
    }
}

Destination linked service


Create and configure Datasets


You see that datasets define schema for the data flows, the data, which required extracting from a source, and the data to load into a destination or sink. In other words, a dataset is a named view or a data structure of data that refer the precise data you want to use with the associated activities as input as well as output. 

Visit the precise post to know about all other relevant bits & pieces like pipelines, activities, datasets and linked services, etc. 

Henceforth must create datasets to define the text file data source and the database table, which will be used in your pipeline.

STEP – 1: Create a source Dataset (Azure Blob)


Click the context menu More which will display a couple of options, click the New dataset, which further will display different options.

New dataset


By clicking the New dataset a further context menu will be appeared with multiple options, then click the Azure Blob storage to create a new document for an Azure Blob store dataset.

Azure Blob storage


STEP – 2: Configure source Dataset JSON


By clicking the Azure Blob storage, a new JSON document would be launched. In the new JSON document, replace the default code with the following code, which you can copy and paste from the transactions-txt.json file available in the GitHub repository

{
"name": "transactions-txt",
"properties": {
"structure": [
{
"name": "tdate",
"type": "Datetime"
},
{
"name": "amount",
"type": "Decimal"
}
],
"type": "AzureBlob",
"linkedServiceName": "source-blobstorage-linked-service",
"typeProperties": {
"folderPath": "dfactorycontainer/data/",
"format": {
"type": "TextFormat",
"columnDelimiter": ",",
"firstRowAsHeader": true
}
},
"external": true,
"availability": {
"frequency": "Minute",
"interval": 15
}
}
}

The custom JSON defines a schema for comma-delimited text containing a tdate column of DateTime values, and an amount column of Decimal values. Along with it, the data is available in the dfactorycontainer/data folder of the blob storage account defined by the source-blobstorage-linked-service linked service, and new data will be available every fifteen minutes.

Once you validate the JSON parameters and values, deploy the new dataset definition to your Azure Data Factory by clicking the Deploy button that appears in the top menu.

New data store - draft 1

Sooner you will be notified once the new dataset is created successfully. Even you can verify the same thing, by selecting a new dataset named transactions-txt that exist under the Datasets option from the left pane.

Datasets - transaction


STEP – 3 : Create and configure destination Dataset (Azure SQL)


Similar to source dataset, required to create and configure the destination dataset for the Azure SQL Database. Once again, click the context menu More and subsequently click the New dataset from the displayed option. Next then click Azure SQL to create a new JSON document for an Azure SQL Database dataset.

Azure SQL


In the new JSON document, replace the default code with the following code, which you can copy and paste from the dbo-transactions.json file available in the GitHub repository

{
"name": "dbo-transactions",
"properties": {
"type": "AzureSqlTable",
"linkedServiceName": "destination-sqldatabase-linked-service",
"structure": [
{
"name": "tdate",
"type": "Datetime"
},
{
"name": "amount",
"type": "Decimal"
}
],
"typeProperties": {
"tableName": "dbo.transactions"
},
"availability": {
"frequency": "Minute",
"interval": 15
}
}
}

The custom JSON defines a schema for a table named dbo.transactions containing a tdate column of DateTime values, and an amount column of Decimal values in the database defined by the destination-sqldatabase-linked-service linked service. The dataset can be updated every fifteen minutes.

Once you validate the JSON parameters and values, deploy the new dataset definition to your Azure Data Factory by clicking the Deploy button that appears in the top menu.

New data store Azure SQL - draft


Sooner you will be notified once the new dataset is created successfully. Even you can verify the same thing, by selecting a new dataset named dbo.transactions that exist under the Datasets option from the left pane.

Datasets - dbo-transactions


Create and configure Pipeline


You know that a pipeline encapsulates a data flow that includes copying the data and transforming the data as we move it from one place to another. Since now that you have defined the linked services and datasets for your data flow, you can create a pipeline to encapsulate it. 

In other words, a pipeline defines a series of actions that use linked services to operate on datasets. In this case, your pipeline will consist of a single action to copy data from the source-blobstorage-linked-service  linked service to the destination-sqldatabase-linked-service  linked service.

You can visit the precise post to know about all other relevant bits & pieces like pipelines, activities, datasets and linked services, etc. 

STEP – 1: Create a Pipeline


Click the context menu More and subsequently click the New pipeline from the all available options.

New pipeline

STEP – 2: Configured Pipeline JSON


In the new JSON document, replace the default code with the following code, which you can copy and paste from the copytrans.json file available in the GitHub repository

{
"name": "copytrans",
"properties": {
"activities": [{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource",
"recursive": false
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "tdate:tdate,amount:amount"
}
},
"inputs": [{
"name": "transactions-txt"
}],
"outputs": [{
"name": "dbo-transactions"
}],
"name": "CopyTransData"
}],
"pipelineMode": "OneTime"
}
}

The custom JSON defines a pipeline containing a Copy action to copy the transaction-txt dataset in the blob store to the dbo.transactions table in the SQL database.

Once you validate the JSON parameters and values, deploy the new pipeline definition to your Azure Data Factory by clicking the Deploy button that appears in the top menu.

New pipeline - draft 1

Sooner you will be notified once the new pipeline is created successfully. Even you can verify the same thing, by selecting new pipeline named copytrans that exist under the Pipelines option from the left pane.

Pipelines - copytrans

Validate the Pipeline Status


Now all datasets and pipeline have deployed successfully, time to validate and monitor the pipeline status using the Azure portal.

Move to your Azure Data Factory blade, and click the Pipelines tile.

All resources - Pipelines


By clicking the Pipelines tile, promptly the pipelines blade will be launched. Here you can notice two pipelines exist, one was created using the Copy Data Wizard previously and along with it you can see the newly created copytrans pipeline with a Pipeline State of Running.

Pipelines - overview


Next, click the copytrans pipeline, and observe the state of the activities it contains. Might be initially it contain no activities, as the datasets are validated and the pipeline is prepared.

Pipelines monitoring


For the moment close the Pipelines blade, and wait for a few minutes, approx. 15 minutes. Later on, reopen the Pipelines blade and the copytrans blade to view its state. When the pipeline has been run successfully, its State in the copytrans blade will be indicated as Ready.

Along with it, under the Pipelines blade its Pipeline State will be set to Completed simultaneously.

Pipelines Ready state

Verify the Data has been copied at Azure SQL end


Now onward your pipeline has run without any exception, time to verify that the data has been copied to a SQL database. Return to the Query editor (preview) from the Azure SQL database blade and submit the administrator credentials. 

Post login, run the following command to verify that data has been inserted into the table - 

SELECT * FROM dbo.transactions;

Verify the table


If you verify the table, you can see it contains the same transaction data, which has been copied from the text file, exist in the Azure Blob store.

Congratulation, Data Pipeline exercise completed!! 😊

As a result, we covered one more hand on activity in the context of Azure Data Factory using Data Pipeline to copy data from an Azure Blob Storage to an Azure SQL Database. Here we brushed up some previous accomplishments like pipeline, linked service and datasets, etc. as well did some JSON codes.

Keep visiting for further posts.




Clone an Azure Virtual Machine using Snapshot option



Cloned VM using Snapshot





Clone an Azure VM


We already talked about an Azure Virtual Machine as well how to clone or create a managed image of an Azure Virtual Machine. Undoubtedly, in some precise scenario, it is essential to keep the exact configuration and details as exist in the original VM, here cloning fulfil the desired need.

We discussed that, the introduction of managed disks, makes quite feasible to simplify working with Azure Virtual Machines (VM) and Virtual Machine Scale Sets (VMSS). Now cloning or creating a managed image of an Azure VM is realistic and can be achieved by different ways, using – 
  • Generalized
  • Snapshot
  • Site Recovery


In one of the previous article, we already covered how to clone an Azure Virtual Machine using Generalized option. Generalization has been a preparation process of a virtual machine creating an image, through a tool called as Sysprep that located under system32/sysprep.

However, we have noticed that post execution of Sysprep utility over an Azure VM, that VM would be considered as generalized for capturing the image, the VM gets deallocated and you cannot login to it again. In other words, the process of generalizing a Virtual Machine is not reversible and here overcome the situation Snapshot appears.

In this post, we will talk and walk through how to clone an existing Azure VM and keep old VM in working condition, i.e. both VM would be in running mode.

Snapshot an Azure Virtual Machine


In fact, a snapshot is a full, read-only copy of a Virtual Hard Drive (VHD) belongs to an Azure Virtual machine. Purpose behind taking snapshot of an OS or a data disk VHD to use as a backup, or to troubleshoot Azure Virtual Machine (VM) issues.

Using snapshot, you can create an Azure Virtual Machine by creating a managed disk from the snapshot and then attaching the new managed disk as the OS disk. You can either use the Azure Portal or instead, achieve the same through PowerShell.

In this hand on an activity, you will create a cloned VM using following steps as – 
  • Validate existing VM
  • Create Snapshot from the VM
  • Create Managed Disk from the Snapshot
  • Create/Restore VM from the Managed Disk

Clone VM using Snapshot steps


Note: If you are moving ahead to use the snapshot to create a new VM, it is preferred that you quickly shut down the VM before taking a snapshot, to clear out any processes that are in progress.

Pre-requisites


Before proceeding, we need some pre-requisites to accomplish this Cloning VM task using Azure Portal.
  1. Azure subscription, if you don't have an account then sign up for a free Azure account - https://azure.microsoft.com/en-gb/free/
  2. An existing Windows Azure virtual machine (VM). 


STEP – 1: Validate an Azure Virtual Machine (VM)


It is essential to exist a Windows based Azure Virtual Machine (VM) to accomplish this demo task, login to Azure portal https://portal.azure.com/.

On the left Hub menu, click All resources and select the existing virtual machine, verify the VM is either running or stopped. 

Hub menu


STEP – 2: Move to attached Disk


Next, required to select the Disks option under the Settings section that will load the existing virtual machine disk blade. Here you can see the OS disk with brief details like size, storage type etc.

Disks blade


Now launch the OS disk blade by clicking the disk link, which appears in the OS disk list. Here you can see the brief details under the Overview section.

Disk overview

STEP – 3: Create a Snapshot


Post launches the OS disk blade, you can notice a + Create snapshot option exist on the menu tab, that is being used to create a snapshot. 

Create Snapshot


By clicking the + Create snapshot option, the Create snapshot blade will be loaded and essential to fill the following parameters to proceed further –
  • Name – The name of snapshot.
  • Subscription – The name of your Azure subscription, it would be either free, paid or specific subscription etc. 
  • Resource group – The name of the resource group used to organize related resources, you can either create a new resource or choose an existing one.
  • Location – The region where to store your snapshot information, it would be pre-selected by default.
  • Account Type – Select the Account type to use to store the snapshot. Go ahead with by default option as Standard HDD, unless you need the snapshot to be stored on a high-performing disk.


Create Snapshot blade

Once you are done with all required properties, proceed by clicking the Create button to create the snapshot. Promptly the process will be started and complete the creation of the snapshot, the same you can validate in the Notifications section.

STEP – 4: Validate the newly created Snapshot


In previous steps we have created the snapshot of the Azure Virtual Machine’s OS disk, now validate the snapshot. Go to Resource groups and here you can see a new entry of the newly created snapshot of the disk.

Along with this, you can notice the existing OS disk is also exist, that means we can still log on the old VM, will validate later.

Resource Group


By clicking the newly created snapshot, you can launch the Snapshot blade and verify the details like source disk, size, account type etc.

Snapshot Overview



Congratulations, snapshot has been created!! 😊

STEP – 5: Create a Managed Disk


Post creation of a snapshot, essential to create a Managed disk so that further you can create a virtual machine (VM) from the disk accordingly.

Move to Hub menu and click the All services, which will launch the All service blade. Here in the All services search box, enter disk that will display the list of available disks.

Disk Search


Select the Disks that you would like to use; most probably, the first option is preferable, which will load the Disks blade.

Disks blade list


Here you can see there is only one disk in the list, which is in fact the disk belongs to the old existing virtual machine. Now similar to this disk need to create a new managed disk, click the + Add button from the top menu that will load the Create managed disk where essential to fill the following parameters to proceed further – 
  • Name – The name of your new managed disk.
  • Subscription – The name of your Azure subscription, it would be either free, paid or specific subscription etc. 
  • Resource group – The name of the resource group used to organize related resources, you can either create a new resource or choose an existing one.
  • Location – The region where to store your storage account information, prefer to go with default-selected location.
  • Availability zone – Since this is demo task and it is not required so go with default-selected option as None.
  • Account Type – Select the Account type to use to store the snapshot. Go ahead with by default option as Standard HDD, unless you need the snapshot to be stored on a high-performing disk.
  • Source type – Here make sure the Snapshot should be selected.
  • Source snapshot – Select the created snapshot from the Source snapshot drop-down list.
  • Size – Size of disk as per need, by default it would be pre-populated based on a snapshot.

Create Managed disk

If required, make any other adjustments as needed and then proceed by clicking the Create button to create the managed disk. Quickly the process will be started and complete the creation of managed disk, the same you can validate in the Notifications section.

Along with it, if you reload the Disks blade, you can see this new entry in the disk list.

Disk blade lists

STEP – 6: Restore (Create) Virtual Machine from the Managed Disk


Now you have the managed disk VHD, next you can restore or better to say can create an Azure Virtual Machine (VM) using the same managed disk.

Time to load the Disks blade again, which list all available managed disks.

Managed disk list


You can notice the newly created managed disk, proceed ahead by clicking the same and load the Disk blade.

Managed Disk overview


In the Overview page, ensure that Disk state appears as Unattached. If it is not, you might need to either detach the disk from the source. Here from the top menu, click the + Create VM to proceed creating a new VM.

It will load the Create a virtual machine blade the time you click the + Create VM button inside the disk blade in the previous step.

Create Virtual Machine Basics


On the Basics tab, under Project Details, make sure the following parameters need to be filled up correctly as – 
  • Subscription – The name of your Azure subscription, it would be either free, paid or specific subscription etc. 
  • Resource group – The name of the resource group used to organize related resources, you can either create a new resource or choose an existing one.

STEP – 7: Provide specifics for Instance Details 


Under the Instance Details section of Create a virtual machine blade, provide following property details in the context of new VM as – 
  • Virtual machine name – The name new VM which is being created using the snapshot.
  • Region – Since this is based on image so most probably it would be pre-selected, go with the default-selection.
  • Availability options – Go ahead with by default selection as No infrastructure redundancy required.
  • Image – You can see the earlier creared managed disk is already pre-selected, go with this selection.


Create Virtual Machine Instance Details


If you create a fresh VM then most probably the size would be pre-selected with Standard D2s v3 2 vcpus, 8 GB memory options (might be variations in some scenario), but here it appears blank and asking for Select size due to selection of image/managed disk.

Next, click the Select size link, it will load the Select a VM size blade where you can choose an appropriate size for the new VM. 

Select Size

Once you select the desired size, click the Select button which will lead to back to Instance Details with  selected option of Size.

Create Virtual Machine Size


STEP – 8: Provide specifics for Inbound Port rules


Under Inbound Port Rules - Public inbound ports, choose Allow selected ports and then select RDP from the drop-down. 

Along with under the Save Money – Already have a Windows license, go with the default selection as No.

Create Virtual Machine Save Money


Next, proceed ahead by clicking the Review + create button at the bottom of the page, promptly the validation will be started.

STEP – 9: Review and Creating the VM


On the Create a virtual machine page, post validation success you can see the details for the VM you are about to create from the managed disk (based on snapshot).

Create Virtual Machine Validation


If you feel each information is fine and ready, then click the Create button. It will take a few minutes for your VM to be deployed. Sooner you will be notified once the cloned VM is created successfully.

Virtual machine deployed


The new virtual machine using the cloned image is deployed successfully, time to connect the VM now.

STEP – 10: Connect to new Virtual Machine


Move to newly created VM Overview dashboard and copy the Public IP address or can click the Connect button from the menu bar to log on the VM.

Cloned VM overview


As usual, you have both options to log on the VM using either mstsc utility and copied the public IP address or go ahead with the downloaded RDP file, I am moving with mstsc.

Note: Here the same administrator credentials will be required, which was provided during the provisioning of old virtual machine (VM).

Remote connection to cloned VM


Once you get connected with newly created Azure VM, here you can notice the same Operating Software, all other applications and settings are appearing which were part of previous VM.

Cloned VM connected


Congratulation, Clone to an Azure VM using Snapshot done!! 😊

STEP – 11: Connect to old Virtual Machine


Since we created the new virtual machine based on Snapshot and Managed Disk option, respectively, henceforward still we can access the old virtual machine. In other words, still the old virtual machine is running.

Virtual Machine list


Let us connect the same, move to old VM Overview dashboard and copy the Public IP address or can click the Connect button from the menu bar to log on the VM.

Old VM overview


As usual, I am moving with mstsc utility and copied the public IP address to connect to the virtual machine. 

Remote connection to old VM


Post get connected with the old Azure VM, here you can notice nothing has been changed neither Operating Software nor all other applications and settings. 

Old VM connected


Therefore, in previous article (https://www.solutionmandi.com/2018/12/clone-azure-virtual-machine-using.html), we have seen that using sysprep.exe which is a part of Generalization process, the virtual machine (VM) is considered Generalized and cannot be restarted or access the same. In brief, using Generalized option we can create the cloned VM, but the old VM get deallocated so further you cannot access the old VM.

However, using the Snapshot option, we clone an existing Azure VM and keep old VM in working condition, i.e. both VM would be in running mode.

Keep visiting for further posts.