Integration Runtime (IR) in Azure Data Factory - An Introduction - TechDB

Latest

All about Database Programming, Performance Tuning and Best Practices.

BANNER 728X90

Wednesday, 14 July 2021

Integration Runtime (IR) in Azure Data Factory - An Introduction

Introduction of Integration Runtime (IR)

This article explain about the basics of Integration Runtime (IR), Type of IR, Which IR should I use, Where the IR actually located

The Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide the data integration capabilities across different network environments. This is the very first step in Azure data factory before executing any pipeline. Whenever you create a Linked Service (Linked Service is nothing but a connection string that hold the resource connection details which you want to connect)to connect with any Azure database or Services, it will ask for Integration Runtime.


Private Network = On-Permises

Public Network = Azure Cloud


On the home page of Azure Data Factory UI, select the Manage tab from the leftmost pane.

Data Flow

Select Integration runtimes on the left pane, and then select +New.

Data Flow
  • An IR specify what kind of hardware is used to execute activities
  • Where this hardware is physically located
  • Who owns and maintains the hardware
  • And which data stores and services the hardware can connect to

ADF is the managed service. It maintains the IR.


IR Activitise

  • Data Flow : Execute a Data Flow in Managed Azure compute environment. This is no-code zone where developer only do drag and drop the different available Source, Sink and Transformation activity and design the flow....it is very much similar of SSIS data flow activity.
  • Data Movement: Copy data across data stores in public network and data stores in private network (on-premises or virtual private network).
  • Activity dispatch: Here actually Azure send an instruction to respective services (Azure Data bricks , HDInsight, ML ...etc) to execute the code in their own clusters or compute environment.
  • As name itself explain own definition, Activity inside Azure Data factory pipeline Dispatched by Azure data Factory to another compute environment.

    This area where developer can write some extra own code like using PySpark, R , Scala and Python in Azure Databricks.

  • SSIS package execution: Natively execute SQL Server SSIS package in a managed compute environment.

IR is like act as brige between your ADF and network (public/private) . If you want to connect to on-premise sql server then you need to create self Hosted IR because on-premise is private network and azure sql is public network.


IR location

  • IR location is a very important concept in whole ADF when you do different activities in ADF. The IR Location defines the location of its back-end compute, and essentially the location where the data movement, activity dispatching, and SSIS package execution are performed. The IR location can be different from the location of the data factory it belongs to.
  • When you create a data factory instance, you need to specify the location for the data factory. The Data Factory location is where the metadata of the data factory is stored and where the triggering of the pipeline is initiated from. Metadata for the factory is only stored in the region of your choice and will not be stored in other regions.
  • For copy activity, ADF will make a best effort to automatically detect your sink data store's location, then use the IR in either the same region if available or the closest one in the same geography; if the sink data store's region is not detectable, IR in the data factory region as alternative is used.
  • For Lookup/GetMetadata/Delete activity execution ADF uses the IR in the data factory region.
  • For Data Flow, ADF uses the IR in the data factory region

Data Factory offers three types of Integration Runtime (IR)

The following table describes the capabilities and network support for each of the integration runtime types:

IR type Public network Private network
Azure
  • Data Flow
  • Data movement
  • Activity dispatch
  • Data Flow
  • Data movement
  • Activity dispatch
Self-hosted
  • Data movement
  • Activity dispatch
  • Data movement
  • Activity dispatch
Azure-SSIS SSIS package execution SSIS package execution

Azure (AutoResolveIntegrationRuntime)

Data Flow
  • This IR taken care by Microsoft only
  • Azure IR use Infrastructure and Hardware managed by Microsoft
  • Microsoft take care of the installation , maintenance ,patching and scaling while you pay for the time you use it.
  • An Azure IR can only access data stores and service in public networks
  • Your Azure DF will always have at least one Azure IR called AutoResolveIntegrationRuntime
  • This the default Integration Runtime and the region is set to auto-resolve. That means that Azure Data Factory decides the physical location of where to execute activities based on the source, sink, or activity type
  • If you need to ensure that data does not leave a specific region, for legal reasons, you can create a new Azure IR in specific regions.
  • Ex: Copy Data from India to Zim...and Zim have no IR then it search for nearest data centre(from Zim) for executing.

You use an Azure integration runtime when you

  • Copy data between cloud stores
  • Transform data between cloud stores using data flows
  • Execute activities using cloud stores and services

Azure (Self-Hosted IR)

Data Flow
  • Self-hosted IR use infrastructure and hardware managed by us
  • We take care of all installation , maintenance, patching and scaling , but you also pay for the time you use it through Azure DF
  • A self-hosted IR can access resources in both public and private networks.
  • Public to Public - Azure IR
  • Public to Private OR Private to public OR private to private use self-hosted IR
  • Install Self-hosted IR on an on-premises machine or a virtual machine inside a private network
  • A self-hosted IR works like a gateway. You install the IR on a machine inside the private network, and then it can communicate with Azure DF using key. Azure DF provide a key to connect ...this key you have to register in private network machine in which you installed IR.

You use a self-hosted integration runtime when you

  • Copy data between cloud and on-premises stores
  • Copy data between on-premises stores
  • Execute activities using on-premises stores and services

Azure SSIS-IR


  • Azure-SSIS IR can be provisioned in either public network or private network.
  • Azure-SSIS IR is a fully managed cluster of Azure VMs dedicated to run your SSIS packages. managed by Microsoft.
  • Microsoft take care of all installation , maintenance, patching and scaling , but you also pay for the time you use it
  • An Azure SSIS IR is used for executing SSIS packages in azure Data Factory.
  • Those SSIS packages can access resources in both public and private networks.
  • While creating this IR ..we have to define the Node Size (Server configuration) and number of Nodes required.

You use an Azure-SSIS integration runtime when you

  • • Execute SSIS Packages through Azure Data Factory

Data Flow

No comments:

Post a Comment