Merge Join Transmission in SSIS


In this article, I create a simple ETL package that join the data from two SQL tables using merge join transmission and finally loads the result into a SQL table.Merge Join is same as JOIN in t-sql, you can choose between different types of Inner join, Left outer join and Full outer join.

We have two source SQL tables an Employee and an Incentives. We want to join this data and load into one table in SQL Server called Output using Merge Join.

SQL Scripts


SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Employee](
[EMPLOYEE_ID] [int] IDENTITY(1,1) NOT NULL,
[FIRST_NAME] [nchar](20) NULL,
[LAST_NAME] [nchar](20) NULL,
[SALARY] [money] NULL,
[JOINING_DATE] [date] NULL,
[DEPARTMENT] [nchar](20) NULL,
[REGION] [nchar](10) NULL,
CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED
(
[EMPLOYEE_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO


SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Incentives](
[EMPLOYEE_REF_ID] [int] NULL,
[INCENTIVE_DATE] [date] NULL,
[INCENTIVE_AMOUNT] [money] NULL
) ON [PRIMARY]

GO
SET IDENTITY_INSERT [dbo].[Employee] ON

INSERT [dbo].[Employee] ([EMPLOYEE_ID], [FIRST_NAME], [LAST_NAME], [SALARY], [JOINING_DATE], [DEPARTMENT], [REGION]) VALUES (1, N'John ', N'Abraham ', 1000000.0000, CAST(0x94360B00 AS Date), N'Banking ', N'South ')
INSERT [dbo].[Employee] ([EMPLOYEE_ID], [FIRST_NAME], [LAST_NAME], [SALARY], [JOINING_DATE], [DEPARTMENT], [REGION]) VALUES (2, N'Michael ', N'Clarke ', 800000.0000, CAST(0x94360B00 AS Date), N'Insurance ', N'South ')
INSERT [dbo].[Employee] ([EMPLOYEE_ID], [FIRST_NAME], [LAST_NAME], [SALARY], [JOINING_DATE], [DEPARTMENT], [REGION]) VALUES (3, N'Roy ', N'Thomas ', 750000.0000, CAST(0xB3360B00 AS Date), N'Banking ', N'North ')
INSERT [dbo].[Employee] ([EMPLOYEE_ID], [FIRST_NAME], [LAST_NAME], [SALARY], [JOINING_DATE], [DEPARTMENT], [REGION]) VALUES (4, N'Tom ', N'Jose ', 600000.0000, CAST(0xB3360B00 AS Date), N'Insurance ', N'North ')
INSERT [dbo].[Employee] ([EMPLOYEE_ID], [FIRST_NAME], [LAST_NAME], [SALARY], [JOINING_DATE], [DEPARTMENT], [REGION]) VALUES (5, N'Jerry ', N'Pinto ', 650000.0000, CAST(0xB3360B00 AS Date), N'Insurance ', N'East ')
INSERT [dbo].[Employee] ([EMPLOYEE_ID], [FIRST_NAME], [LAST_NAME], [SALARY], [JOINING_DATE], [DEPARTMENT], [REGION]) VALUES (6, N'Philip ', N'Mathew ', 750000.0000, CAST(0x94360B00 AS Date), N'Services ', N'South ')
INSERT [dbo].[Employee] ([EMPLOYEE_ID], [FIRST_NAME], [LAST_NAME], [SALARY], [JOINING_DATE], [DEPARTMENT], [REGION]) VALUES (7, N'Cook ', N'Adams ', 650000.0000, CAST(0x94360B00 AS Date), N'Services ', N'East ')
INSERT [dbo].[Employee] ([EMPLOYEE_ID], [FIRST_NAME], [LAST_NAME], [SALARY], [JOINING_DATE], [DEPARTMENT], [REGION]) VALUES (8, N'Ian ', N'L% ', 600000.0000, CAST(0xB3360B00 AS Date), N'Insurance ', N'West ')
SET IDENTITY_INSERT [dbo].[Employee] OFF

INSERT [dbo].[Incentives] ([EMPLOYEE_REF_ID], [INCENTIVE_DATE], [INCENTIVE_AMOUNT]) VALUES (1, CAST(0xB3360B00 AS Date), 5000.0000)
INSERT [dbo].[Incentives] ([EMPLOYEE_REF_ID], [INCENTIVE_DATE], [INCENTIVE_AMOUNT]) VALUES (2, CAST(0xB3360B00 AS Date), 3000.0000)
INSERT [dbo].[Incentives] ([EMPLOYEE_REF_ID], [INCENTIVE_DATE], [INCENTIVE_AMOUNT]) VALUES (3, CAST(0xB3360B00 AS Date), 4000.0000)
INSERT [dbo].[Incentives] ([EMPLOYEE_REF_ID], [INCENTIVE_DATE], [INCENTIVE_AMOUNT]) VALUES (1, CAST(0x94360B00 AS Date), 4500.0000)
INSERT [dbo].[Incentives] ([EMPLOYEE_REF_ID], [INCENTIVE_DATE], [INCENTIVE_AMOUNT]) VALUES (2, CAST(0x94360B00 AS Date), 3500.0000)
ALTER TABLE [dbo].[Incentives] WITH CHECK ADD CONSTRAINT [FK_Incentives_Employee] FOREIGN KEY([EMPLOYEE_REF_ID])
REFERENCES [dbo].[Employee] ([EMPLOYEE_ID])
ON UPDATE CASCADE
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[Incentives] CHECK CONSTRAINT [FK_Incentives_Employee]
GO


Employee
Employee

Incentives
02

Output Table

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Output Table](
[EMPLOYEE_ID] [int] NULL,
[FIRST_NAME] [nvarchar](20) NULL,
[LAST_NAME] [nvarchar](20) NULL,
[SALARY] [money] NULL,
[INCENTIVE_DATE] [date] NULL,
[INCENTIVE_AMOUNT] [money] NULL,
[JOINING_DATE] [date] NULL,
[DEPARTMENT] [nvarchar](20) NULL,
[REGION] [nvarchar](10) NULL
) ON [PRIMARY]

GO


Merge Join:-
The Merge Join transformation provides an output that is generated by joining two sorted datasets using a FULL, LEFT, or INNER join.

1. From the Toolbox Double-click or drag the two "OLB DB SOURCE" component to the Data Flow Task.

2. We first need to select one of the connection managers we created earlier. From the OLE DB connection manager drop-down list, select the connection manager you created to the "Test" database. On my system, the name of the connection manager is "LocalHost.Test". Next, select the "dbo.Employee" table from the Name of the table or the view drop-down list. The OLE DB Source Editor should now look similar to the one shown in below image

1

3. We first need to select one of the connection managers we created earlier. From the OLE DB connection manager drop-down list, select the connection manager you created to the "Test" database. On my system, the name of the connection manager is "LocalHost.Test". Next, select the "dbo.Incentives" table from the Name of the table or the view drop-down list. The OLE DB Source Editor should now look similar to the one shown in below image

2

4. From the Toolbox Double-click or drag the "SORT" component to the Data Flow Task. We need to use the SORT task to sort the data prior to using the MERGE JOIN task.

5. Right click the Sort task and click Edit. Then select which column the data should be sorted on. This needs to be done for both of the OLB DB sources

3

6. From the Toolbox Double-click or drag the "MERGE JOIN" component to the Data Flow Task.

7. Connect the Sort transformation to the Merge Join transformation, so drag the data path from the Sort transformation to the Merge Join transformation
We need to configure the Merge Join transformation in the following ways:
• Specify the join as FULL, LEFT, or INNER join.
• Specify the key columns to join.
• Specify the columns to output.

4

8. From the Toolbox Double-click or drag the "OLE DB DESTINATION" component to the Data Flow Task.

9. We first need to select one of the connection managers we created earlier. From the OLE DB connection manager drop-down list, select the connection manager you created to the "Test" database. On my system, the name of the connection manager is "LocalHost.Test". Next, select the "dbo.OutputTable" table from the Name of the table or the view drop-down list. The OLE DB Source Editor should now look similar to the one shown in below image

5

10. Connect the Merge transformation to the OLE DB Destination.

11. In Mappings tab, Map the Source and destination columns shown in the image below

6

12. Run the package, it will load all the merged data into the destination SQL Table.

7

13. Data is successfully loaded to Destination Table from three sheets. Refer the below screen shot

8

Points to remember:-

• Merge & Merge Join transformation can accept only two inputs whereas Union all can take more than two inputs

• Data has to be sorted before Merge & Merge Join Transformation whereas Union all doesn't have any condition like that.

• Union all & Merge transformation is straight concatenation. Merge Join transformation is not a straight concatenation. Instead, a join type is specified.


Comments

No responses found. Be the first to comment...


  • Do not include your name, "with regards" etc in the comment. Write detailed comment, relevant to the topic.
  • No HTML formatting and links to other web sites are allowed.
  • This is a strictly moderated site. Absolutely no spam allowed.
  • Name:
    Email: