Thursday, June 14, 2007

T-SQL or Merge JOIN or Lookup to load Dimension data

In my recent discussion with a friend we talked about loading the data in dimension table. So being a SQL guy we have at least two options available in Microsoft SQL Server 2005.

So let me first explain the objective behind this.

We need to load only new data in target dimension table. So we need to check if this data exists at the destination, if not then insert this new data based on the business keys e.g. CustomerCode, ProductCode etc.

T-SQL Solution

LEFT / RIGHT JOIN
Make a join of target table with source table on business keys and then use LEFT or RIGHT join to get only new rows.

SELECT s.CustomerCode, s.CustomerName
FROM SourceTable s
LEFT JOIN TargetTable t ON (s.CustomerCode = t.CustomerCode)
WHERE t.CustomerCode IS NULL

The above solution works fine if you have source and destination databases on the same server, or may be you can use linked server to achieve the same if it's on different servers.

SSIS Solution

Now the standard solution to use LEFT / RIGHT JOINS in SSIS is "Merge Join" transformation, you can do precisely the same thing, it works well even if you have source database in Oracle and target database in Microsoft SQL Server.

But as we are aware that Merge Join expects sorted data, so if you are planning to sort the data at source system or use SORT transformation in SSIS then it will prove to be a costly affair.

Hmm............then what else can I do.........here I present Lookup transformation in SSIS

Well, generally this component is used to load Fact tables; when you want to load surrogate keys derived from Dimension table into Fact table.

But we can also "Configure Error Output" for this component....what does that mean........let me explain.....

Technically we map a common field from two tables in this component and fetch other columns based on this common column data. So what if all the data is not available in both of these tables…..let me explain…





Let say we have 10 rows in source and 5 rows in target.........so by default this component will return error because it’s not able to find other 5 rows in target table................hmmm..........trust me this is really good for us....









Now I can "Configure Error Output" for this component, which will redirect the failure rows and these failure redirected rows are the one that we are missing in target............so we achieved the functionality of loading only new rows in target table.





I hope this will be helpful for you as well..........

4 comments:

Anonymous said...

Hey Mohit,
How are you doing...
Long time no communication with you... I was doing this SSIS development few days back and stuck on some deployment requirements. Basically, I need to install SSIS packages in specific folder in MSDB and not the root using depoyment utility. Is is possible??

Thanks...

-Suneet
techie@suneet.net

JH said...

Very cool ! Thanks !

Koteswara Rao Gadda said...
This comment has been removed by the author.
Anonymous said...

Instead we can use advanced TAB in loopup transformation and edit query and set <> then we can use the rows which are coming from default connectr