Showing posts with label Dimension. Show all posts
Showing posts with label Dimension. Show all posts

Thursday, May 29, 2008

SSIS approach to handle Inferred Members

While loading data in Fact tables we usually see a scenario where the fact data is available but there is no corresponding business key in the related dimension.

In this case we choose multiple options to resolve the issue.
  1. Ignore that fact

  2. Insert the associated business key in dimension table and return the newly generated surrogate key from dimension table. And now store the data in Fact table with the surrogate key.

The second approach relates to a term called “Inferred members”. All the other attributes of that dimension will also get updated in next run of dimension load (usually nightly load).

In SSIS there are multiple options available to implement the second case.

First approach is to do lookup on the dimension table and for all the rows that are now matching, insert the business key in Dimension table and then do the lookup again to get the surrogate key.

Second approach is to make use of Lookup and Script component. Lookup component will ignore rows with no matching business key in dimension table. Then script component will process only those rows where it didn’t find the associated surrogate key and finally insert the same in dimension table and return the associated surrogate key through stored procedure output parameter.
This script component approach is more efficient because its using the existing lookup component only once and then doing all the processing in script component.

But the additional benefit comes if we make use of .Net Generic.SortedDictionary class to store the cache information regarding the newly generated key. Read more about this here…
http://msdn.microsoft.com/en-us/library/f7fta44c(VS.80).aspx


- Mohit

Monday, August 27, 2007

SSIS - deadlock was detected while trying to lock variables

Recently I faced one MAJOR issue in my ETL packages because of new patches deployed on Microsoft Windows 2003 Server.

"A deadlock was detected while trying to lock variables "variable names (comma separated)" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out."

OR

"The script threw an exception: A deadlock was detected while trying to lock variables "variable names (comma separated)" for read access and variables "variable names (comma separated)" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out."


It's more to do with Script component/Task used in the package and making use of "ReadOnlyVariables" and "ReadWriteVariables" properties for declaring variables that will be used in the script.

Technically if we declare variables in these two properties then there is NO need to lock these variables in the script, and this is exactly what we were doing in the past prior to these patches....

Windows 2003 Post-SP2 Hotfix - MS07-31/935840 W2K3 Server
Windows 2003 Post-SP2 Hotfix - MS07-34/929123 W2K3 Server
Windows 2003 Post-SP2 Hotfix - MS07-039/926122 W2K3 Server
Windows 2003 Post-SP2 Hotfix - KB924054 W2K3 Server
2.0 IE Update W2K3 Server

Now I don't know which one of these patches is causing the real problem but yes it's one of them for sure.

We can see these error in the SSIS logging files...

Now the quick fix to these errors are NOT using "ReadOnlyVariables" and "ReadWriteVariables" properties for declaring variables that will be used in the script. Instead a better way is to declare the same in script itself and then locking and unlocking the same using DTS object model.


Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain

Public Sub Main()

Dim myVar As Variables

Dts.VariableDispenser.LockForWrite("User::Var1")
Dts.VariableDispenser.LockForWrite("User::Var2")

Dts.VariableDispenser.GetVariables(myVar)

myVar("User::Var1").Value = "SomeValue"
myVar("User::Var1").Value = "SomeValue"

myVar.Unlock()

Dts.TaskResult = Dts.Results.Success

End Sub

End Class



Let me know if you know the exact source of this problem.

- Mohit Nayyar

Thursday, June 14, 2007

T-SQL or Merge JOIN or Lookup to load Dimension data

In my recent discussion with a friend we talked about loading the data in dimension table. So being a SQL guy we have at least two options available in Microsoft SQL Server 2005.

So let me first explain the objective behind this.

We need to load only new data in target dimension table. So we need to check if this data exists at the destination, if not then insert this new data based on the business keys e.g. CustomerCode, ProductCode etc.

T-SQL Solution

LEFT / RIGHT JOIN
Make a join of target table with source table on business keys and then use LEFT or RIGHT join to get only new rows.

SELECT s.CustomerCode, s.CustomerName
FROM SourceTable s
LEFT JOIN TargetTable t ON (s.CustomerCode = t.CustomerCode)
WHERE t.CustomerCode IS NULL

The above solution works fine if you have source and destination databases on the same server, or may be you can use linked server to achieve the same if it's on different servers.

SSIS Solution

Now the standard solution to use LEFT / RIGHT JOINS in SSIS is "Merge Join" transformation, you can do precisely the same thing, it works well even if you have source database in Oracle and target database in Microsoft SQL Server.

But as we are aware that Merge Join expects sorted data, so if you are planning to sort the data at source system or use SORT transformation in SSIS then it will prove to be a costly affair.

Hmm............then what else can I do.........here I present Lookup transformation in SSIS

Well, generally this component is used to load Fact tables; when you want to load surrogate keys derived from Dimension table into Fact table.

But we can also "Configure Error Output" for this component....what does that mean........let me explain.....

Technically we map a common field from two tables in this component and fetch other columns based on this common column data. So what if all the data is not available in both of these tables…..let me explain…





Let say we have 10 rows in source and 5 rows in target.........so by default this component will return error because it’s not able to find other 5 rows in target table................hmmm..........trust me this is really good for us....









Now I can "Configure Error Output" for this component, which will redirect the failure rows and these failure redirected rows are the one that we are missing in target............so we achieved the functionality of loading only new rows in target table.





I hope this will be helpful for you as well..........