Showing posts with label SSIS. Show all posts
Showing posts with label SSIS. Show all posts

Thursday, May 29, 2008

SSIS approach to handle Inferred Members

While loading data in Fact tables we usually see a scenario where the fact data is available but there is no corresponding business key in the related dimension.

In this case we choose multiple options to resolve the issue.
  1. Ignore that fact

  2. Insert the associated business key in dimension table and return the newly generated surrogate key from dimension table. And now store the data in Fact table with the surrogate key.

The second approach relates to a term called “Inferred members”. All the other attributes of that dimension will also get updated in next run of dimension load (usually nightly load).

In SSIS there are multiple options available to implement the second case.

First approach is to do lookup on the dimension table and for all the rows that are now matching, insert the business key in Dimension table and then do the lookup again to get the surrogate key.

Second approach is to make use of Lookup and Script component. Lookup component will ignore rows with no matching business key in dimension table. Then script component will process only those rows where it didn’t find the associated surrogate key and finally insert the same in dimension table and return the associated surrogate key through stored procedure output parameter.
This script component approach is more efficient because its using the existing lookup component only once and then doing all the processing in script component.

But the additional benefit comes if we make use of .Net Generic.SortedDictionary class to store the cache information regarding the newly generated key. Read more about this here…
http://msdn.microsoft.com/en-us/library/f7fta44c(VS.80).aspx


- Mohit

Monday, October 22, 2007

Selective Cube Measure Groups processing using “Analysis Services Processing Task” in SSIS

Sometime its not feasible to process the full cube, so it makes sense to only process the selective measure group in the cube.

This is especially true when you have really big cube which takes time to process and sometime we just need to load data for only few business metrics.

Here is the small example to process only selective measure groups using “SSIS Analysis Services Processing Task”.

I am assuming that the measure group names are stored in a variable and after that we need to create “Processing Commands” to process the cube.

Processing Commands: We need to set this property in “Analysis Services Processing Task” to process the measure groups. So I will be using “Script Task” to generate the command.

Just make sure that the “Delay Validation” property of “Analysis Services Processing Task” is set to “True”.

Follow these steps now……..


  1. Get comma separated list of all measure groups needs to be processed in local SSIS variable let say “varMeasureGroups”. You can get this list from some sort of configuration, so no hard-coding ;)
  2. Creates the command using “Script Task” and store the final command in local SSIS variable let say “varCubeCommand”
  3. In “Analysis Services Processing Task” task set the property “Processing Commands” using expressions to the recently created variable named “varCubeCommand”.

That’s it………no need to process the full cube now………



Dim strSplitMeasureGroup As String(), i As Integer, strCmd As String

strSplitMeasureGroup = Split(Dts.Variables("User::varMeasureGroups").Value.ToString, ", ")

strCmd = "<Batch xmlns=""http://schemas.microsoft.com/analysisservices/2003/engine"">"


For i = 0 To strSplitMeasureGroup.Length – 1


strCmd = strCmd & "<Process xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" "

strCmd = strCmd & "xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"">" & Chr(13)

strCmd = strCmd & "<Object>" & Chr(13)

strCmd = strCmd & "<DatabaseID>myCube</DatabaseID>" & Chr(13)

strCmd = strCmd & "<CubeID>myCube</CubeID>" & Chr(13)


strCmd = strCmd & "<MeasureGroupID>" & strSplitMeasureGroup(i) & "</MeasureGroupID>"&chr(13)


strCmd = strCmd & "</Object>" & Chr(13)

strCmd = strCmd & "<Type>ProcessFull</Type>" & Chr(13)

strCmd = strCmd & "<WriteBackTableCreation>UseExisting</WriteBackTableCreation>" & Chr(13)

strCmd = strCmd & "</Process>" & Chr(13)


Next


strCmd = strCmd & "</Batch>"


Dts.Variables("User::varCubeCommand").Value = strCmd



- Mohit Nayyar

Sunday, September 23, 2007

UPDATE: SSIS - deadlock was detected while trying to lock variables

Recently I found this issue of deadlock variables in SSIS and finally made two solutions

1. Instead of declaring variables in Script properties (ReadOnlyVariables / ReadWriteVariables), better to make use of Dts.VariableDispenser.LockForRead / Dts.VariableDispenser.LockForWrite in the script to lock variables

2. I also solved my problem by running child packages out-of-proc (ExecuteOutOfProcess=TRUE in Execute Package Task), this is more to do with script caching

But recently I saw something strange and found out that one of my script is making use of this type of code and I am getting this error again.....

"The script threw an exception: A deadlock was detected while trying to lock variables "variable names (comma separated)" for read access and variables "variable names (comma separated)" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out."

  

Dim var As Variables

 

Dts.VariableDispenser.LockForWrite("User::VarName")

  

Dts.VariableDispenser.GetVariables(var)

  

'Problem Line, Just remove the reference to Dts.Variables

Dts.Variables("User::VarName").Value = "SomeVal"

 

'Correct Line, make use of locally declared "var" Variable

var("User::VarName").Value = "SomeVal"

 

var.Unlock()

 

To solve the situation I made a quick fix and change my Dts.Variables to local variable collection I defined in the beginning "var".

As we can see in this script that we are locking one variable for write. But after using the GetVariables we MUST use "var" collection to write data and should NOT use Dts.Variables and that fixed my problem.

So I think this is some sort of double locking issue, like if I lock the variable using "Dts.VariableDispenser" and if I try to access the same variable again with Dts.Variables instead of "var" then SSIS tries to lock that again......

 

- Mohit Nayyar

Thursday, September 13, 2007

DTS to SSIS migration

Today I saw a new product coming soon to do DTS to SSIS migration called DtsXchange.

Looks promising and this is also capable to do something extra which built-in DTS migration doesn’t do like Dynamic Properties. Have a look here http://www.pragmaticworks.com/dtsxchange.htm


Users can also migrate existing DTS packages into SSIS using the FREE migration wizard provided by SQL Server 2005 but this wizard doesn’t cover complete DTS package. So in some cases manual effort is required, and SSIS doesn’t support some of the DTS features, so the user has to manually implement the functionality in new SSIS package.

You can find some known issues in migration here http://technet.microsoft.com/en-us/library/ms143462.aspx


But on top of this SSIS allows you to run the existing DTS package even without any change, using a wrapper called “Execute DTS 2000 Package Task” in the new SSIS package.

So if you don’t have the bandwidth to migrate the existing package then keep on running the old DTS packages in SSIS and start making new packages in SSIS.

Obviously you will miss the “All NEW SSIS” but it’s an easy work around. But I think you will miss a lot, because SSIS is NOT a new name of DTS but it's a truly enterprise level ETL tool. So its always better to migrate to SSIS as soon as possible.


- Mohit Nayyar

Monday, August 27, 2007

SSIS - deadlock was detected while trying to lock variables

Recently I faced one MAJOR issue in my ETL packages because of new patches deployed on Microsoft Windows 2003 Server.

"A deadlock was detected while trying to lock variables "variable names (comma separated)" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out."

OR

"The script threw an exception: A deadlock was detected while trying to lock variables "variable names (comma separated)" for read access and variables "variable names (comma separated)" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out."


It's more to do with Script component/Task used in the package and making use of "ReadOnlyVariables" and "ReadWriteVariables" properties for declaring variables that will be used in the script.

Technically if we declare variables in these two properties then there is NO need to lock these variables in the script, and this is exactly what we were doing in the past prior to these patches....

Windows 2003 Post-SP2 Hotfix - MS07-31/935840 W2K3 Server
Windows 2003 Post-SP2 Hotfix - MS07-34/929123 W2K3 Server
Windows 2003 Post-SP2 Hotfix - MS07-039/926122 W2K3 Server
Windows 2003 Post-SP2 Hotfix - KB924054 W2K3 Server
2.0 IE Update W2K3 Server

Now I don't know which one of these patches is causing the real problem but yes it's one of them for sure.

We can see these error in the SSIS logging files...

Now the quick fix to these errors are NOT using "ReadOnlyVariables" and "ReadWriteVariables" properties for declaring variables that will be used in the script. Instead a better way is to declare the same in script itself and then locking and unlocking the same using DTS object model.


Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain

Public Sub Main()

Dim myVar As Variables

Dts.VariableDispenser.LockForWrite("User::Var1")
Dts.VariableDispenser.LockForWrite("User::Var2")

Dts.VariableDispenser.GetVariables(myVar)

myVar("User::Var1").Value = "SomeValue"
myVar("User::Var1").Value = "SomeValue"

myVar.Unlock()

Dts.TaskResult = Dts.Results.Success

End Sub

End Class



Let me know if you know the exact source of this problem.

- Mohit Nayyar

Thursday, June 14, 2007

T-SQL or Merge JOIN or Lookup to load Dimension data

In my recent discussion with a friend we talked about loading the data in dimension table. So being a SQL guy we have at least two options available in Microsoft SQL Server 2005.

So let me first explain the objective behind this.

We need to load only new data in target dimension table. So we need to check if this data exists at the destination, if not then insert this new data based on the business keys e.g. CustomerCode, ProductCode etc.

T-SQL Solution

LEFT / RIGHT JOIN
Make a join of target table with source table on business keys and then use LEFT or RIGHT join to get only new rows.

SELECT s.CustomerCode, s.CustomerName
FROM SourceTable s
LEFT JOIN TargetTable t ON (s.CustomerCode = t.CustomerCode)
WHERE t.CustomerCode IS NULL

The above solution works fine if you have source and destination databases on the same server, or may be you can use linked server to achieve the same if it's on different servers.

SSIS Solution

Now the standard solution to use LEFT / RIGHT JOINS in SSIS is "Merge Join" transformation, you can do precisely the same thing, it works well even if you have source database in Oracle and target database in Microsoft SQL Server.

But as we are aware that Merge Join expects sorted data, so if you are planning to sort the data at source system or use SORT transformation in SSIS then it will prove to be a costly affair.

Hmm............then what else can I do.........here I present Lookup transformation in SSIS

Well, generally this component is used to load Fact tables; when you want to load surrogate keys derived from Dimension table into Fact table.

But we can also "Configure Error Output" for this component....what does that mean........let me explain.....

Technically we map a common field from two tables in this component and fetch other columns based on this common column data. So what if all the data is not available in both of these tables…..let me explain…





Let say we have 10 rows in source and 5 rows in target.........so by default this component will return error because it’s not able to find other 5 rows in target table................hmmm..........trust me this is really good for us....









Now I can "Configure Error Output" for this component, which will redirect the failure rows and these failure redirected rows are the one that we are missing in target............so we achieved the functionality of loading only new rows in target table.





I hope this will be helpful for you as well..........