While loading data in Fact tables we usually see a scenario where the fact data is available but there is no corresponding business key in the related dimension.
In this case we choose multiple options to resolve the issue.
- Ignore that fact
- Insert the associated business key in dimension table and return the newly generated surrogate key from dimension table. And now store the data in Fact table with the surrogate key.
The second approach relates to a term called “Inferred members”. All the other attributes of that dimension will also get updated in next run of dimension load (usually nightly load).
In SSIS there are multiple options available to implement the second case.
First approach is to do lookup on the dimension table and for all the rows that are now matching, insert the business key in Dimension table and then do the lookup again to get the surrogate key.
Second approach is to make use of Lookup and Script component. Lookup component will ignore rows with no matching business key in dimension table. Then script component will process only those rows where it didn’t find the associated surrogate key and finally insert the same in dimension table and return the associated surrogate key through stored procedure output parameter.
This script component approach is more efficient because its using the existing lookup component only once and then doing all the processing in script component.
But the additional benefit comes if we make use of .Net Generic.SortedDictionary class to store the cache information regarding the newly generated key. Read more about this here…
http://msdn.microsoft.com/en-us/library/f7fta44c(VS.80).aspx
- Mohit
2 comments:
Dear Mohit,
I think these solutions are "Pre-data flow checks", because you are checking whether the dimension member is defined or not before adding fact data.
Thanks for the practical solutions that you described. These are fantastic.
Best wishes,
Amin
Thanks Mohit, good article!
I've done a SQL-command execution for inserting inferred members, but been dumbfounded that SSIS doesn't have a cache-refresh option, since the lookup will continue to use the old cache, I get a lot of cache misses which triggers further SQL-insertion executes which leads to poor performance.
This is a solution to this, although I think the option should exist in the lookup component. Let's hope it gets added in SQL2008.
Best regards,
Stefan Verzel
Post a Comment