Monday, October 22, 2007

Selective Cube Measure Groups processing using “Analysis Services Processing Task” in SSIS

Sometime its not feasible to process the full cube, so it makes sense to only process the selective measure group in the cube.

This is especially true when you have really big cube which takes time to process and sometime we just need to load data for only few business metrics.

Here is the small example to process only selective measure groups using “SSIS Analysis Services Processing Task”.

I am assuming that the measure group names are stored in a variable and after that we need to create “Processing Commands” to process the cube.

Processing Commands: We need to set this property in “Analysis Services Processing Task” to process the measure groups. So I will be using “Script Task” to generate the command.

Just make sure that the “Delay Validation” property of “Analysis Services Processing Task” is set to “True”.

Follow these steps now……..


  1. Get comma separated list of all measure groups needs to be processed in local SSIS variable let say “varMeasureGroups”. You can get this list from some sort of configuration, so no hard-coding ;)
  2. Creates the command using “Script Task” and store the final command in local SSIS variable let say “varCubeCommand”
  3. In “Analysis Services Processing Task” task set the property “Processing Commands” using expressions to the recently created variable named “varCubeCommand”.

That’s it………no need to process the full cube now………



Dim strSplitMeasureGroup As String(), i As Integer, strCmd As String

strSplitMeasureGroup = Split(Dts.Variables("User::varMeasureGroups").Value.ToString, ", ")

strCmd = "<Batch xmlns=""http://schemas.microsoft.com/analysisservices/2003/engine"">"


For i = 0 To strSplitMeasureGroup.Length – 1


strCmd = strCmd & "<Process xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" "

strCmd = strCmd & "xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"">" & Chr(13)

strCmd = strCmd & "<Object>" & Chr(13)

strCmd = strCmd & "<DatabaseID>myCube</DatabaseID>" & Chr(13)

strCmd = strCmd & "<CubeID>myCube</CubeID>" & Chr(13)


strCmd = strCmd & "<MeasureGroupID>" & strSplitMeasureGroup(i) & "</MeasureGroupID>"&chr(13)


strCmd = strCmd & "</Object>" & Chr(13)

strCmd = strCmd & "<Type>ProcessFull</Type>" & Chr(13)

strCmd = strCmd & "<WriteBackTableCreation>UseExisting</WriteBackTableCreation>" & Chr(13)

strCmd = strCmd & "</Process>" & Chr(13)


Next


strCmd = strCmd & "</Batch>"


Dts.Variables("User::varCubeCommand").Value = strCmd



- Mohit Nayyar

Sunday, September 23, 2007

UPDATE: SSIS - deadlock was detected while trying to lock variables

Recently I found this issue of deadlock variables in SSIS and finally made two solutions

1. Instead of declaring variables in Script properties (ReadOnlyVariables / ReadWriteVariables), better to make use of Dts.VariableDispenser.LockForRead / Dts.VariableDispenser.LockForWrite in the script to lock variables

2. I also solved my problem by running child packages out-of-proc (ExecuteOutOfProcess=TRUE in Execute Package Task), this is more to do with script caching

But recently I saw something strange and found out that one of my script is making use of this type of code and I am getting this error again.....

"The script threw an exception: A deadlock was detected while trying to lock variables "variable names (comma separated)" for read access and variables "variable names (comma separated)" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out."

  

Dim var As Variables

 

Dts.VariableDispenser.LockForWrite("User::VarName")

  

Dts.VariableDispenser.GetVariables(var)

  

'Problem Line, Just remove the reference to Dts.Variables

Dts.Variables("User::VarName").Value = "SomeVal"

 

'Correct Line, make use of locally declared "var" Variable

var("User::VarName").Value = "SomeVal"

 

var.Unlock()

 

To solve the situation I made a quick fix and change my Dts.Variables to local variable collection I defined in the beginning "var".

As we can see in this script that we are locking one variable for write. But after using the GetVariables we MUST use "var" collection to write data and should NOT use Dts.Variables and that fixed my problem.

So I think this is some sort of double locking issue, like if I lock the variable using "Dts.VariableDispenser" and if I try to access the same variable again with Dts.Variables instead of "var" then SSIS tries to lock that again......

 

- Mohit Nayyar

Friday, September 14, 2007

SQL Query tool that generates HTML / PDF / Word / Excel output

I just saw this new query tool from SQL Answers.

The beauty of this tool is to directly generate the output of any query into PDF / HTML / Excel / Word / CSV format. Please take a look here…....

http://www.sqlanswers.com/Software/SAQ/Default.aspx


Mohit Nayyar SQL Query Tool

This seems to be a lightweight tool to query multiple databases platforms like Microsoft SQL Server, Oracle, DB2, Microsoft Access and MySQL, but I just tested this for Microsoft SQL Server 2005 and I am really happy with this.

This makes things really easy when you need to send the output in PDF or HTML without any extra effort.

On the similar lines Microsoft also offers a “SQL Server Web Data Administrator” tool, which is very useful because of its basic web interface. You can find this here…….

http://www.microsoft.com/Downloads/details.aspx?familyid=C039A798-C57A-419E-ACBC-2A332CB7F959&displaylang=en

- Mohit Nayyar

Thursday, September 13, 2007

DTS to SSIS migration

Today I saw a new product coming soon to do DTS to SSIS migration called DtsXchange.

Looks promising and this is also capable to do something extra which built-in DTS migration doesn’t do like Dynamic Properties. Have a look here http://www.pragmaticworks.com/dtsxchange.htm


Users can also migrate existing DTS packages into SSIS using the FREE migration wizard provided by SQL Server 2005 but this wizard doesn’t cover complete DTS package. So in some cases manual effort is required, and SSIS doesn’t support some of the DTS features, so the user has to manually implement the functionality in new SSIS package.

You can find some known issues in migration here http://technet.microsoft.com/en-us/library/ms143462.aspx


But on top of this SSIS allows you to run the existing DTS package even without any change, using a wrapper called “Execute DTS 2000 Package Task” in the new SSIS package.

So if you don’t have the bandwidth to migrate the existing package then keep on running the old DTS packages in SSIS and start making new packages in SSIS.

Obviously you will miss the “All NEW SSIS” but it’s an easy work around. But I think you will miss a lot, because SSIS is NOT a new name of DTS but it's a truly enterprise level ETL tool. So its always better to migrate to SSIS as soon as possible.


- Mohit Nayyar

Wednesday, August 29, 2007

Get all user defined table information with RowCount and Size (space used)

"sp_spaceused" procedure is really handy to get the updated information about space usage by database or tables.

 

I combined the same with another very useful procedure "sp_MSforeachtable" to get this information for all user-defined tables.

 

If you see any problem with the information then just change the last parameter and set it to TRUE (FALSE in the script). That will update the space usage and then report the updated details.

 

CREATE TABLE #temp

(

TableName                    NVARCHAR (128),

RowsCnt                       VARCHAR (11),

ReservedSpace             VARCHAR(18),

DataSpace                    VARCHAR(18),

CombinedIndexSpace   VARCHAR(18),

UnusedSpace               VARCHAR(18)

)

 

EXEC sp_MSforeachtable 'INSERT INTO #temp (TableName, RowsCnt, ReservedSpace, DataSpace, CombinedIndexSpace, UnusedSpace) EXEC sp_spaceused ''?'', FALSE'

 

SELECT TableName, RowsCnt, ReservedSpace, DataSpace, CombinedIndexSpace, UnusedSpace

FROM #temp

ORDER BY TableName

 

DROP TABLE #temp

 

 

- Mohit Nayyar

 

Monday, August 27, 2007

SSIS - deadlock was detected while trying to lock variables

Recently I faced one MAJOR issue in my ETL packages because of new patches deployed on Microsoft Windows 2003 Server.

"A deadlock was detected while trying to lock variables "variable names (comma separated)" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out."

OR

"The script threw an exception: A deadlock was detected while trying to lock variables "variable names (comma separated)" for read access and variables "variable names (comma separated)" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out."


It's more to do with Script component/Task used in the package and making use of "ReadOnlyVariables" and "ReadWriteVariables" properties for declaring variables that will be used in the script.

Technically if we declare variables in these two properties then there is NO need to lock these variables in the script, and this is exactly what we were doing in the past prior to these patches....

Windows 2003 Post-SP2 Hotfix - MS07-31/935840 W2K3 Server
Windows 2003 Post-SP2 Hotfix - MS07-34/929123 W2K3 Server
Windows 2003 Post-SP2 Hotfix - MS07-039/926122 W2K3 Server
Windows 2003 Post-SP2 Hotfix - KB924054 W2K3 Server
2.0 IE Update W2K3 Server

Now I don't know which one of these patches is causing the real problem but yes it's one of them for sure.

We can see these error in the SSIS logging files...

Now the quick fix to these errors are NOT using "ReadOnlyVariables" and "ReadWriteVariables" properties for declaring variables that will be used in the script. Instead a better way is to declare the same in script itself and then locking and unlocking the same using DTS object model.


Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain

Public Sub Main()

Dim myVar As Variables

Dts.VariableDispenser.LockForWrite("User::Var1")
Dts.VariableDispenser.LockForWrite("User::Var2")

Dts.VariableDispenser.GetVariables(myVar)

myVar("User::Var1").Value = "SomeValue"
myVar("User::Var1").Value = "SomeValue"

myVar.Unlock()

Dts.TaskResult = Dts.Results.Success

End Sub

End Class



Let me know if you know the exact source of this problem.

- Mohit Nayyar

Friday, August 17, 2007

Microsoft SQL Server common query performance problems

At some point of time we all have to fine tune our old SQL Server queries or some of our colleague’s queries ;)

So I thought of writing something about common query performance problems and a checklist which can be used a make sure we didn’t miss anything.

1.

Large Result Set

Common mistake to include * instead of specific column names takes first place in investigation for performance issue.

We should always use specific column required by the application instead of asking all available columns.

Using * get worst when there are image/text/binary or very large data columns available in the table.

Making use of TOP / TOP Percent / SET ROWCOUNT is another smart move to restrict the data based on technical solution.

2.

Missing Where Clause

We should always include the specific criteria when asking for rows.

Making use of WHERE clause is very beneficial when we have appropriate indexes on the columns in WHERE and SELECT clause.

3.

Lack of useful statistics

This is again very important point, statistics are the key for query optimizer to make an execution plan and get the data out.

SQL Server automatically creates useful statistics, but just makes sure that “Auto Create Statistics” is set to “TRUE” in database properties.

4.

Out-of-date statistics

This is again very important point, statistics are the key for query optimizer to make an execution plan and get the data out.

SQL Server automatically creates useful statistics, but just makes sure that “Auto Update Statistics” is set to “TRUE” in database properties.

5.

Lack of useful indexes

Again Indexes are the key to do any data I/O related operation. So just make sure you have all required indexes in place.

Just make sure you have created required clustered / non-clustered indexes based on the business requirement and technical solution implementation.

And try to create indexes on smallest possible numeric columns with respect to indexing requirement. I recommend reading BOL to understand the indexes thoroughly.

6.

Missing Joins

Yeah, that’s again is a good point to check. I have seen developers who simply put statements like “select * from table1, table2, table3”.

Never do that, just make sure you have followed the proper join and I recommend ANSI joins instead of “=, *=, =*” type joins. They are easy to read/manage and sort of required for future SQL Server versions.

7.

Blocking

Transactions’ fighting for resources is nothing new in any RDBMS application but we need to make sure that make use of all possible options to avoid excessive blocking.

8.

Query / Table Hints

They can be good or bad, based on the scenario. But it’s recommended that we should not use them extensively unless we have excellent understanding of technical implantation and future business requirement and data load trends.

SQL Server takes care of this very well, so better don’t use.

9.

Use best possible data type

We are talking about database systems, so taking care of what we are trying to store is very important in designing the whole system.

We should always choose the best possible smallest data type based on the scenario. Using INT for storing something like number 2000 is not a good option.

10.

Index Fragmentation

Hmm, not only having indexes solves the whole problem, we need to make sure that indexes are not fragmentated as well.

DBCC SHOWCONTIG and some of the latest DMVs are very useful in checking the fragmentation. And then re-building or defragmenting the index will solve the problem easily.

11.

Cursors

We should use cursors only when there is no option left and that too using READ ONLY, FAST FORWARD is recommended. And please make sure that we should only ask for smallest result-set possible using WHERE clause and with specific columns name instead of using star (*) in cursor definition.

Otherwise cursors put extra overhead on SQL Server and make extensive use of TempDB.

12.

Extensive temp table (tempdb) usage

Big bulky queries with lots of ordering, grouping clauses which result in huge result set can make use of TempDB extensively.

So try to use simple, smart queries which generate small result set and if this is not possible then make sure TempDb is on fastest possible disk system with setting a optimum size for the TempDB based on the technical solution.

Using TABLE data type is another good option for storing temporary result set. CTE (Common Table Expression), new to Microsoft SQL Server 2005 can also be used to achieve the same to some extent.

13.

Triggers

Try to use Constraints instead of triggers, like for referential integrity task using triggers is not a good option and that can be easily achievable by using constraints.

Triggers are more resource expensive and they work as integral part of transaction, which will makes transaction lengthy as well.

14.

Dynamic SQL Statements

Using Views, stored procedures are always beneficial over using dynamic SQL. And using dynamic SQL is an open invitation to SQL injection attacks.

And when there is no option left then we should make use of sp_ExecuteSQL instead of using EXEC command.

15.

Lack of Views / Stored Procedures

Accessing direct table is not a good practice in any engagement, and a developer should always access data through views and stored procedures.

SP also provides additional benefits like modular programming, security later, saves network traffic, and faster execution. SQL Server 2005 also introduces a statement level re-compilation, which is again very useful for bulky procedures.

16.

Use Distinct clause

Deleting duplicate data itself is a resource intensive operation, so making use of DISTINCT clause should be minimal in all queries.

17.

Missing SET NOCOUNT ON statement

Setting this option will stop sending the rows affected message to the calling client, which itself is a network intensive operation and moreover client applications doesn’t make use of those messages.

18.

Using UNION clause

Again using UNION will delete the duplicate rows in the result set, which is a resource intensive operation, so using UNION ALL is preferred way to doing the same thing.

19.

Multiple Network trips

Making use of stored procedures, views and small result set can save lots of network traffic.

Instead of sending a lengthy query to SQL server we should send procedure name with parameters to SQL Server and then get the smallest possible result set by using all possible options I explained here.

20.

Lack of useful data striping

Data partition is the very useful option to reduce the size of the table. We can do the same using horizontal or vertical partition.

This way we can save a lot of IO overhead and other related resources. There are various options available in SQL Server 2005 to do the same stuff and developers can also make their custom solutions based on business requirement.

Remember, small tables are always faster than bulky ones.

21.

Front-end Application issues like looping statements

Sometimes things are going well at SQL server end but because of some problem at front-end application we assume that SQL server is not doing his job.

So it’s always better to test the query at SQL Server end and then checking network resources before blaming poor SQL Server.

22.

Third party applications like anti-virus or some other services like IIS

Sometimes resource overhead because of other application running on same SQL Server is the main cause of poor performance.

So we should always check the memory and CPU utilization of other applications on the server.

23.

Slow network communication

Network bandwidth plays a very vital role in whole solution and this should be tested thoroughly in the solution.

24.

Inadequate memory in the server computer or not enough memory available for SQL Server

Setting appropriate settings for SQL Server is an art; SQL Server will NOT utilize all memory available until asked to do so.

Using 3GB, PAE, AWE options are the key to memory utilization for Microsoft SQL Server.

25.

High Processor usage by other applications

Sometimes resource overhead because of other application running on same SQL Server is the main cause of poor performance.

So we should always check the memory and CPU utilization of other applications on the server.

26.

Incorrectly configured system parameters

Setting SQL Server system parameters is a highly skilled job, and in some case it’s an art. Things works well on one server but this doesn’t out be true on another server because of changed business/technical requirement.

So it’s always better NOT to change any default option and even if we need to change some option then please read thoroughly about the option and test on development server before putting the same on production server.

27.

Disk bottlenecks

Disk IO is the key to any RDBMS application, because all we have is data and lots of data. So make sure you have used best possible RAID along with fastest possible Disk IO.