Wednesday, August 29, 2007

Get all user defined table information with RowCount and Size (space used)

"sp_spaceused" procedure is really handy to get the updated information about space usage by database or tables.

 

I combined the same with another very useful procedure "sp_MSforeachtable" to get this information for all user-defined tables.

 

If you see any problem with the information then just change the last parameter and set it to TRUE (FALSE in the script). That will update the space usage and then report the updated details.

 

CREATE TABLE #temp

(

TableName                    NVARCHAR (128),

RowsCnt                       VARCHAR (11),

ReservedSpace             VARCHAR(18),

DataSpace                    VARCHAR(18),

CombinedIndexSpace   VARCHAR(18),

UnusedSpace               VARCHAR(18)

)

 

EXEC sp_MSforeachtable 'INSERT INTO #temp (TableName, RowsCnt, ReservedSpace, DataSpace, CombinedIndexSpace, UnusedSpace) EXEC sp_spaceused ''?'', FALSE'

 

SELECT TableName, RowsCnt, ReservedSpace, DataSpace, CombinedIndexSpace, UnusedSpace

FROM #temp

ORDER BY TableName

 

DROP TABLE #temp

 

 

- Mohit Nayyar

 

Monday, August 27, 2007

SSIS - deadlock was detected while trying to lock variables

Recently I faced one MAJOR issue in my ETL packages because of new patches deployed on Microsoft Windows 2003 Server.

"A deadlock was detected while trying to lock variables "variable names (comma separated)" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out."

OR

"The script threw an exception: A deadlock was detected while trying to lock variables "variable names (comma separated)" for read access and variables "variable names (comma separated)" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out."


It's more to do with Script component/Task used in the package and making use of "ReadOnlyVariables" and "ReadWriteVariables" properties for declaring variables that will be used in the script.

Technically if we declare variables in these two properties then there is NO need to lock these variables in the script, and this is exactly what we were doing in the past prior to these patches....

Windows 2003 Post-SP2 Hotfix - MS07-31/935840 W2K3 Server
Windows 2003 Post-SP2 Hotfix - MS07-34/929123 W2K3 Server
Windows 2003 Post-SP2 Hotfix - MS07-039/926122 W2K3 Server
Windows 2003 Post-SP2 Hotfix - KB924054 W2K3 Server
2.0 IE Update W2K3 Server

Now I don't know which one of these patches is causing the real problem but yes it's one of them for sure.

We can see these error in the SSIS logging files...

Now the quick fix to these errors are NOT using "ReadOnlyVariables" and "ReadWriteVariables" properties for declaring variables that will be used in the script. Instead a better way is to declare the same in script itself and then locking and unlocking the same using DTS object model.


Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain

Public Sub Main()

Dim myVar As Variables

Dts.VariableDispenser.LockForWrite("User::Var1")
Dts.VariableDispenser.LockForWrite("User::Var2")

Dts.VariableDispenser.GetVariables(myVar)

myVar("User::Var1").Value = "SomeValue"
myVar("User::Var1").Value = "SomeValue"

myVar.Unlock()

Dts.TaskResult = Dts.Results.Success

End Sub

End Class



Let me know if you know the exact source of this problem.

- Mohit Nayyar

Friday, August 17, 2007

Microsoft SQL Server common query performance problems

At some point of time we all have to fine tune our old SQL Server queries or some of our colleague’s queries ;)

So I thought of writing something about common query performance problems and a checklist which can be used a make sure we didn’t miss anything.

1.

Large Result Set

Common mistake to include * instead of specific column names takes first place in investigation for performance issue.

We should always use specific column required by the application instead of asking all available columns.

Using * get worst when there are image/text/binary or very large data columns available in the table.

Making use of TOP / TOP Percent / SET ROWCOUNT is another smart move to restrict the data based on technical solution.

2.

Missing Where Clause

We should always include the specific criteria when asking for rows.

Making use of WHERE clause is very beneficial when we have appropriate indexes on the columns in WHERE and SELECT clause.

3.

Lack of useful statistics

This is again very important point, statistics are the key for query optimizer to make an execution plan and get the data out.

SQL Server automatically creates useful statistics, but just makes sure that “Auto Create Statistics” is set to “TRUE” in database properties.

4.

Out-of-date statistics

This is again very important point, statistics are the key for query optimizer to make an execution plan and get the data out.

SQL Server automatically creates useful statistics, but just makes sure that “Auto Update Statistics” is set to “TRUE” in database properties.

5.

Lack of useful indexes

Again Indexes are the key to do any data I/O related operation. So just make sure you have all required indexes in place.

Just make sure you have created required clustered / non-clustered indexes based on the business requirement and technical solution implementation.

And try to create indexes on smallest possible numeric columns with respect to indexing requirement. I recommend reading BOL to understand the indexes thoroughly.

6.

Missing Joins

Yeah, that’s again is a good point to check. I have seen developers who simply put statements like “select * from table1, table2, table3”.

Never do that, just make sure you have followed the proper join and I recommend ANSI joins instead of “=, *=, =*” type joins. They are easy to read/manage and sort of required for future SQL Server versions.

7.

Blocking

Transactions’ fighting for resources is nothing new in any RDBMS application but we need to make sure that make use of all possible options to avoid excessive blocking.

8.

Query / Table Hints

They can be good or bad, based on the scenario. But it’s recommended that we should not use them extensively unless we have excellent understanding of technical implantation and future business requirement and data load trends.

SQL Server takes care of this very well, so better don’t use.

9.

Use best possible data type

We are talking about database systems, so taking care of what we are trying to store is very important in designing the whole system.

We should always choose the best possible smallest data type based on the scenario. Using INT for storing something like number 2000 is not a good option.

10.

Index Fragmentation

Hmm, not only having indexes solves the whole problem, we need to make sure that indexes are not fragmentated as well.

DBCC SHOWCONTIG and some of the latest DMVs are very useful in checking the fragmentation. And then re-building or defragmenting the index will solve the problem easily.

11.

Cursors

We should use cursors only when there is no option left and that too using READ ONLY, FAST FORWARD is recommended. And please make sure that we should only ask for smallest result-set possible using WHERE clause and with specific columns name instead of using star (*) in cursor definition.

Otherwise cursors put extra overhead on SQL Server and make extensive use of TempDB.

12.

Extensive temp table (tempdb) usage

Big bulky queries with lots of ordering, grouping clauses which result in huge result set can make use of TempDB extensively.

So try to use simple, smart queries which generate small result set and if this is not possible then make sure TempDb is on fastest possible disk system with setting a optimum size for the TempDB based on the technical solution.

Using TABLE data type is another good option for storing temporary result set. CTE (Common Table Expression), new to Microsoft SQL Server 2005 can also be used to achieve the same to some extent.

13.

Triggers

Try to use Constraints instead of triggers, like for referential integrity task using triggers is not a good option and that can be easily achievable by using constraints.

Triggers are more resource expensive and they work as integral part of transaction, which will makes transaction lengthy as well.

14.

Dynamic SQL Statements

Using Views, stored procedures are always beneficial over using dynamic SQL. And using dynamic SQL is an open invitation to SQL injection attacks.

And when there is no option left then we should make use of sp_ExecuteSQL instead of using EXEC command.

15.

Lack of Views / Stored Procedures

Accessing direct table is not a good practice in any engagement, and a developer should always access data through views and stored procedures.

SP also provides additional benefits like modular programming, security later, saves network traffic, and faster execution. SQL Server 2005 also introduces a statement level re-compilation, which is again very useful for bulky procedures.

16.

Use Distinct clause

Deleting duplicate data itself is a resource intensive operation, so making use of DISTINCT clause should be minimal in all queries.

17.

Missing SET NOCOUNT ON statement

Setting this option will stop sending the rows affected message to the calling client, which itself is a network intensive operation and moreover client applications doesn’t make use of those messages.

18.

Using UNION clause

Again using UNION will delete the duplicate rows in the result set, which is a resource intensive operation, so using UNION ALL is preferred way to doing the same thing.

19.

Multiple Network trips

Making use of stored procedures, views and small result set can save lots of network traffic.

Instead of sending a lengthy query to SQL server we should send procedure name with parameters to SQL Server and then get the smallest possible result set by using all possible options I explained here.

20.

Lack of useful data striping

Data partition is the very useful option to reduce the size of the table. We can do the same using horizontal or vertical partition.

This way we can save a lot of IO overhead and other related resources. There are various options available in SQL Server 2005 to do the same stuff and developers can also make their custom solutions based on business requirement.

Remember, small tables are always faster than bulky ones.

21.

Front-end Application issues like looping statements

Sometimes things are going well at SQL server end but because of some problem at front-end application we assume that SQL server is not doing his job.

So it’s always better to test the query at SQL Server end and then checking network resources before blaming poor SQL Server.

22.

Third party applications like anti-virus or some other services like IIS

Sometimes resource overhead because of other application running on same SQL Server is the main cause of poor performance.

So we should always check the memory and CPU utilization of other applications on the server.

23.

Slow network communication

Network bandwidth plays a very vital role in whole solution and this should be tested thoroughly in the solution.

24.

Inadequate memory in the server computer or not enough memory available for SQL Server

Setting appropriate settings for SQL Server is an art; SQL Server will NOT utilize all memory available until asked to do so.

Using 3GB, PAE, AWE options are the key to memory utilization for Microsoft SQL Server.

25.

High Processor usage by other applications

Sometimes resource overhead because of other application running on same SQL Server is the main cause of poor performance.

So we should always check the memory and CPU utilization of other applications on the server.

26.

Incorrectly configured system parameters

Setting SQL Server system parameters is a highly skilled job, and in some case it’s an art. Things works well on one server but this doesn’t out be true on another server because of changed business/technical requirement.

So it’s always better NOT to change any default option and even if we need to change some option then please read thoroughly about the option and test on development server before putting the same on production server.

27.

Disk bottlenecks

Disk IO is the key to any RDBMS application, because all we have is data and lots of data. So make sure you have used best possible RAID along with fastest possible Disk IO.