At some point of time we all have to fine tune our old SQL Server queries or some of our colleague’s queries ;)
So I thought of writing something about common query performance problems and a checklist which can be used a make sure we didn’t miss anything.
1. | Large Result Set | Common mistake to include * instead of specific column names takes first place in investigation for performance issue. We should always use specific column required by the application instead of asking all available columns. Using * get worst when there are image/text/binary or very large data columns available in the table. Making use of TOP / TOP Percent / SET ROWCOUNT is another smart move to restrict the data based on technical solution. |
2. | Missing Where Clause | We should always include the specific criteria when asking for rows. Making use of WHERE clause is very beneficial when we have appropriate indexes on the columns in WHERE and SELECT clause. |
3. | Lack of useful statistics | This is again very important point, statistics are the key for query optimizer to make an execution plan and get the data out. SQL Server automatically creates useful statistics, but just makes sure that “Auto Create Statistics” is set to “TRUE” in database properties. |
4. | Out-of-date statistics | This is again very important point, statistics are the key for query optimizer to make an execution plan and get the data out. SQL Server automatically creates useful statistics, but just makes sure that “Auto Update Statistics” is set to “TRUE” in database properties. |
5. | Lack of useful indexes | Again Indexes are the key to do any data I/O related operation. So just make sure you have all required indexes in place. Just make sure you have created required clustered / non-clustered indexes based on the business requirement and technical solution implementation. And try to create indexes on smallest possible numeric columns with respect to indexing requirement. I recommend reading BOL to understand the indexes thoroughly. |
6. | Missing Joins | Yeah, that’s again is a good point to check. I have seen developers who simply put statements like “select * from table1, table2, table3”. Never do that, just make sure you have followed the proper join and I recommend ANSI joins instead of “=, *=, =*” type joins. They are easy to read/manage and sort of required for future SQL Server versions. |
7. | Blocking | Transactions’ fighting for resources is nothing new in any RDBMS application but we need to make sure that make use of all possible options to avoid excessive blocking. |
8. | Query / Table Hints | They can be good or bad, based on the scenario. But it’s recommended that we should not use them extensively unless we have excellent understanding of technical implantation and future business requirement and data load trends. SQL Server takes care of this very well, so better don’t use. |
9. | Use best possible data type | We are talking about database systems, so taking care of what we are trying to store is very important in designing the whole system. We should always choose the best possible smallest data type based on the scenario. Using INT for storing something like number 2000 is not a good option. |
10. | Index Fragmentation | Hmm, not only having indexes solves the whole problem, we need to make sure that indexes are not fragmentated as well. DBCC SHOWCONTIG and some of the latest DMVs are very useful in checking the fragmentation. And then re-building or defragmenting the index will solve the problem easily. |
11. | Cursors | We should use cursors only when there is no option left and that too using READ ONLY, FAST FORWARD is recommended. And please make sure that we should only ask for smallest result-set possible using WHERE clause and with specific columns name instead of using star (*) in cursor definition. Otherwise cursors put extra overhead on SQL Server and make extensive use of TempDB. |
12. | Extensive temp table (tempdb) usage | Big bulky queries with lots of ordering, grouping clauses which result in huge result set can make use of TempDB extensively. So try to use simple, smart queries which generate small result set and if this is not possible then make sure TempDb is on fastest possible disk system with setting a optimum size for the TempDB based on the technical solution. Using TABLE data type is another good option for storing temporary result set. CTE (Common Table Expression), new to Microsoft SQL Server 2005 can also be used to achieve the same to some extent. |
13. | Triggers | Try to use Constraints instead of triggers, like for referential integrity task using triggers is not a good option and that can be easily achievable by using constraints. Triggers are more resource expensive and they work as integral part of transaction, which will makes transaction lengthy as well. |
14. | Dynamic SQL Statements | Using Views, stored procedures are always beneficial over using dynamic SQL. And using dynamic SQL is an open invitation to SQL injection attacks. And when there is no option left then we should make use of sp_ExecuteSQL instead of using EXEC command. |
15. | Lack of Views / Stored Procedures | Accessing direct table is not a good practice in any engagement, and a developer should always access data through views and stored procedures. SP also provides additional benefits like modular programming, security later, saves network traffic, and faster execution. SQL Server 2005 also introduces a statement level re-compilation, which is again very useful for bulky procedures. |
16. | Use Distinct clause | Deleting duplicate data itself is a resource intensive operation, so making use of DISTINCT clause should be minimal in all queries. |
17. | Missing SET NOCOUNT ON statement | Setting this option will stop sending the rows affected message to the calling client, which itself is a network intensive operation and moreover client applications doesn’t make use of those messages. |
18. | Using UNION clause | Again using UNION will delete the duplicate rows in the result set, which is a resource intensive operation, so using UNION ALL is preferred way to doing the same thing. |
19. | Multiple Network trips | Making use of stored procedures, views and small result set can save lots of network traffic. Instead of sending a lengthy query to SQL server we should send procedure name with parameters to SQL Server and then get the smallest possible result set by using all possible options I explained here. |
20. | Lack of useful data striping | Data partition is the very useful option to reduce the size of the table. We can do the same using horizontal or vertical partition. This way we can save a lot of IO overhead and other related resources. There are various options available in SQL Server 2005 to do the same stuff and developers can also make their custom solutions based on business requirement. Remember, small tables are always faster than bulky ones. |
21. | Front-end Application issues like looping statements | Sometimes things are going well at SQL server end but because of some problem at front-end application we assume that SQL server is not doing his job. So it’s always better to test the query at SQL Server end and then checking network resources before blaming poor SQL Server. |
22. | Third party applications like anti-virus or some other services like IIS | Sometimes resource overhead because of other application running on same SQL Server is the main cause of poor performance. So we should always check the memory and CPU utilization of other applications on the server. |
23. | Slow network communication | Network bandwidth plays a very vital role in whole solution and this should be tested thoroughly in the solution. |
24. | Inadequate memory in the server computer or not enough memory available for SQL Server | Setting appropriate settings for SQL Server is an art; SQL Server will NOT utilize all memory available until asked to do so. Using 3GB, PAE, AWE options are the key to memory utilization for Microsoft SQL Server. |
25. | High Processor usage by other applications | Sometimes resource overhead because of other application running on same SQL Server is the main cause of poor performance. So we should always check the memory and CPU utilization of other applications on the server. |
26. | Incorrectly configured system parameters | Setting SQL Server system parameters is a highly skilled job, and in some case it’s an art. Things works well on one server but this doesn’t out be true on another server because of changed business/technical requirement. So it’s always better NOT to change any default option and even if we need to change some option then please read thoroughly about the option and test on development server before putting the same on production server. |
27. | Disk bottlenecks | Disk IO is the key to any RDBMS application, because all we have is data and lots of data. So make sure you have used best possible RAID along with fastest possible Disk IO. |
No comments:
Post a Comment