Friday, March 11, 2011

BEST PRACTICES FOR WRITING BETTER SQL


Background:
The performance of SQL query that we write for different projects is a very crucial issue for us. In many cases we have to perform time consuming optimization work to come out from this performance related problems. Surprisingly, most of the time we succeed to improve our poor SQLs to an optimum level of acceptance. In my experience, when we worked on these optimizations we found, if we could follow some very simple practices of writing SQL and were a little more aware of performance, this issue could have been avoided from the beginning. So, our goal should be to write the best possible SQLs in our first attempt so that we do not need to spend our expensive time to optimize them later.
Here I am trying to share some of the experiences I gained while working in different optimization tasks.
Analyzing SQL:
Analyzing the performance of your SQL at best you can in parallel to the development is very important. If you perform this analysis for every query/operational statement you write before moving to the next query/operational statement of the procedure you can easily trace where the performance is suffering. You should always remember, the time that your procedure takes is the summation of all the query and operational statement that executes inside it. Normally if any of your queries takes more than 0-20 millisecond, you must revise the SQL to see if there is better way to write it.
Here I am providing one utility function that I have written and used in some of my optimization work and found very helpful for tracing the execution time of any SQL in millisecond level:
CREATE FUNCTION [dbo].[GetCurrentTime]()
RETURNS varchar(100)
AS
BEGIN
DECLARE @time varchar(100)
SET @time = (CAST(DATEPART(mi,getDate()) ASvarchar(50)) + ' Min: ' +CAST(DATEPART(ss,getDate()) AS varchar(50)) + ' Sec: ' + CAST(DATEPART(ms,getDate()) ASvarchar(50)) + ' Ms')
RETURN @time
END
You can use the above function in the following way:
DECLARE @time varchar(100)
SET @time = ( dbo.GetCurrentTime())
print 'Current Time: ' + @time
PRINT '----------------------------------------------------------'
--Here write your sql statement.
PRINT '----------------------------------------------------------'
SET @time = ( dbo.GetCurrentTime())
print 'Current Time: ' + @time
This will print the execution start time and execution end time of your SQL in milliseconds while you run the SQL for tracing in the query analyzer. So, from this result you can easily find the time your procedure is taking to execute. Instead of using this function you can utilize SQL profiler as well. But this function will help you get the timing with in the same window that you are working in. However, SQL profiler provides few more information about your query that also might help you to analyze the performance of the SQL. So, do not keep it away from you.
Best Practices:
1. Database Design:
The Design of your database is very important for writing best performance SQL. You must ensure the database is normalized to some extent. Be sure that normalization does not affect the performance of your database. For large and frequently accessed database normalization can cause performance downfall in some cases. So, you must analyze the design carefully to decide the level of normalization you require.
2. Writing Trigger:
In many scenarios we write triggers on database tables. In one of my optimization task I found few triggers extremely slow in performance due to some conceptual lacking of the developer. So here I want to share my experience about those triggers.
Before writing a trigger we need to understand what trigger is and how exactly it works. Trigger is actually a procedure that runs in response of an event fired due to performing some operations on database tables. The events could be insert, update or delete. Now, the question is how database handles the execution of a trigger when it fires? If you write a trigger for insert operation on a table, after firing the trigger, it creates a table named “INSERTED” in the memory. Then it performs the insert operation and after that the statements inside the trigger executes. We can query the “INSERTED” table to manipulate with the inserted row/s from the trigger. Similarly if you write a trigger for delete operation on a table, it creates a table in memory named “DELETED” and then deletes the row. More importantly you must understand how an update trigger works. After firing an update trigger it works in the following sequence:
  1. All constraints are enforced.
  2. All declarative referential integrity (DRI) constraints are enforced (via foreign keys).
  3. The inserted and deleted tables are created for use within the trigger.
  4. The triggering action (in this case the UPDATE statement) is executed.
  5. The AFTER UPDATE trigger executes.
From the above steps you can see that no table called “UPDATED” is created. Actually on database no operation called update executes. Internally it actually deletes the row to be updated and then inserts the row with the new updated value. So, from an update trigger we can access both INSERTED and DELETED table though directly we may not execute any insert or delete operation. This is a very important concept for us. Here I am providing the mistakes you may make if you are not clear on this.
Consider the following trigger:
CREATE TRIGGER TriggerName ON YourTableName
AFTER UPDATE ,INSERT , DELETE
AS
if(exists(select Contact_Id from inserted whereContact_id is not null))
begin
--Do your operation
end
if(exists(select Contact_Id from deleted whereContact_id is not null))
begin
--Do your operation
end
Here the developer wrote the trigger for all the events on the table and expecting to do some operation if Contact_ID is inserted, deleted or updated in to the table. Now note carefully the mistakes that the developer did in this trigger. For example an operation is executed on the table which updates some other field other than Contact_ID. Now if Contact_ID is a not null column of the table we will never get null from INSERTED and the DELETED table. So, here even though Contact_ID is not updated the operation of the triggers will execute. It is not finished yet. It has more problems as well. The Developer wrote the 2ndif condition assuming that DELETED table will be created only when trigger fires for any Delete operation on the table. But you see, as a matter of fact this table is also available when the trigger fires for update operation. The situation will be the worst if the developer thinks the first if statement will be successful for INSERT and UPDATE operation and 2nd if statement will be successful for DELETE operation. Only In that case he might write the same instructions to execute both if statements. This will in turn execute the same operation twice. Doesn’t it sound very silly and surprising? Yes. But if you are not careful these mistakes can happen anytime and can take you to hell from the heaven you are currently in.
So, while writing a trigger keep an eye on the following points:
i. If you write a single trigger for multiple event, be very careful to ensure that your trigger does not execute for unwanted events.
ii. When writing update trigger always check if your desired column is updated by using IF UPDATE(ColumnName).
iii. Be very careful in querying INSERTED and DELETED table.
iv. Try to avoid cursor from the trigger.
v. Ensure that your trigger is not creating any deadlock/Infinite loop on your database.
3. Using INNER JOIN Vs WHERE clause:
This is true that from performance perspective a query written using WHERE clause and a query written using INNER JOIN do not have any difference. If we watch the execution plan of both the query we will see it actually executes in the same way. Then what actually you should use for joining 2 tables? You must always use INNER JOIN not WHERE clause. That means you have to ensure that if you have n number of tables in your query you must have at least n-1 number of joins. Why? Well, there are several reasons:
i. INNER JOIN makes your query more readable and logical looking.
ii. It makes your query more manageable.
iii. It separates the joining condition from filtering condition of your query.
iv. It will make your life easier when you have complex queries with several joins.
v. It will minimize the possibility of error in the result of the query
vi. Above all, trust me, it will reduce the time for writing the query.
4. Index Creation
Creating necessary indexes on your database is very important. In one of my optimization work I found just creating 2 indexes on a table gives my query 50% performance boost. Determining the correct indexing policies is very crucial for your database. You must remember that proper indexing is good for your database performance but on the other hand creating indexes unwisely can affect your database performance significantly. Unnecessary index creates overheads on database server memory and also could result extremely slow (Sometimes never ending) query performance. So what are the basics you should follow to consider creating index? Here are some key features that makes database columns to be candidate for indexing:
i. Ensure that you have at least one primary key in your table. Sql Server automatically indexes the primary keys.
ii. All foreign keys are good candidates to be indexed.
iii. Columns on which ordering operation is done very frequently are very good candidate for indexing.
iv. Columns on which a large table is filtered or joined in different queries are also good candidates for indexing.
Remember for creating index no rule can be considered as Bible. You should always analyze the database and go for extensive testing to find if an index is helpful for your database. For better understanding you may like to do further study on it.
5. Using Temporary Table
In many stored procedures I have found we have used temporary tables without giving much thought on it. We should know that temporary tables are created in the tempdb and and any insert/update/delete operation or queering from temporary table has extra overhead as you have to access a table of out side of your database internally. So, try to avoid temporary table as long as you can. If you see there is no alternative of using this, you must define the table with CREATE TABLE statement with proper primary/unique key constraints and if required create indexes on it. This perhaps can improve the performance of your query.
6. Using Table Variable
In SQL server we have an alternative of temporary table called Table variable. Perhaps many of you are not aware of this. Table variable is created in memory and performs lot better than temporary table reducing the overhead on database. But If you need a table that will contain huge amount of data temporary table actually could be better for you. Because you cannot create index on Table variable but in temporary table you can.
7. Avoid Using Dynamic SQL:
Think twice before writing any dynamic SQL in your procedure. If possible try to avoid this. Here are few drawbacks of writing dynamic SQL:
i. It does not allow your stored procedure to be pre compiled and thus affects performance.
ii. It makes your query less manageable.
iii. It makes your query less readable.
iv. It can open your procedure for possible XSS attack.
In many cases we can avoid writing dynamic SQL. For example, in most of the cases you write dynamic SQL for using variable with TOP key word. Yes you needed to do it in SQL 2000. But now in SQL 2005 it can be achieved without dynamic query. Just put a bracket around your variable with the TOP key word and it will work. Example: SELECT TOP (@size) marks FROM result
So, the key point is you will not write dynamic query as long as it can be avoided.
8. Using Like Operator:
Filtering your query using Like key word could be very slow in performance if you query on a table with huge amount of data specially if you have other joins with this filtering criteria. So when you need to achieve the functionality of “Like” operator on a high volume table consider using Full Text query instead of that.
9. Using OR in your WHERE condition:
Be very careful in using OR in your filtering criteria’s. It can make your query significantly slow. In one of my optimization work I got a query of the following format:
FROM
Action
,Relationship rel
where
(Action.GROUP_ID=@GroupID
OR (rel.GROUP_ID=@GroupID
AND rel.CONTACT_ID= Action.CONTACT_ID
)
)
A query written in above format was taking almost 7-8 sec to execute. I just spited the query in 2 queries and united them using UNION operator. It improved the query to be executed just in few milliseconds. You see the significant performance gain you can achieve in this way? So always try to utilize UNION instead of using OR filtering criteria.
10. Use of cursor:
Probably cursor is the most dangerous tool exists in SQL that can kill the performance of your procedure. Remember in most of the cases cursors can be avoided by different kind of JOINS, sub queries etc. So think, analyze and if needed take help of others before writing any cursor in your procedure. In one of the optimization task that I worked on I improved the query performance more than 40% by removing cursor from the query.
11. Never Use SELECT * FROM TABLE:
Many of us have tendency of writing SQL using “SELECT * FROM TABLE” format. Do not do it. Even if you need all the columns to be returned, you specify all the columns by their name in your query. Moreover always ensure you are returning only those columns that you really need for your purpose – nothing more than that. This is actually a matter of habit that you can achieve just by not becoming lazy during writing SQL.
12. Print statement:
Sometimes you write lot of print statements to trace your procedure execution. Then after completing the procedure you do not delete those print statements. You should remember event for printing a single digit your processor needs some time and memory. It could be less then micro second but do not forget summation of microseconds become millisecond and summation of milliseconds become second. Yes I have gained almost 1 second performance gain just by deleting all the print statements in one of my optimization task.
13. Using SCOPE_IDENTITY, IDENT_CURRENT, and @@IDENTITY:
SCOPE_IDENTITY, IDENT_CURRENT, and @@IDENTITY are similar functions in that they return values inserted into IDENTITY columns.
IDENT_CURRENT is not limited by scope and session; it is limited to a specified table. IDENT_CURRENT returns the value generated for a specific table in any session and any scope. For more information, seeIDENT_CURRENT.
SCOPE_IDENTITY and @@IDENTITY will return last identity values generated in any table in the current session. However, SCOPE_IDENTITY returns values inserted only within the current scope; @@IDENTITY is not limited to a specific scope.
For example, you have two tables, T1 and T2, and an INSERT trigger defined on T1. When a row is inserted to T1, the trigger fires and inserts a row in T2. This scenario illustrates two scopes: the insert on T1, and the insert on T2 as a result of the trigger.
Assuming that both T1 and T2 have IDENTITY columns, @@IDENTITY and SCOPE_IDENTITY will return different values at the end of an INSERT statement on T1.
14. Using @@ERROR:
Make sure that the @@ERROR global variable is checked after every statement which causes an update to the database (INSERT, UPDATE, DELETE). Make sure that rollbacks (if appropriate) are performed prior to inserting rows into an exception table
15. Minimize using Not Equal operator:
Minimize the use of not equal operations, <> or !=. SQL Server has to scan a table or index to find all values to see if they are not equal to the value given in the expression. Try rephrasing the expression using ranges: WHERE KeyColumn < 'TestValue' AND KeyColumn > 'TestValue'
16. DONOT start the name of a stored procedure with SP_: This is because all the system related stored procedures follow this convention. Hence a valid procedure today may clash with the naming convention of a system procedure that gets bundled with a Service pack / Security patch tomorrow. Hence do not follow this convention.
17. Using Indexed View:
Sometimes we would require an view to be indexed. This feature is bundled with SQL Server 2000. The result set of the indexed view is persist in the database and indexed for fast access. Because indexed views depend on base tables, you should create indexed views with SCHEMABINDING option to prevent the table or column modification that would invalidate the view. Hence using them can reduce a lot of load on the base tables but increases the maintainability.
18. Readability:
i. Ensure the readability of your SQL.
ii. Capitalize all key word like SELECT, WHERE etc.
iii. Use camel casing in naming the variables.
iv. Breakdown your SQL in multiple lines. Never write too long line that can become tough to trace. A very well indented query could be like the following:
SELECT
c.CONTACT_ID
,u.USER_ID
FROM
Cac_Contact c
INNER JOIN
Cac_User u
ON
c.CONTACT_ID = u.CONTACT_ID
End Words:
In this article I have tried to point out the core best practices that you should follow whenever you write any SQL. I have come to learn all of these throw experience rather than study documents or browsing internet. So, I hope this could be practical guide line for all of us in our development to ensure best performance SQL query. Above all you all should remember to ensure the best performance, testing during development has no alternative. So keep tracing your procedures in different scenarios and with different data load so that it can pass any future situation. Be accustomed of using SQL profiler to analyze your query and always keep an eye on the exaction plan that query analyzer generates. I believe above everything patience is the most important thing to improve your SQL’s performance. So don’t get hurry to meet your deadline, because it can cause expensive optimization work to come in front. Rather you take your time to write the best SQL, discuss with your team mates, managers if needed and ensure that you
are writing the best possible implementation of your query.
Nb: Your important feedbacks and contributions on this article are most welcome. Please write mahmud.dev@gmail.com or talk with me verbally for that.