Wednesday, April 9, 2014

Improve query performance using index in SQLServer

Index Basics:
Clustered Index: clustered index will arrange the pages of table using clustered key. It will arrange the entire physical table either in ascending or descending order in the disk.
If there is no clustered index on a table then it is called as heap. By default when a row is inserted the row identifier will allocate 8 KB Space. For small table index and table scan will perform the same.

For larger table, the index seek will perform better than the table scan.
Clustered index leaf nodes contain data pages of those tables. It uses binary tree mechanism.
Only one clustered index can be created on a table. Clustered index can be created using 16 columns (maximum).

Non Clustered indexes can also be used for improving query performance.  Non- clustered index will store the information for only the columns used in indexes. The leaf node of non-clustered index stores the row identifier if there is no clustered index on the table else it will stores the clustered key. Row identifier stores the information about data pages. Page contains 8 KB. The sum of 8 pages can be called as extent.

As per books online you can create 999 non- clustered indices per table.

Indexes will help us to improve query performance. It will be mainly used for selecting the data.  I will show here how indexes will improve the query performance. The down side of indexes is it will allocate extra space in database. If Inserts, Updates and delete are more it will create blocking, fragmentation, page splits and deadlocks.

Covering index is a non-clustered that satisfies the selectivity of the query.
Filtered index is a non-clustered index with where clause.

IF we create a clustered index on a view with schembinding then that index is called as indexed view.

I have created the table EMP table with the following details.

create table emp (id int identity ,name varchar(200),Datejoined date default getdate()) .

Let us populate some test data.

insert into emp (name)
select (name) from sys.all_parameters
union all
select name from sys.all_objects
union all
select name from sys.all_columns
union all
select name from sys.indexes where name is not null
union all
select name from sys.databases
GO

Suppose we have a search condition to fetch emp table details based on id column as shown in below query.

select id,name,datejoined from emp where id = 1

In order verify the above query performance we can use the three set statistics statements.

set statistics io on|off
set statistics time on
set statistics xml on 

The above statistics statement is used for identifying IO requests like scan count, logical reads etc.
Scan count refers number of times the table scanned or index scanned.
Logical reads is used for fetching the data pages from memory

set statistics io on
set statistics time on
set statistics xml on
select id,name,datejoined from emp where id = 1
set statistics time off
set statistics xml off
set statistics io off


See the execution plan below






There is no clustered index on the table. So it is showing the table scan.

Execution statistics


SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 4 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.

(1 row(s) affected)
Table 'emp'. Scan count 1, logical reads 4365, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 178 ms.


The above statistics shows that query optimizer scanned entire table. Logical reads are high.
Now we will create the clsutered index on emp table.
create unique clustered index  IXUC_emp_ID ON Emp (id)

Now verify the execution plan below






Now query optimizer is using clustered index seek.
Verify the statistics time and IO

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 1 ms.

(1 row(s) affected)
Table 'emp'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Now the query executed is 0 ms.  Only few logical reads required to fetch the data.
There is no table scan. Scan count is zero. So we have optimized the query using the clustered index.
But there is limitation to clustered index. We can create only one clustered index per table.
Suppose we have a query for fetching the employee details using the employee name.


set statistics io on
set statistics time on
set statistics xml on
select id,name,datejoined from emp where name = 'emp'
set statistics io off
set statistics time off
set statistics xml off

Execution statistics for the above query


 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 15 ms.

(2 row(s) affected)
Table 'emp'. Scan count 1, logical reads 140, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 39 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

The above statistics shows that query optimizer used the clustered index to scan enitre. Because in where clause we have used name column but we have created the clustered index on ID column.

Scan count 1 means query optimizer scanned Clustered index one time.
Logical reads 140 means query optimizer used 140 data pages to fetch the data.


  Execution plan for the above query




       i.          Clustered index scan is almost equal to table scan.
      ii.          Optimizer will not choose the clustered index if the clustered key not exists in where clause of the query.
     iii.          In order improve query performance we can use the non-clustered index because we can create only one clustered index per table.

Create the non-clustered index using the below query

 create nonclustered index IXUNC_emp_name ON Emp (name)

 Now execute the same query and verify plan and the statistics





Now query for cost for index seeks is 28% and key lookup is 72 %. Even though we have created the non-clustered index on name column, we are retrieving two more columns id, datejoined in select clause of the query.  Non clustered index will store data of the non-clustered key columns. Non-clustered index leaf node contains the clustered key if clustered index exists on a table. So for fetching data about the two columns id,datejoined using non clustered index.

 DROP INDEX [IXUNC_emp_name] ON [dbo].[emp]
GO
CREATE NONCLUSTERED INDEX [IXUNC_emp_name] ON [dbo].[emp]
(
       [name] ASC
)
INCLUDE
(
id,
datejoined

)


  Execution plan for the above query

















Execution statistics for the above query

SQL Server Execution Times:
   CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parses and compiles time:
   CPU time = 0 ms, elapsed time = 0 ms.

(1 row(s) affected)
Table 'emp'. Scan count 1, logical reads 3, and physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, and lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms, elapsed time = 0 ms.




Using the above covering index we can improve the query performance.













No comments: