Index Basics:
Clustered Index: clustered index will arrange the pages of
table using clustered key. It will arrange the entire physical table either in ascending
or descending order in the disk.
If there is no
clustered index on a table then it is called as heap. By default when a row is
inserted the row identifier will allocate 8 KB Space. For small table index and
table scan will perform the same.
For larger table, the
index seek will perform better than the table scan.
Clustered index leaf
nodes contain data pages of those tables. It uses binary tree mechanism.
Only one clustered
index can be created on a table. Clustered index can be created using 16
columns (maximum).
Non Clustered indexes can also be used for improving query performance. Non- clustered index will store the
information for only the columns used in indexes. The leaf node of non-clustered
index stores the row identifier if there is no clustered index on the table
else it will stores the clustered key. Row identifier stores the information
about data pages. Page contains 8 KB. The sum of 8 pages can be called as
extent.
As per books online
you can create 999 non- clustered indices per table.
Indexes will help us
to improve query performance. It will be mainly used for selecting the
data. I will show here how indexes will
improve the query performance. The down side of indexes is it will allocate
extra space in database. If Inserts, Updates and delete are more it will create
blocking, fragmentation, page splits and deadlocks.
Covering index is a
non-clustered that satisfies the selectivity of the query.
Filtered index is a
non-clustered index with where clause.
IF we create a
clustered index on a view with schembinding then that index is called as
indexed view.
I have created the
table EMP table with the following details.
create table emp (id int identity ,name varchar(200),Datejoined date default getdate()) .
Let us populate some
test data.
insert
into emp (name)
select
(name)
from sys.all_parameters
union
all
select
name from sys.all_objects
union
all
select
name from sys.all_columns
union
all
select
name from sys.indexes where name is not null
union
all
select
name from sys.databases
GO
Suppose we have a
search condition to fetch emp table details based on id column as shown in
below query.
select
id,name,datejoined from
emp where id =
1
In order verify the
above query performance we can use the three set statistics statements.
set
statistics io on|off
set statistics time on
set statistics xml on
The above statistics
statement is used for identifying IO
requests like scan count, logical reads etc.
Scan count refers number of times the table scanned or index scanned.
Logical reads is used for fetching the data pages from memory
set
statistics io on
set
statistics time
on
set
statistics xml on
select
id,name,datejoined from
emp where id =
1
set
statistics time
off
set
statistics xml off
set
statistics io off
See
the execution plan below
There
is no clustered index on the table. So it is showing the table scan.
Execution
statistics
SQL Server
Execution Times:
CPU time = 0 ms, elapsed time = 4 ms.
SQL Server
parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(1 row(s)
affected)
Table 'emp'.
Scan count 1, logical reads 4365, physical reads 0, read-ahead reads 0, lob
logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s)
affected)
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 178 ms.
The above statistics shows
that query optimizer scanned entire table. Logical reads are high.
Now we will create the
clsutered index on emp table.
create unique clustered index IXUC_emp_ID ON
Emp (id)
Now
verify the execution plan below
Now query optimizer is
using clustered index seek.
Verify the statistics
time and IO
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server
parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
(1 row(s)
affected)
Table 'emp'.
Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob
logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s)
affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Now the query executed
is 0 ms. Only few logical reads required
to fetch the data.
There is no table
scan. Scan count is zero. So we have optimized the query using the clustered
index.
But there is
limitation to clustered index. We can create only one clustered index per
table.
Suppose we have a
query for fetching the employee details using the employee name.
set
statistics io on
set
statistics time
on
set
statistics xml on
select
id,name,datejoined from
emp where name =
'emp'
set
statistics io off
set
statistics time
off
set statistics xml off
Execution statistics
for the above query
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server
parse and compile time:
CPU time = 0 ms, elapsed time = 15 ms.
(2 row(s)
affected)
Table 'emp'. Scan count 1, logical reads 140, physical reads
0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
(1 row(s)
affected)
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 39 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
The above statistics shows that query optimizer used the
clustered index to scan enitre. Because in where clause we have used name
column but we have created the clustered index on ID column.
Scan count 1 means query optimizer scanned Clustered index
one time.
Logical reads 140 means query optimizer used 140 data pages
to fetch the data.
Execution plan for the above query
i.
Clustered
index scan is almost equal to table scan.
ii.
Optimizer
will not choose the clustered index if the clustered key not exists in where
clause of the query.
iii.
In order
improve query performance we can use the non-clustered index because we can
create only one clustered index per table.
Create the non-clustered
index using the below query
create nonclustered index IXUNC_emp_name ON
Emp (name)
Now execute the same query and verify plan and
the statistics
Now query for cost for
index seeks is 28% and key lookup is 72 %. Even though we have created the
non-clustered index on name column, we are retrieving two more columns id,
datejoined in select clause of the query.
Non clustered index will store data of the non-clustered key columns.
Non-clustered index leaf node contains the clustered key if clustered index
exists on a table. So for fetching data about the two columns id,datejoined
using non clustered index.
DROP INDEX
[IXUNC_emp_name] ON
[dbo].[emp]
GO
CREATE NONCLUSTERED INDEX [IXUNC_emp_name]
ON [dbo].[emp]
(
[name] ASC
)
INCLUDE
(
id,
datejoined
)
Execution plan for the above query
Execution statistics
for the above query
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parses and compiles
time:
CPU time = 0 ms, elapsed time = 0 ms.
(1 row(s) affected)
Table 'emp'. Scan count 1,
logical reads 3, and physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, and lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Using the above
covering index we can improve the query performance.
No comments:
Post a Comment