In this post I am going to discuss about what is covering index and how INCLUDE column enhance the performance of Non Clustered index.
What is covering Index:
Index with INCLUDE clause is the covering index. Covering index is basically used to cover the query(include columns from Select list which are not part of index) and to avoid bookmark lookup. INCLUDE clause applicable for Non Clustered index only , because Clustered index holding all the data in leaf level pages.
An index with nonkey columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index.
The INCLUDE clause adds the data at the lowest/leaf level, rather than in the index tree. This makes the index smaller because it's not part of the tree
Lets take a simple example to understand the concept.
We have a table called dbo. Customer (Ref: AdventureWorksDW2012 Database ). Initially we do not have any Clustered or NC index created. It’s a HEAP table.
Pic-1 |
As shown in this Pic-1 the Index Type is HEAP, means no index exists in this table.
Lets create a NC index on CustomerKey column.
-- Create a Non clustered index
CREATE NONCLUSTERED INDEX IDX_NonClusteredIndex ON [dbo].[Customer] (CustomerKey);
GO
CREATE NONCLUSTERED INDEX IDX_NonClusteredIndex ON [dbo].[Customer] (CustomerKey);
GO
Now the NC index have only CustomerKey in the index tree. And the leaf level of the index holding the Non Clustered Key and RID (No Clustered index in this table as of now) to point the actual data pages.
Now let's Select CustomerKey, CustomerName and PINCode and use CustomerKey in the where clause.
If you see the execution plan the optimizer choose NC index seek operation. NC index can give only customerKey, but for rest of the columns that are in select list optimizer need RID Lookup or Key Lookup to get all the required column.
RID Lookup (In HEAP table) if no clustered index exists in the table or Key Lookup if any Clustered index exists in the table.
Here in this case RID Lookup chosen by optimizer, because as of now we do not have any Clustered Index created in this table.
Pic-2 |
Pic-3 |
Pic-4 |
Now I am going to create a Clustered Index on the same table and check the execution plan.
-- Create a clustered index
CREATE CLUSTERED INDEX IDX_ClusteredIndex ON [dbo].[Customer] (CustomerAlternateKey);
GO
Pic-5 |
Pic-5 shows the pages distribution after creating the Clustered Index.
Now again we Select these columns (as shown in Pic-6) and use Non Clustered Index column in the where clause
Pic-6 |
In Pic-6 you can see the optimizer choose Key Lookup to fetch rest of the required result set after creating the Clustered Index on the table instead of RID Lookup in case of HEAP table.
Pic-7 |
Pic-8 |
But in both the cases optimizer need additional page read. In case of only presence of Non Clustered Index, optimizer choose RID Lookup and choose key Lookup if table has a Clustered Index. Also you can see the no. of pages read in each cases.
To remove the extra page read by optimizer, SQL server introduce INCLUDE COLUMN in the Non Clustered index. We can INCLUDE those non key columns here that are frequently used in the Select list but remember these columns are not participating in the where/Join/Group/Order By clause.
Before creating a Non Clustered Index with INCLUDE column, just drop the existing Non Clustered index.
-- Create a Non clustered index with INCLUDE clause
CREATE NONCLUSTERED INDEX IDX_NonClusteredIndex ON [dbo].[Customer] (CustomerKey)
INCLUDE (FirstName,Lastname,EmailAddress)
GO
After creating NC index with INCLUDE column , lets see the pages distribution
Pic-9 |
In Pic-9 you can see the no. of pages at leaf level increased by 240 instead of 78 (without INCLUDE column in Pic-5). No. of pages increased or decreased at leaf level of Non Clustered Index based on the no. of columns used in the INCLUDE column list.
One important things here is INCLUDE Columns are not participate in index ROOT and Intermediate levels. Those columns are only occupy space in leaf level of the index tree.
You would use the INCLUDE to add one or more columns to the leaf level of a non-clustered index, if by doing so, you can "cover" your queries.
Obviously, you cannot include every column in every non-clustered index - but if you do have queries which are missing just one or two columns to be "covered" (and that get used a lot), it can be very helpful to INCLUDE those into a suitable non-clustered index.
Now we try to execute the same query and let's check the execution plan.
Pic-10 |
In Pic-10 you can see, No Lookup operation happened. I Mean to say required requested column in the select list satisfied by Non Clustered index, so no need to read extra pages to fetch those additional details. Also see the pages read (In Pic-12) by optimizer to satisfy the query result set.
Pic-11 |
Pic-12 |
No comments:
Post a Comment