INF: Optimizer Index Selection with Stored Procedures

ID: Q89385


The information in this article applies to:


SUMMARY

SQL Server incorporates an intelligent, cost-based query optimizer which, given a user's query, can quickly determine the best access plan for returning the data. The optimizer is invoked for every query that is sent to SQL Server, and its primary goal is to minimize execution time, which generally has a direct link to minimizing page accesses. The optimizer functions slightly differently when stored procedures are executed, as opposed to when ad-hoc queries are submitted; however, understanding these differences will help ensure consistent performance regardless of how the query is sent to the server.


MORE INFORMATION

When a user creates a stored procedure, a normalized form of the query, called the "query tree," is stored in the sysprocedures table. Upon first execution of this procedure, the query tree is read into the procedure cache and a "query plan" is created for the procedure. The query plan is the optimized path to the data, produced in part by the optimizer. As long as this plan remains in the procedure cache, all subsequent executions of the procedure will use the same plan. This point becomes very important when dealing with stored procedures that have queries whose WHERE clauses depend on parameters passed to the procedure.

For example, suppose a table exists with the following structure:


   CREATE TABLE Member_Info
      (member_no int,
      l_name varchar(20),
      f_name varchar(10),
      group_name char(10) NULL
      ) 

Assume that two indexes have been defined on this table: "C_Idx" is a unique, clustered index on the member_no column, and "NC_Idx1" is a nonunique, nonclustered index on the group_name column. For this example, assume the table has 10,000 rows, with member_no numbered consecutively from 1 to 10,000.

A procedure is created that accepts an input parameter and returns all rows in the table that have a member_no value larger than that parameter:

   CREATE PROCEDURE GetNLargest @var1 int
   AS
   SELECT *
   FROM Member_Info
   WHERE Member_no > @var1 

The plan that is generated for this procedure will be based on the value of @var1 that was used the first time the procedure was executed. For example, if the procedure is executed with a parameter of 9900, the optimizer will choose to access the data through the clustered index (to verify this, execute SET SHOWPLAN ON prior to executing the procedure).

If the procedure is then executed with a parameter of 1, it will use the same plan, and thus access the data through the clustered index. Because all but the first row will need to be returned in this case, it would make more sense from a performance standpoint to start at the first page of the table and scan each row on each page, rather than incur the extra overhead of searching the index tree. To generate a new query plan for the procedure, execute the procedure using the WITH RECOMPILE option. This causes the optimizer to generate a new query plan for the procedure based on the current parameter. In general, any time you suspect that the query plan may not be the optimum one for the given parameter(s), executing the procedure using WITH RECOMPILE will ensure that the plan used is the most efficient one based on the given parameter.

Another factor that can affect the plan generated for a stored procedure is the data type the procedure's parameter is defined as. It is good practice to always define the parameter as the same data type as the column it refers to in the table. This includes user-defined data types: if the member_no column was defined as type "mem_type", and mem_type was defined as "int", the parameter to the procedure should be defined as type "mem_type", not "int". By doing this, the optimizer will know that the values it is comparing are the same type and will not mistakenly instruct SQL Server to convert the parameters when the query is executed.

There is one exception to the above guideline. When a column is defined as type "char(n) NULL", SQL Server internally represents that column as though it were defined as "varchar(n) NULL". Thus, there is a distinct difference between the types "char(n)" and "char(n) NULL". Given this fact, when a table has a column defined as type "char(n) NULL", a stored procedure parameter that will refer to that column should be defined as "varchar(n)" to ensure that the optimizer recognizes the two items as having the same data type.

Additional query words: 4.20 datatype


Keywords          : kbother 
Version           : 4.2
Platform          : OS/2 
Issue type        : 

Last Reviewed: March 12, 1999