INF: Optimizer Index Selection with Stored Procedures
ID: Q89385
|
The information in this article applies to:
-
Microsoft SQL Server for OS/2, version 4.2
-
Microsoft SQL Server version 4.2x
SUMMARY
SQL Server incorporates an intelligent, cost-based query optimizer
which, given a user's query, can quickly determine the best access
plan for returning the data. The optimizer is invoked for every query
that is sent to SQL Server, and its primary goal is to minimize
execution time, which generally has a direct link to minimizing page
accesses. The optimizer functions slightly differently when stored
procedures are executed, as opposed to when ad-hoc queries are
submitted; however, understanding these differences will help ensure
consistent performance regardless of how the query is sent to the
server.
MORE INFORMATION
When a user creates a stored procedure, a normalized form of the
query, called the "query tree," is stored in the sysprocedures table.
Upon first execution of this procedure, the query tree is read into
the procedure cache and a "query plan" is created for the procedure.
The query plan is the optimized path to the data, produced in part by
the optimizer. As long as this plan remains in the procedure cache,
all subsequent executions of the procedure will use the same plan.
This point becomes very important when dealing with stored procedures
that have queries whose WHERE clauses depend on parameters passed to
the procedure.
For example, suppose a table exists with the following structure:
CREATE TABLE Member_Info
(member_no int,
l_name varchar(20),
f_name varchar(10),
group_name char(10) NULL
)
Assume that two indexes have been defined on this table: "C_Idx" is a
unique, clustered index on the member_no column, and "NC_Idx1" is a
nonunique, nonclustered index on the group_name column. For this
example, assume the table has 10,000 rows, with member_no numbered
consecutively from 1 to 10,000.
A procedure is created that accepts an input parameter and returns all
rows in the table that have a member_no value larger than that
parameter:
CREATE PROCEDURE GetNLargest @var1 int
AS
SELECT *
FROM Member_Info
WHERE Member_no > @var1
The plan that is generated for this procedure will be based on the
value of @var1 that was used the first time the procedure was
executed. For example, if the procedure is executed with a parameter
of 9900, the optimizer will choose to access the data through the
clustered index (to verify this, execute SET SHOWPLAN ON prior to
executing the procedure).
If the procedure is then executed with a parameter of 1, it will use
the same plan, and thus access the data through the clustered index.
Because all but the first row will need to be returned in this case,
it would make more sense from a performance standpoint to start at the
first page of the table and scan each row on each page, rather than
incur the extra overhead of searching the index tree. To generate a
new query plan for the procedure, execute the procedure using the WITH
RECOMPILE option. This causes the optimizer to generate a new query
plan for the procedure based on the current parameter. In general, any
time you suspect that the query plan may not be the optimum one for
the given parameter(s), executing the procedure using WITH RECOMPILE
will ensure that the plan used is the most efficient one based on the
given parameter.
Another factor that can affect the plan generated for a stored
procedure is the data type the procedure's parameter is defined as. It
is good practice to always define the parameter as the same data type
as the column it refers to in the table. This includes user-defined
data types: if the member_no column was defined as type "mem_type",
and mem_type was defined as "int", the parameter to the procedure
should be defined as type "mem_type", not "int". By doing this, the
optimizer will know that the values it is comparing are the same type
and will not mistakenly instruct SQL Server to convert the parameters
when the query is executed.
There is one exception to the above guideline. When a column is
defined as type "char(n) NULL", SQL Server internally represents that
column as though it were defined as "varchar(n) NULL". Thus, there is
a distinct difference between the types "char(n)" and "char(n) NULL".
Given this fact, when a table has a column defined as type "char(n)
NULL", a stored procedure parameter that will refer to that column
should be defined as "varchar(n)" to ensure that the optimizer
recognizes the two items as having the same data type.
Additional query words:
4.20 datatype
Keywords : kbother
Version : 4.2
Platform : OS/2
Issue type :
Last Reviewed: March 12, 1999