道风的身份:Measuring SQL Performance

来源:百度文库 编辑:九乡新闻网 时间:2024/05/05 18:28:44

Measuring SQL Performance

By Christoffer Hedgate, 2005/12/23 (first published: 2004/04/02)


One thing that often amazes me is that many SQL Server developers do not actually measure the performance of their queries. When you are working with a small site or home project you might not see a big difference, but when implementing systems with large amounts of users and high levels of traffic you can not just settle with the fact that your query returns the expected result. You must also make sure that your queries use the least amount of resources and execute as quickly as possible. Sure, you can read articles and literature that describe how to write queries that perform well, but you can still not be sure that they work in an optimal way for your specific situation. After all, different schema designs, amounts of data, hardware resources etc all affect how a query performs. And one of the problems with SQL is that you can write the same query (i.e. that return the same results) in many different ways, and the performance of these different formulations will often differ as well. When I started investigating why some developers did not compare the performance of their queries it became clear to me that the main reason is that they do not know how to do this in an easy way. Many of them thought that you needed external tools, more or less complicated, to run against your server, and they did not have the time or inclination to learn and try these. This article will describe a couple of much easier methods of measuring performance of queries.

Time

The most simple way to measure performance is of course to measure the time it takes to execute a query. If you have not already noticed it, take a look in the status bar at the bottom-right corner of Query Analyzer. There you will find a timer that shows how many hours:minutes:seconds it took for a query (or rather the entire script) to execute. This is of course not a very exact measurement. Most queries you want to measure will probably not take more than a second to run, in a high-traffic environment they should probably execute in milliseconds if they are correctly optimized. So you need a better instrument to measure the execution time for a query.

Another, and better way to measure the amount of time it takes for a query to execute is to use the built-in function GETDATE(). Example 1 show how you can do this. The example uses the command WAITFOR to make the query execution 'stand still' for as long as we specify with DELAY. By first storing the present date and time when the execution begins and then comparing this to what it is when the execution is finished we can get a more exact measurement with milliseconds specified. Note however that the time is only specified down to 1/300 of a second (i.e. 3.33 ms). So, if a query takes 40 ms to execute that means somewhere between 40-43 ms.

-- Example 1
DECLARE @start datetime, @stop datetime
SET @start = GETDATE()
 
WAITFOR DELAY '00:00:00.080' -- do not do anything for 80 ms
 
SET @stop = GETDATE()
 
SELECT 'The execution took ' + CONVERT(varchar(10), DATEDIFF(ms, @start, @stop)) + ' ms to finish'

STATISTICS TIME

The best way to measure time however is to use the configuration setting SET STATISTICS TIME. The syntax for this is as shown below:

SET STATISTICS TIME {ON | OFF}

When this parameter is set to on the results pane of Query Analyzer will show statistics for the time it took to execute a query. Note that if you are running QA in grid mode you will need to switch to the Messages tab to see this. Example 2 demonstrates this:

-- Example 2
USE Northwind
GO
SET STATISTICS TIME ON 
SELECT * FROM orders

In my results pane I get the following text:

SQL Server Execution Times:
  CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server parse and compile time: 
CPU time = 0 ms, elapsed time = 0 ms.
(830 row(s) affected) 
SQL Server Execution Times:
CPU time = 30 ms,  elapsed time = 500 ms.

At first glance this might seem complicated to understand, but more or less the only thing you need to do is to look for the row with SQL Server Execution Times that is printed right after the text that specifies the number of affected rows. Above this you can see the time it took to parse and compile the query, but that time is not what we are interested in here. Most of the times this will be 0 ms if you run the same query several times in a row since the execution plan will already be cached. As said earlier, what we are looking for is the time it took to execute the query. In the example above it needed 30 ms of CPU time, but the total amount of time needed was 500 ms (try replacing the WAITFOR statement in example 1 with the select statement in example 2 and see if GETDATE gives you the same measurement). But if CPU time was only 30 ms, then where are the remaining 470 ms? The answer for this is I/O.

STATISTICS IO

As you probably know I/O is short for Input/Output. You could say that it means reading/writing resources, and normally you mean reading/writing from/to disk or memory. Very simply described, SQL Server needs to have the data pages containing the data to return to the client stored in memory (RAM). If they are not already cached there they must first be read from disk where they are physically stored and then placed in memory, from where they can then be returned to the client. The data pages will then be cached in memory for an unspecified time, which depending on several factors can range from 0 - ~ (indefinitely). Therefore a query might need more time to execute the first time you execute it, and because of this you should always execute the query a couple of times when measuring performance for it.

It is not only the time it takes for a query to execute that is interesting when measuring performance. Equally important (and often even more) is the amount of system resources that is needed to execute it. Since I/O is normally the slowest part of a query, especially if physical disk access is needed, it is very important to know the amount of I/O resources needed to execute it. The way to measure this is to use another configuration setting called SET STATISTICS IO. The syntax for this is similar to that of SET STATISTICS TIME:

SET STATISTICS IO {ON | OFF}

The result however is different. Again, look in the text of the results pane in QA. I executed example 2 a couple of times and the result is shown below:

Table 'Orders'. Scan count 1, logical reads 22, physical reads 0, read-ahead reads 0.

First we have the table name. Then comes the number of time this table was scanned, or rather accessed, to fetch the result of the query. The next parts tells us how many pages (data and/or index) that were read from the cache in memory to fetch the results, how many pages that were read from disk and the final number called read-ahead reads shows how many pages were placed into the cache for the query. The numbers you should normally look at is logical and physical reads plus scan count, and they should all of course be as low as possible. It might be better to have 100 logical reads than 10 physical reads since it is faster to read from memory, but generally speaking they should both be as low as possible. If you execute a query a couple of times physical reads will often be 0 since the data pages will already be cached after the first execution. Use these numbers to compare the resources needed when executing the same query formulated in different ways.

Other tools

With the above mentioned tools you have a good way of deciding which of several different versions of a query you should use to get the results you want in an optimal way. There are lots of other tools available as well, but I will not discuss them in this article. If you want to experiment with them yourself I would recommend you take a look at the following tools:

  • Show execution plan: by pressing Ctrl-K you get an extra tab when executing queries in QA. This tab shows a graphical representation of the execution plan used by SQL Server to execute your query. There is lots of information in Books Online about how to use the information shown there, as well as articles online.
  • SET STATISTICS PROFILE: This configuration option gives you a textbased variant of the execution plan.
  • SET SHOWPLAN_ALL and SET SHOWPLAN_TEXT: These options both present information regarding the resources and execution plan that would be used to execute the query, without actually executing it.
  • Profiler, Sysmon (Performance Monitor) and other external applications: Finally there are several external applications that can be used to measure and show different events and measurements in SQL Server and the system under execution. Profiler, a tool in the SQL Server client tools pack, connects to SQL Server and log all kind of diffenent events that occur, and Sysmon can of course be used to log measurements for a huge amount of performance counters both for SQL Server and the system as a whole.