In-Depth
Achieving Optimal Database Application Performance: Benchmark Tips and Techniques (Part 3 of 3)
In the last part of our three-part excerpt adapted from The Data Access Handbook, authors John Goodson and Rob Steward conclude their discussion of nine guidelines that will help you create meaningful, useful benchmarks in your own organization.
Excerpted from The Data Access Handbook by John Goodson and Rob Steward; published by Prentice-Hall; used by permission. For more information, visit http://dataaccesshandbook.com/
As you'll recall, last week (http://esj.com/articles/2009/09/15/achieving-database-performance-2.aspx) we pointed out that standardized benchmarks are available for measuring the performance of databases (TPC, for example) and application servers (SPECjAppServer, for example), but standard industry benchmarks for measuring data access code and database middleware don’t exist.
To help you develop benchmarks that measure these important factors, we presented nine guidelines:
- Define benchmark goals
- Reproduce the production environment
- Isolate the test environment
- Reproduce the workload
- Measure the right tasks
- Measure over a sufficient duration
- Prepare the database
- Make changes one at a time
- Assess other factors
We explored the first four guidelines in the second part of our series. In this article, we focus on the last five guidelines.
Measure the Right Tasks
Not all tasks that a database application performs are equally important. For example, a mail-order company that accepts orders over the phone may require a quick response time when referencing inventory availability to minimize the wait for the customer on the phone. That same company may not care as much about the response time required for the actual order to be processed. Ask your user liaisons what tasks are most important to them, and make testing of those tasks a priority.
Make sure the benchmark makes the same API calls your database application makes. For example, we often see benchmarks that execute a query and retrieve a result set but do nothing with the data. Of course, this would never happen in a real-world application. For example, suppose you are tasked to design a benchmark that measures the time it takes for a JDBC application to process 50,000 rows. Let’s take a look at the following simple benchmark.
Statement stmt = con.createStatement();
\\Get start time
resultSet = stmt.executeQuery(“SELECT acct.bal
FROM table”);
while (resultSet.next())
{}
\\Get finish time
Notice that the statement is opened and executed but is never closed, so resources are not released. Also, notice that the application positions a cursor on a row in the result set, but it subsequently ignores the data in that row. Different database drivers optimize retrieving data from network buffers and convert data at different times. For example, some drivers retrieve all requested data when a query is executed; others don’t. Other drivers leave some data in the network buffer on the database server until the application actually requests that data. If you don’t realize that this type of optimization occurs, you wouldn’t know that results generated by the previous benchmark code would be greatly influenced by which driver you use.
Although these lapses in real-world modeling may not seem like a big deal, they can add up to make a big difference in performance. For most applications, 75 percent to 95 percent of the time it takes to process data is spent in the database middleware and on the network. The difference between 75 percent and 95 percent can represent a big disparity in your application’s performance.
Let’s rewrite the benchmark to reflect how the application would work in the real world.
Statement stmt = con.createStatement();
\\Get start time
resultSet = stmt.executeQuery(“SELECT acct.bal
FROM table”);
while (resultSet.next()) {
int id = resultSet.getInt(1);
}
resultSet.close();
\\Get finish time
Also, exclude writing output from your benchmark timings. For example, suppose that your benchmark writes data to a console so that you can verify the results of each Select statement. For example, what if your benchmark includes the following line of code:
System.Console.WriteLine(“Value of Column 2: “ +
dataReader.GetInt32(2));
If done once, it may add only a second or two to your benchmark results, but if done repeatedly, that time can add up, skewing your true results. Make sure that the console output occurs outside your timing loop.
Measure over a Sufficient Duration of Time
Design benchmarks so that they measure tasks over a sufficient duration. Benchmarks that are run over short durations make it difficult to reproduce meaningful and reliable results because:
- They produce results that often do not scale. In most cases, you cannot extrapolate the results from a short duration and apply them to the larger context of your application.
- Computer system clocks, used to time benchmark runs, are notoriously imprecise because of design limitations, temperature changes, and diminished battery voltage over time. In fact, time kept by computer system clocks can fluctuate from the real time as much as several minutes a day. If a benchmark is run over a short duration, perhaps 10 seconds or less, the drift caused by a system clock can produce inconsistent results.
- Factors such as Java class loaders and the .NET Just-in-Time (JIT) compiler cause application start-up performance costs that skew performance results over short durations.
For example, suppose you want to measure the throughput of an application that retrieves 1,000-byte rows from a database table containing a million rows. First, the benchmark is run over 5 seconds, resulting in a throughput of 5 rows per second. What if another short-term process running in the background caused a “blip” in the system during that 5 seconds? You could run the same benchmark a second time for 5 seconds, and the outcome may result in a completely different metric -- for example, 10 rows per second, which is a huge variance on this scale.
However, if you run the same benchmark again for 100 seconds, the throughput result is a more useful and reliable metric -- for example, 30,000 rows per second -- because any blips caused by another service running are averaged over a longer period.
Similarly, a system clock used to measure a benchmark can experience blips in its timekeeping that cause the clock to drift suddenly. For example, suppose that you run a benchmark over 5 seconds and a blip occurs causing the system clock to drift by 500ms. That’s a significant difference that you may not even realize occurred. Running the benchmark for a sufficient duration -- 100 seconds, for example -- ensures that any system clock blips are averaged over a longer period.
Prepare the Database
Because disk I/O is much slower than memory I/O, any time the database retrieves data from or stores data to the disk on the database server, performance degrades significantly. The first time the application accesses a table row in a database, the database places a copy of the row on disk into a fixed-length block of memory known as a page. If the database can find the requested data on a page in memory when subsequent data requests are processed, the database optimizes its operation by avoiding disk I/O.
When the database fills up a page with data, it creates a new page. The pages in memory are ordered from MRU (Most Recently Used) to LRU (Least Recently Used). If the allocated memory buffer becomes full, the database makes room for a new page by discarding the LRU page. This method of memory management counts on the fact that the LRU page will probably not be needed any time soon.
When your application retrieves, inserts, or updates data in the real world, typically, the database has been running for some time, allowing your application to access data in memory. Running the benchmark at least once without timing it allows the database to place some, or possibly all, the data you will be working with in memory where it can be accessed on subsequent runs of the benchmark. This also helps model how your application will run in your production environment because applications typically access the same tables over and over.
Make Changes One at a Time
The most important guideline to remember when running a benchmark is that a seemingly insignificant change can have a dramatic effect on performance. It’s crucial that you can demonstrate whether any change has a positive or negative impact; otherwise, your benchmark efforts are useless. With this in mind, make sure that you only change one variable at a time when you run the benchmark.
For example, suppose you want to explore the effect of setting two different connection options in your database driver. Instead of making both changes at once, make the changes one at a time and rerun the benchmark after each change. If you make both changes at once, how do you know if either change made a difference? Suppose one change has a positive effect on performance and a second change has a negative effect, cancelling any performance gain caused by the first change. How would you know if either change was good or bad?
Assess Other Factors
If your application does not perform in the real world as your benchmark predicted, what then? Look at external influences such as corporate traffic slowing down the network or excessive CPU and memory usage on the client or database server.
In addition, be aware that some tasks (such as stored procedures or application-side caching of a large amount of data) can mask the performance of the network, database middleware, or application code. For example, if the execution of a stored procedure takes 50 times longer than retrieving the data, changing the application code or database driver so that it’s 100 percent faster at data retrieval would not show a noticeable difference in the benchmark results because most of the time processing the data requests is spent on the database server. Using tools such as code profilers and network sniffers to create a log that times each operation can tell you where processing time is being spent and help you make educated guesses about where your efforts can improve performance.
In summary, when developing any benchmark, consider the following guidelines:
- Define benchmark goals to align with your business needs
- Measuring CPU and memory use can be a good predictor of scalability, but make sure that you do it appropriately so your results are useful.
- Reproduce the essential characteristics of both your production environment and your production workload.
- Isolate your test environment from outside environmental influences.
- Measure the right tasks over a sufficient duration of time.
- Prepare your database before measuring by running the benchmark at least once, and introduce only one change at a time into the benchmark.
Rob Steward has spent more than 15 years developing database access middleware, including .NET data providers, ODBC drivers, JDBC drivers, and OLE DB data providers. He is currently vice president of research and development for DataDirect Technologies (http://www.datadirect.com). John Goodson is the executive lead at DataDirect Technologies. For more than 15 years, he has worked closely with Sun Microsystems and Microsoft on the development and evolution of database connectivity standards including J2EE, JDBC, .NET, ODBC, and ADO.NET.