Downright's WebSledge Testing
Services
UNNAMED CLIENT Stress Test Report
Introduction
Configuration Tested
Testing Protocol
Testing Results
Observations
Recommendations/Summary

1.0 Introduction
This document captures the observations, results, and
recommendations of the UNNAMED CLIENT Stress Testing engagement
conducted May 10, 1999 through May 13, 1999. This testing engagement
focused on the performance of UNNAMED CLIENT's on-line bill paying
application. During a planning meeting held on May 12th, CLIENT
REPRESENTATIVE, UNNAMED CLIENT project manager, described the goals of
the engagement as:
- Identify any bottlenecks in the online bill paying application.
- Identify any configuration changes required to optimally support
the maximum number of users on the current HW/SW configuration.
- Identify the optimal number of Haht processes to support a given
number of concurrent users.
- Identify the HW/SW growth requirements as the number of
concurrent users grows.
UNNAMED CLIENT identified the following performance metrics as
defining "acceptable" performance:
- Average per-page response time of less than 15 seconds
(regardless of statement size.)
- Maximum per-page response time of less than 45 seconds
(regardless of statement size.)
- Average "internal" response time of less than 2
seconds. ("Internal" response time consists of the time
required for the "backend HTML generator" to generate a
statement, as well as any Oracle processing necessary for a page.)
The testing and data collection processes went quite smoothly due
to the planning and support of the UNNAMED CLIENT staff. The effort
was supported by a number of UNNAMED CLIENT personnel, notably Russell
Niesz and Gary Greg. Russ was very helpful in collecting performance
data from the Solaris system(s) and Gary made sure that all required
resources (people, hardware, and facilities) available. This support
enabled the on-site HAHT representatives to focus on the testing and
work very efficiently. The stability of the application and the
servers under test, in combination with the professionalism and
competence of the UNNAMED CLIENT staff directly contributed to an
efficient and successful testing effort.

2.0 Configuration Tested
Due to scheduling constraints, the system under test (SUT) was
UNNAMED CLIENT's development configuration rather than their
production system. The SUT consisted of the following HW/SW
configuration:
|
Hardware |
Software |
- Sun Ultra 2 (dual 300mhz UltraSPARC II processors)
- 896Mbytes memory
- 2 x 4.2GB SCSI internal disks
- External Raid 5 Controller with 4 x 4.2GB disks
|
- Solaris V2.6
- Oracle V7.3.4
- Haht V3.1 Build 104
- Backend HTML Generator
|
For comparison, the production system consists of a two machine
configuration connected via a 100Mbit backbone:
|
Hardware
|
Software
|
- Sun Ultra 2 (dual 300mhz UltraSPARC II processors)
- 384Mbytes memory
- 2 x 4.2GB SCSI internal disks
- External Raid 5 Controller with 4 x 4.2GB disks
|
- Solaris V2.6
- Haht V3.1 Build 104
|
- Sun Ultra 450 (dual 300Mhz processors)
- 500Mbytes memory
- 8 x 4.2GB SCSI internal disks
- External Raid 5 Controller with 7 x 4.2GB disks
|
- Solaris V2.6
- Oracle V7.3.4
- Backend HTML Generator
|
Examination of these configurations implies that any performance
measurements made on the development system should be worse than the
performance of the production system. The production system spreads
the same workload over two separate systems and the only HW difference
on the machine supporting Haht software is memory size. During
testing, the combined load of Oracle and the Backend HTML Generator
never exceeded 10% of the total CPU demand on the development system.
The load for the test was generated using 3 NT V4.0 (SP 3) PCs
running WebLoad Version 3.01.321

3.0 Testing Protocol
Stress/Load testing was conducted by using WebLoad to simulate a
number of concurrent users performing a set of tasks. Based on input
from UNNAMED CLIENT developers, a set of scripts (called agendas in
WebLoad terminology) which represented a single user performing a set
of billing -paying operations were developed. To test the impact of
statement size, agendas that referenced the CLIENT1 site and the
CLIENT2 site were developed. (The CLIENT2 generated significantly
larger statements.) The table below outlines the specific tasks
incorporated in each agenda.
|
Agenda 1 |
Agenda 4 |
- Login to CLIENT1
- Export Statement
- Review Outstanding Balance
- Submit Payment
- Review Payment History
- Logout
|
- Login to CLIENT2
- Review Current Statement
- Submit Payment
- Logout
|
For load testing, the agendas were configured to simulate users by
incorporating random "think times" between pages (but not between
frames.) These "think times" represent the time a user spends
reading and respond to a page before submitting data or "clinking"
to another page. For these agendas, the random think times were evenly
distributed with a range of 5 to 30 seconds.
For stress testing, the per-page think times were set to 0. This type of
test agenda will generate the maximum amount of traffic and the most stress on
a system.
Stress/Load tests were conducted in two different modes:
- WebLoad was used to simulate a given number of users, each performing
the tasks in a specific agenda. When an agenda was completed (a user
completed their interaction with the application), WebLoad repeated the
agenda until the test was concluded. This mode results in a even load on
the SUT, giving performance information about the application at a given
number of users.
- WebLoad was used to gradually increase the number of users performing an
agenda. As a simulated user completed an agenda, a new copy was started.
For the UNNAMED CLIENT tests, we started with 15 simulated users and added
an additional 15 users every 2 minutes. This mode results in an ever
increasing load in the SUT. The testing cycle stops when a given maximum
number of simulated users is reached, or the measured average and/or
maximum response times exceed defined limits. This mode reveals the
maximum supportable user level given a set of response time criteria. This
mode is commonly referred to a "cruise control" mode.
The time to complete a single agenda by a single user is called "round
time." Because of the random per-page think times, round times vary.
Round times will also vary based on the number of concurrent users and the
tasks performed by the users.
During test runs, WebLoad collects the following data on the performance of
the SUT every 20 seconds:
- Simulated Load (number of simulated users)
- Min, max, average and current Round Time
- Number of successful rounds
- Number of failed rounds (the agenda generated some sort of error)
- Min, max, average and current Response time for each frame/page
(Note: WebLoad collects a wide variety of statistics. The statistics
listed above are the one most relevant to the UNNAMED CLIENT testing
scenarios and goals.)
A variety of other measurements were collected during each run:
- To measure time spent processing HAHTsite dynamic pages, HAHTsite
collects session and page statistics on page wait times, run times, and
CPU times. These statistics can be used to determine the "internal
response time" metric. The "internal response time" metric
can be measured using the difference between a Haht page run time and Haht
page CPU time.
- Memory and CPU utilization rates on the SUT were measured using vmstat
on the Solaris system.
Finally, at several points in the testing process, WebLoad and/or Oracle
parameters were changed to overcome problems encountered while stress testing
or to increase the overall throughput of the system.

4.0 Testing Results
The first set of tests was used to generate a set of baseline performance
metrics for each agenda. During these tests, the agenda were run in Load test
mode (using per-page thing times) at a load level of 50 simulated users. These
test run give a baseline set of performance metrics that were used to identify
key agendas to use for further testing.
The application was then stress tested using the agenda(s) that generated
the most load. This series of tests revealed the need to modify Oracle
parameters and increase the number of configured HAHTsite processes.
Finally, the maximum number of concurrent users was identified using the
most "stressful" agenda run in "cruise control" mode. The
most "stressful" agenda was identified based on the size of the
returned statement, the CPU and memory loads revealed by vmstat, and the
WebLoad and HAHTsite response time statistics.
4.1 Average Statement Size
As can be seen from the table below, the CLIENT2 agenda generated
significantly larger statement sizes than the CLIENT1 agenda(s). For
this reason, test runs used for calculating the maximum number of
supportable users were based on the CLIENT2 agenda.
|
Agenda |
Statement Size |
|
CLIENT1
|
12,900 bytes
|
|
CLIENT2
|
61,775 bytes
|
4.2 Rounds per Minute
The number of rounds completed per minute is a useful measure in
determining the overall throughput and capacity of the application.
Round per minute measures are displayed below for the CLIENT1 agenda
for both the 5 process and 10 process configurations of HAHTsite.
| CLIENT1 Agenda |
| # of Users |
5 Processes |
10 Processes |
| 50 |
10.5 rpm |
|
| 100 |
18.7 rpm
|
19.1 rpm |
| 150 |
|
27.9 rpm
|
While the round per minute measure shows only about a 1% increase in
capacity, another critical measure was impacted by increasing the
number of HAHTsite processes. A significant component of the round
time can be the time dynamic HAHT pages wait for free process. This
measurement, Page Request Time on Queue, averaged 1.5 seconds while
support 100 users with 5 processes and dropped to 0.6 seconds with 10
processes. The maximum measured Page Request Time on Queue was 18.8
seconds with 5 processes and 12.3 seconds with 10 processes.
The round per minute statistics for the CLIENT2 agenda reveal that
the application breaks down somewhere between 100 and 150 users.
| CLIENT2 Agenda |
| # of Users |
10 Processes |
| 100 |
20.5 rpm |
| 150 |
13.3 rpm
|
4.3 Page Response Times
One of the most CPU intensive pages in the on-line bill paying
application is the login page. This page makes a number of calls to
Oracle to assemble information about the user and their current
status.
The other intensive page is the Request Statement Detail page.
The charts on the next two pages show the Current, Session Average,
and the maximum response time for the Login page and the Request
Statement detail page when tested using the "Cruise control"
mode.
The "Max" measurement reports the recorded maximum
response time across the test run. The "Session Average"
measurement represents the average response time across the test run.
The "Current Average" measurement represents the average
response time at the identified user load.

This chart reveals that the average and maximum response time
constraints are exceeded when the SUT is supporting between 50 and 70
users.
Comparing this chart to the Statement Detail Page response
times reveals that the login process takes longer that the statement
generation process. The Request Statement Detail pages are
consistently returned in under 15 seconds until a load of 180 users is
reached; and, tis operation never exceed the 45 seconds maximum
response time threshold.
Inspection of the vmstat logs show the CPU idle time dropping to 0%
at about 60 concurrent users (with a queue of 3 to 7 processes waiting
to execute.) Between 60 and 75 users, the CPU idle time
"bounces" between 0% and 50%. After exceeding 75 concurrent
users, the CPU idle time remains consistently at 0%, with the queue of
executable process growing to 8 to 12 processes.
Further inspection of the vmstat logs reveals that even at a user
level of 220 users, no swapping occurred and there remained at least
81Mbytes of available memory. So, the principal constraint on the
performance of this application is available CPU cycles.


5.0 Observations
- The stress testing revealed the need to increase the number of
Oracle cursors as the number of supported concurrent users was
increased.
- Overall response of the application was improved by increasing
the number of HAHTsite process from 5 to 10. At user loads that
resulted in 15 second average response times, the CPU utilization
was measured at 100% with a run queue of 4 to 8 processes. This
measurement implies that further increasing the number of HAHTsite
processes will have little impact on increasing the overall
throughput of the application (unless hardware capacity is
increased.)
- Based on vmstat and top reports on the SUT while under load,
Oracle and the Backend HTML Generator used less about 5% to 8% of
the CPU capacity of the SUT.
- Based on vmstat and top reports, the http server on the SUT
consumed approximately 10% of the CPU during the tests using the
CLIENT2 agenda. (The CLIENT2 agenda produced statements that were
62Kbytes in size, resulting in a significantly heavier load during
SSL processing.)
- The principal constraint (bottleneck) in the application is
available CPU cycles. The HAHTsite and vmstat statistics reveal
that application spent very little time waiting on Oracle or the
Backend HTML Generator and the server supporting HAHTsite, Oracle
and the Backend HTML Generator did not swap or run out of memory.
- The agendas used in this testing emulated "graceful"
users; i.e., they ended every interaction with the application by
logging out. Not all (most?) "real" users will not
bother to log out. Instead, they will pay their bill and leave the
site. To the application they will appear to be simply a quiescent
active user. As currently configured, HAHTsite will
"time-out" their state after 15 minutes of inactivity.
If this inactivity is factored in, 600 concurrent users will have
the same CPU load as 81 "graceful" users.
- At a load level of 75 users, the system was able to complete 24
rounds of CLIENT2 agenda per minute. This represents a system
capacity to support 1440 bill paying operations per hour, or 8640
per 6 hour day.

6.0 Recommendations/Summary
- The current development configuration, and by implication the
production system, will comfortably support 60 to 75 concurrent
users with response times that fall within UNNAMED CLIENT's
defined goals. Since the main constraint was CPU cycles, and
HAHTsite's performance response to additional HW is typically
linear, a quad CPU configuration should comfortably support 120 to
150 users.
- Additional growth to 200 or more concurrent users will require
moving to a distributed implementation of HAHTsite. Again, since
HAHTsite scales linearly, adding another HAHTsite server
(configured similarly to the HAHT production server) will permit
the configuration to support a total of 240 to 300 users.
- Supporting 60 to 75 concurrent users on the production system
will probably require increasing the memory configuration on the
system supporting HAHTsite.
- Overall application performance and number of supportable
concurrent users can be positively impacted by investigating and
improving the performance of the login process. If the overhead of
this process can be reduced, the current configuration(s) should
be able to support more than 75 concurrent users with quite
respectable response times.
- These recommendations and observations are based on the
assumption that the similarity in configurations between the
development platform and the production platform allow a simple
comparison. This assumption can be easily validated by running a
test using the CLIENT2 agenda in "cruise control" mode
against the production system. This validation was not available
during the on-site testing due to I UNNAMED CLIENT's concerns
about the test having a negative impact on the performance of the
production system.
- These tests reflect the performance of the application without
the impact of Internet (and/or modem) induced delays. They
represent a "best case" performance benchmark and
reflect a test of the parameters within the control of UNNAMED
CLIENT. (UNNAMED CLIENT cannot control the latency introduced by a
very active Internet.) WebLoad can be configured to generate the
simulated users from load generators located "outside"
of UNNAMED CLIENT. If UNNAMED CLIENT wants to attempt to quantify
the impact of Internet latency, this same series of tests should
be run using load generators located in the Internet
"cloud".
|