Performance testing is a particularly common question in our experience with our clients and prospects. We have encountered situations where understanding what constituted good performance was not clear at the beginning of a test effort.
Benchmarking web services usually involved simulating lots of users and sending lots of messages to simulate a heavy production situation. Often we're called upon to simulate production environments using our gateway, so we often become central to discussions around performance. Equally important is a desire to understand what statistics are available to measure, and what the relationship between them can be.
We have identified key metrics that are used in various organizations as the most important statistics that are reported on. The reasons to choose one or the other vary but follow a short list of themes. Some organizations characterize one metric or another as being the key performance indication. We've encountered several in our engagements and the following list stands out:
Planning for a good user experience and sizing your enterprise solution is a complex undertaking as there are at least five different parameters as inputs and quite a few ways of looking at the problem. Doing the math here is important to understand the issues.
One of the most common assumptions in sizing is that large concurrency is required to support a large numbers of simultaneous users interacting with the application. The usual mandate is to support your user base, and to plan to accommodate a worst case situation, so lets see what real concurrency is needed by a large number of users.
Lets assume 20,000 users. This analysis assumes a somewhat casual user base, because an enterprise that has 20,000 users using a single line of business application demands quite a bit more analysis than a short paper.
Let's further assume the application is web based, but has a core component that is sourced from some services component, i.e. the portal model. Most of the HTTP requests for given page are things like images, CSS and other small static files and are serviced by web servers, not application servers, and so don't figure into this analysis. The calculations I present here are also applicable for fat client GUI-style applications because the same kind of technology choices around minimizing server round trips to heavyweight services hold true for GUI applications.
We think it is important to digress into application design for a moment. Designing an application to do live queries for small parts of user interface content is not good practice whether its client/server, a web app, or fat client. Waiting for even local network latency to fill in the content of UI elements like drop down lists gives extra waiting states during the painting of a display. This make the UI appear unresponsive, and makes a large client base roll-out impractical for even unsecured applications, just from sheer request volume.
We are making a best practices assumption that services applications are designed to do one or two larger critical path requests as the core of the application service. We'll assume that most pages have a single type of information the user wants to view, but some will be more complex. I've chosen to use an average of 1.25 service requests per page view, reflecting a mix of page types.
Lets talk about what medium service latency means. Static content requests that require no processing should be sub millisecond in latency, but actual service requests are normally in the 10 milliseconds to 5000 milliseconds range on the back end. Later I'll show how the service request latency is a hugely important number in determining required concurrency.
So far we have 20,000 users, with 1.25 service requests per page, and each of those request taking from 10 to 5000 milliseconds to process.
Next we need to determine how many requests those users will generate.
Given the way that people read and use applications, the bare minimum time it takes to recognize a fully rendered page or UI, find the content you are looking for, then choose a navigation element to initiate another request is likely 3 to 5 seconds. That's the bare minimum. I'm calling this time that users are not generating new requests to back end services the page dwell time.
Dwell time on a page of something like an invoice, a purchase order or a line of business task like a shipping request is going to be longer than 5 seconds.
So given a page dwell time between 5 and 60 seconds, over the course of an hour, 20,000 users are going to generate between 0.75 and 18 million requests, or between 208 and 5000 requests per second. This is a reasonable number for the ove requests per second statistic, but leads us in the the discussion of needed concurrency and how latency is by far the critical statistic.
The calculation for the required concurrency is as follows: 20000 users generating 1.25 service requests per page every 5 seconds would generate, on average 20K * 1.25 * (5/60) or 30,000 requests per minute or 5000 requests per second. We need to retire 5000 requests every second and the service takes 10 milliseconds to retire a single request. In one second there are 100 periods of 10 milliseconds, so in each of these 10 millisecond periods we need to retire 5000/100 or 50 simultaneous requests.
Required Concurrency=Requests per second / (1/Latency in seconds)
In our first example 5000/(1/0.010) = 5000/100 = 50. Of very important emphasis here is the effect of latency on concurrency. Keeping minimum dwell to 5 seconds, and starting from 10 to 5000 milliseconds service latency, concurrency requirement goes from only 50 concurrency required to service 20K users all the way to 16667 concurrency. At this point performance of the system is at a crawl, because at 1.25 requests per page, its would take an average of 7.5 seconds just to make the data available to render the page.
There are a large number of simplifications in this calculation but it does demonstrate that characterizing the load and the user experience has a huge impact on a prediction of required concurrency. How long will your users wait for data before they decide the system is too slow?
Requests per user action has a direct relationship to concurrency as well. Less clear is the effect of page dwell time. These worst case numbers reflect a given user, on average, asking for new content every 5 seconds. That's kind of fast for most pages, unless you've built the system with lots of paging through content. Then when what they need is on page 3, they won't wait 5 seconds to ask for new content. This can create a worst case scenario unintentionally as user acceptance testing may not accurately reflect how often people generate new requests, because the UAT environments often are not loaded with enough data to require paging through content.
In the discussion of concurrency I presented a look at application analysis with total application service latency as a huge determining factor in concurrency requirements. There are many contributors to latency, and our gateway product is often a focal point for analysis of latency.
The above sequence diagram describes the processing steps and messages, internal lookup requests and points of latency when servicing a single inbound request at the Gateway.
The Layer7 SecureSpan Manager Dashboard specifically reports the time between steps 1 and 12 as the Front End Response time and the time between 8 and 10 as the Back End response time.
Experience has shown us that those are the two most important items to report when measuring latency.
Of note in this example is that the maximum front end response time or more accurately, the latency experienced by the end user was only 132 milliseconds even though the back end response time was 100 milliseconds.
In almost all scenarios we've encountered in the field, Step 9, the back end processing time is the bulk of the latency. This is beyond our control, but we do our best to help here: an efficient requester subsystem, controls on concurrency and connection caching for SSL.
There are some components of overall latency that we end up classifying as "our local processing overhead". One of them, Step 4, LDAP Lookup Time is minimized somewhat by our authentication cache, but still can be a limiting factor. This call to LDAP has similar analogs in Single Sign On authorizations and other methods of external decision point references. This latency is not separately described in our UI, and may in some cases result in the gateway itself suspected as being a source of latency.
Also of particular impact is cryptography. Cryptographic operations can incur latency and/or heavy CPU usage depending on the use of internal HSM, internal software cryptography or external HSM solutions. We have very efficient cryptographic capabilities, but there is an associated mathematical complexity associated with public key operations that no system can avoid.
With back end latency so dominating normal performance testing, we optimized our systems to minimize delay in back end processing. Our simplest case in small messages has us processing 20k requests per second, for latency in the sub-millisecond range, so in most cases, the gateway is not contributing a any significant amount to latency.
Some policy elements have latency associated and can be avoided in latency sensitive applications; Auditing is the obvious one as it has dependencies associated with synchronously waiting for the auditing subsystem to write to hard disk. Layer7 has identified some usage patterns that have added undue amounts of overhead to requests and we can help you with those situations, just ask.
Latency in the whole transaction is one of the most important determining factors in sizing services deployments. Layer7's SecureSpan SOA Gateway product line is not a large contributor to latency but can be used to measure and in some cases help alleviate these issues. We're happy to help you analyze your prospective workload and help plan for sizing your installation.