Designing Load Performance Test Suites For Ruby On Rails Applications

Designing Load Performance Test Suites For Ruby On Rails Applications

IS YOUR RUBY ON RAILS APPLICATION READY FOR LAUNCH?

Imagine the following situation:  after several weeks of development, your team is nearing launch. The moment that will allow you to start selling your product to your customers and, hopefully, to become rich. But, how can you tell your product is ready to go live?

For some, “an application ready for launch” means an app with a sufficient set of outstanding features enveloped in beautiful design. It is obviously good to sort the point out before you launch. Having done so you should immediately move on to the next question: “Will my application actually deliver all the outstanding features and create a pleasant experience for the user?”

Hence, it does not matter if you are the development team leader, the product owner or yet another stakeholder; sooner or later you will all have to deal with such questions as:

  • Will the application meet the non-functional requirements / performance goals?
  • How will the application behave under extreme conditions like heavy traffic? Will it be secure at that point or will some functionalities be compromised under heavy usage?
  • Can we rest assured that the site will not go down under the load generated by visitors hitting the site as a result of a brilliant marketing campaign that is to accompany the launch?
  • How will the scheduled tasks such as, e.g. the system backup, affect user experience? May they lead to inconsistent / unacceptable response times?
  • If the application shuts down unexpectedly, what will happen to partially completed transactions? Subsequently, when the application is restarted (automatically or by the administration crew) will it resume operation at the correct point? Will it pick up unfinished transactions and process them properly?

This is a rather lengthy list and yet there are still some more issues to address in order to make an informed decision, i.e., the decision whether to launch the application or postpone the launch to gain the time needed to address such performance issues. And it is by no means true that you cannot fix performance issues before actually launching the application since there are no real data available. There are measures which can be taken during the development process which may ensure that the app will be capable of performing under certain circumstances and projected business volumes. What is more, timing is crucial. A few days/weeks before the launch may not be enough to deal with the performance questions and to effectively solve the associated problems. It may as well be too late. In a poorly managed application development process that is sometimes the case.

A mature and well thought out development process addresses performance goals, objectives and priorities from the very beginning of the development cycle. It is already at the very beginning that the development team should not only discuss the features and create the functional tech specs, but they should also try to establish how the web application should perform. The output may come in the form of non-functional requirements specification. The crucial task for the development team is to determine the real intent and specific values behind the client’s expectations concerning performance. In practice, the client’s statements can be vague, like, for example, “it should be fast and responsive” or they can be very specific: “under these circumstances, the average response time should not exceed 3 seconds”. Especially when the expectations are subjective, ambiguous or incomplete, the development team should ask questions and clarify the requirements to arrive at verifiable acceptance criteria. To cite a few examples:

  • How do you expect your app to perform when compared to some other applications / web sites you are familiar with?
    • How much better? (“Slightly better would be just perfect,” “Dramatically better”, “About 20% faster.”)
    • How did you arrive at the requirement / number?
  • How do you expect your app to behave under high traffic conditions?
    • Can the application dramatically degrade in performance for all users, after reaching a specified critical point? or
    • Should the application reject excessive connections displaying the “temporarily unavailable, please try later” message and handle regular capacity connections in a timely manner?

The answers to such questions not only help the development team to understand the client needs, but they also:

  1. play a significant role in designing the performance / load tests for the application,
  2. enable better resources accounting (performance tuning budget, the cost of hardware / cloud infrastructure)
  3. allow developers to make better decisions when it comes to the system architecture.

Now that we see the importance of performance in software solution development projects, it is time to clear up some misunderstandings about the terms: “performance testing“, “load testing” and “stress testing“. The understanding of the different types of performance tests reduces risks, minimizes costs and helps to apply appropriate test types over the course of the project implementation. Performance test types differ when it comes to objectives and context.

PERFORMANCE, LOAD AND STRESS TESTS ON YOUR RUBY ON RAILS APPLICATION

The purpose of performance testing is to investigate the parameters of the tested system, such as responsiveness, speed, scalability, stability, etc. The performance is often defined in terms of response times, throughput and hardware utilization levels achieved during test, in relation to the performance objectives specified for the project.

Performance testing can be defined as a technical investigation conducted to measure or validate the speed and/or stability of a given application. Performance tests, load tests, and stress tests are the subsets of performance testing. Each performance test type is meant to serve a different purpose and should be designed accordingly. You can find a brief overview of the different test types and their parameters in the following table:

Test typePerformanceLoad TestStress test
PurposeDetermining (or validating) speed/scalability/stability of the tested application.Verifying application performance under normal and peak load conditions.Determining application behaviour under extreme conditions.
BenefitsReal application parameters enable informative decision making(concerning launch schedule, performance tuning).Helps to establish if the end user will be satisfied with the application performance or not.Exposes the gap between the expected performance and the real performance of the application.Determines thethroughput required to support the anticipated peak production load as well as the adequacy of the hardware environment.Evaluates load balancer implementation.Detects race conditions issues and functionality errors which occur under load.Helps to determine application capacity (in terms of concurrent user sessions).Determines if the system data can be corrupted under extreme load conditions.Determines the amount of traffic / load needed to cause failures and errors in addition to slowness.Helps to configure monitoring alerts that can warn of possible incoming failures.Exposes security vulnerabilities caused by stressful conditions (Denial of service attacks).Helps to define thetypes of failures that are most valuable to plan for.
GotchasNot suitable for the detection of functional defects appearing only under load conditions.Does not identify performance parameters of the system properly if designed poorly.Usually not designed with the focus on testing the application’s response times.It is difficult to quantify the amount of stress to apply, while designing the type of tests.It is important to isolate the test environment to avoid disruptive network and/or application failures.

To sum up,

schedule for and execute performance tests if you want to keep or improve the product quality, use load tests to check if your performance requirements and performance goals are met, conduct stress tests to discover bugs caused by extreme loads.

Notes on execution

A solid development process incorporates all the three performance test types into the development life cycle as early as it is possible. And there are good reasons for it; the tests ensure specific benefits, including:

  • identifying bottlenecks,
  • establishing a baseline for future testing,
  • supporting performance tuning efforts,
  • ensuring the non-functional requirements are met rather than become an afterthought.

The data gathered during performance tests should constitute a basis for making decisions concerning performance tuning. For example, if the performance parameters are unacceptable, the development team should shift their focus from developing new features to optimizing the performance of the existing ones. The approach whereby you introduce features first and delay system performance improvements may be more risky / expensive than dealing with performance problems at the time they arise. As the complexity of the system grows, it is ever harder to track the dependencies between the different parts of the system, which may seriously affect performance and make it difficult to bring it back under control. Therefore, regular performance checks ensure better isolation and identification of the poorly-performing parts of the system. If you ask “what regular means,” the answer depends on the software development methodology used. For instance, if a development team use Agile and deliver a batch of functionalities each week / every two weeks, performance testing should be executed at least once a week / every two weeks. If the development team use a Waterfall approach, performance testing will most likely be conducted less frequently.

DESIGNING A PERFORMANCE TEST SUITE FOR YOUR RUBY ON RAILS APPLICATION

Knowing the types of performance tests as well as understanding their rationale, we can now move on to the challenge of designing them. The topic of designing a performance test suite for a web application is too broad to be covered comprehensively in a blog article. I have limited myself to compiling a list of tips as well as answers to questions which typically arise when designing performance test suitesAlways think in the context of your specific RoR app The fundamental purpose of load testing a web app is to realistically simulate the user’s experience. Consequently, the test environment should be as close to the (projected) production usage as possible. The test conditions should always reflect the context of the project if the results obtained are to be accurate and predictive of the future behaviour of the application. Do not overstress load generators. Check your network. Ensure that the client computers used as load generators are not overly stressed. Keep the utilization of the resources (CPU, memory) low enough to have the confidence that the load-generation environment is not itself a bottleneck. Similarly, take a closer look at the network environment of your test suite: it may happen, that your network does not offer enough bandwidth to simulate the desired number of concurrent user sessions ‘hitting’ you RoR application. Baselines The performance test results should be archived in a way that enables their comparison with the data acquired in later stages. Hence, the archived data is most valuable if the development team designs the application performance suites based on a set of reusable test assets. Creating a baseline is a process of capturing the differences between the subsequent archived datasets describing application behaviour over time. With such a baseline, the development team is able to evaluate the effectiveness of changes introduced to improve the performance of the system as well as to establish the impact of newly developed features on system performance. Web server logs acquired from the prototype or legacy application Sometimes, the application that is developed has its predecessor. If that is the case and if you have access to the Web server logs of the legacy application, you may use them to validate and/or possibly enhance the data that have been collected during performance testing. The data acquired from an application prototype or its beta release may also come in handy for validating the various assumptions made while designing the performance test suite for the application. The 80% server CPUs utilization rule There seems to be a general consensus about what should be considered a maximum acceptable CPU utilization for a working web application. It is 80%. After 80% CPU utilization has been reached, user requests queues increase exponentially. For the user this means poor experience, despite the fact that the server might still be running fine, at least from the metrics perspective. Knowing that, design your performance test suite to target slightly below 80% and set the threshold at 80%. Having done so, you have clearly set the expectations:

  • developers should be alerted if during a test the application utilizes more than 70% processor power,
  • if under the target workload developers see more than 80% utilization, they should consider it to be a defect.

Benchmarks

You may need to play ‘by the rule’ and use the benchmarks recognized in the industry. This brings the benefit of being able to evaluate your application against some other systems or applications that calculate their scores for the same benchmark. It may be useful if you want to convince the product owner or other stakeholders that the application is ‘great’ or ‘better than <>. Additionally, such a score can be useful when setting realistic performance goals. Still we need to remember that each application is unique to some extent, and so are their environments. Therefore, if you want to be truly confident that your application behaves in a given way under certain conditions, use the benchmarks which are tailored / modified to suit your specific application. Quantifying the end-user satisfaction / frustration It is useful to capture the end user feedback on the performance of your app to verify your assumptions. And you do not need to be far into the development process to start doing so; a prototype or a demo will often do. With a very limited codebase you are in a position to, for example, control the load time for each page, screen, graphic, control, or a list. The relative simplicity of the procedures as well as the low costs involved enable you to create several versions of the application with different parameters and, consequently, with different degrees of responsiveness. You may then ask the (test) users to try each version and provide you with the feedback to confront with the performance goals and requirements adopted for your application. Dynamic data vs. static data It is a rule of thumb to use dynamic rather than static data in the performance test suite. Some points to consider in this respect:

  • Static data can skew the results of performance testing. Operating on the same data may trigger various caching mechanisms in the system. As a result, your application may not render a view for each request or fetch the data from the database. The application may retrieve the data from copies stored in memory and thus fake the results.
  • Dynamic data in load tests usually allows you to discover some more complicated and time-sensitive bugs, such as the errors caused by multiple user sessions using the application simultaneously.
  • Using dynamic test data in a load test can bring the additional benefit of discovering possible security vulnerabilities in the system. Hackers often exploit the application errors
  • Hence, random values allow you to replicate application errors such as, for example, scanning the database table when an invalid value is supplied.

Even if you plan to leverage caching techniques in order to achieve decent performance, it is a good idea to incorporate some cache independent scenarios in your performance test suite to catch bugs and discover potential vulnerabilities.

PERFORMANCE GOALS IN TERMS OF CONCURRENT USERS

Should the application be prepared for two hundred, two thousand, or maybe ten thousand concurrent users? What is the cost of guessing wrong? If you have not prepared your application for the traffic to arrive after launch, your application may crash in a critical moment. You will disappoint the customers and lose money. If you overestimate the projected traffic, you will be paying for the underutilized resources. Although there are a number of tools that can facilitate the process of designing a performance suite and answering such questions as “How many concurrent user sessions do I need to simulate in order to properly test my application?”, it is beneficial to understand the maths behind such calculations. Here is a recipe for a reasonable guesstimate:

  1. Let’s assume the performance goal/requirement is to serve 100.000 visitors per month;
  2. we may be tempted to divide the number by 30, i.e., the average number of days in a month, but that approach is most likely based on a wrong assumption as a web application usually experiences both busy and idle periods. We need to account for such fluctuations. To cite a relatively simple example, let’s assume that the application is used mostly during weekends, during the day and the traffic is relatively stable. Then we arrive at around 21 for the number of days, and 12 for the number of hours per day.
  3. The average number of visits per hour would be (100.000/21/12) =~ 396. In the next step we need to make an assumption concerning the average length of a session, i.e., try to figure out how much time the average user is going to spend using the application. Web metrics may be helpful; if they are not available, you can guess the number by conducting an experiment whereby test user tries to mimic user behavior in the system and measure how long it takes to achieve the objectives of a given persona.
  4. Finally, we can calculate the concurrent user level parameter. To illustrate it, let’s assume that the user spends on average 10 minutes on the site. Hence, to get one level of concurrency,  60 minutes / 10 minutes per visit = 6 visits per hour.
  5. Accordingly, the number of concurrent users (an average concurrent user level) is 396 / 6 = 66. It is useful to establish the average number of users on the site at any one time, but it is even more useful to know the instantaneous peak in the number of concurrent users. And it is the only moment where we will need to apply some kind of “magic” multiplier as there are too many variables and unknowns to tackle in a simple way. The design practice shows that the magic multiplier that might help us establish the likely peak values ranges between 3 and 6. Hence, if we take the former number, we will come up with the final result of 198, which is our target when designing a load test, given the above circumstances.

“Automated” as a key term in “Automated performance/load testing” Persuading 50 of your friends or co-workers to perform a given action in the system concurrently may be interesting from the social and logistics point of view, yet it is impractical from the performance testing perspective. Therefore, whenever you come across the term “Automated performance testing”, treat it as a pleonasm. Manual performance testing will not get you very far. You cannot manually simulate  loads of thousands of virtual users hitting your website for hours. Manual tests cannot be repeated in a comparable manner, they cannot scale either. Performance / load tests should (in principle) be automated tests.

THE ULTIMATE GOAL & THE KEY CONCERN

The differences between the various terms associated with performance testing are, hopefully, clearer now. You know the answers to some important questions that may arise while designing the performance suite for your Ruby on Rails application. All said and done, let me stress the fundamental goal behind the performance tests – it is all about the quality and the user experience delivered on the product you implement. And the key concern is the app response time your users are exposed to. It is the ultimate metric which translates into a percentage of application users who are frustrated by poor app performance. To put it bluntly, your app users do not care about the values in your performance test results, your goals or assumptions concerning application performance. They do notice, however, that the application is slow, too slow for them at least. And some of them decide to leave. Let the end-user response time constitute the core concern around which to develop and implement performance testing strategies. Performance / load tests may help ensure your customers/users will not be frustrated the moment you eventually launch your application. And there is a chance they will decide to stay for longer. So what next? After you have laid the foundations under your performance test suite, you need to choose the tools with which to execute the process of benchmarking and monitoring the performance of your application. I will cover the topic in the next blog post, i.e., the second part of the article.

Post Scriptum

Special thanks to Michał Czyż for his feedback on this article.