Performance Testing across the pipeline with k6

Craig Risi
Aug 14, 2021
10 min read

I’ve spoken about the effectiveness of performance testing before and what to look out for. There are many tools that can be used to test for performance and load on a system. One of the biggest issues with most approaches to performance testing though is that teams leave it till too late waiting for the majority of their new code to be developed before undergoing these important tests.

The problem with this approach, as with security testing is that leaving the testing of an applications performance and load till the end, opens the risk of important application issues being discovered that will require extensive redesign to rectify. Issues that could easily have been identified had the system being satisfactorily performance and load tested earlier in the development cycle.

There are many reasons as to why performance testing is not conducted earlier. Some of these reasons are because the initial test systems will not mimic that of production, data related or testers wanting to replicate customer behaviour in the tests and need a fully system to be able to do this against. While all of these points make sense and you still want to ensure you do a performance testing that is based on actual use-cases to get a sense of what the customer will experience, the truth is that you can still conduct performance testing at even the smallest component level to at least ensure that each aspect of the code remains performant. And while having every aspect of a system meetings its expected benchmarks is no guarantee that it won’t crumble under a high load when all modules are brought together, it will greatly reduce the chances and at least then the likelihood is that environmental or data issues are causing the poor performance as opposed to poorly optimized code.

Another reason why performance testing is often left to the very end of development though is because of the resources required for the tests to be executed on. After all, with performance tests often ranging in reasonably high numbers, I can put strain on all systems involved and even crash your testing environments if they’re not suitably prepared. However, when you break the different aspects of a system into smaller components and isolate in a container ecosystem ,the can still be reliably tested without much strain on the infrastructure and provide accurate results of how performant each aspect of an application is.

His is where a tool like k6 has proven to be so useful in the performance testing space as it allows for the easy testing of system components simply by spinning each up in a container and through mocking can show the isolate performance of the code under test. And due to its small footprint, it takes up little space on server to spin-up a sizable number of tests to provide realistic results It can also still be used for bigger end-to-end tests when these are required too, so still meets those needs of performance and load testing, allowing everything to be done in one tool. Which s why I want to unpack how to work with it more clearly.

I am not one to prescribe to certain tools and prefer to be as tool agnostic as possible, but I do believe the core principles behind k6 are important to all forms of performance testing and that other tools are likely to come out and adopt similar models. The core aim is to bring performance testing into the development pipelines and this allows it to be done easily.

What is k6

k6 is a Free/Open-Source tool, written in Go, which can ingest tests written in JavaScript (ES5.1+) and turn them into requests to load test your website or API. k6 works with the concept of virtual users (VUs), which run scripts. Duration is a string specifying the total duration a test should be run.

Scripts must contain, at the very least, a default function — this defines the entry point for your VUs, similar to the main() function in many other languages. The code inside the default function is called “VU code”, and is run over and over for as long as the test is running.

When creating a new load test, the first thing that you’ll often do is define the HTTP requests that will be used to test your system. A simple example that just performs a GET request looks like this:

import { check } from "k6";

import http from "k6/http";

export default function() {

  let res = http.get("https://test.repl.com/"); (Not an actual site, just using for illustration purposes)

  check(res,

    "is status 200": (r) => r.status === 200

});

This script essentially sets up a function where you are asking k6 to hit a specific URL and get a desired response from it. You can also add sleep commands to it and a host of other functions to make the script match a specific user scenario that you are intending to replicate.

While k6 scripts can be quite simple, you can also do far more complicated test scenarios within it where you can also include multiple stages of testing a particular command or script, directly within a k6 script.

Different parts of the k6 script

Imports: This is obviously where you can import k6 Script API, and other JavaScript modules (libraries) that you desire to make use of. These modules can be loaded using various methods: bundled NPM modules on your local machine and remote modules (from a URL, even CDNJS and GitHub).

Init code: This is the part of the script that is outside the exported “default” function. It is usually used to provide options to the whole test, how to run the test, how to distribute it on the cloud, etc. Obviously, it is used for initialization of the whole test itself.

VU code: k6 supports a feature called virtual users. This means that you can use separate “smart” virtual users to test your system. The code in this section which is inside the exported “default” function, is ran over and over inside each VU and the aggregated results of all these VUs are processed and reported by k6.

Running k6

Once you have saved this script somewhere accessible to k6, naming it script.js or your own desired file name, you can run it with 10 virtual users (VU) and over a period of 30 seconds, as follows:

k6 run -u 10 -d 30s script.js

It basically runs k6 on your machine using 10 virtual users over the course of 30 seconds, and it checks to see if the test URL returns 200 (OK) in all tests.

These are additional commands that you can include in your execution command:

pause: pause a running test.
resume: resume a paused test.
run: run a test with various flags, e.g.--paused to run a script in paused mode.
stats: shows statistics about the currently running or paused test entailing number of VUs.
status: show the status of current k6 instance, either running, paused, tainted and the number of VUs.
archive: creates a bundled tar file of your script along with all the dependencies, which you can later use with “k6 run”.
login: authenticates with Load Impact Insights cloud service and provides an authentication token to be used by k6.
cloud: runs your authenticated test on the k6 cloud service.
convert: browsers can log requests/responses as HAR (HTTP Archive) files, which you can then convert the HAR file using k6. k6 creates a script from the HAR file, which you can edit and run locally or on the cloud.
inspect: basically outputs the consolidated script options for that script or .tar bundle.
scale: scale a paused/running test with a new number of virtual users (VUs). It can do this either for local or for cloud execution.

K6 results

After executing a k6 test, you will see the below output:

By interpreting the current output, you can see that the script is locally executed for the duration of 30 seconds and spawned 10 separate virtual users (vus) to test the URL. You can also see the highlighted green checked item, “is status 200”, that signifies that our check has passed for 100% of the cases, meaning all cases. The total data sent and received in 2775 requests is also present. For each HTTP requests, there is a http_req_* metric (key) with 5 different values, each corresponding to the average, minimum, median, maximum, 90th percentile and 95th percentile of all the requests.

A breakdown of the different times displayed is shown below. The purpose is to help identify where the potential performance issues are.

Note: k6 simply runs the script and doesn’t monitor the underlying performance of specific servers or applications and so once you identify processes which are responding below expectation, then its important to delve deeper into the code/application/server to find the real root of the performance issues.

http_reqs How many HTTP requests has k6 generated, in total.
http_req_blocked Time spent blocked (waiting for a free TCP connection slot) before initiating the request.
http_req_connecting Time spent establishing TCP connection to the remote host.
http_req_tls_handshaking Time spent handshaking TLS session with remote host
http_req_sending Time spent sending data to the remote host.
http_req_waiting Time spent waiting for response from remote host (a.k.a. \”time to first byte\”, or \”TTFB\”).
http_req_receiving Time spent receiving response data from the remote host.
http_req_duration Total time for the request. It’s equal to http_req_sending + http_req_waiting + http_req_receiving.

You can also make k6 output detailed statistics in a CSV format by using the –out/-o option for k6 run, like this:

$ k6 run --out csv=my_test_result.csv script.js

The results shown on screen is the aggregated results on all the tests. If you want to be able to use all the generated data, and not only the aggregated ones, you should use the -o flag to send the output to either a files, a software or the cloud service.

The raw results can be written to a JSON file using JSON plugin. There are other plugins that push the metrics to InfluxDB, Apache Kafka, StatsD or Datadog.

The Metrics of Performance Testing

Performance testing is an umbrella term for a group of tests that encompasses many types of tests. Each test type tries to answer a set of questions to make the performance testing process more goal-oriented. This means that just running tests is not enough, you have to have a set of goals to reach.

Concurrency: systems which would set this goal, usually have a concept of end-user and need to see how the system behaves while many concurrent users try to access the system. They basically want to test how many of requests fail/pass under high loads of users. This both includes many concurrent users and each requesting multiple resources at the same time.
Throughput: systems with no concept of end-users, would set this goal to see how the system behaves overall, while there is a ton of requests/responses coming in/out of the system.
Server response time: this goal signifies the time it takes from the initial request from the client to the server up until a response is sent back from the server.
Regression testing: sometimes the goal is not to put “heavy load” on the system, but is more about “normal load” and functional and regression testing to see how a change would affect our system’s performance and if it still adheres to our defined SLAs.

The general idea is to measure how a system or system of systems behave(s) under heavy load, in terms of speed, scalability, stability and resiliency. Each of which can be measured by these goals.

Speed can be measured by time it takes for request to be handled by the server and how much time it takes for this request/response to happen.

Scalability can be measured by how well the system scales if the load is increased and by measuring if it sustains over a period of time under this load.

Stability can be measured by how well the system sustains the load and to see if it stands against a high number of errors and events and still stays responsive and stable.

Resiliency can be measured by how the system recovers from crashes and down-times and responds to requests, after putting too much or too frequent load on it and eventually crashing the system.

Rerunning Tests to Verify the Results

You can rerun the tests to see if they hold almost the same results during different tests and compare the tests to see if they deviate. If they are almost the same, you can analyse the tests and derive your results, otherwise you should pinpoint where it deviates and try to find a way to prevent it from happening, like a bottleneck.

k6 and the Metrics

k6 supports a set of built-in and custom metrics that can be used to measure various things and to either achieve the above mentioned goals or prove them wrong. The metrics that can be used to define custom metrics are: Counter, Gauge, Rate and Trend.

1. Counter

This is a simple cumulative counter that can be used to measure any cumulative value like number of errors during the test. k6 Counter metric As you can see in the above example, it counts the number of 404 errors that are returned by the test. The result is evident in the screenshot below:

2. Gauge

This metric lets you keep the last thing that is added to it. It’s a simple over-writable metric that holds its last added value. This metric can be used to retain the last value of any test item, be it response time, delay or any other user-defined value. If you run the following code, you’ll see that it catches the latest error code, which is 404.

3. Rate

This built-in metric keeps the rate between non-zero and zero/false values. For example if you add two false and one true value, the percentage becomes 33%. It can be used to keep track of the rate of successful request/responses and compare them with errors.

4. Trend

This metric allows you to statistically calculate your custom value. It will give you minimum, maximum, average and percentiles, as is evident in the above screenshots for http_req* requests.

Regular scheduling and benchmarking

Performance tests are not something that should just be run once and forgotten about once software meets the desired performance expectations. Even if code doesn’t change on a portion of an application, other dependencies, libraries or the servers themselves could face changes and as such, it’s important to regularly run performance tests and get consistent benchmarking for your application to identify any potential issues that may arise at any given time.

Performance Testing Throughout the Pipeline

While I talk about k6 in this article and certainly try and promote its capabilities, the goal is not to focus on the tool, but rather the importance of early and regular performance and load testing. In order to work in an automated manner through a development pipeline, it also needs to be lightweight and easy on resources, which is where the likes of k6 comes in hand. There are other options though and as long as it can provide your teams with the opportunity to explore and measure code performance as early in the development cycle as possible, it should make a big difference in improving your overall application performance.