Table Of Contents
Most performance testing is fluff. Full of twists and turns, culminating in a report that most don’t understand, if they read it.
The scripts execute and the results get generated. The document ends up somewhere like SharePoint or MS Teams. Nothing the team does next week changes because of it. But the box got ticked, so the job is done. You’ve done your part.
I have spent enough time watching this play out for the last 10 years to know it is not the tool’s fault. It’s not JMeter and it’s not k6. It is not the engineer(s) running the test. Like with a lot of feedback loops, the problem sits upstream. We didn’t agree on what “acceptable” actually looks like as a business, so the test has no true value. It produced output sure, but without action it means nothing. And everyone just quietly moves on.
That’s basically the whole article right there, but I’ll dig in a little deeper given the nuance.
The Theatre Problem
Performance testing falls into the “we should probably do this” bucket for a lot of teams. It gets treated like a release-gate activity rather than an ongoing discipline. And as a release-gate activity it usually takes longer and generates more complex work. Someone runs the scripts, exports the results, and we move on unless something is “broken enough”. The next release rolls around, the same script runs against the application. We didn’t learn lessons from the last run, so we didn’t action anything of value. Nobody cares until now, when “suddenly” we’re at critical mass.
I have seen teams pay for expensive performance testing platforms and environments, to test their application once or twice a year. I have seen teams use what’s in the browser DevTools as proof performance is fine. Yet, neither are able to answer the simplest question a product manager will ask – “is this release going to be slower than the last one?”
I don’t think this is even the right question, ‘cos sometimes the data can give you a false sense of security. You need experts that know what is going on to know for sure. The tool is not usually the problem, it’s more about the conversation(s) that should’ve been had, but never were.
Performance testing that adds real value looks different to what most think, and starts much earlier.

Start with the User, Not the Tool
Here’s the thing: performance is a user experience problem, not necessarily an infrastructure problem (remember I mentioned nuance).
When someone says “the system is slow,” they don’t mean a specific endpoint returned in 840ms instead of 520ms. They mean they clicked something, and waited long enough to get annoyed. The feeling that the product is slow has almost nothing to do with server-side latency on its own, and everything to do with the shape of the journey they just went through.
That’s where performance testing has to start. Not with “let’s beef up resources, then load up the API,” but with “what does too slow feel like on this users journey?” Checkout is not the same as search. A dashboard loading from cold is not the same as refreshing (F5) in place. A background report is not the same as an interactive query. The thresholds are different, and treating them like they are not is how you end up tuning the wrong thing.
Service Level Objectives are the tool here. An SLO on checkout might read “95% of checkout page loads complete in under 2 seconds.” Sure, that is a sentence a product person can own, a developer can build against, and a tester can verify. But you really need all three in the room when that gets written. If SRE writes it alone, it reads like infrastructure. If product writes it alone, it reads like a wish. If QA writes it alone, it reads like a testing spec.
In my experience, the teams that do performance well have an initial SLO conversation once and revisit at least every quarter. The teams that do it badly, likely never had the conversation at all and don’t have a goal in mind. They just argue about whether “slow” is really slow every time someone looks at the data or when complaints come in. What’s the problem, “it runs fast on my computer”?
Percentiles, Not Averages
Two modelling mistakes make most performance results lie.
The first is averages. The average load time across a checkout flow tells you almost nothing. If 90% of users complete in 1 second and 10% take 30 seconds, the average is about 4 seconds, which describes nobody. The experience your angriest user is having is invisible in that number. p95 and p99 are where the pain lives. That is what you tune for and averages have no place here.
The second is treating load as concurrent users rather than arrival rate. A system behaves very differently when 100 users arrive every second and stay for a while, versus 100 users sitting in various states of a session at the same time. Real traffic is usually seen as an arrival rate. Modelling it as a fixed concurrency count produces clean-looking graphs that bear almost no resemblance to what happens at 9am on a Monday (real world).
Neither of these are new ideas. They have been in the performance testing literature for twenty years. They still show up missing from most production load tests I’ve seen. I am no performance test expert, but so many people see load as load, not the variations of what load actually is.
To try and put this simply, think of takeaway coffee and restaurants. Takeaway coffee places have only some people hanging around and usually for short periods of time, but lots of traffic. Restaurants on the other hand, have less traffic but people hang around for a long time to enjoy their food.
The Four Flavours
Load, stress, soak, spike. One paragraph each, and the honest answer on which ones your team probably does not need (from a non-expert).
Load testing – “how does it handle normal expected traffic.” This is the one everyone runs. It is stock standard performance testing.
Stress testing – “how does it break when we push past normal.” Useful if you care about graceful degradation, which you should. Less useful if your team has never actually acted on a stress test result or the application never experiences increased load.
Soak testing – running a realistic load for hours. This is the one that catches memory leaks, connection pool exhaustion, and slow degradation. Almost nobody runs it as it takes a lot of time and effort. But it’s the most valuable of the four for finding real problems.
Spike testing – sudden traffic bursts. Useful for products with launch moments, marketing-driven or event-driven workloads. Not useful for most internal applications.
There are others, but these are stock, keeping it focused is more likely to help with first steps.
If your team can only do one of these “properly”, make it soak. It will find things the others won’t.

Shift-Left but Don’t Slow Delivery
There is a version of “shift-left performance testing” that looks like forcing every commit or PR to run a full load test. Do not do that! You will have people yelling at you before midday.
The version that works is lightweight. A k6 or Artillery script that hits a handful of realistic user-journeys against a staging environment, checking if any crossed a latency threshold, and flags the PR for a second look if they did. It takes ~90 seconds and runs on every PR. Simples!
That’s not the full load test. That’s the canary in the coalmine. The full load test will still happen, just less often, against an isolated environment, controlled and with much more intent.
Production Monitoring Is Performance Testing
This is the part that tends to get left out of the conversation, possibly due to limited or zero production access (regulations/compliance). Hopefully not due to: “why do you need access for?”
Synthetic monitoring is a performance test. Real user monitoring is a performance test. A feature flag rollout with a latency gate is a performance test. All of those produce better signal than a staged load test, because they are measuring the real system serving real users in real conditions on real infrastructure. Really.
The pre-release load test tells you whether the release will survive the rollout. The production monitoring tells you whether it is actually surviving, or requires rollback. That is where most decisions should be coming from. If your team is running load tests religiously but nobody is watching p95 in production, the priorities are backward.
What QA Actually Owns Here
I’ll be honest – QA does not own performance tooling. QA does not own SRE dashboards. QA does not own the production metrics pipeline. And that’s mostly fine ‘cos performance testing is something of a specialisation.
What QA does own is the question and whether proper actions have been built to deal with it, based on risk. “What does too slow mean to this user, and how will we know?” That question should show up in sprint refinement via acceptance criteria. It should show up in release readiness reviews. It should show up in exploratory test sessions. It should be set by the business as their line in the sand that QA maintains. QA is often the only one in the team who will keep asking these things out loud.
The teams that do performance well have a QA team who has made questioning a habit. A sprint team who have had discussions and understand the risk involved in avoiding that question. A business who have made that question part of their development processes. The ones that do it badly treat performance as “someone else’s problem”, refusing to consider proper solutions to help themselves. At least until customer support starts getting complaints. And by then, it’s a problem dragging in everyone.

The One-Page Performance Brief
A useful exercise for any change/feature with performance implications – one page, four things.
- Journey: Name the user journey you care about. Not the endpoint, the impact is on the user’s journey.
- SLO: What threshold, at what percentile, defines acceptable. Agreed by product, dev, and QA in the same room.
- Method: How will we test this before release. What will we watch in production.
- Gate: What result means “do not ship”? What result means “ship with eyes on”?
If you can fill this out in 30 minutes, you have a better performance strategy than most teams. It is not a full blown plan by any means. But it is a shared understanding, which is worth a lot. This can also become part of any Jira ticket.
I understand you may have a team looking after performance testing and some of these decisions need to go through them and/or your DevOps Manager. Work together to find a path that works for you, so you are working together and not against each other.
The Take
Performance testing done well is not just a report. It is a conversation that the test results feed into, supported by shift-left processes the team actually follow, to avoid repeat issues or surprises.
Start with the user journey. Define the SLO with the whole team in the room. Pick your method based on what will actually change a decision. Watch the production metrics, because that is where the real world signal is. And let QA own the question of what “too slow” means, even when QA does not own the tooling or overarching decisions. Work as a whole (business) to share the understanding, the decisions, the effort, and the outcomes.
The PDF report in the SharePoint folder is not the goal. The goal is that when someone asks “is this release going to be slower,” the team already has the answer.
A Note on Context
Every business and every project is different. What works in one place won’t work in another, and that’s the point.
Nothing here is meant to be a step-by-step prescription. It’s guidance, drawn from my own experiences, and deliberately kept general to avoid pointing fingers anywhere.
Take what’s useful, ignore what isn’t, and adapt it to your own context. Or as Joe Colantonio always says: “Test everything and keep the good.”

