Quantitative vs Qualitative Reporting (Part 2)

In my last post, I talked about how I presented a set of individual quantitative reports to some management figures to illustrate how they could be misleading and why it was necessary to understand the story behind the numbers rather than simply place value on the numbers themselves. After I had led them successfully up the garden path, I presented a low tech dashboard dashboard for the same hypothetical project (Project 007). Before doing that I asked them to put the quantitative reports to one side for the moment.

Low Tech Dashboard

After showing them the dashboard, the immediate reaction was positive.

‘Ah, this looks interesting’ they commented excitedly.

I could see their attention was immediately drawn to two things – Key Area and Comments. I guessed that if you’ve never seen low tech dashboards before (or any sort of dashboard for that matter), you attention is most likely going to be drawn to the elements that need little or no explanation. I could see their eyes flick between the extremities trying to understand the data.

“So what’s the initial gut feel about this? What jumps out at you?” I began
“Well, it’s good to see the breakdown of key areas” they replied
“And you see value in that, why?” I pressed
Almost immediately, they said “Well, it’s more informative!”
“It’s more useful to see the key areas!”
“It allows us to better assess any risks!”
“And what are the risks here?” I asked
“Well…”

“Some fields may require a bit of explaining” I interrupted

“Yes, what does Effort actually mean?”
“What about Coverage?”
“Don’t forget Quality!”

“Let’s start with the simplest” I proposed, “Does everyone understand the fields Key Area and Comments?”
Heads nodded in unison. “Yes, they’re self-explanatory”
“Ok, so let’s look at Quality first” I suggested

Qualitative Key Quality

“So just think of this as a traffic light status”, I added.
This seemed to make sense. No real questions.
“Ok, so let’s add another”

Qualitative Key Effort

“So what about a ‘Medium’ value so we have Low, Medium and High efforts”, they queried
“We could add that”, I replied “But the point of this field is to help you understand which key areas we’re focusing on at any point in time”
“So how do we know how much you’ve tested?” they added
Time for the coverage legend…

Qualitative Key Coverage

After a few nodding of heads and confirmatory noises the questions started to come:

“So if something is green does that mean you’ve tested it thoroughly?” they asked
“Not necessarily”, I replied

“…and does that mean that the coverage will be 3 if you’ve got an effort of High?” another followed
“Absolutely not”

“So what if something has an Effort of ‘None’ and Coverage of ‘2’?” they contested.
“That would depend” I countered “It could be that a key area merits inclusion in the dashboard but is tested implicitly as part of other key areas, but I’d make that clear in the comments. However, as a normal rule of thumb I’d try to avoid that by stating that anything included in the dashboard will have an explicit test effort associated with it”

“So there are no dependencies between effort, coverage & quality?”
“Not really, no.”

“Hmm, We like it”, they replied “But how do we know when you’re finished?”
“Define ‘finished’” I countered
“Well, when all your test cases are complete” came the response
“Not really, but remember what we’re trying to achieve with this” I said “We’re trying to take your focus away from numbers so you can understand the story behind them. We could easily overload the dashboard with many other bits and pieces of information, but the effectiveness of this particular dashboard lies in its simplicity”

“Compare this to the previous graphs showing all the numerical information” I continued “All you’re being presented with as part of the ‘traditional’ reports are numbers – 70% complete, 100 test cases, 134 defects raised, etc. but how does that really help you when it comes to making important release decisions?”

“We agree, but what about an amalgamation of the two – dashboard and graphs?” they queried

“That’s of course a possibility, but previous experience of distributing such reports has suggested that your attention would be drawn to the graphs and numbers FIRST, and the dashboard LAST. Like I said, I want to shift your attention away from the numbers and understand the story behind them.”

“To address your original question of ‘how do you know when I am finished’ – Well, I could test for as long as required and continue to gather new information about the product, but in terms of whether something is good to go or not, I would focus your attention on the ‘Effort’ and ‘Quality’ fields. If they are ‘Go Live’ and ‘Green’ respectively, then I’m suggesting that key area is ready for acceptance”

Our discussion went on but they admitted that the dashboard was a more effective way of communicating critical information.

Now, to put it into practice…… 🙂

Quantitative vs Qualitative Reporting (Part 1)

I used to work in structured environments for many years where in the field of (test) reporting, numbers were definitely king. The upper echelons of the management team demanded a quantitative style report and that’s what I delivered. I had suggested that an alternative style of reporting would be beneficial so that they could actually see and understand what was going on but they loved their graphs too much to let go of them.

While this was somewhat disappointing I never let the reporting style they demanded dictate my thought process when dealing with the unique set of circumstances and considerations that inevitably come with each and every project. In other words, for instance, I never concerned myself whether the progress graph inched another percentage point towards the exalted 100% each week or not, but rather focused on learning as much as we could within the time we had.

I’ve always had a firm belief that there’s always more than one way to do a job. Whatever worked for you at Company A would almost certainly not work at Company B as the context would be very different and I’ve prided myself as being very adaptable in that respect. My current role is at a comparatively smaller company where reporting (mercifully) isn’t governed by the digit. That said, management did ask me when I came on board to produce some reporting stats which they considered ‘normal’ – sadly, their definition of ‘normal’ turned out to be quantitative.

The management here however, are open to suggestion so with that in mind, I conducted a test on some management staff using different reporting styles, primarily quantitative and qualitative, and asked them for some honest feedback about what was really useful. I have the Rapid Software Testing course from last November to thank for so much inspiration in this area.

(Note to Shmuel Gershon – this was slightly different to the one I told you about earlier 🙂 )

I took a hypothetical project, Project 007 (Yesh, Mish Moneypenny!) and began providing them with snippets of what would constitute a potential quantitative report. We started with the progress graph:

Quantitative Test Progress

They felt comfortable with this graph.

“Ah, you’re making good progress I see!” they said “although you’ve stalled a bit. Anything we should be worried about?”
“Would you be less concerned if the last entry had advanced past 70%?” I queried
“Yes!” came the resolute reply
“Ok, so any progress is good”, I pressed
“Definitely”

So I decided to add another metric to the fray. [sarcasm] But not just any metric, oh no, this one is the daddy of (test) metrics, the top dog, the big kahuna. The metric that spawned a million high fives, popped champagne corks, and congratulation speeches on a job well done in that everything is now ‘fully’ tested. Ladies and gentlemen, I give you the test case status graph. [/sarcasm]

Quantitative Test Case Status

“Nice, looks like you’ll finish on time!” they quipped after a cursory glance. I must admit, this took me by surprise.
“How would you define ‘finished’”, I asked
“Well, the majority of test cases pass.” they replied. A few heads nodded sagely in agreement
“Did you check the legend at the top of the graph?” I asked
“No, wait….what does red mean….oh no, wait, this isn’t good at all. Oh dear!”
“Still think we’ll ‘finish’ on time? I pressed
“No chance!” came the resolute reply

The discussion then quickly turned towards defects – how many – what severity, so I added another metric to the ensemble. Defect arrival rate (which simply means the number of (project) defects that were created after each day).

Quantitative Defect Arrival Rate

There was around thirty seconds of synchronised head twisting, much like an audience watching a very rapid tennis match as they cross referenced the defect data with the test case data. It made me a little dizzy.

“Why weren’t there more defects being raised around the time the majority of the test cases were run?” they asked
(When asked for clarification they meant between the 7/12/10 and 15/12/10).
“I think you’ll find there are around the same number of defects raised in that time frame” I replied.
They considered this for a moment, “So you failed thirty test cases on a set of Minor defects?”
“Yes” I confirmed. “Would you rather I had passed the test cases despite the defects?”
“No, of course not” came the reply.
They looked puzzled “But what about all those Major and Critical defects created between the 1st and the 7th?”
“Ah, the majority of those were encountered while not actually running any test cases” I stated
They took a moment once again to consider this.
“You are fully testing this product, aren’t you?” that asked
“Can you define ‘fully tested’, please?” I countered
[Lots of mumbling…]
“Well, fully tested obviously means testing all the requirements” they agreed
“Ah, I suspected you may say that – All requirements are ‘checked’ as part of the test cases” I assured.
[More mumbling…]
“Would it help if I provided one more metric?” I queried
“Yes!”

Enter the cumulative defect count…….

Quantitative Cumulative Defects

[Even more mumbling…]
“This doesn’t help us” they retorted.
(I didn’t expect it would, but wanted to see their reaction)

Now obviously I was leading them up the garden path and I was deliberately playing devil’s advocate. Why? To illustrate to them that misuse of numbers as part of reporting can be dangerous and misleading. Although I’ve paraphrased a lot of the dialogue here, I don’t want to portray my management in a bad light, or for you to have a poor opinion of them. To their credit, once they saw the failure of this particular reporting style (in this particular context) they were open to alternatives but I wanted to dispel any expectations that numeric based reports would be a way forward for them.

Enter the ‘Low-Tech Dashboard’ (and their response) – coming in part 2 – watch this space.

Battling the Bias

There seems to be a particular word that crops up again and again in blogs, papers and presentations about software testing and beyond:

Bias

Honestly, it’s everywhere. Some examples:

Observational Bias (Darren McMillan, Requirements Analysis & Testing Traps)
Darren quite rightly points out the dangers of having visual references (wireframes) at a very early stage in the project lifecycle that could take your attention away from something more fundamental within the text of the requirements themselves.

Reporting Bias (Pete Houghton, Conspicuous in their Absence)
In this rather superbly written blog post, Pete highlights a way valuable information can be skewed to make a problem seem less severe (e.g. <1% of our customer base use *that* browser so can’t do XYZ). Can anyone claim that someone hasn’t tried to change their opinion by using an argument like this?

Survivorship Bias (Pete Houghton, Conspicuous in their Absence)
In the same post, Pete extends the examples of survivorship bias to advertising slogan you’ve probably seen dozens of times in a multitude of places.

Confirmation Bias (Michael Bolton, I wouldn’t have seen it if I hadn’t believed it: Confirmation Bias in Testing)
Perhaps this particular bias is the big daddy if all biases since it has so many variations – see slide number six. If you only looked at slide six then you may wonder how you could ever engage in any testing activity without suffering from one form of bias or another. Thankfully, Michael provides some really useful tips for escaping confirmation bias.

Anchoring Bias (Michael Kelly, Anchoring Bias)
In this rather odd piece, Michael talks about how he was struggling to come up with ideas about how to test a second iteration of software coming his way, but while walking up the stairs to a meeting he decided to simply sketch out a schematic of sorts and talk through his ideas (not necessarily solutions). There’s a heuristic to be found in there somewhere – “Talk it through with your mates” heuristic? I can appreciate this particular problem though and admit to being influenced by a little bit of anchoring bias in some projects many years ago.

Congruence Bias (Pete Houghton, The Arrogance of Regression Testing)
“We stop looking for problems that we don’t think are caused by the new changes.” claims Pete. How true. This has a particularly strong resonance as I believe it’s so easy for something like this to adjust our mindset if we’re not aware of it. Good post – go read.

Emotional Bias (Wikipedia)
I couldn’t find any specific examples of emotional bias for software testing, but it did prompt the question “How would your approach to testing change if you were asked to test something that you had a strong (adverse) emotional reaction to?” (Ignoring for now the obvious option of saying “I would find another job”). For example: a weapons targeting system, an adult entertainment website, etc. Perhaps this is more of an ethical bias?

And there are, many others such as Automation Bias, Assimilation Bias (nothing to do with the Borg, Star Trek fans), etc… So as you can see, there’s quite a few biases out there, and you may wonder how testers even get out of the starting blocks with so many possible ways for their judgement and work to be skewed.

That particular thought reminds me of a problem a well-known UK darts player had back in the late eighties. Eric Bristow was at the top of his game, had won the world title five times and was pretty much considered the Muhammad Ali of the darts world. However, he started having problems releasing the darts and was diagnosed as suffering from a condition known as Dartitis. The condition is believed to be psychologically tooted and this was the first time it had come to the public attention because Bristow was so well known.

I do wonder now whether it was some sort of bias that caused the condition. Taking into account the pressure of managing stress and expectations when playing at world events, could he possibly have been over-scrutinising one or many parts of his technique that somehow led to an unbalanced approach without him realising, and resulting in the physical manifestation of the condition?

In comparison, I wonder if there’s ever been a reported case of something called Testitis #1 – the psychologically rooted condition that stops testers dead in their tracks, unable to test for fear of any sort of bias impinging on their work. John Stevenson even suggests that bias can be infectious. So not only could testers be afraid to actually do any testing, they could also be unwilling to work in a team for fear of bias contamination? Haz-mat suits at the ready!

I’ve raised a lot of problems here and not really provided any remedies (but you can find excellent suggestions in the links I’ve provided above) but I’ll post another blog with examples of what I’ve done personally to combat bias at some point in the future.

#1 – Don’t Google Testitis, by the way – it doesn’t exist, and you’ll probably get some undesirable results 🙂

One Line Change

In my experience over the years, I’ve heard one phrase consistently used by developers across different organisations in order to try and mitigate risk associated with any one or more deliverables due for test team. These particular deliverables tend to be for specific issues such as a critical bug that needs to be addressed in order for testing to continue, or to satisfy a particular business need that has been requested at the 11th hour.

Naturally, there’s a time pressure aspect to all this, so unit & integration testing of the deliverables may have been fast tracked or even omitted altogether. However, the phase I hear again and again is:

“Don’t worry, Del. It’s only a one line change!”

At first, I was astonished that so many developers resort to the same simplistic method of assessing risk with their code changes, but then I’ve also seen the same principle employed on a wider scale to assess the risk of complete subsystem at end-of-project checkpoints. It’s one of many meaningless metrics presented to the stakeholders and labelled ‘code churn’. The bizarre thing about code churn is not just the metric itself (or indeed why people place so much importance on it), but more the binary nature of the stakeholder’s reaction (technical and non-technical, alike). The reactions I’ve witnessed are as follows:

Reaction 1: A subsystem with a code base of 5,000 lines has a churn of 500 lines (10%).
Stakeholders nod sagely, secure in the knowledge that any risk is minimal.
The figure is low.
This is good.

Reaction 2: A subsystem with a code base of 5,000 lines has a churn of 2,500 lines or more (>=50%).
Stakeholders collectively have a sharp intake of breath. They look at each other in distress.
The figure is high.
This is bad.

Even more absurdly, the tension from the second reaction is quashed by some equally flawed statistical mitigation from the subsystem owner. Rarely have I ever heard stakeholders say “well, let’s see what the testing of that subsystem revealed” and put their reaction on hold. Testing is after all, about “gathering information with the intention of informing a decision” (Jerry Weinberg).

Anyway, to get back to the developers and their one line change. I counter their statement in an attempt to adjust their thinking by suggesting the following scenarios:

Would a passenger be willing to fly in an aircraft where the maintenance engineer had told them
Don’t worry, the aircraft is only missing one bolt!”?

Would a heart bypass patient be comforted any more when the surgeon had told them
Don’t worry, I’m only going to make one incision!”?

Trying to equate risk with a numerical value of any kind is obviously foolhardy, yet the practice remains widespread throughout our industry. We must endeavour to educate developers and stakeholders alike with meaningful information gathered from testing so they can start to see the real risks and issues and break free of these numerical shackles.

Software Testing ShuHaRi

As a martial arts practitioner for over 25 years, there’s been a concept I’ve been aware of for a long time that could be easily be loosely applied to software testers making a transition into the world of exploratory testing from what we may refer to as ‘traditional’ testing (dare I say, ISTQB :-)).

This concept is known as Shu Ha Ri

ShuHaRi

And in very simple terms it deals with three stages of learning – Imitation, Understanding & Transcendence.

Shu

In martial arts terms, when you begin training you spend your time blindly copying your instructor, absorbing everything he/she imparts, eager to learn and willing to accept all correction and constructive criticism on offer. Shu stresses basics in an uncompromising fashion so you have a solid foundation for future learning. Most students at this stage perform techniques in identical fashion even though their personalities, body structure, age, and abilities all differ.

In testing terms, this may be indicative (at least for those of us a bit longer in the tooth) of how we were introduced into the world of software testing as newbies. It may even reflect how some people *still* teach software testing today (I therefore use the term in its loosest sense in this particular context). Pillars of flawed wisdom such as

– All knowledge about testing the system is embedded in your test cases.
– To appear to fully test coverage you have requirement-test case mappings.
– The more test cases you have, the better coverage you’re getting.

The above statements are obviously laughable, but in terms of your teachings, I think it’s probably safe to say they were imparted in good faith, and your mentor was trying to provide you with a prescribed approach (however misguided) to testing that would cater for most of the problems you would encounter in your testing career. I’m sure there are people out there who still use this flawed logic and feel they’re doing a pretty good job.

Ha

In martial arts terms, when you reach a certain level of proficiency (generally, Dan grade or above), you begin to break free of the fundamentals and apply the principles acquired from the practice of basics in new, freer, and more imaginative ways, as well as beginning to question and discover more through personal experience.

In testing terms, I’m sure we can all appreciate the situation where a project/product has gone live, a critical defect has been discovered that escaped the attentions to the test effort, and your manager has invariably asked “But we tested against all the requirements, how could this have happened?

It’s at this point when you realise that blindly following your prescribed approach wasn’t going to cut it (although one would hope that the proverbial penny would have dropped some time before) and you understand that your approach will need to be adapted (or better, revamped) to fit in with your current situation. This doesn’t have to be a solo journey either; it can be done in collaboration with your mentor as you seek new ideas and solutions.

Ri

In martial arts terms, this is when you reach a high degree of proficiency and have absorbed all you can learn from your teacher. You are now learning and progressing more through self-discovery than by instruction and can provide an outlet to your own creative impulses, and your techniques will be executed with facets from your character and personality.

In testing terms, you will be proficient at constructing and blending approaches to best suit your current context, largely be influenced by your experiences to date which will most likely outweigh the formal teachings handed down in the prescribed approach.

One may be tempted to use terms such as mastery here, but I believe transcendence is more appropriate – why? The term mastery traditionally suggests that there’s nothing else to learn and I firmly believe there’s always something new to learn. Besides, who’d ever want to be in that situation? Learning and the continual opportunity to learn is probably the single most invigorating facet of a career in software testing. The same is true in martial arts, even after 25 years I’m still learning new things as my understanding grows deeper.

I think it’s safe to say that our experiences largely dictate the kind of testers we become and the approaches we take, but we should all continually strive to better ourselves through sharing of experiences and active discussion (amongst other things) and with a thriving online community this is made so much easier.

So wherever you are on your journey, take comfort in the fact that there’s always something new to learn.