Building a Model for Quality Maturity.

Craig Risi
Sep 14, 2021
6 min read

Updated: Sep 15, 2021

Software quality is something that most development teams will aspire to achieve when they release software to production. Producing high quality software is not something that is easily achievable though and takes a lot of dedicated effort and planning by a team to get right.

Effort and strategies which I have shared various thoughts to on this blog. However, just because these best practices exist doesn’t mean teams follow them and often bad habits and cultures can easily infiltrate teams that cause them to compromise on good quality practices and leave teams struggling to deliver the standard of software that they intended.

This is a difficult problem to solve, especially for big companies with many cross-functional teams where it’s difficult to retain core oversight in how they function. This is why things like Maturity Models exist, allowing companies to provide some level of audit against software development teams to measure how well teams are adhering to certain practices and try and incentivize them to adopt them in scoring higher.

The problem though with most maturity models is that they often go out of date very quickly, with best practices defined several years ago, not always a good idea in working with some modern architectures and development processes. Secondly, having teams adhere to certain processes can simply create a tick-boxing exercise where teams end up just adding processes for the sake of it, without fully grasping its purpose, creating unnecessary admin and slowing down their overall delivery as a result ,with no actual quality gain.

This is why, outside of companies measuring teams more effectively through a range of quality metrics internally, it’s also important for these maturity models to have some form of enforced metrics to better assess not just if teams are following on the right processes, but actually delivering on them too. And while there is no one size fits all solution here, below is a model that can help teams include aspects of both process control and successful execution in adhering to it. It can be quite complicated to implement based on the tools utilized and the maturity of a company, but where these gaps exist, will hopefully encourage these teams to rethink about these issues and start utilizing tools that better help them to measure these things.

Similarly, whereas some areas I define measurement criteria quite well, others are a little lose largely because there may be vast differences in how they are measured across tools and different architectures. In these cases, I would encourage you to look to understand the core principles of what is trying to be measured in the metrics and see how it can be best achieved given a team’s current tools and approach, without compromising on any of the core measurement criteria which are all important.

The core behaviours that you want to drive through a model like this are:

Quality is the responsibility of every member of the team and all should be playing their part to delivering high-quality software.
Software testing should be automated as close to the code as possible, with a focus on unit testing.
The appropriate automation coverage is achieved at different levels of the software
Defects are tracked and resolved with the appropriate root causes identified
Overall defect leakage is low for the team
Performance and security measures are in place and showing results

To do this, the model assess aspects of how a team collaborates together on quality, the tooling in place, whether the reporting of quality is correct and the overall processes followed with regards to testing and quality related issues, along with some NFR specifics around performance and security.

Specific focus areas and metrics to be assessed across these measures include:

Early involvement of testing
Test Driven Development where tests are identified and unit test scripted before development has started
Code Coverage for tests
The prescribed testing framework and tools are being utilised
Automation coverage across the appropriate levels (Unit, Component API, Front-end, Visual, End-to-end)
Pipelines in places for all testing levels and pipeline execution time of adequately quality
Security scanning in place - with a scoring to ensure teams adhere to it
The appropriate performance testing out in place and where
Measurement of software performance against benchmarks
Monitoring in place across both test environments and production
Defect leakage rate (what the team misses in their sprints)
Defect process (is the prescribed defect process adhered to with proper triage and RCA processes in place)
Sprint retros are taking place with issues and mitigation places appropriate documented to drive improvement

How the Scoring Works

The below model scores the following areas that are then added together, to provide a score out of 700.

The seven scoring areas are:

Code Coverage Scoring
Pipeline Scoring
E2E Test Coverage Scoring
Process Scoring
Defect Scoring
Performance Scoring
Security Scan Scoring

These scores will be collected from a combination of tooling to identify the coverage rates and defect calculations, along with a quarterly audit, where teams are scored on how well they are performing - with the incentive to improve their scores each time and face mitigation actions where scores are unsuitably low.

Code Coverage Scoring

Should be based on real code coverage for the application under test, across all measured tests

Pipeline Automation Scoring

0 – No pipeline automation

20 – Unit & integration tests are running after merge to the main branch

40 - Unit & integration tests are running on every pull request

60 – E2E tests run after a merge to the main branch

80 – Integration tests validate dependent services before merging to the main branch

100 – Full e2e tests running before merging to the main branch

E2E Test Coverage Scoring

0 – You do not verify the app is available

25 – You verify the app is available

50 – You verify the critical path scenarios

75 – You verify negative path scenarios

100 – You verify all paths scenarios

Process Scoring:

This is a measure of how well teams are adhering to certain testing processes. The processes identified here are good ones to have, but these will vary based on teams and culture and so should be adjusted to suit the needs of the company.

Defect Scoring - Defect Leakage

While there is more to tracking defects than pure defect leakage, we will be using this below metric to assess the defect leakage score as it is a measure of what teams are missing in their project cycle. The reason for using a formula that calculates defects both before and after a defect release cycle as it should help to factor in the size and complexity of a release, so while bigger releases will likely have more defects post release, the team should have identified more as well, making the metric more fair.

CD=No. of Defects raised after release

DR=No. of Defects raised before release

ID- No. of invalid Defects from Defects raised before release(Eg. Duplicate, Cannot Fix, error in test environment etc.)

*Formula to calculate Defect Leakage Defect Leakage= 100 - ((CD/(DR-ID))100).

We calculate the percentage of your defect leakage and then subtract that from 100 to provide a higher score for a better performance. Should a defect leakage actually be over 100 percent, it should create a negative number which is added to the remaining scores and lowers the total maturity model score.

Performance Benchmarks

Once benchmarks have been clearly established, performance of an application will be measured against the frequency of the performance runs along with the difference between the benchmark scores. The following metrics will form part of this benchmark measurement:

All of these scores should have a predetermined benchmark based on the needs of the functionality they support. The scores will then be determined based on the combined scores across all scenarios under test for the team:

Security Scans

Software scans need to be run against the code to ensure that security best practices are adhered to and that known vulnerabilities have been mitigated. Scoring for this depends on the chosen tools to measure the security of the code and this will need to be adjusted accordingly, though the following below score is advised when using tools like Black Duck/Checkmark which apart from just raising security risks to the development team, also provide a scoring system out of 10 (or 100), which will be used to measure how a team scores in this area. Note though, that security scan scores can easily be improved by removing scanning from certain modules, so the metrics will only count against the scanning of all developed code:

Scoring

Tool score out of 10: Security Scan score = security tool score from 100% coverage * 10
Tool score out of 100: Security Scan score = security tool score from 100% coverage
In the absence of a scoring system, you can leverage a percentage of failures versus the lines of code as a score to measure.

This might all sound like a complex set of calculations and measurements that need to be maintained. Especially with companies that have a high number of teams. However, most of these measures can be easily tracked through tooling and made visible to teams live with only the basic process audit requiring a deliberate manual effort to score – and even this can be tracked through the evidence provided from certain process management or sprint tools to make things even easier. Ultimately, you can still remove this measure as well if you feel these others scores will help drive the right testing behaviours.

It would be great to need a maturity model, especially in this DevOps world where the majority of quality should be controlled in a coding pipeline. However, the truth is that many teams find themselves in very different spaces depending on their history, architecture and purpose and so a maturity model is a way of helping teams move in the right direction without restricting their ability to still be productive.