Quality Insight

Written by Dilpesh Bhhesania (IBM), and Richard Jordan (Nationwide Building Society).

Quality is not an action; it’s a habit. Aristotle said this to be true today in relation to a Product or Process. Quality is the foundation of all that an organisation strives for. Although quality can be subjective, having a way to measure the quality position is important. There are many reasons teams might choose to measure quality in the context of product delivery or application. Some of these are discussed here.

Traditional Waterfall delivery required that the product’s perceived quality was verified by a separate Testing group. This team would then design and execute a series test to verify the solution meets the organisation’s requirements. This approach raises the question of who is ultimately responsible for quality. However, the link here is the type of metrics that would have been used in order to evaluate the quality of a product or process.

Traditional delivery is judged by the number of defects found, either in totality or at a rate that they are detected. This should not be confused with Defect Density, which is a separate metric. The total number raised of defects is simply a numerical value. The lower the number, the less quality is perceived.

There are other areas that measure quality:

  • Test Status – Passed/Failed/Blocked/No Run
  • Test Schedule – Progress against Plan
  • Planned vs. Actual

These metrics were presented in different flavours, dividing them between platforms, applications, teams, etc. Besides calling out top blockers, RAG states, and the path to green, Areas that are more mature in reporting could include metrics such the total effort and/or costs associated with testing, feature stability, etc.

These metrics were useful for showing progress in testing, but they did not show the quality of the product or processes. It also failed to provide a consistent, unified view of quality throughout the entire delivery lifecycle.

Another controversial view is that in traditional delivery structures, teams would produce metrics because this is what the Management told them to produce. Project managers would prefer to see numbers that were focused on the number of tests, with the view that more tests equals better quality and progress. It was difficult to show the value of Testing and, more importantly, Quality when metrics were not properly interpreted.

There is a common misconception that agile methods are automatically associated with more collaboration. This is not true, especially if an Organisation’s culture or mindset is traditional.

Quality is not confined to one area. This article aims to highlight some measures that can be used to show the quality of the product and embed it throughout the lifecycle. These metrics encourage engineers to take a Quality first approach, to actively look for areas where improvements can be made or to make efficiencies. Rather than defaulting to the Test function to use these metrics to show progress, they encourage them to adopt a “Quality first” approach to engineering.

It will also identify efficiency measures that can be used to complement the overall reporting structure. However, the key is to help people shift to the mindset that metrics are meant to provide valuable insight into products, processes, and teams. Blindly producing “throwaway” metrics simply because it’s been done that way is a hindrance in an organisation’s Quality transformation journey.

Why should teams measure – Purpose

There are many reasons teams might use certain metrics to track progress, efficiency, and quality. Take, for example:

  • Evaluate Product Health – A quick visual and/or numerical indicator of the “healthiness” of your product can help you determine where to focus to correct or maintain your current trajectory.
  • You can use these to determine progress and understand when a goal or story will be achieved. This also shows any obstacles a team might be facing that could hinder progress and ultimately affect the quality of a product.
  • Requirements Coverage & Traceability – Quickly determine where there are gaps in coverage and areas of risk
  • Uncover Inefficiencies – This can come from many perspectives, including code rework and value realization, time to market, time to market, etc. It is easier to identify areas that need remediation by having a set metrics that can show where inefficiencies exist throughout the product’s lifecycle.
  • Articulate Product Complexity-You might ask, “Why should I care about Product Complexity?” Many organisations wouldn’t consider this a quality indicator or measure it. Poor product management can lead to poor quality products, such as bloated features, inaccuracy or poor design. This will not only lead to more rework and higher costs, but it will also reduce value. McCabe’s Cyclomatic Complexity helps you understand how complex a product might be. Complexity doesn’t necessarily mean that a product has to be complex. This metric can be used to help clarify unit test coverage.
  • Adherence to Acceptance Criteria/Definition of Done
  • Articulation of risk / Problem areas – Metrics can give valuable insight into the risks that may be present in both the product and the process. It is rare to see metrics that help to define risk positions, which can be used in other areas.
  • You can create feedback loops by increasing the frequency of issuing metrics. This can either create more feedback loop instances or create one that has never existed.
  • Investment Payoffs – Quantify the impact of certain strategies or investments on the quality of a product, or how a team’s performance.
  • Automation effectiveness – This will help you determine whether your Automation strategy is helping to deliver your product(s), without compromising on quality.
  • Accountability – Quality is everyone’s problem. Everyone should be held accountable. Metrics are not enough.
  • Speed to market / Product release – End users want functionality quickly. This is a constant demand in today’s rapidly changing technology landscape. Having a look at how quickly features are released can give insight into whether your customer base is satisfied.
  • Trend Analysis/Continuous Improvement – Determine quickly if improvements are paying off, or if quality has dropped. This view allows you to make earlier corrections and remediations.

DevOps Research & Assessment (DORA), Programme

The DORA was a research program that lasted seven years and looked at data points from more than 32,000 professionals worldwide in the IT industry. In 2018, Google acquired it. Google acquired it in 2018.

The DORA published “Accelerate”, a book in 2018. In it, the team identified a core group of metrics that could be used to show the capabilities of software developers and their delivery. These 4 metrics are designed to help teams make informed decisions based on data. They can improve their delivery rate, work practices, and product reliability. It was found that the top performers will strive to improve on these four metrics.

  • Change Time This measures how long it takes for a team to have code running in production after it has been committed in development environments. This can indicate how mature or equal the delivery process, and also highlight inefficiencies. Lack of CI/CD pipelines and shared environments, isolated testing and development teams, and cumbersome Route to Live processes can all lead to longer lead times. Team members who aim to achieve “elite” status can have their code running in Production within a day. Others may have only monthly, quarterly, or even half-yearly deployments. This increases the risk that you are trying to deliver too many features, which could lead to poor user experience or system outages.
  • Deployment Frequency This shows how often changes are made to Production. This can be used at any level, i.e. This can be used at any level of the organization. However, if there are teams with greater maturity in DevOps skills then these metrics can also be applied locally. In both cases, the underlying insight is still valid. Teams strive to make production changes that are smaller and more frequent, which will help speed up the realization of benefits for end users. This has the added benefit of reducing the chance of regression or production failures. Teams that strive for elite status can deploy almost immediately, while teams that have less maturity in DevOps may have only monthly, quarterly, or even half-yearly releases. It is easy to calculate the deployment frequency and use this information to help determine if there are any other impediments that may be preventing faster and more frequent production releases.
  • Change failure rate This shows how many times a change is pushed into production and results in a failure. It could be complete outage, poor service or inaccessibility. It’s when a change enters Production and requires a fix. A team can measure the Change Failure Rate to assess its maturity in its deployment process. This could be either manual or integrated. This can be used to highlight quality issues that may have been overlooked earlier in the Product’s lifecycle. Quality features should always be the priority. Quantity. The organisation can set targets for failure rates based on risk appetite, but teams should aim to achieve near-zero failure rates. Near-zero failure rates will be achieved by more mature and “elite” teams due to their team culture, as well as the other metrics discussed in this article.
  • Mean time to recovery (MTTR) This measures the reliability of your applications and systems. It measures the time it takes for an organization to recover after an incident or outage. Failures are inevitable, but it is important to be able to recover from them. While teams that aim for elite status can recover in minutes or hours, others may need to take weeks or months to recover. This information can provide valuable insight into other areas, such as whether an organisation has adequate alerting and monitoring capabilities, or if the number and size of Production deployments need to be reviewed. Iterative, smaller releases are better for users and reduces the chance of production failures.

These DORA metrics are used to show that there is no one way to measure quality, stability, or organisational maturity. However, they can be a useful starting point for organisations and teams when combined with other metrics.

What should teams measure – Quality & Efficiency Indicators

The table below outlines some of the metrics that can help to show the efficiency and quality of a product, team or organisation. The links are to one or more DORA metrics. Some metrics cannot or should link back to DORA. This is to show that four metrics will not provide comprehensive insight. These metrics are not intended to be prescriptive, but rather to shift the mindset from “We have to provide metrics because that’s what’s being asked” to “What are these metrics actually telling us about my organisation’s position on quality?”

Metric/IndicatorTypeDescription/Primary ObjectiveCalculationAdditional InformationDORA Link
Manual vs Automated TestingEfficiencyThis metric is used to help understand the split between manual and automated testing. It also helps to track the progress towards higher levels of automation. Automation with higher levels generally results in greater efficiency. This is not just because it has a shorter run time, but also because it allows people to spend more time on other tasks.Automated Test Coverage = (Total Number. Number of Automated Tests/Total No. (Tests) * 100This metric can be used to supplement the existing position of maintaining a number of manual test by helping to articulate why certain elements should never be automated, e.g., higher complexity, limited rate or execution, low rate change etc.Change Lead TimeFrequency of deployment
Test Suite Run / Execution TimeEfficiencyThis will help you understand the time it takes to execute a set test. These tests can be either part of a DevOps/CICD process or stand-alone. It is a great way to show how fast value can be shown when paired with the Automation measure.Run/Execution time = End Time – Begin TimeAlthough the time factor may not appear to be of any value on its own, it does start to raise questions like:How sharp is the feedback loopIs the time it takes to run a Test Suite too long compared to the amount of change? Are I running too much or too little?Is it possible to identify areas for verification based upon the code?Change Lead Time
Frequency of deployment
Automation Testing StabilityEfficiencyThis can be extremely useful in determining the stability of automation collateral. This can be a sign that the products are not being tested are failing frequently or require constant refactoring. Stability should show an upward trend over time.Automation Test Stability = Total No. Number of Failures/Total No. (Executions) * 100This should be done per test in order to avoid skewing the overall view of stability. For example, only a small subset of tests might be problematic.Change Lead Time
Sprint Automation vs. Out of Sprint (Value Reisation)EfficiencyThis indicator can help you see how much value was realised within the sprint by using automation. All new features that are delivered during the sprint should be paired with automation test assets. This will make integration into CI/CD pipelines more efficient.Sprint Automation = (Total Number. Number of Sprint Tests Created / Total No. (Number of Tests Created in Sprint) * 100Change Lead Time
Defect DensityQualitativeThis will give you a clear indication of any potential problems within the product. You can calculate the Defect Density for an entire product or you can break it down into smaller areas like modules, technology types, etc. This is another useful feedback tool for sprint teams to determine where additional focus might be required. The Defect Density is usually measured in KLOC (1000 lines of code).Total No. Number of Defects/Total Code Lines) = 1000This can be used to show why a team cannot achieve a certain velocity. A higher number of defects will hinder progress.Change Failure Rate
StabilityQualitativeIt is possible to demonstrate build stability and gain insight into the code being produced. This helps to identify any impediments within the sprint, and to determine where to focus to eliminate them completely or reduce the pressures. High numbers of build failures could indicate an issue in the DevOps toolchain, CI/CD pipeline, or other related issues. This metric can only be collected if teams use a DevOps Toolchain.Building Stability = (Total Number. Building Failures / Total Number. (Number of Builts)This can also be linked to the stability of the code. Unit Test Coverage, Code Churn etc. These features will be further developed.Frequency of deployment
Unit Testing CoverageQuality and CompletenessThis is a great way to understand how much of the source code was tested. You can do it using a variety of methods, such as:Statement coverage – This refers to the number of statements that were tested.Branch Coverage – This is the total number of Control Structures tested.Line Coverage – This indicates how many lines have been code testedStatement Coverage = (Total Number. Number of Executed Statements/ Total No. (Statements) * 100Branch Coverage = (Total Number. Number of Executed Branch / Total No. (number of branches)Line Coverage = Total No. Number of Executed Lines/Total No. Number of Executed Lines / Total No.A robust set of Unit Tests is a great way to get an early indication of product quality issues. It can also show the breadth of coverage for the product being tested. The greater the coverage in these early stages, ultimately the better your chances of finding material problems.Change Lead TimeFrequency of deployment
Code ChurnQualitativeThis is fundamentally “Rework”. Measuring the code churn level can help identify problematic areas and reduce time/effort spent on making changes. Although it is common for code to undergo constant change, it could indicate poor quality. It is quite common for code churn levels in products to fluctuate throughout their lifecycle. They can range from extremely high during initial development to very low and stable after product release.Total Code Churn = Lines of Code added + Lines of Code deleted + Lines of Code modifiedThis can give you an indication of the potential defects that could be expected once the code is released to production. Demonstrating a higher level of code churn as a sprint/release/production deadline approaches could be a warning signal of fault-prone code.Change Lead TimeFrequency of deployment
Performance MonitoringQuality and CoverageIn today’s world, performance is crucial. Having a way of monitoring this is important especially when user demand is growing. It is important to be able to see where there are problems such as memory leaks, poor SLA compliance, and non-compliant Garbage collection. This will allow you to target the areas that need attention. This can be tracked release by release to identify trends and correct any issues.There are many low-level calculations you can make from a Performance perspective. Here are some areas that could be considered for assessing the performance quality of products in the early stages.* Conformance with architectural practices* Connection pooling Vs static connections* Memory management* Technology aligned practice adherence, e.g. Object-Oriented* Constant benchmarking against SLAs agreedTo determine if there is any degradation, it can be tracked trends in the system or application characteristics. This could include:* Load Analysis* Stress Profiles* Soak Analysis / EnduranceMTTR
ReliabilityQuality and CoverageReliability is a growing priority. Both organisations and end users want reliable applications and systems. It is extremely useful to have a method to measure this to identify potential flaws. Reliability is a strong indicator of resilience. It can be used to measure the reliability of products or applications and provide valuable insight into potential resiliency problems.Rolling metrics can be applied to any time period or interval.Mean Time Between Failures = Total Operational time / Total No. Number of Failures (Rolling Metric).Average Failure Rate = Total Production Losses / Total No. Number of components deployed * 100Mean Time to Repair = Total Repairs Time / Total No. Repairs (Rolling Metric).Mean Time to Recover = Total downtime / Total no. Number of Incidents (Rolling Metric).You can also use reliability metrics to assess the effectiveness of your Incident Management processes and support systems. If the average time it takes to resolve an issue grows, this could be a sign that there are problems with your processes.Software and code complexity are closely linked to reliability. Higher complexity codes will generally lead to reliability problems.Change Failure RateMTTR
SecurityQuality and CoverageIt is crucial to know the security status of your code. Code should be capable of resisting attacks. According to an IBM report, data breaches cost $4.24 Million in 2021. This is why security is so important when producing high quality products . It is not enough to assume security is paramount. Therefore, having the ability track threat resistance trends and code scan output trends or applications that meet compliance requirements gives you a picture of the level of risk that you are taking.
Rolling metrics can be applied to any time period or interval.
Application Infiltration Rate = Total No. Number of Infiltrated Apps / Total No. (Applications) * 100Security Defect Density = Total No. Security Defects/Total Code Lines) = 1000Vulnerability creation rate = Total. No Vulnerabilities Created. (Rolling Metric).Vulnerability Remediation Ratio = Total No. Number of Vulnerabilities That Have Been Remediated (Rolling Metric).Vulnerability Growth Rat = Vulnerability Creation rate – Vulnerability Remedies RateThis could help you to understand the types of practices or tools that should be used during the product’s lifecycle. Examples include Dynamic Security Test tools (DAST), or Static Application Security Test. It can identify vulnerabilities in coding standards and code libraries, or highlight concerns about capability.MTTR
MaintainabilityQualitativeIt’s all about how easy it is to update code with regard to improving, removing redundancies and addressing problems. The ability to maintain a product’s code is directly linked to its success. It is a continuous activity to address new demands, make code more efficient, fix security vulnerabilities, and optimize performance. It’s also important to know how code can be maintained and, most importantly, how quickly it can become operational again after an outage. To determine how efficient this can be done, and to highlight areas for improvement, a metric like the “Mean Time to Repair” can be used.Mean Time to Repair (MTTR = Total Time Spent for Repairs/ Total No. Number of RepairsThe Maintainability Index (MI) can be determined using a combination of metrics. The code’s maintainability is better if the Index is higher.171- 5.2 * (Halstead Volume), – 0.23 *(Cyclomatic Complexity), – 16.2* ln[Number of Statements] HTML3_To help determine the maintainability of code, there are additional measures. These measures include:* Degree of coupling* The Level of Cohesion* Duplicated code* Consistency in Naming ConventionMTTR

Where should metrics be made available to teams?

Any platform used by teams can make metrics available. There is no set of rules; the metrics should be consistent. It is not a good idea to have different metrics reported in Jira and Confluence. This doesn’t make sense, and doesn’t provide all stakeholders with the same information. It’s all about efficiency.

Platforms like Confluence, Jira and Test Management tools are all possible to be used. To demonstrate certain capabilities, it might be beneficial to create reports using the dev ops chain toolset if more is being done in CI/CD.

Access to information is more important than ever. It’s essential to have the right tools to help you make informed decisions. Stakeholders don’t want to have to search in many places for the information they require.

Who should use these metrics?

These reports should be automated to take in input from all tools used to support the activities within delivery functions. This could include design, build, testing, infrastructure, pipeline activities and build releases. These allow the reporting of the above metrics. These metrics can be owned, for instance:

  • Sprint Teams
  • Scrum Masters
  • Quality Engineers/SMEs/Leads
  • Technology is the Key to Success
  • Support Teams
  • Incident Management
  • Product owners

Reporting alone is not enough. These metrics must provide tangible and meaningful insights that allow teams to continuously improve their performance and to strive for faster, more agile and more iterative delivery. To recognize the areas that are performing well and to take responsibility for those areas that aren’t, it must be a coordinated effort from all parts of the delivery function.

When are these metrics to be collected?

There is no set rule regarding the frequency at which metrics should be collected. You can collect them as often as you feel necessary. This is because it allows for quick feedback and better quality. An automated reporting system would allow reports to be sent daily to various events, whether they are stand-ups or retrospectives. It is also useful in prioritizing backlogs and identifying key areas to focus on quality issues.

  • Daily Report generated automatically by hourly metric collection (depending on velocity, team construct, etc.).
  • Weekly Summary / End of Sprint Summary: Enabling trend analysis, continuous improvement
  • Monthly – Meetings with key business/stakeholders, etc.

These metrics are not intended to be used as a guideline and should not be interpreted as such. It is clear there is no single metric which will provide holistic or complete insight into the quality of software/product/code. These metrics are intended to be used as an example of how an organization can include certain elements in their reporting, and more importantly, to provide meaningful insight into the Quality journey. These metrics are dependent on the production and consumption of the individual teams, but it is important to agree on consistent generation. You might consider tying metric generation to key delivery milestones or aligning it with sprints. The frequency of metrics is flexible as long as there’s value from them.

Your organisation may want to begin capturing metrics. However, it can be overwhelming to try to measure all of them. It may also make it difficult to see the forest for the trees. You will be able to gain more insight and influence greater change by starting with a subset of metrics that are relevant to your stakeholders. Consider this: What would you choose if you were just beginning to report metrics?