Software delivery performance is no longer explained well by the old four-metric DORA model. Today, DevOps Research and Assessment at Google Cloud uses five metrics split into throughput and instability. That matters because teams need to track delivery speed and delivery risk at the same time.
Google-backed DORA shows that generative culture links to 30% higher organizational performance, and user-centricity links to 40% higher organizational performance. More than 80% of respondents say AI improves productivity, but software delivery performance still gets worse when stability breaks. This matters even more in cloud-native architectures such as microservices and container orchestration. DORA’s DevOps Research shows that speed and stability are linked, not opposed.
-
Software delivery performance means speed and stability.
-
DORA now uses 5 metrics, not 4.
-
The model splits delivery into throughput and instability.
-
One metric alone can mislead the team.
-
Good measurement starts with one service and clear event rules.
-
Culture, context, and AI all shape delivery performance.
What is software delivery performance, and why do DORA metrics define it today?
Software delivery performance means one simple thing. It shows whether a team can move changes to production fast and keep the product stable at the same time. At its core, software delivery performance reflects a team's ability to deliver software safely, quickly, and efficiently, directly impacting throughput and stability metrics. DORA is the model people use for this today because it no longer looks only at release speed. Google’s DevOps Research and Assessment (DORA) has played a key role in studying software delivery performance and identifying the critical metrics that distinguish high-performing teams. It looks at throughput and instability together. By 2024, the model had moved from 4 metrics to 5, after replacing MTTR with Failed Deployment Recovery Time in 2023 and adding deployment rework rate in 2024.
When I explain this to clients, I do not start with jargon. I start with the business problem. Software delivery performance tells you whether your roadmap can move forward without creating extra delay, extra cost, or extra cleanup in the next sprint. That matters to a CTO because slow delivery increases time to market, and unstable delivery increases technical debt. Effective software delivery performance is characterized by the ability to ship features quickly while maintaining stability, which is essential for user satisfaction and business success. In a real Scrum team, you see this during code review, in blocked pull requests, and in stories that stay open after the sprint should have ended. Google Cloud specifically focused on DevOps performance assessment through the DORA initiative, helping organizations benchmark and improve their delivery processes.
The confusing part is that a lot of content still talks about the old four key metrics. That is not wrong in a historical sense, but it is not the full picture anymore. The current five metric model is the better reference point because it shows both planned delivery and reactive work after production problems. Measuring software delivery performance using these metrics helps teams identify bottlenecks and drive continuous improvement. That extra layer matters in practice. A team can look busy in Jira and still lose capacity because unplanned fixes keep pushing roadmap work to the right. The same logic applies in custom software development, where progress only counts when working software reaches users without creating more repair work.
There is one more reason this topic matters so much in search and in buying decisions. DORA gives leaders a shared language for software delivery, delivery performance, and business outcomes. It helps separate useful key performance indicators from noisy activity metrics such as ticket volume or lines of code. Think about it this way: a team can produce a lot of code and still have weak software delivery performance metrics if lead time is long, recovery is slow, and changes keep failing in production. That is why this topic also connects naturally with human-centered design, because shipping faster has value only when the team is solving the right user problem.
When we talk to clients about delivery performance, we do not begin with a dashboard. We begin with one practical question. How long does a small change take to reach production without creating extra work for the next sprint?
Which DORA metrics make up the current five metric model for delivery performance?
Here’s the reality. The current DORA metrics are not five random numbers. They are five connected performance metrics that help a CTO measure software delivery performance in a way that matches real software delivery work. DORA groups them into 3 metrics for throughput and 2 metrics for instability, so you can see both delivery speed and delivery risk in one model. That matters when you need to decide whether the problem sits in flow, release quality, or recovery after production issues.
The 3 throughput metrics are deployment frequency, lead time for changes, and failed deployment recovery time. The 2 instability metrics are change failure rate and deployment rework rate. This split works because software delivery throughput answers how fast code moves, while instability answers how much damage that movement creates in production. In practice, this gives you a clearer read on the team's ability to deliver quality software efficiently and safely than classic devops metrics that only count activity and ignore operational friction inside CI/CD, code review, QA, and release.
Most leadership teams get into trouble when they try to optimize one number in isolation. DORA warns against the one metric trap, because speed without stability creates incident load, and stability without flow slows the roadmap. The useful view is not one KPI but a system of tension, where the core metrics show how the delivery process behaves under real sprint pressure. DORA metrics also help organizations benchmark their teams, distinguishing elite performers from low performers—teams with slower development cycles and higher failure rates. Research shows that elite performers who excel in DORA metrics are twice as likely to meet organizational performance targets. That is why these DORA metrics matter for organizational performance too. They help you measure performance across the whole delivery path instead of rewarding one silo for looking good on paper.
How do deployment frequency, lead time for changes, and failed deployment recovery time describe software delivery throughput?
Software delivery throughput is about movement through the system. DORA measures that movement with deployment frequency, lead time for changes, and failed deployment recovery time. These 3 metrics tell you how often the team ships, how long a code commit needs to reach production, and how quickly the team can restore service after a bad deployment. That combination is useful because a team can look fast in Jira and still move slowly through the real SDLC.
Deployment frequency is the simplest signal. It shows how many times the team can deploy code to production in a given period. Lead time for changes shows how long work takes from commit to production deployment, so it reveals waiting time in code reviews, test runs, approvals, and release queues. A CTO can read these 2 numbers together and see whether the team ships in small batches or stores up risk for one larger release. This sounds simple on paper. In a two week sprint, it shows up as smaller pull requests, shorter feedback loops, and fewer end of sprint surprises.
Failed deployment recovery time closes the loop. Shipping fast is not enough when the team needs half a day to fix production after a broken release. A team with fast deployment frequency and weak recovery does not have strong throughput. It has fragile throughput. That is the part many software teams miss. Real throughput includes the recovery loop, because every hour spent restoring service pulls senior people out of roadmap work and into firefighting.
How do change failure rate and deployment rework rate expose instability in devops performance?
Instability is the part of devops performance that shows what happens after code hits production. Change failure rate tells you how many deployments create production failures requiring immediate intervention, and deployment rework rate tells you how much deployment work happens only because production already went wrong. These 2 metrics expose reactive work that burns sprint capacity, support time, and trust in the release process. Think about it this way: two planned releases and three hotfixes still look like five deployments in a dashboard, but the delivery system tells a very different story.
Change failure rate shows direct release damage. Deployment rework rate shows the extra operational load that follows that damage. Together, they tell a CTO whether the team is delivering value or spending too much energy cleaning up after itself. This is also where siloed ownership becomes expensive. One team pushes the release, another team handles the incident, and the product team loses time that was meant for planned work. Organizations using peer reviews rather than Change Approval Boards (CABs) tend to perform better in software delivery.
Try our developers.
Free for 2 weeks.
No risk. Just results. Get a feel for our process, speed, and quality — work with our developers for a trial sprint and see why global companies choose Selleo.
How should devops teams measure software delivery performance without gaming the numbers?
Start with one application or one service. Then define the events. DORA ties good measurement to a single service boundary, and a 30, 60, or 90 day baseline gives you a usable frame before comparisons begin.
Here’s how I explain it to clients at Selleo. If you mix five products, two teams, and three release paths into one chart, the chart stops helping. The safest way to measure software delivery performance is to pick one application or service and keep the scope stable until the numbers start to mean something. That gives you a clean view of the software delivery process, the delivery process, and the development process. It also makes staff augmentation easier to evaluate, because you can see whether extra hands improve flow or only create more coordination work.
- Choose one application or service as the unit of measurement.
- Define change, deployment, incident, recovery, and rework in plain language.
- Connect repo, CI/CD, and incident tooling before you compare teams.
- Build a 30, 60, or 90 day baseline before you start judging improvement efforts.
The next step is simple, but teams skip it. They open a dashboard before they agree on what a deployment or incident actually is. If one team counts a staging push as a production deployment and another team does not, your actionable data is already broken. In practice, this is where GitHub, GitLab, CI/CD, automated tests, quality assurance, feature flags, and code reviews all need the same event taxonomy. I see this mistake when a software outsourcing company enters the project and each side brings its own labels, workflow rules, and reporting habits.
If the team cannot explain what counts as a deployment in one sentence, the dashboard will confuse the client. We always fix the language first. Then we measure the numbers.
Measurement dictionary for software delivery performance
Now comes the part that protects the team from gaming the numbers. Metrics are useful when they help you identify bottlenecks, not when they turn into targets for personal ranking. The moment deployment frequency becomes a goal on its own, people start slicing work into tiny pushes that look good in reports and change nothing in real process performance. That is why I discuss in house vs outsourcing software development through shared ownership, shared definitions, and shared recovery rules, not through one chart in a weekly status meeting. Four Keys is still a practical pattern here, because it shows how to normalize changes, deployments, incidents, and recovery signals from different tools before you implement DORA metrics in a serious way.
How do deployment frequency, lead time, and change failure rate work together in real delivery performance?
Here’s how I explain it to clients. Deployment frequency, lead time, and change failure rate are useful only when you read them together. That combination shows whether software delivery throughput is healthy or whether the team is just moving risk around the system. A CTO does not need three separate charts. A CTO needs one clear picture of the team’s ability to deploy code, protect code quality, and keep system reliability under control.
A high deployment frequency can be a very good sign. It can also hide a weak delivery process. Low lead time looks strong on paper too, but it loses value when change failure rate is high and production keeps pushing work back into the sprint. High performing teams improve speed and stability at the same time, while low performing teams tend to improve one number and break another. That is why the 2021 DORA benchmark still matters. Elite teams reported on demand deploys, lead time under 1 hour, and change failure rate between 0% and 15%. The benchmark is older, but it still helps because it gives a hard reference point for what strong devops performance looked like before the newer seven team archetypes added more diagnosis.
This is where pattern diagnosis becomes useful. High deployment frequency and high change failure rate mean release chaos. Low deployment frequency and low change failure rate can mean over control, oversized batch size, long code review latency, or weak continuous improvement. Two software teams can show similar lead time and still need two different fixes, because one is blocked by testing coverage and the other by approval delay. I see this when a team moves from an interactive prototype into full delivery and keeps informal habits that do not scale. I also see it in software development services for enterprise, where approval chains can slow lead time long before anyone notices the business impact.
When do DORA metrics stop being enough, and how do Flow Metrics, SPACE, and AI tools change devops performance?
Treat AI tools as amplifiers because they speed up what already exists in the system. Investing in CI/CD pipelines and automating testing and deployment processes can significantly enhance software delivery performance by enabling teams to ship faster and with fewer errors. High-performing organizations also actively manage technical debt to prevent it from affecting future development. As part of a DevSecOps approach, automated scanning for vulnerabilities and compliance-as-code are integrated into the pipeline, ensuring secure software delivery. DevSecOps involves embedding security into the development pipeline itself, rather than auditing it only at the end.
AI is where many teams get confused. Individual developers feel faster, but the release system does not always get healthier. That gap is the difference between local productivity and system productivity. In practice, AI-assisted software development can create larger pull requests, slower code reviews, and more pressure on testing coverage when the process is already weak. The hard truth is simple: AI tools do not repair poor ownership, poor communication, or weak process performance inside complex systems. They expose those problems faster.
How does Selleo improve software delivery performance for engineering teams in practice?
From my side at Selleo, better software delivery performance starts when the team stops treating release as one big event. We improve delivery performance by making changes smaller, ownership clearer, and feedback loops shorter. That is the practical version of what DORA points to when it connects fast code reviews, continuous integration, and loosely coupled teams with better software delivery and operational performance. In a real development team, this means smaller pull requests, visible QA and DevOps handoffs, and fewer surprises at the end of a sprint. The same rule applies in Ruby On Rails development, in custom LMS software, and in HR management software, even though each product has a different release rhythm.
The second part is context. We do not treat delivery as one generic pipeline, because a compliance flow, an EdTech module, and a health communication feature do not carry the same risk. We get better results when one service has one clear owner, one review cadence, and one shared release signal for product, engineering, QA, and DevOps. That fits DORA’s point that metrics work best one application or service at a time, and it also explains why user centricity links with 40% higher organizational performance. In practice, that means discovery before throughput, domain context before automation, and service ownership before more reporting. You can see that difference in Case Study Selleo: Finpay, Case Study Selleo: ClickAula, and Case Study Selleo: Catalyst, where release visibility, code quality, and review depth depend on the product context, not on one fixed template.
- small batches before hero releases
- explicit review ownership instead of hidden queues
- production visibility shared by dev, QA, and product
- domain context first, automation second
Allowing teams to experiment fosters innovation and ownership within high-performing cultures. A culture of learning and experimentation treats failures as learning opportunities rather than assignments of blame. High-performing cultures also prioritize trust, information flow, and conduct blameless post-mortems according to the Westrum model.
How do continuous deployment, deployment rework rate, and AI tools affect time to restore for devops teams?
When I explain this to clients, I start with one simple point. Delivery gets better when the team works in smaller pieces and sees the same release picture. At Selleo, we improve software delivery performance by reducing batch size, making ownership clear, and shortening feedback loops inside the daily work of the development team. That changes the real flow of work. Pull requests get smaller, code reviews move faster, QA handoffs are clearer, and release visibility improves for product, engineering, and DevOps. The same approach works in Ruby On Rails development, in custom LMS software, and in HR management software, even though each product has a different pace and a different risk profile.
The next part is context. We do not push every product through one generic pipeline, because a compliance feature, an EdTech flow, and a health communication module create different delivery pressure. We get better results when one service has one clear owner, one review cadence, and one release signal that the whole engineering team can trust. That is where delivery ownership starts to matter. In a two week sprint, this means less waiting, fewer hidden queues, and less confusion about who decides what is ready for production. It also improves code quality, because the team stops guessing and starts working inside one visible release path.
- small batches before hero releases
- explicit review ownership instead of hidden queues
- production visibility shared by dev, QA, and product
- domain context first, automation second
This is also why discovery comes before throughput in our work. We do not want teams to ship faster in the wrong direction. Context first gives the team a better delivery process, because the release rhythm, QA depth, and DevOps decisions start matching the real product risk. You can see that clearly in Case Study Selleo: Finpay, Case Study Selleo: ClickAula, and Case Study Selleo: Catalyst. Each product needs a different level of release visibility, but the rule stays the same: one service, one owner, one trusted flow. That is what keeps continuous deployment useful instead of stressful for engineering teams.
Not exactly. The old model used four key performance indicators, but the current model uses five. In plain terms, devops research and assessment moved from a simpler view to a fuller one. That helps a CTO see both planned delivery and reactive work.
Not by itself. A team can deploy multiple times and still create cleanup work after release. I look at deployment frequency together with change failure rate and time to restore service. That shows whether speed is real or whether the team is just pushing risk forward.
Start with one service and one shared definition of deployment, incident, recovery, and rework. That gives you a clean base for assessing devops performance. Then check where work slows down in code reviews, CI, QA, and release. That is where process improvements usually start.
Look for one thing first. Are you measuring the real delivery path from code commit to production and recovery, or only reporting activity. Good evaluating process performance means using signals that show bottlenecks, rework, and recovery. It does not mean counting tickets or commits.
It means the system is moving fast but not safely. I see this when teams reduce lead time but ignore testing coverage, release ownership, or rollback discipline. The result is simple. Frequently deployments fail, and the next sprint pays for it.
They need to separate local speed from system performance. AI tools can help one developer write more code, but they can also overload reviews and increase rework. That is why dora research and broader devops research keep pointing to the same lesson. Better output is not the same as better software delivery.