THE AGILE CULTURE: LEADING THROUGH TRUST AND OWNERSHIP (2014)
Appendix E. What to Do about Metrics
A Detailed Description
Now that we understand the characteristics of effective metrics, it is easy to understand how inappropriate metrics can act as walls to the team and good metrics can drive effective improvement. Poor metrics will stop teams from improving by forcing them to focus on activities that will not increase the effectiveness of their value delivery. Good metrics focus effort where it has real effect.
You can use metrics in a variety of areas—as long as they provide useful and actionable information and don’t create walls.
Some of the major areas of metrics include
The actual metrics you choose to use will, of course, be based on your actual situation.
You gather these during the delivery cycle, and the delivery team acts on them in as close to real time as possible. The shorter the cycles, the better. To have the most positive effect, they need to be well aligned and have integrity.
Teams and individuals will align their priorities to the organization’s behaviors. If the organization stresses schedule, then quality will certainly be compromised. If the organization values process adherence, then that will be more important than customer and business value delivered.
Please again consider the cost to gather and analyze the metric. To improve early defect analysis, some teams record all development-discovered Unit Test defects in their problem-tracking system. This has a huge impact on team productivity with little corresponding value.
Internal measures can be very helpful in allowing leaders and teams to improve effectiveness in target areas. Well-aligned measures are predictive. But if not well-designed, they can cause bad behavior. Internal measures should promote action at the team level.
Good examples of internal measures
Test coverage (path or branch)
Orthogonal defect classification (ODC)1
1. Ram Chillarege, “Orthogonal Defect Classification,” http://www.chillarege.com/articles/odc-concept
Bad examples of internal measures
Iteration velocity when used as a team-effectiveness metric rather than as a means to assess project progress.
Causal Analysis—this is very costly to use on a continuous basis.
Unit Test defect tracking—this has a tendency to create a lot of non-value work and bureaucracy.
Organizational measures are used to assess the effectiveness of operations or strategies. These differ from internal metrics in that they are targeted for action beyond the individual team. There is a high risk of misuse if these are used below the organizational level. Incidentally, organizational metrics applied at the team level is one of the most common reasons that metrics get a bad name.
Cycle times for organizational metrics are typically longer than for internal metrics.
An organization wanted to evaluate the effectiveness of their software development teams. The company was well aware that incorrect measures could cause bad behavior in the teams. At the same time, the organization wanted to understand the effectiveness of its investments in technology and process.
The organization settled on metrics of Lines of Code (LOC) per Programmer Month (PM) and Cost per LOC as the key measures. Their initial assumption was that there was a positive relationship (though possibly not linear) between code delivered and customer value. Because such measures are known to be dangerous and often cause bad behavior (teams feel that they are being compared to others) the company took great care in collecting and using the data. The company collected the data directly from the development libraries and averaged it, by target platform, across the whole business. This eliminated any chance of comparing teams or projects.
The raw data collected was
Unique lines of code in the libraries. They used this to assess the code creation velocity of the development teams. It was clearly different by platform and by product and gave a good indication of the effectiveness of investments in tooling, training, and process enhancements. Note that it was not used to compare teams or individuals because their product profiles were quite different.
Lines of code being shipped in products with credit given for multiple use. This was used to assess the effect of reuse on the delivery of business value. The results clearly showed that this was by far the most effective way of increasing value delivered to the customer.
Lines of code in the support libraries. This explained support effort trends and informed the migration strategies for existing customers. This was also the measure with the fastest growth, as the company supported customers at different software levels.
Overall development and support staffing and costs. This gave an indication of overall productivity and the effect of various organizational initiatives.
The company used this data to show multiple-year trends across many projects to validate and guide investments in the development process and infrastructure. As expected, the data showed significant variation in LOC productivity depending on the hardware platform and the quality requirements of the target market. There were also some interesting findings such as
Unique LOC productivity per programmer was basically flat but assessed business value trended up sharply. The company’s investment in improved infrastructure and processes enabled the move to higher-level platforms and languages and more complex systems.
Shipped or sold LOC increased at over 18 percent per year over a decade as the company emphasized the reuse of common components across multiple products and solutions.
Cost per LOC supported was falling fast from increased quality and investments that made it easier to produce high-quality software.
Thus, even sometimes controversial metrics can be effective if used carefully and in a way that does not compromise integrity.
Good examples of organizational metrics
Aggregate Manager NPS—assessing the degree to which the managers are seen to support their development teams.
Average Cycle Time.
Bad examples of organizational metrics
LOC per PM at a team level.
We use external metrics to understand the market view of a product or an organizational capability.
A common and powerful measure is the Net Promoter Score (NPS)2 of the solutions or products being sold.
An organizational example is the Trust Index Employee Survey done by Great Place to Work.3
Each year the Great Place to Work Institute polls many companies to assess the degree of trust felt by employees. They analyze and collate this data to include in their annual report.
The survey has high levels of integrity and the institute has demonstrated very strong alignment between an organization’s Trust Index and its business performance.
The survey’s action cycle time is quite short and organizations take the survey once a year.
Good examples of external metrics
Net Promoter Score for individual products.
Bad examples of external metrics
Customer satisfaction surveys completed in the presence of the salesperson.
Net Promoter Score across products, which is not actionable.
Examples of Possible Metrics
We always resist the temptation to tell anyone what metrics they should use. At the same time, people beg us to give them meaningful metrics they should consider. That said, here is a list of metrics—many focused on software development (that most challenging of processes)—you can use as a starting point—but just as a starting point. To be meaningful, metrics must be yours and must change as you, your teams, and your organization progress and improve—and as the marketplace changes. To keep the example list short, we have not included detailed descriptions, as most are well known and more detail is just a search away.
To assess organizational support of delivery
Percentage Flow Time (time without interruptions)
Anxiety Boredom Continuum
Meeting Net Promoter Score (NPS for regular meetings)
To predict project progress
Story Point Burndown or Burnup
Function Point delivery
To improve delivered quality and reduce technical debt
Open Defects at iteration end, particularly Severity 1 and 2
Severity 3 and 4 defects that cross two iteration boundaries
Unit or Regression Test Code Coverage
Successful test cases run
Orthogonal Defect Classification
To reduce development wasted effort
Integration Lag (the average time a changed module waits before build)
To assess process effectiveness and improvement
Timely Iteration Completion
Closure of Actions from Reflections
Deferred User Stories
Value Stream Map
To assess organizational productivity
Delivered code trends—unique/shipped or sold/supported
To assess practice adoption
Survey teams on whether they use the practice in <25%/25% to 75%/>75% of the places where it might be used.
Green/Amber/Red/Black assessment of progress
Green—data collected, meeting goal
Amber—data collected, not meeting goal
Red—data collected, not analyzed
Black—no data available
To assess practice credibility
To evaluate customer attitudes to the product
Overall Cycle time
Time to Break Even
Time to first dollar returned
Profit & Loss
Learning from failure
Availability of cross-functional teams
Participation of full team in program success
Metrics for management
Head count to E/R percentage
Head count to products delivered
What not to measure:
Velocity as a measure of team effectiveness
Technical debt as a measure of the team
Process against plan
Story points as hours worked