top of page

Getting started with data-driven software management

Writer: Dan SturtevantDan Sturtevant

Regain control by measuring productivity and quality to drive results.


Why measure software economics?

If you can’t measure, you can’t manage.  Fortunately, with the right tooling in place, it is possible to measure software quality and its impact on maintainability, agility, and cost.  The goal of this paper is to help you understand software economic measurement and think about how to get started.  Once objective measurement is in place, software leaders can steer projects more effectively by driving technology in a direction that will improve economic outcomes.  Figure 1 illustrates the results of a successful data-driven software management effort.  This organization set out to capture quality and performance metrics in an integrated way.  As a result managers could:

  • Distinguish between components in their large- and long-lived codebase with good vs. challenged architecture health

  • Measure defect density and productivity

  • Determine that developers working in parts with good design quality produced 2.5 times as much code as developers working in the challenged parts.

  • Determine that developers working in good parts spend 80% of their time implementing useful features, while developers in challenged parts spent 70% of their time investigating and fixing bugs.

 

Figure 1: Measured impact of design quality on defects and productivity 
Figure 1: Measured impact of design quality on defects and productivity 

With this kind of measurement, managers could make important decisions in more objective and informed ways.  For example, it becomes possible to reason about whether an initiative – such as test coverage improvement, architecture improvement refactoring, or a rewrite – would have a positive long-term ROI.  Without measurement, decisions to improve were often subjective, political, and risky.

 

How to measure quality and software economics

Most development projects do not capture the data required to do detailed maintainability, agility, and cost analyses. Fortunately, it is not overly difficult to do so.  Projects using modern best practices are already 80% there. Necessary metrics can often be pulled directly from normal software management IT systems.


Figure 2: The sources of quality and software economic data
Figure 2: The sources of quality and software economic data

Figure 2 shows common software management systems and artifacts typically found in a software enterprise.  These include:


  • Change Management System: Managers and developers create feature requests and bug reports (or ‘tasks’) that get stored and tracked.  Tasks can be scheduled and prioritized.  Time estimates for each can be made to aid in resource allocation and planning.  Developers use this system to track work progress.  Examples include Jira, Trac, Bugzilla, Microsoft Project.

  • Version Control System: This system stores all versions of each source code file and information about changes (or patches) that go into it over time.  It allows developers to look at the history and evolution of every file and determine who contributed each line of code, and at what time. Examples include Git, Subversion, Perforce, CVS.

  • Software Source Code: The aim of a development enterprise is to produce a high-quality source code base.  This code compiles directly into the product that is deployed.

 

Management activities flow from left-to-right in Figure 2 from planning through production.  Feature and bug-fix tasks are created, prioritized, and tracked in the Change Management System.  Developers pick a task, work until it is complete, and then submit a patch into the version control system.  At regular intervals, the version control system is used to create entire copies of the software source code to be compiled, tested, and deployed.

 

Interestingly, software economics flow in the opposite direction – right-to-left - in Figure 2.  Code quality, design quality, and test quality (properties of the codebase itself) are dominant drivers of the quality and productivity of future development.  These influence the rate of change observable in version control system logs.  They also strongly influence defectfulness and schedule slippage observable in change management system records.  For this reason, project management techniques are often powerless to correct problematic projects where quality had degraded.  Project managers leveraging only data from ‘change tracking systems’ are heavily reliant on subjective and game-able measures of overhead activity, rather than the true drivers of project success. ‘Product agility’ (i.e. modularity) drivesprocess agility,’ not the other way around.

 

Basic questions to understand current state of project tools

To start down the path towards data-driven software management, begin with the following checklist:

  1. Do you have access to source code?

  2. Is a version control system in use?

  3. Do you have access to the version control system?

  4. Is a modern change management system in use?

  5. Do you have access to the change management system?

  6. Does the change management system have a programmatic API for data extraction?

 

What data should be exploited?

In order to analyze and leverage defect density, productivity, and other software economic outcome information, one should capture and link data from the three sources above (in reverse order): First from code, second from version control, and third from change management systems.

 

From source-code (with software analysis tools) we can capture:

  • System name, version, and release dates

  • Entities, relationships between entities, and metrics

  • Counts, Sizes, Languages

  • ‘Architecture health’ metrics related to modularity, hierarchy, and APIs such as ‘Core-Periphery’ measures.

  • Test quality’ metrics such as unit-test or system-test code coverage

  • Code quality’ metrics such as ‘Cyclomatic Complexity’.

 

From version control we can capture:

  • Change-sets, or code submissions, made by a developer at a single point in time.

  • Patches, or changes to one file at a single point in time.  Multiple patches may belong to the same change-set.

  • Version numbers, identifiers, and dates

  • Line counts associated with each patch

  • The identity of the developer making a change

  • The identity of the file being patched

  • File age and code change volume.

 

From change management, we can capture information about each task:

  • Task identifiers

  • Type – such as ‘bug,’ or ‘feature’

  • Subtypes such as ‘critical bug’, ‘bug with customer impact,’ or ‘bug shipped in the product.’

  • Priority information

  • Timestamps such as creation time, closed time.

  • Status such as ‘ToDo’ or ‘Completed’

  • Identity of individuals working on issue

 

Additional sources of data:

  • Test suites might be run to capture unit-test or system-test coverage.

  • HR databases might be mined for information about developers such as managerial status, years with organization, role, etc.  An org chart could also be extracted as well.  Such information is useful when studying developer or team productivity.

 

Link between version control & change management systems

To measure software economic outcomes, we need traceability between version control and change management system data.  For example, we must know which file-changes fixed bugs (vs implemented new features) to compute defect density in a file

Unfortunately, ‘change management’ and ‘version control’ systems were historically created to serve separate functions.  ‘Change management systems’ were developed for project managers, while version control systems were created for developers.  Because of this separation, many IT systems do not contain data linking ‘tasks’ and ‘patches.’

 

Fortunately, some integrated ‘change management’ and ‘version control’ systems are available.  (One example is Jira’s integration with GIT.)  These systems typically store unique identifiers associated with a ‘change management task’ inside logs for a ‘version control patch.’  Some integrated systems require and enforce this connection via tooling.  Other systems require or request that developers manually establish this connection by entering a ‘task id.’  Some manual-entry systems do error checking, while others do not.  Data quality is better when policy is enforced by tools, when management requires developers to comply with policy, when developers are conscientious about using ‘change management’ categories appropriately, and when tools check for errors during manual entry.

 

Advanced questions to understand current state

  1. Are the version control and change management systems integrated?

  2. Is this integration enforced by tooling?

  3. Is this integration accomplished by manual entry?

  4. If by manual entry, does the system check for correctness when IDs are entered?

  5. How many version control changes are properly linked to tasks in the change management system?  What fraction?

  6. Does management and development culture require proper use of the change management system?

  7. Are developers rewarded or evaluated based on the contents of the ‘change management system?

  8. Is there reason to believe change management system is being ‘gamed’ (e.g. ‘bugs’ misclassified as features to make defect numbers look better.)


Contact Us

Silverthread’s mission is to advance the state of software measurement practice by quantifying complexity and design quality. Our measurement know-how can establish a more trustworthy foundation for improving software economics.

Commentaires


bottom of page