The most lively debates that I regularly encounter leading an Engineering team revolves around the allocation of resources between bug fixing and the development of new features: “Why doesn’t Engineering fix all the bugs?” exclaims a customer support person – “Why don’t we allocate all Engineering resources to New_Shiny_Feature_X?” wonders the salesperson whose major deal depends on this feature.
These are both absolutely legitimate questions! … It does not mean that their answer is easy.
The main challenge in satisfying these two rightful requests is that they compete for the same resources, and that different people within the company have strongly-held different perspectives. The same person can even switch camps in a matter of days. It all depends on the last sales call. Do we have a customer threatening not to renew until we fix “their bugs”, or do have a big deal pending on the delivery of a new feature?
As a consequence, it is imperative to create a business and technical framework that leads to decision making, where every stakeholder can not only express their perspective but also be satisfied about the decision process and thus about the decisions that come out of this process.
Framework for Decision Making
What’s more important? Or more precisely, what’s more important to implement in this release cycle?
- New features driven by product roadmap and corporate strategy
- Customer-driven enhancement requests
- Bug fixes requested by existing customers
- Paying down technical debt: upgrade architecture, refactor ugly code, optimize operational infrastructure, etc
The process to reach a decision is basically the same as for any business decision: we weigh how much income each item will generate and how much investment it will require.
Implementing a new feature, fixing a bug, enhancing a released feature or paying down technical debt demand the same activities: define requirements, design, code, test, deploy. They also all draw from the same pool of product managers, developers, QA and DevOps engineers. As a consequence, it is relatively easy to define the “investment” side of the equation.
Estimating the income side is a bit more complex, because it comes in multiple flavors. However, the process is the same as prioritizing the backlog of new features: we need to articulate the business case:
- Expected revenue stream (new features & enhancements)
- Reduction in subscription churn (enhancements & bug fixes, as well as new features)
- Cost reduction (technical debt / architecture) through increased future development velocity
- Customer satisfaction (bug & enhancements) which translates in better advocacy for the brand and churn reduction
- Strategic objectives (market positioning, competitive move, commitment to win a major deal)
Each of these categories is important in its own right. Since they cannot all be translated into a common unit of measure (e.g. dollars), I recommend quantifying each of these elements relative to one another (e.g. using T-shirt sizes: S, M, L, XL, …) for each item on the list.
Practically, I create a matrix with rows listing each feature, bug, enhancement request, technical debt, and the following columns:
- Short Description
- Link to longer description (Jira, Wiki, …)
- Summary business case
- Estimated engineering effort
- Estimated calendar duration
- Expected increase in revenue (if any)
- Expected cost reduction (if any)
- Customer satisfaction impact
- Strategic value
While this is not perfect – ideally we’d want to assign a single score for each item – this allows to (a) resolve the no-brainers (high-benefits at a low-cost or high-cost and low-benefits) (b) frame the discussion for the remainder against the business context of the company:
- Are we in a tight competitive race where we need to show momentum in our innovation?
- Do we have one, or more, major deals dependent on a given set of features?
- Are our customers grumbling about our product quality, or worse threatening to leave?
- Is our scalability at risk because of legacy code?
- Are we being hampered in our ability to deliver new features by too much legacy code?
While this will not eliminate passionate debates at Product Council, it will hopefully bound them, particularly if we can first agree on high-level priorities for the business.
Why Not Have a Dedicated Sustaining Engineering Team?
There are two primary reasons why a Sustaining Engineering team is a bad idea: first, it “does not answer the question” of prioritization, and secondly, it is a bad practice as it creates a class of “second-class citizens” engineers.
Say you want to have a Sustaining Engineering team. How large should it be? 5%, 10%, 20%, 50% of all engineering? Why? Should its size remain constant? Or are we allowed to shift resources in and out depending on business priorities? Answering these questions requires the same analysis and decision making as I propose above, but is burdened by the inflexibility of a split organizations
Regardless of whom you assign to Sustaining Engineering, these engineers will be considered second-class by the self-proclaimed hotshots who get to work on new features. Worse, it promotes laziness with respect to quality from the “new feature team”: they know that Sustaining will clean whatever mess they leave. It is pervasive, and over time can even lead to cherry-picking of work, which means that Sustaining ends up completing the “new feature” work. For example, the “new feature” team releases a new product on Chrome (so that they can meet “their date”), but Sustaining gets to make it work on Internet Explorer.
A Useful Best Practice
Any bug older than 12 months, should be removed from the bug backlog. They should either be marked as “Won’t Fix”, or assigned to a secondary backlog list (which, I predict, will never be reviewed). The justification is simple: if a given bug has lived through a year’s worth of bug triages without rising to the top and being fixed, then it is almost certain that it will never be prioritized for resolution. Better to put it out of its misery. Furthermore, this will keep the bug backlog to a reasonable size and bug triage a manageable task. Finally, if for some reason, the visibility of this bug raises anew, it can be returned to the active backlog.
The adage “Software always has bugs” remains true, not because it is impossible to write perfect software (I argue that this IS possible), but rather because in a business context, quality is not an end in-and-of-itself. Don’t get me wrong high-quality is critical, but fixing ALL the bugs is not a requirement for business success.
As a consequence, only 1 criterion matters: “what moves the business forward the most effectively?”
Typically this means making customers happy. There are times when customers are happier if we fix bugs, at other times they prefer to see a new feature brought to market earlier. The answer depends on what drives their business. Do they prefer that we fix a bug that costs them an extra hour of work per day or that we launch a new feature that will allow them to grow their business by 10% in 6 months?