Customer Discovery – Software Engineering

Why Digital Transformations Fail – the Monolith Syndrome

Previously published on Silicon Valley Software Group Insights in March 2023.

A number of our engagements come from clients who experience a similar pattern of symptoms: release velocity is trending down, critical bugs pop up with each release, yet hiring more developers does not seem to improve anything. In parallel, the digital imperative, which has gained momentum over the past couple of years, whether imposed by the pandemic, or simply overall evolution, keeps building the pressure: consumers require a flawless digital experience. When the technology team does not deliver, the consequences for the business are painful: customers are disappointed, competition edges ahead and, even more heartbreaking, our clients are unable to capture the demand that their marketing has generated.

The goal of this post is to inform both CEOs and CTOs on how to diagnose what we term the “Monolith Syndrome”. As with any condition, early diagnosis vastly improves the chances of success. It is thus critical for CEOs and CTOs to know how to recognize this pattern, and take the necessary early actions. Further, it often falls on the CEO to identify the situation, because the CTO is usually consumed in trying to just keep up.

Symptoms

The symptoms of what we term the “Monolith Syndrome” look like this:

The application’s response time keeps degrading;
Outages are becoming more frequent;
As outages occur, new features requests do not get delivered. Customer complaints rise;
Re-prioritization of the product roadmap occurs before the main features of the previous roadmap are delivered (because they took too long);
Distrust between the executive and the technology teams grows.

Like any challenge, each company faces its own flavor of the “Monolith Syndrome”, yet to the experienced eye, the pattern is easily recognizable. More fundamentally, it is absolutely normal: it occurs when a company has grown into a new stage of maturity – where a new way of running the business, including the technology, is now necessary. Like most living organisms, when looking on a short time horizon, companies grow incrementally. However, when taking a step back, discrete stages become evident. On the technical front, transitioning between maturity stages call for what is called a “Digital Transformation”.

The Monolith Syndrome encapsulates scenarios of pain when the technology team cannot keep up with the needs of the business through “business as usual”.

There are multiple scenarios that require a digital transformation, the Monolith Syndrome is one of them. We will explore the others in subsequent posts.

Causes

From a technical perspective, the root causes of the “Monolith Syndrome” are often a combination of:

The architecture of the current codebase was developed more than five years ago, and has changed little since;
The code is built on a single codebase and uses a single database – hence the term “monolith”;
Development expediency has been the priority which has led to: poorly organized code, little documentation, few tests, and even fewer automated tools for QA, release and operational management;Critical areas of functionality are implemented in “dark code”: code that was written by developers who are no longer employed by the company, and which current developers are scared to touch, because the code is difficult to understand and there is no documentation.

The Monolith Syndrome encapsulates scenarios of pain when the technology team cannot keep up with the needs of the business through “business as usual”. We described the symptoms above in technical terms. Yet, the underlying cause is that the company has grown into a different maturity level – where “what got you here” no longer works.

To be clear, a monolithic codebase is usually the right way to go in the early stages of a company: there are a handful of developers, a manageable number of lines of code, and few features that are quick to test manually. Yet, at some point in the company’s growth, the nimbleness and expediency become a detriment rather than an asset. For example, it becomes cumbersome to develop, let alone release, when twenty-plus developers are writing code in a monolith: different developers’ new code interact with each other in a way that creates unforeseen bugs.

The underlying cause of the Monolith Syndrome is that the company has grown into a different maturity level, but not the technology team.

As a company battles through the Monolith Syndrome, the CEO and CTO have a heart-to-heart: the CEO asks “what do you need to develop new features faster?” – to which the CTO invariably answers “I need more engineers”, and then proceeds to build a “better monolith”, i.e continue to work on the same codebase with the same processes and tools. Yet with poor architecture, software organization, and documentation, the extra developers only create more confusion and barely accelerate development velocity. The root cause of this lack of progress is that the business side has gone through a change of paradigm, but not the technology team.

Again, this is why it is the CEO, who understands the business context, who needs to recognize the pattern.

The goal of the transformation is not to update to the latest and greatest technologies, but rather to identify the technologies most appropriate for the foreseeable needs of the business.

The Proper Mindset

In order for the transformation to be successful, everyone needs to have the proper mindset:

Recognize that this effort is the “price of success”. Understand that current architecture, code, tools, etc. were not a mistake – no one deserves blame. On the contrary, they were optimal for the previous stage of maturity. Now that the business has grown, and evolved, technology also has to transform to a more mature architecture.
The goal of the transformation is not to update to the latest and greatest technologies, but rather to identify the technologies most appropriate for the foreseeable needs of the business.
The transformation will require a set of skills that is typically not present in-house. Rare are the CTOs who have successfully led digital transformations. Hence, it is usually wise to enlist the help of technical leaders who do have this experience.

SVSG’s Framework

SVSG follows the following framework:

Re-align the technology to the business: understand the main stakeholder journeys (customer and employee), which have likely evolved since the current architecture was designed.
Design the architecture – and data models – before coding, based on the new stakeholder experiences, as well as needs for scale, resilience, security, etc.
Incorporate the full business context such as scale, security, resiliency, etc.
Design an incremental migration path from the current state to the desired state. For example, start by breaking up the monolith by creating one additional microservice, validating its design before moving one to a second microservice.
Evangelize that the transformation goes beyond architecture and code. The whole development process, from end to end, must align with the company’s new stage of growth.

Final Thoughts

Digital transformations are rare events in the life of a company. Technology leaders are usually selected and trained to design and build technology incrementally. Unless you have gone through it before, detecting that your company might be experiencing the Monolith Syndrome is an unusual, and difficult, challenge for both CTOs and CEOs; but when the symptoms arise, it’s important to act swiftly if the business is to keep up with its growth.

Scalable Software Architecture for a Startup

Say we are the founders of a startup and we just got a big fat check for our A-round funding. The VCs love our idea, and we all know that our app will attract millions of users in no time. This means that from day one we architect for millions of page-views per day…

But wait … do we really need to deploy Hadoop now? Do we need to design for geographical redundancy now? OR should we just build something that’s going to take us through the next 3 months, so that we can focus our energy on customer development and fine-tuning our product features? …

This is a dilemma that most startups face.

Architecting for Scale

The main argument for architecting for scale from the get-go is akin to: “do it right the first time”: we know that lots of users will be using our app, so we want to be ready when they come, and we certainly don’t want the site going down just as our product catches fire.

In addition, for those of us who have been through the pain of a complete rewrite, a rewrite is something we want to avoid at all costs: it is a complex task that is fun under the right circumstances, but very painful under time pressure, e.g. when the current version of the product is breaking under load, and we risk turning away customers, potentially for ever.

On a more modest level, working on big complex problems keeps the engineering team motivated, and working on bleeding or leading edge technology makes it easier to attract talent.

Keeping It Simple

On the other hand, keeping the technology as simple as possible allows the engineering team to be responsive to the product team during the customer development phase. If you believe, as I do, one of Steve Blank’s principles of customer development: “No Business Plan Survives First Contact with Customers”, then you need to prepare for its corollary namely: “no initial product roadmap survives first contact with customers”. Said differently, attempting to optimize the product for scale until the company has reached clear validation of its business assumptions, and product roadmap, is premature.

On the contrary, the most important qualities that are needed from the Engineering team in the early stages of the company are velocity and adaptability. Velocity, in order to reduce time-to-market, and adaptability, so that the team can rapidly adapt to feedback from “outside the building”.

Spending time designing and implementing a scalable architecture is time that is Not spent responding to customer needs. Similarly, having built a complex system makes it more difficult to adapt to changes.

Worst of all, the investment in early optimization may be all for naught: as the product evolves with customer feedback, so do the scalability constraints.

Case Study: Cloudtalk

I lived through such an example at Cloudtalk. Cloudtalk is designed as a social communication platform with emphasis on voice. The first 2 products “Cloudtalk” and “Let’s Talk” are mobile apps that implement various flavors of group messaging with voice (as well as text and other media). Predicint rapid success, Cloudtalk was designed around the highly scalable noSQL database Cassandra.

I came on board to launch “Just Sayin”, another mobile app that runs on the same backend (very astute design). Just Sayin is targeted to celebrities and allows them to cross-post voice messages to Twitter and Facebook. One of my initial tasks coming on board was to scale the app, and it was suggested that we needed it to move it to Amazon Web Services so that we can scale rapidly as more celebrities (such as Ricky Gervais) adopt our product. However, a quick analysis revealed that unlike the first two products (Let’s Talk and Cloudtalk), Just Sayin’ impact on the database was relatively light, because communications were 1-to-many (e.g. Lady Gaga to her 10M fans). Rather, in order to scale, we first needed a Content Delivery Network (CDN) so that we could feed the millions of fans the messages from their celebrities with low response time.

Furthermore, while Cassandra is a great product, it was somewhat immature at the time (stability, management tools) and consequently slowed down our development. It also took us a long time to train new engineers.

While Cassandra will have been a good choice in the long run, we would have been better served in the formative stages of the company to use more established technology like mySQL. Our velocity in developing new features, and our ability to respond to changes in product strategy would have been significantly faster.

Architecting for Scale is a Process, not an Event

A startup needs to earn the right to design for scale, by first proving that it has found a legitimate market. During this first phase adaptability and velocity are its most important attributes.

This being said, we also need to anticipate that we will need to scale the system at some point. Here is how I like to approach the problem:

First of all, scaling is an on-going process. Even if traffic increases dramatically over a short period of time, not all parts of the system need to be scaled at the same time. Yet, as usage increases, it is likely that any point in time, some part of the system will need to be scaled.
In order to avoid complete rewrites of the system, we need to break it into independent components. This allows us to redesign each component independently, and have different teams work on different problems concurrently. As a consequence, good modularization of the system is much more important early on, than designing for scale
Every release cycle needs to budget time and resources for redesign – including both modularization and scalability. This is just like maintenance on the Golden Gate bridge: the painters are always working; when they finish at one end, they start all over at the other end.
We need to treat our software architecture the same way, and budget maintenance work every release cycle: dollars, time, people. CEOs have to be trained to not only think about the “shiny features” – those that are customer-facing – but also about the “continuous improvements” of the architecture that has to be factored in every release cycle.
We also need to instrument the code to tell us were it is under strain. Unlike the Golden Gate bridge, we can’t always see where it’s breaking, or even rationalize it. Scaling sometimes works in mysterious ways that are not always obvious to predict.

In summary, designing for scale is a high-class problem, on which we only get to work once we have demonstrated true demand for our product. During this first phase, velocity and adaptability are critical, and are better served with well-understood technologies, and a well modularized design. Once our product reaches an adoption phase, then designing for scale is a continuous process that hopefully can be focused on individual modules in turn – guided by proper instrumentation of the code

MVP – Minimum Viable Product

Defining the Minimum Viable Product requires the selection of a segment of target customers and deliver the smallest critical mass of features – as early as possible – provided that you can charge a high enough price for it.

I have recently discovered, with great delight, Eric Ries’ “Startup Lessons Learned” blog , and in particular, his post about Minimum Viable Product (MVP). This is not surprising, since we are both fans of Steve Blank‘s Customer Discovery Process.

Eric’s post reminded me, how critical, yet how difficult in practice, the concept of Minimum Viable Product is.

Defining the minimum viable product correctly allows you to release products that are valuable to your customers with the minimal amount of energy and time invested – because as the name says, you have done the minimum, and yet you provide value. Said differently, if you only need to have 2 features in your product in order to sell it for $100, then you’d be crazy to spend the extra effort to add a 3rd or a 4th feature. Plus, by only delivering the minimum, you get to market fast – and hopefully beat the competition.

So why is this so difficult in practice … at least in my experience 🙂 ?

My first answer is that it is a lot easier to define the Maximum Product than it is to define the Minimum Viable Product.

Defining the Maximum Product entails compiling a list of all the possible features that your product could possibly have: you only need to talk to a handful of customers and take good notes. Critical thinking is not required. It is easy to get consensus on the Maximum Product: More is always better. The only problem is that no company can afford the time it takes to deliver this “ideal” product. Hence this need for the MVP.

The first step in defining the MVP is the one that is most often overlooked: you first need to define the segment of your customers that you target with the new product. The segment has to be small enough to group customer with similar requirements, but large enough that your new product will generate enough revenue.

The second step is to define the theme of the product in terms of benefits (not features). One of the best tools to help define this theme is by imagining that you are putting up a huge billboard on 101 (one the main arteries of Silicon Valley) that will advertise the new product: what does the billboard say?

The third and final step is to define the critical mass of features in the release. In this step, ruthless time vs feature vs price trade-offs need to be made – because the question is not just “what features do our target customers absolutely need?” (this list will always be too long), but rather: “Will our customers be willing to buy the product with these features – available at this date – at this price? Economically, this question may have multiple correct answers. However, in practice, presented with this question, customers will often select a date in the near term, which in turn defines the minimum viable product.

	Ely Shemer on Lessons Learned From 50 Techni…
	Lessons Learned From… on Lessons Learned From 50 Techni…
	DevOps Consult on DevOps-Driven Development
	devops training on DevOps-Driven Development
	Time Tested Engineer… on (Boosting) Morale in Engineeri…