Scalable Software Architecture for a Startup

Say we are the founders of a startup and we just got a big fat check for our A-round funding. The VCs love our idea, and we all know that our app will attract millions of users in no time. This means that from day one we architect for millions of page-views per day…

But wait … do we really need to deploy Hadoop now? Do we need to design for geographical redundancy now? OR should we just build something that’s going to take us through the next 3 months, so that we can focus our energy on customer development and fine-tuning our product features? …

This is a dilemma that most startups face.

Architecting for Scale

The main argument for architecting for scale from the get-go is akin to: “do it right the first time”: we know that lots of users will be using our app, so we want to be ready when they come, and we certainly don’t want the site going down just as our product catches fire.

In addition, for those of us who have been through the pain of a complete rewrite, a rewrite is something we want to avoid at all costs: it is a complex task that is fun under the right circumstances, but very painful under time pressure, e.g. when the current version of the product is breaking under load, and we risk turning away customers, potentially for ever.

On a more modest level, working on big complex problems keeps the engineering team motivated, and working on bleeding or leading edge technology makes it easier to attract talent.

Keeping It Simple

On the other hand, keeping the technology as simple as possible allows the engineering team to be responsive to the product team during the customer development phase. If you believe, as I do, one of Steve Blank’s principles of customer development: “No Business Plan Survives First Contact with Customers”, then you need to prepare for its corollary namely: “no initial product roadmap survives first contact with customers”. Said differently, attempting to optimize the product for scale until the company has reached clear validation of its business assumptions, and product roadmap, is premature.

On the contrary, the most important qualities that are needed from the Engineering team in the early stages of the company are velocity and adaptability. Velocity, in order to reduce time-to-market, and adaptability, so that the team can rapidly adapt to feedback from “outside the building”.

Spending time designing and implementing a scalable architecture is time that is Not spent responding to customer needs. Similarly, having built a complex system makes it more difficult to adapt to changes.

Worst of all, the investment in early optimization may be all for naught: as the product evolves with customer feedback, so do the scalability constraints.

Case Study: Cloudtalk

I lived through such an example at Cloudtalk. Cloudtalk is designed as a social communication platform with emphasis on voice. The first 2 products “Cloudtalk” and “Let’s Talk” are mobile apps that implement various flavors of group messaging with voice (as well as text and other media). Predicint rapid success, Cloudtalk was designed around the highly scalable noSQL database Cassandra.

I came on board to launch “Just Sayin”, another mobile app that runs on the same backend (very astute design). Just Sayin is targeted to celebrities and allows them to cross-post voice messages to Twitter and Facebook. One of my initial tasks coming on board was to scale the app, and it was suggested that we needed it to move it to Amazon Web Services so that we can scale rapidly as more celebrities (such as Ricky Gervais) adopt our product. However, a quick analysis revealed that unlike the first two products (Let’s Talk and Cloudtalk), Just Sayin’ impact on the database was relatively light, because communications were 1-to-many (e.g. Lady Gaga to her 10M fans). Rather, in order to scale, we first needed a Content Delivery Network (CDN) so that we could feed the millions of fans the messages from their celebrities with low response time.

Furthermore, while Cassandra is a great product, it was somewhat immature at the time (stability, management tools) and consequently slowed down our development. It also took us a long time to train new engineers.

While Cassandra will have been a good choice in the long run, we would have been better served in the formative stages of the company to use more established technology like mySQL. Our velocity in developing new features, and our ability to respond to changes in product strategy would have been significantly faster.

Architecting for Scale is a Process, not an Event

A startup needs to earn the right to design for scale, by first proving that it has found a legitimate market. During this first phase adaptability and velocity are its most important attributes.

This being said, we also need to anticipate that we will need to scale the system at some point. Here is how I like to approach the problem:

  • First of all, scaling is an on-going process. Even if traffic increases dramatically over a short period of time, not all parts of the system need to be scaled at the same time. Yet, as usage increases, it is likely that any point in time, some part of the system will need to be scaled.
  • In order to avoid complete rewrites of the system, we need to break it into independent components. This allows us to redesign each component independently, and have different teams work on different problems concurrently. As a consequence, good modularization of the system is much more important early on, than designing for scale
  • Every release cycle needs to budget time and resources for redesign – including both modularization and scalability. This is just like maintenance on the Golden Gate bridge: the painters are always working; when they finish at one end, they start all over at the other end.
  • We need to treat our software architecture the same way, and budget maintenance work every release cycle: dollars, time, people. CEOs have to be trained to not only think about the “shiny features” – those that are customer-facing – but also about the “continuous improvements” of the architecture that has to be factored in every release cycle.
  • We also need to instrument the code to tell us were it is under strain. Unlike the Golden Gate bridge, we can’t always see where it’s breaking, or even rationalize it. Scaling sometimes works in mysterious ways that are not always obvious to predict.

 

In summary, designing for scale is a high-class problem, on which we only get to work once we have demonstrated true demand for our product. During this first phase, velocity and adaptability are critical, and are better served with well-understood technologies, and a well modularized design. Once our product reaches an adoption phase, then designing for scale is a continuous process that hopefully can be focused on individual modules in turn – guided by proper instrumentation of the code

 

Advertisements

MVP – Minimum Viable Product

Defining the Minimum Viable Product requires the selection of a segment of target customers and deliver the smallest critical mass of features – as early as possible – provided that you can charge a high enough price for it.

I have recently discovered, with great delight, Eric Ries’ “Startup Lessons Learned” blog , and in particular, his post about Minimum Viable Product (MVP). This is not surprising, since we are both fans of Steve Blank‘s Customer Discovery Process.

Eric’s post reminded me, how critical, yet how difficult in practice, the concept of Minimum Viable Product is.

Defining the minimum viable product correctly allows you to release products that are valuable to your customers with the minimal amount of energy and time invested – because as the name says, you have done the minimum, and yet you provide value. Said differently, if you only need to have 2 features in your product in order to sell it for $100, then you’d be crazy to spend the extra effort to add a 3rd or a 4th feature. Plus, by only delivering the minimum, you get to market fast – and hopefully beat the competition.

So why is this so difficult in practice … at least in my experience 🙂 ?

My first answer is that it is a lot easier to define the Maximum Product than it is to define the Minimum Viable Product.

Defining the Maximum Product  entails compiling a list of all the possible features that your product could possibly have: you only need to talk to a handful of customers and take good notes. Critical thinking is not required. It is easy to get consensus on the Maximum Product: More is always better. The only problem is that no company can afford the time it takes to deliver this “ideal” product. Hence this need for the MVP.

The first step in defining the MVP is the one that is most often overlooked: you first need to define the segment of your customers that you target with the new product. The segment has to be small enough to group customer with similar requirements, but large enough that your new product will generate enough revenue.

The second step is to define the theme of the product in terms of benefits (not features). One of the best tools to help define this theme is by imagining that you are putting up a huge billboard on 101 (one the main arteries of Silicon Valley) that will advertise the new product: what  does the billboard say?

The third and final step is to define the critical mass of features in the release. In this step,  ruthless time vs feature vs price trade-offs need to be made – because the question is not just “what features do our target customers absolutely need?” (this list will always be too long), but rather: “Will our customers be willing to buy the product with these  features – available at this date –  at this price? Economically, this question may have multiple correct answers. However, in practice, presented with this question, customers will often select a date in the near term, which in turn defines the minimum viable product.