Lessons Learned From 50 Technical Due Diligence Reviews, Part 2

Previously published on Forbes Technology Council – April 26, 2022

In a prior post, Lessons Learned From 50 Technical Due Diligence Reviews, we offered information and advice to founders, CEOs and CTOs on what to expect from, and how to approach, technical due diligence review (tech DD). In this post, we cover how founders, CEOs and CTOs can prepare for tech DD.

In our prior post, we emphasized that tech DD is forward-looking. Investors want to confirm that as a management team, you will master the future opportunities and challenges on both business and technical fronts. Furthermore, an injection of capital usually comes around an inflection point in the growth of the company. For example, when the primary focus shifts from developing the product to increasing revenues, or when adding a major product line extension to conquer new markets. This period is conducive to a strategic reflection on how the company will win in this new phase, under the Marshall Goldsmith adage “What got you here won’t get you there.” After gaining a baseline for where the technology is today, the focus of a tech DD review will, by and large, be very similar to the questions addressed in a strategic review.

Below, we share the most common questions that we ask during tech DD in the hope that they help you prepare your strategic review as well as the tech DD itself.

We always start our reviews with the business context because the role of the technology team is to deliver the products that will enable the business to reach its goals. The product road map for the upcoming 24 months is, for the purposes of tech DD, the materialization of the business objectives. In this context, the product road map must include noncustomer-facing features such as performance, scale, security, business resilience and continuity.

The focus of the tech DD is to determine how well prepared the technology team is to deliver the product road map as promised and the associated revenues or other business metrics.

At the risk of simplifying, tech DD will ask the same questions, listed below, for all areas related to technology (see list further down), which we will later illustrate with examples:

• Is what you are doing today working?

• What are your plans for fixing what’s not working?

• Will the next 24 months create a discontinuity compared to the past year?

• If so, do you have a plan? If not, do you have a plan for a plan?

• Are there areas where you need to acquire competence (learn, experiment, hire, buy)?

The standard areas we investigate, and to which we apply the questions above, are:

• Software architecture and data architecture.

• Technical stack: frameworks for back end and front end, data stores, APIs.

• Performance and scale.

• Security, compliance, data privacy.

• Testing.

• Operational management: deployment, management, alerting, performance.

• SDLC process and toolchains: code analyzers, test automation, SCCS, CI/CD.

• Team: talent, organization.

In practice, this translates to questions like:

• Is the product road map aspirational (wishful thinking) or actionable (backed up by an engineering plan)?

• How much technical debt is there? What is the plan to tackle it, if need be?

• Will major components of the code require re-architecting or major refactoring?

• Are the data models consistent with the main use cases as well as future ones?

• How do your security, data privacy and risk profiles look, both now and once you have scaled 10x?

• Will you require new certifications for compliance?

• What critical hires do you need to make? By when?

• What would be the impact, financial and technical, of a two-day outage of your cloud hosting provider availability zone?

• Have the major technology initiatives (re-architecture, technical debt reduction, security upgrade) all been approved by the business team and budgeted (i.e., have time, resources and money been allocated)? Is the product road map based on budgeted resources?

Investors in high-growth companies, by and large, have a strong stomach and anticipate that at least one major software project will be needed every year or two. From what we can observe, investors generally have the strongest negative reactions to misalignments. For example, an aspirational rather than actionable road map (which implies that both budget and revenue plans are aspirational), or leadership that does not acknowledge that architecture, code quality, processes or security have been outgrown by the success of the company (which implies either lack of competence or lack of teamwork in the business and technical leadership).

In summary, the best scenario is when the business and technical leaders have performed a strategic review prior to fundraising. Preparing for the technical due diligence review should not be about cramming to figure out the answers to anticipated questions; rather, it should be about visualizing how the business, and its technology, will grow over the next two years and identifying the new categories of challenges that growth, and success, will bring.

Lessons Learned From 50 Technical Due Diligence Reviews, Part 1

Previously published on Forbes on 3/18/2022

Over the past couple of years, I’ve led, in collaboration with other CTOs in my company, about 50 technical due diligence reviews, primarily for the benefit of venture capital firms and sometimes for M&A deals. The target companies ranged in maturity from early stage to a hundred million dollars in revenues.

Occurring after a term sheet has been signed and before the full contract is executed, a proper technical due diligence review is far more than an evaluation of a snapshot in the life of a company. It evaluates the ability of the target company’s technical team to deliver the technology that underlies the growth objectives of the company in the next two years.

Having performed these technical due diligence reviews across a variety of industries and company sizes has allowed us to empirically identify patterns, which I’m sharing here in a series of articles for the benefit of founders, CEOs, CTOs, investors and acquirers. My goal is to help each participant be more effective in these situations. In this first part of the series, I’ll start with founders, CEOs and CTOs.

1. Embrace the technical due diligence process.

First, technical due diligence is good news: It means that an investor, or acquirer, is committed to investing in your company. Furthermore, the presumption about the technology is positive—after all, it got you this far.

Don’t ruin this positive vibe by being coy with information or holding back on providing detailed information about your technology and what makes it unique under the guise of protecting the company’s intellectual property (Europeans companies seem more prone to doing this). NDAs protect you. Withholding information simply causes more questions and, thus, more emails and more time on Zoom.

In the worst case, resisting standard requests for information raises questions on what you have to hide. Said differently, it’s impossible for your technical due diligence review provider to provide good recommendations on something they haven’t seen.

In fact, the worst conclusion that we can report to our clients is that the target company doesn’t have any differentiated intellectual property. Consequently, rather than be secretive about your algorithms and technology, “sell” your review provider on how innovative you are. Impress them, and they’re likely to share their enthusiasm with your investors.

2. Use technical due diligence to your advantage.

There’s no right or wrong architecture. By definition, the current architecture is pretty good because it allowed your company to grow to this stage. During the many technical due diligence reviews we’ve performed, we’ve seen the same categories of problems solved successfully with different technical stacks. A good technical due diligence review provider should be polyglot and agnostic. What matters is its understanding of the strengths and weaknesses of the technical stack and how it needs to evolve based on how the business will evolve.

We also know from personal experience that nothing is ever perfect, particularly in a high-growth company. What matters is to demonstrate the awareness of what works and what doesn’t, as well as the decision-making process that’s guided trade-offs over time.

Granted, having to provide documents and answer questions for the technical due diligence review may seem like a huge waste of time. But how often do you get a chance to have experienced peers review your architecture, code, processes and tools? Most CTOs we work with tell us that they learn a lot from the questions that are asked. A question like, “What factors led to the selection of a given framework?” implicitly guides the discussion toward whether these factors will be relevant in the next two years and whether others need to be included as well.

3. Technical due diligence is all-encompassing.

Investors care less about your current state than whether you have a realistic assessment of it (including the problems you haven’t yet solved) and of what it will take to meet the growth numbers you posted in the pitch deck. A good technical due diligence review provider will evaluate the team (the CTO, specifically) as well as the technology.

Sharing how you think, your approach to problem-solving and how trade-offs are made among budget, features, time and resources shows that you’re open and confident. In addition, it lays a solid foundation for the working relationship with the investors over the next three to 10 years. This is similar to job interviews: The interviewer cares more about how you think about a problem than whether you’ve memorized the correct answer.

4. Technical due diligence is nonbinary.

As mentioned, technical due diligence occurs between the signing of the term sheet and that of the contract—in other words, after the deal has already been made. The investors, or acquirers, really want the deal to happen. They don’t ask our opinion about the deal. They just want us to help them paint a picture of what the technology journey will be over the next two years.

In cases in which we’ve suggested that the CTO has reached their peak or the software needs to be rewritten, this has rarely canceled a deal. Instead, investors may decide to reduce their pre-money valuation, increase their investment amount (e.g., to pay for the rewrite) or rework the business plan with the management team. Very rarely do they walk away from the deal, and to our knowledge, it’s never because of technology only.

5. Technical due diligence is forward-looking.

Technical due diligence is the start of the collaboration between the CEO, CTO and the future board members. As technical leaders, you’ll want to demonstrate that you understand the needs of the business and how to architect the technology, team, tools and processes to support these needs over the next two years.

As I’ll illustrate in a forthcoming article in this series, technical due diligence should be considered not as a test but as an opportunity to have a conversation about what lies ahead. Ideally, this conversation should take place internally prior to fundraising, which will likely result in a smooth technical due diligence review.

How To Maximize The Value Of Technical Due Diligence

Previously published on Forbes on 11/16/2021

Technical due diligence (TDD) is typically requested by investors prior to closing a growth-stage investment or when acquiring a company. A smart investor should expect a lot more out of TDD than a “yes or no” answer to the question “Are there any red flags that warrant canceling the investment or acquisition?” 

Instead, as I highlighted previously in my article “The Art of Technical Due Diligence,” “Technical due diligence should provide actionable information about the upcoming 24 months, including critical dependencies, risk factors and major technical milestones that will usher in product milestones.” 

TDD allows future board members to track technical milestones and thus anticipate the financial ones. Technical milestones typically precede some of the financial milestones by three to six months — for example, when software needs to be re-architected to deliver the scale to serve the expected growth. 

A good technical due diligence identifies: 

• When and where the past is no longer a predictor of the future.

• What new skills will need to be developed in the technology and product teams.

• What new risks need to be handled.

Here are some examples: 

Scale will hit a wall.

This is almost a universal concern in technical due diligence projects. The deal is based on four times or 10 times revenue growth in the next 24 months, but can the software keep up? If the answer is “no,” investors will want to know what it will take to meet the growth projections: architecture redesign, implementation plan along with schedule, resources and budget estimates.

There is a large amount of technical debt.

Only close inspection of the code by a talented CTO can identify whether the code is ready for the next phase of growth. Some of the more frequent scenarios include:

• The company is generating millions of dollars of revenues on code based on its first prototype, typically a monolith, with layers of dead code that supported use cases that were abandoned in the quest for product-market-fit. This impacts not only operational performance but also hinders the development velocity once the team grows beyond a dozen developers.

• The code base is “legacy” and poorly maintained. This often happens with companies that were early on the market, persevered through years of slow growth and now suddenly take off. The code is based on old technology, has been updated — expediently — over time by different teams of developers and has poor documentation. In this situation, a rewrite from scratch is usually the only practical solution.

• For enterprise companies, another common scenario occurs when the software and the data storage are still single-tenant. Transitioning to a multi-tenant architecture is a problem with a known solution, but it is time-consuming and costly.

Development velocity will tank.

Probably the hardest transition to navigate for a startup is when the size of the userbase dictates that quality trumps new features. When a company has a large number of customers, the cost of a serious bug — let alone a DOA release — becomes prohibitive.

This is when test automation and CI/CD automation (including Infrastructure as Code) need to be deployed, which is usually a painful process because existing code must be “retrofitted” with automated regression tests. In addition, development velocity temporarily stalls before accelerating again once a critical mass of automation has been reached.

Another common scenario occurs when the target company is developing products like “three founders in a garage,” i.e., with very little documentation, limited QA, manual deployments. Scaling the team will require changing processes as well as attitudes and, possibly, the CTO.

Risk arbitration is drastically different.

A company with one million users should look at security — and business continuity — very differently than a company that has 10,000 users. At the risk of oversimplifying, the cost of implementing state-of-the-art security is the same in both scenarios, yet the ROI is different: The cost of being hacked is much greater for the former than the latter. Similarly for business continuity: The cost of a one-day outage may be acceptable for the latter company, but may kill the former company.

One of the companies we reviewed at my organization had grown organically from a prototype to one that stored hundreds of thousands of credit cards in its database. Because the growth has been organic and moderate, no one in the executive team noticed that the company had reached a scale where a hacker could destroy the company.

There is an inefficient development process.

An often-overlooked factor affecting development velocity is the alignment, or misalignment, between the executive team, product team and technology team.

This shows up in two ways: a product road map that is aspirational (i.e., dates are not backed up by engineering estimates) and a product road map that zig-zags (i.e, changes every quarter). This situation is normal, and possibly desired, when the company is searching for product-market-fit but counterproductive when it is attempting to conquer the large market that it has discovered.

Moving from chasing opportunities to a mode where formal business cases for new features are developed cooperatively is challenging for the company’s leadership but essential to ensure stability in the product road map, which, in turn, allows the technology team to develop a technology road map as well as predictable releases.

Conclusion

None of the issues presented above are deal killers, but they can lead to a modification of the terms of the deal. For example, investors may want to increase their investment to cover the rewrite of major components of the products. In all situations, even with a well-performing technical team, TDD delivers a list of major milestones that can be tracked by the investors as the company grows.

DevOps-Driven Development

It is now time to add the concept of “DevOps-Driven Development” to our repertoire.

“Test-driven” development, which originated around the same time as Extreme Programming and Agile Development, encourages us to think about testing as we architect our software and plan our tasks. Similarly, a “DevOps-Driven Development” approach, ensures that we consider operational implementation as well as deployment process during the design phase. To be clear, DevOps thinking needs to augment (and not replace) testing strategy.

Definition and Motivation

First a definition: I am using the word DevOps here as a shortcut to include both DevOps (build and deployment tools) and Ops (IT/data center Operations).

How many times have you heard “ … but it works on my machine!!” from a developer whose code was found to have a bug in the QA environment or, worse, in production? We all agree that these situations are a horrible waste of time for all involved, most of all customers. This post  thus advocates that DevOps-thinking, just as quality-thinking, must occur at the design phase and continue throughout the development of the software until the software is released to production, and even after it has been released in production.

Practicing DevOps-Driven Development

I have always advocated: “If you don’t know how to test it, you don’t know how to design it.” (Who Owns Quality? Part 3), to articulate the fact that “quality cannot be debugged out, it has to be designed in”. Similarly, if we want to know – before our customers call us – when our code crashes in Production, or becomes unusably slow, then we must build into our code the proper instrumentation and administration capabilities.

We now must add this mantra “If you don’t know how to deploy it and manage it in Production, you don’t know how to design it”.

Just like we don’t allow code to be merged into Trunk (main branch) without complete unit tests, code cannot be merged into Trunk without correct deployment scripts, release notes, and production instrumentation.

Here is a “thinking DevOps” check list:

Deployable

First of all, we must ensure that the code deploys successfully not only in Production but in all environments: Dev, QA, Stage, etc

This implies:

  • Developers write/update release notes: e.g. highlighting any changes required in the configuration of the environments: open new port, add a column in database, a new property in config files, etc
  • Developers in collaboration with DevOps team update deployment scripts, e.g. to account for a new executable, or schema changes in the database

The management of Config/Property files is beyond the scope of this blog, but I strongly recommend the “Infrastructure as code” approach: i.e. fully automating  server/image configuration for deployment and, managing configuration, deployment scripts and application property files under source code control.

Monitor-able

If we want to detect problems before our (irate) customers call us, our code needs to be monitor-able – not only at the physical server level, but also each virtual machine, service and process, as well as networking and storage systems.

Monitor-ability needs surpass keeping track of CPU load, disk space and network bandwidth. We, developers, (should) know what parameter(s) indicate when our system is mis-behaving, whether it is a queue exceeding a given size, or certain operations timing out. As a consequence, we must publish these parameters to interfaces compatible with Ops monitoring tools, of which there are several categories:

Furthermore, by making performance metrics easily observable, we ensure that each new release maintains (or improves) the performance of the prior release.

Diagnosable

Despite our best intentions, we must humbly assume that at some point our code will crash, or seriously mis-behave, and thus require troubleshooting. In the worst case, Development will be called in (usually in the wee hours of the night) to assist the Ops team. As any one who has had to figure out why a given system intermittently crashes will attest, having log files capture meaningful information prior to the incident is invaluable. Having to add logging statements after-the-fact is a painful process. Consequently, a solid Logging Hygiene is critical (and worthy of a dedicated post):

  • Log statements must be written in a format compatible with the log management system (Splunk, GrayLog2, …)
  • All log statements used during the coding and QA phase must be removed
  • Comprehensive Operations-focused logging must be added to document all operations that may fail due to environmental and data-related problems: out-of-memory, disk full, time out, user not found, access denied, etc. These are not bugs, but failures due to either environment (e.g. a server or connection is down) or incorrect data (e.g. the user has been deleted).
  • The hierarchy of logging levels must be enforced so that in normal operations log files are kept small, and conversely  meaningful information is output when troubleshooting is required
  • Log statements must include all the information necessary to bind all operations across various services that are related to a single user-level transaction (e.g. clicking on a link to a new page, adding an item to cart) – more details below in “Tunable”.

Security

This again is worthy of its own post, but code that is deployed to Production must both support the security practices implemented by the Ops team (e.g. Authentication protocols, networking infrastructure), and ensure that the code itself is secure (e.g. no SQL injection, buffer overflow, etc).

Business Continuity

Business continuity is often overlooked, but we must ensure that any persistent data is stored in a storage system that is backed up by the Ops team. In other words, if we add a new database, we’d better ask the Ops team to add it to their backup scripts.

Similarly, if our infrastructure is deployed (or even just deployable) across multiple data-centers, our code must support this though configuration.

The above requirements represent the basic DevOps requirements that any developer must address before even thinking that his/her code is ready to release. The following details additional practices that are highly recommended, but not strictly necessary.

Scalable

The code must be designed so that the Ops team can scale it in the datacenter without needing help from Development.

This may involve deploying the code to a bigger server. This implies that the code can be configured (and documented for the Ops team) to make use of the expanded resources, whether it is number of cores, RAM, threads, I/O, etc

This may also involve adding instances to a cluster. Consequently, the code must be discoverable (the load balancer must find out that a new instance has been added/subtracted), as well as cluster-aware (e.g. stateless).

Tunable

Because it is so hard to simulate all real-life user activities and behaviors in non-production environments, we must provide tools to the Ops team to tune the performance of our code through configuration rather than code deployment (e.g. size of JVM, number of threads, queue sizes, hash table size, etc).

We must thus provide the metrics to observe performance. Let’s take the example of response time: depending on the complexity of the application a user request may be handled by tens, or even hundreds of services. In order to allow the Ops team to build a timeline of the interactions between all the services involved, each log entry must carry at least one tag that identifies the root transaction that generated the request. Otherwise it is impossible to determine whether the performance degradation comes from a given service, or a unique server, or even from the network infrastructure.

The same tagging will be used to troubleshoot failures (e.g. to discover why a given service fails intermittently).

QA-able

As I mentioned in an earlier blog, QA does not stop in QA: we have to anticipate “unknown unknowns”, i.e. usage (or performance) scenarios that we have not modeled in our QA environments. By definition, there is not much we can do other than ensuring that our code is easy to trouble-shoot (see above) and that logs and associated data can be made available easily and rapidly to developers and QA team (e.g. by giving them access to the log management console).

Sometimes this requirement is more complex than it sounds, e.g. when user data must be deleted or obfuscated for privacy or security reasons. Again, this should be thought through before code is deployed.

Analytics – Growth Hacking – Usability

This last requirement stems from Marketing and Sales rather than Operations, but it is equally important since it drives revenue growth.

In most companies, marketing and sales rely on usage reports to drive new marketing campaigns, pricing, product offerings and even new features. As a consequence, any new feature must integrate with the Analytics infrastructure whether via integration with usage tracking applications (e.g. Mixpanel, Flurry, …) or simply log management consoles (Splunk, GrayLog2, …). However, I highly recommend using separate logging infrastructure for operations monitoring and for usage analytics, if only because usage analytics requires additional data that is not useful for Operations monitoring (e.g. the time a user spends on a page is extremely valuable for usage analytics but irrelevant for Operations)

Even More So for Microservices

As we migrate towards a microservices architecture, early “DevOps thinking” becomes even more critical. As the “Microservices: Four Essential Checklists when Getting Started” advises: “Microservices introduces a lot of moving parts that were previously non-existent in a monolithic system”.

What was a monolithic application running in a single virtual machine can morph into 5, 10 or even 20 microservices. Consequently, Development, DevOps and Ops must collaborate on microservices infrastructure tools: service registration, scaling up/down each service independently, health monitoring, error detection, etc. to provide visibility on the status of these 20 microservices as a whole. This challenge has even prompted dedicated product categories (SignalFx,  Nirmata, etc)

Summary

Only with a holistic approach to product architecture can we ensure customer satisfaction with software that works the first time, and all the time. Deployment and operations management concerns, just like testability, must be addressed at design time, so that these capabilities are meshed natively into the code rather than “bolted on” after the fact. Failing to do so will likely impact the delivery schedule, or worse, create outages in production.

More importantly, there is so much we can learn from observing how our code behaves in Production: operational efficiency, stability, performance, usability, that we would do a disservice to ourselves if we did not avail ourselves of this valuable information to drive further improvements to our product.

Scalable Software Architecture for a Startup

Say we are the founders of a startup and we just got a big fat check for our A-round funding. The VCs love our idea, and we all know that our app will attract millions of users in no time. This means that from day one we architect for millions of page-views per day…

But wait … do we really need to deploy Hadoop now? Do we need to design for geographical redundancy now? OR should we just build something that’s going to take us through the next 3 months, so that we can focus our energy on customer development and fine-tuning our product features? …

This is a dilemma that most startups face.

Architecting for Scale

The main argument for architecting for scale from the get-go is akin to: “do it right the first time”: we know that lots of users will be using our app, so we want to be ready when they come, and we certainly don’t want the site going down just as our product catches fire.

In addition, for those of us who have been through the pain of a complete rewrite, a rewrite is something we want to avoid at all costs: it is a complex task that is fun under the right circumstances, but very painful under time pressure, e.g. when the current version of the product is breaking under load, and we risk turning away customers, potentially for ever.

On a more modest level, working on big complex problems keeps the engineering team motivated, and working on bleeding or leading edge technology makes it easier to attract talent.

Keeping It Simple

On the other hand, keeping the technology as simple as possible allows the engineering team to be responsive to the product team during the customer development phase. If you believe, as I do, one of Steve Blank’s principles of customer development: “No Business Plan Survives First Contact with Customers”, then you need to prepare for its corollary namely: “no initial product roadmap survives first contact with customers”. Said differently, attempting to optimize the product for scale until the company has reached clear validation of its business assumptions, and product roadmap, is premature.

On the contrary, the most important qualities that are needed from the Engineering team in the early stages of the company are velocity and adaptability. Velocity, in order to reduce time-to-market, and adaptability, so that the team can rapidly adapt to feedback from “outside the building”.

Spending time designing and implementing a scalable architecture is time that is Not spent responding to customer needs. Similarly, having built a complex system makes it more difficult to adapt to changes.

Worst of all, the investment in early optimization may be all for naught: as the product evolves with customer feedback, so do the scalability constraints.

Case Study: Cloudtalk

I lived through such an example at Cloudtalk. Cloudtalk is designed as a social communication platform with emphasis on voice. The first 2 products “Cloudtalk” and “Let’s Talk” are mobile apps that implement various flavors of group messaging with voice (as well as text and other media). Predicint rapid success, Cloudtalk was designed around the highly scalable noSQL database Cassandra.

I came on board to launch “Just Sayin”, another mobile app that runs on the same backend (very astute design). Just Sayin is targeted to celebrities and allows them to cross-post voice messages to Twitter and Facebook. One of my initial tasks coming on board was to scale the app, and it was suggested that we needed it to move it to Amazon Web Services so that we can scale rapidly as more celebrities (such as Ricky Gervais) adopt our product. However, a quick analysis revealed that unlike the first two products (Let’s Talk and Cloudtalk), Just Sayin’ impact on the database was relatively light, because communications were 1-to-many (e.g. Lady Gaga to her 10M fans). Rather, in order to scale, we first needed a Content Delivery Network (CDN) so that we could feed the millions of fans the messages from their celebrities with low response time.

Furthermore, while Cassandra is a great product, it was somewhat immature at the time (stability, management tools) and consequently slowed down our development. It also took us a long time to train new engineers.

While Cassandra will have been a good choice in the long run, we would have been better served in the formative stages of the company to use more established technology like mySQL. Our velocity in developing new features, and our ability to respond to changes in product strategy would have been significantly faster.

Architecting for Scale is a Process, not an Event

A startup needs to earn the right to design for scale, by first proving that it has found a legitimate market. During this first phase adaptability and velocity are its most important attributes.

This being said, we also need to anticipate that we will need to scale the system at some point. Here is how I like to approach the problem:

  • First of all, scaling is an on-going process. Even if traffic increases dramatically over a short period of time, not all parts of the system need to be scaled at the same time. Yet, as usage increases, it is likely that any point in time, some part of the system will need to be scaled.
  • In order to avoid complete rewrites of the system, we need to break it into independent components. This allows us to redesign each component independently, and have different teams work on different problems concurrently. As a consequence, good modularization of the system is much more important early on, than designing for scale
  • Every release cycle needs to budget time and resources for redesign – including both modularization and scalability. This is just like maintenance on the Golden Gate bridge: the painters are always working; when they finish at one end, they start all over at the other end.
  • We need to treat our software architecture the same way, and budget maintenance work every release cycle: dollars, time, people. CEOs have to be trained to not only think about the “shiny features” – those that are customer-facing – but also about the “continuous improvements” of the architecture that has to be factored in every release cycle.
  • We also need to instrument the code to tell us were it is under strain. Unlike the Golden Gate bridge, we can’t always see where it’s breaking, or even rationalize it. Scaling sometimes works in mysterious ways that are not always obvious to predict.

 

In summary, designing for scale is a high-class problem, on which we only get to work once we have demonstrated true demand for our product. During this first phase, velocity and adaptability are critical, and are better served with well-understood technologies, and a well modularized design. Once our product reaches an adoption phase, then designing for scale is a continuous process that hopefully can be focused on individual modules in turn – guided by proper instrumentation of the code