The Art Of Technical Due Diligence

Previously published in Forbes on April 11, 2019

Technical due diligence (TDD) takes place once an investor (such as a venture capitalist, private equity manager or another company) has decided to invest in, or acquire, a technology company. Once they make this decision, they have limited time to dig into the company in order to ensure that its technology, its engineering team and its development velocity are as advertised.

As someone who has experienced this process from both sides — and whose company provides it — I understand the stress TDD sometimes causes. That’s because a poorly executed technical due diligence initiative can derail a deal and hurt the bottom line for investors — as well as the management team of the company receiving the investment — so it is worth understanding how to do it right as a CTO.

TDD Isn’t A Beauty Contest

After participating in dozens of TDD projects, I’ve learned that there are many ways to solve a technical challenge — including using frameworks or programming languages that I personally wouldn’t touch. It is thus critical to put aside one’s own ideas of technical purity and “the right way of doing things” during TDD and to have an open mind about how technology can be used. (You could even learn something along the way.)

Furthermore, it’s important to be clear about the purpose of TDD. If they’ve already made the decision to invest in a company, investors should assume that its technology is good enough today. What you want to know instead is whether the company can execute your business plan. A technical review should thus stay away from judging the beauty of today’s product architecture and take a more dynamic view to validate whether the technical team can deliver the future that the company has drawn for itself.

This dynamic perspective is all the more important because, for companies lucky enough to grow at a fast pace, life is messy. Code architecture is constantly evolving, and documentation is often incomplete and out of date. Concurrently, penetrating new markets and creating new major features often requires introducing novel technology or considerable re-architecting. You can reign in this apparent chaos during TDD through major core projects that temporarily do not produce user-facing features yet allow the engineering team to maintain velocity in the long run. A wise TDD will use these projects to discern between the “normal growth-driven chaos” and signs of any additional structure the company may need to reach a new stage of growth. An investor should expect one, two or even more of these fundamental projects in a two-year timeframe.

It’s All About The Product Road Map

The product road map is the engineering team’s commitment to the company to deliver specific features, products and capabilities on a given schedule. The management team, in turn, makes revenue projections based on the availability of these new features. Consequently, delays in the product roadmap can have a direct impact on the company’s revenue stream — and thus its valuation.

Beyond giving the product road map a simple thumbs up or down, your technical due diligence should provide actionable information about the upcoming 24 months, including critical dependencies, risk factors and major technical milestones that will usher in product milestones. As a TDD assessor, you should gather this information to track the success of your investment over the short- to mid-term.

In order to evaluate what the technical team must accomplish in order to execute the product road map, you should:

• Capture the business context of the road map

• Understand the business objectives for the next two years or more

• Evaluate today’s technical foundation to appreciate whether it can support future plans

• Internalize the future plans

• Evaluate the team’s ability to deliver these plans — and to mitigate risks

Understanding The Business Context Is Critical

Technology serves the business. It follows that you should assess technology in the business context of the company: Consider market (consumer, enterprise, or government), space (finance, health, social, tools and so on) and company maturity (five versus 10,000 enterprise seats and 1,000 versus 1 million daily users) as a few obvious dimensions. “Scalability” or “security” have very different meanings depending on the company’s business context — and so do the solutions. Similarly, you should evaluate talent, processes, tools and operational playbooks differently based on the business context.

Skills And Experience Matter

Nowadays, many companies use multiple technology stacks. As a consequence, if you’re a CTO performing TDD, you should be “multilingual” so you can evaluate all components of the technology.

To assess development velocity, your investigation should also show how well the code is written and organized and include an evaluation of the tools for test automation, continuous integration/continuous deployment, data center deployment, monitoring, alerting, business intelligence, data science and so on. In addition, assessing a company’s specific expertise in artificial intelligence has become a must in many industries.

As if all that was not enough, TDD assessors should understand engineers as well. It is critical to assess individual and collective talent on the team, as well as organizational dynamics and methodology.

Finally, because so many of the risks and critical milestones can depend on the maturity of the company, one of the most important skills that you can bring as a CTO performing TDD is the ability to identify the inflection points in the company’s growth, assess the impact on technology and translate insights to the technology team: For example, what new technology requirements will you have when the company has reached product-market-fit and enters the growth stage? For this work, there’s no substitute for “I’ve been there.”

The Good News Is Also Important

In parallel to identifying what could go wrong, it is critical to highlight the company’s unique strengths. This starts with its intellectual property (whether it’s patentable or not), it and includes unique sources of talent, internally developed tools and methodologies that increase development velocity and difficult-to-recreate data sets … all of which may have been overlooked by non-technologically-inclined investors. Ultimately, the balance of a company’s unique strengths and weaknesses that will determine its success, and a good due diligence report will highlight that.

For a company seeking investment, TDD may seem like an unnecessary hurdle; however, when it’s properly conducted, TDD adds value and insight for both the investor and the startup.

How Machine Learning Will Disrupt The Established Cloud Providers

Previously published in Forbes on October 24, 2017

In the past few years, new categories of products have emerged thanks to the extraordinary advances in machine learning (ML) and deep learning (DL). These new techniques power product recommendations, computer-aided diagnosis in medical imaging and self-driving cars, just to name a few.

Most ML and DL algorithms require compute profiles (hardware, software, storage, networking) that are significantly different from those optimized for traditional applications. Consequently, as more and more companies develop their own ML/DL solutions and deploy them to production, the demand for the ML-optimized compute resources will grow dramatically and create opportunities for new entrants to offer solutions that compete with today’s dominant cloud providers: Amazon AWS, Microsoft Azure and Google Cloud.

The ML/DL Cloud Is Different

In an article on Mesosphere’s blog page, Edward Hsu presented the case that web applications are now primarily data-driven. Consequently, a new set of frameworks (a.k.a. stacks), namely SMACK (Spark, Mesos, Akka, Cassandra, Kafka), must replace the traditional LAMP (Linux, Apache, MySQL, PHP) stack used to build web-based applications. In my view, rather than replacing LAMP, SMACK will coexist side by side with, and feed data to, traditional web-based based frameworks, which are still needed to present nice-looking webpages and interface with mobile phones.

Yet the main point is well-taken. We need to update Marc Andreesen’s famous line about how “Software is eating the world” to “Data is eating the world.” Let’s unpack this statement and derive the consequences.

Hardware

The disruption created by machine learning and deep learning extends well beyond the software stack into chips, servers and cloud providers. This disruption is rooted in the simple fact that GPUs are much more efficient processors for ML and DL than traditional CPUs.

Up until recently, the solution was to augment traditional servers with GPU add-on cards. We are now at a point where demand for ML/DL computing is such that special-purpose servers, optimized for ML/DL compute loads, are being built.

Data centers are also being re-architected to support the extremely large amount of data consumed by ML and DL. Imagine you are designing the brains for self-driving cars. You need to process thousands and thousands of hours of video (and other such signals as GPS, gyroscopes, LIDAR) to train your algorithms. The amount of data that a Tesla on the road records in one second is a million times larger than a tweet or a post on Facebook.

ML/DL data centers thus require both huge amounts of storage and extremely high bandwidth.

Software

The software side is even more complex. A new infrastructure stack, typically using machine learning-specific frameworks such as Tensorflow (originally developed by Google) or PyTorch (originally developed at Facebook), is required to shepherd data around and manage the execution of the compute jobs. Furthermore, open-source code libraries (pandasscikit-learnmatplotlib) are used to implement the models (e.g., neural networks, data displays). These model libraries are critical because they are optimized to be both easy to use for algorithm research and offer high performance for use in production.

Finally, each vendor offers complete building blocks for specific use cases. For example, Amazon LexGoogle Cloud Speech and Microsoft Bing Speech provide speech recognition and can even recognize intent. Each has its own API and unique behavior, making the migration from one vendor to the other time-consuming.

New Entrants

In addition to the Big Three cloud providers (Amazon AWS, Microsoft Azure and Google Cloud) that have offered GPU-accelerated instances for a few years, new ML-optimized offerings have emerged:

• NVIDIA, which is already the dominant provider of GPUs that power the graphics cards that drive computer displays, recently introduced a portfolio of “purpose-built AI supercomputers” servers known as its DGX systems.

• Servers.com offers its Prisma Cloud with dedicated GPU-optimized servers.

• Rescale, one of the niche cloud providers that focuses on high-performance computing (HPC), just announced the availability of the latest generation of GPU-powered servers, along with high-bandwidth interconnect, to create high-performance multi-node clusters.

What’s At Stake

The Big Three cloud providers are the ones most immediately at risk to be disrupted by new entrants such as NVIDIA, Servers.com and Rescale. ML/DL innovation is still running at a torrid pace thanks to innovation in algorithms as well as compute efficiency. This is creating a small arms race where end users are constantly looking for the provider that can give that extra edge.

On one hand, end users are benefiting hugely from this arms race to provide the best software and hardware compute environment. On the other, this requires constant vigilance to keep abreast of the latest offerings. Even more importantly, when deploying ML/DL products to production, CEOs and CTOs need to pick the winner — or at least a future survivor — that will keep their edge for the next two to five years. This is not an easy task.

We will delve deeper into these two topics in future posts — stay tuned.

The Machine Learning Imperative

Previously published in Forbes on June 28, 2017

There’s no longer a debate as to whether companies should invest in machine learning (ML); rather, the question is, “Do you have a valid reason not to invest in ML now?”

Machine learning is here, and it’s finally mature enough to cause a major seismic shift in virtually every industry. For example, Matt Swanson, founder of SVSG, wrote an article last year about how chatbots will disrupt a $200 billion industry. While ML cannot solve every problem, it has demonstrated a game-changing impact in enough markets that every CEO and CTO must ask himself/herself whether they understand ML well enough to rule it out for their own business. While appreciating the rewards of ML may be difficult, we do know the risks: ML has already disrupted several industries, including e-commerceautonomous driving and customer engagement. The risk of ignoring ML today is one that is probably too large for any established company to take.

Machine Learning Changes The Game

While artificial intelligence grabs most of the spotlight in discussions about machine learning (primarily due to its easily graspable life-altering implications), it is but one of many disciplines in ML. Big data has demonstrated the enormous value of data: Netflix and Amazon recommend films and products based on our own purchase history and those of customers like us. Thus, big data has helped us answer questions we already knew to ask, questions such as, “What more can I sell to my customers?”

Machine learning allows us to make even better use of the data we have, as well as the data we don’t currently possess, and answer the questions we didn’t know we should ask.

Machine Learning Uses Data We Don’t Yet Have

Analytics and business intelligence extract information from structured data (i.e., data stored in databases: customer information, purchase history, etc.). But thanks to ML, we can now extract information from unstructured data such as texts, phone calls, images and videos.

Search engines used to return pages based the exact words of the query. ML takes this text analysis a few steps further. First, it extracts concepts out of words and associates pages that discuss the same concept with different words: A search for “artificial intelligence” will produce results that mention machine learning and robotics but not explicitly the words “artificial intelligence.” Beyond this, ML is now becoming proficient at sentiment analysis and determining intent in a given context. This means that ML can deduce, via our posts on social media, if we are happy or angry (sentiment analysis), for whom we are likely to vote for, or what purchase we are considering next (intent).

Similarly, ML techniques like natural language processing (NLP) and image categorization interpret and translate people’s speech as well as the content of images (e.g., facial recognition on Facebook).

This means that, thanks to ML, the huge amount of publicly available content — which, up until recently, was of little use — can now give us useful new insights.

Machine Learning Makes Better Use Of The Data We Have

Machine learning provides a new class of algorithms that manipulates structured data that we already possess. AWS has a nice blog, including code, on how to build a prediction engine for customer churn. BlackRock is using machines to manage funds.

In addition, data that every company gathers from its customers (emails, chats, comments, support requests, etc.) can now be analyzed by ML to extract accurate customer sentiment (satisfaction with the service, suggestions, identifying emergency requests). Even polls and surveys may be replaced by ML algorithms that can mine Facebook, Twitter and news sites to capture the sentiment of millions of people expressing themselves openly.

Machine Learning Answers Questions We Didn’t Know To Ask

At the risk of stating the obvious, the power of machine learning is that it learns. The more information provided, the faster it learns and the better it answers.

While traditional business intelligence techniques can tell us how often products A and B are purchased together, these techniques fail in the face of a massive organization such as Amazon, which sells over 368 million products. However, ML can digest the flow of purchase transactions and identify patterns of joint purchases. ML can even use these predictions to automatically make purchase decisions (see German e-commerce merchant Otto as an example).

Furthermore, by leveraging data we don’t have — such as stock market indices, weather data, political news and government statistics — we can correlate external events with our business data and thus enrich the accuracy of our predictions and decisions.

Why Now?

The rapid growth of machine learning leads to uncertainty, which may entice business leaders to hesitate in utilizing it. Yes, machine learning is complex, but it is also a powerful force of disruption. Because ML is still developing, it presents an opportunity to pull ahead of the competition by taking advantage of this maturation period. The choice is simple: disrupt or be disrupted.

It will take some time to ascertain what use cases are relevant to your company, so it is important to start this investigation now. ML is complex and challenging to master, yet the tools for machine learning are all readily available to you and are already being employed by AmazonGoogle and  Microsoft.

The journey to machine learning must start now.

Time Tested Engineering Leadership Principles

I put together the first three of these four leadership principles during my first VP of Engineering gig, twenty years ago. Thirteen companies later, and having shared it with hundreds of engineers, I feel it is time to share the secret J

These leadership principles have been honed (a) for Engineers and (b) in the context of startups, typically with fewer than 150 employees. No claim is being made outside of these parameters.

1.   I commit to give you more responsibilities than you can handle … and help you succeed

The vast majority of Engineers are highly motivated (see my previous blog on “(Boosting) Morale in Engineering). They are motivated by their career, naturally, yet they are primarily driven by a need to accomplish and an intense desire to learn.

Another way of articulating this commitment is: “I am going to challenge you, and let you work as hard as you want, and exercise as many of your skills as possible”. Engineers hate being bored. On the contrary, they work extra hard when challenged. So my job is to continuously provide new challenges to each engineer in my team, and remove any impediments to their desire to fulfill these challenges.

2.   I commit to give you clarity, both strategic & tactical

I work hard to ensure that everyone knows where we, as a company and as an Engineering team, are going, what our objectives are (strategic), and how we plan to get there (tactical).

In practice, I make sure, during our periodic 1on1 that each engineer understands how his/her own project and role align with the company mission, and Engineering’s product roadmap.

Included in this commitment is a promise to each member of the Engineering team that on any given day, his/her #1 priority is clear. As logical consequence, this implies that each engineer only has one #1 priority (I have seen a lot of companies where this logic is violated). Their manager, or I as last resort, will handle situations where, for example, 3 VPs are breathing down an engineer’s neck, each with their own “top priority”.

Having everyone in the team understand and share the same strategic context empowers developers to make correct micro-decisions every day. As a side benefit, this frees me and their managers to work on bigger problem.

 

Taking a step back, if I’ve communicated correctly my commitments 1 and 2, then everyone in the team is working at the maximum of their ability – and – all are working in the same direction. This is a good foundation for solid productivity.

Having made two commitments to everyone in the team, I ask for two in return.

3.   In return, I demand teamwork & 3-D communications

I put teamwork and communications in the same sentence because one is meaningless without the other. Teamwork can’t exist without meaningful communications, and if we communicate but don’t work together, we don’t go very far.

No interview question will ever suss out whether a candidate is a team player or not. Instead, I explicitly declare that they should not join my team if they are not a team player.

Team work is important because product development is a team effort. Every engineer interacts with product managers, UX designers, front-end engineers, middle-tier, backend, data, QA, tech support, etc. Poor interactions with other team members results in poor individual efficiency.

Teamwork means that “together, we succeed”. Teamwork is not merely about helping out a teammate who needs help. More importantly, being a team player means asking for help when we need it, so as not to delay the whole team.

3-D communications simply expands the definition of “team” beyond one’s daily scrum. We are all inter-dependent, and we each must ensure that information gets to the people who need it, no matter where their name sits in the org chart. Making sure information is received in a timely fashion, rather than waiting for questions to be asked, is incumbent upon each of us.

In particular, this means that everyone on my team has the responsibility to inform me if I am not meeting commitments #1 and #2 stated above. I don’t read minds, and I can only take corrective actions if someone lets me know that they are bored, confused, pulled in too many directions, or under-utilized, etc.

4.   At the end of the day, we need to be proud of our work

I added this fourth principle, a few years later. I had been working at a company for about a year, had delivered a handful of successful releases, yet sensed burn-out and loss of creativity in the team.

A startup demands almost contradictory qualities from its Engineering team: speed and creativity (quality is a given). Because the demand on speed is often explicit, while the demand on creativity is often implicit, it is easy to fall into the trap of focusing only on execution at the detriment of innovation, or even the beauty of the code.

Yet, if we continuously succumb to the mantra of “ship, ship, ship”, and give up trying to build something cool, then we start on a slippery downward slope towards creating “blah” products. There are always pressures to ship more features faster, but if each of us is not proud of the product we are releasing to our customers then our customers won’t be excited about the product, and we won’t be having fun at work. Life is too short for us to accept either of these issues.

Making It All Work

There is nothing new, or magic, about these four leadership practices. The magic is in their daily practice. They work for me because I force myself to apply them on a daily basis, and I remind my teammates of their existence, their rationale and their own commitments, whether when welcoming a new member, during a 1on1, during my weekly staff meetings, at exec staff, or monthly Engineering updates, or even at the water cooler.

Scalable Software Architecture for a Startup

Say we are the founders of a startup and we just got a big fat check for our A-round funding. The VCs love our idea, and we all know that our app will attract millions of users in no time. This means that from day one we architect for millions of page-views per day…

But wait … do we really need to deploy Hadoop now? Do we need to design for geographical redundancy now? OR should we just build something that’s going to take us through the next 3 months, so that we can focus our energy on customer development and fine-tuning our product features? …

This is a dilemma that most startups face.

Architecting for Scale

The main argument for architecting for scale from the get-go is akin to: “do it right the first time”: we know that lots of users will be using our app, so we want to be ready when they come, and we certainly don’t want the site going down just as our product catches fire.

In addition, for those of us who have been through the pain of a complete rewrite, a rewrite is something we want to avoid at all costs: it is a complex task that is fun under the right circumstances, but very painful under time pressure, e.g. when the current version of the product is breaking under load, and we risk turning away customers, potentially for ever.

On a more modest level, working on big complex problems keeps the engineering team motivated, and working on bleeding or leading edge technology makes it easier to attract talent.

Keeping It Simple

On the other hand, keeping the technology as simple as possible allows the engineering team to be responsive to the product team during the customer development phase. If you believe, as I do, one of Steve Blank’s principles of customer development: “No Business Plan Survives First Contact with Customers”, then you need to prepare for its corollary namely: “no initial product roadmap survives first contact with customers”. Said differently, attempting to optimize the product for scale until the company has reached clear validation of its business assumptions, and product roadmap, is premature.

On the contrary, the most important qualities that are needed from the Engineering team in the early stages of the company are velocity and adaptability. Velocity, in order to reduce time-to-market, and adaptability, so that the team can rapidly adapt to feedback from “outside the building”.

Spending time designing and implementing a scalable architecture is time that is Not spent responding to customer needs. Similarly, having built a complex system makes it more difficult to adapt to changes.

Worst of all, the investment in early optimization may be all for naught: as the product evolves with customer feedback, so do the scalability constraints.

Case Study: Cloudtalk

I lived through such an example at Cloudtalk. Cloudtalk is designed as a social communication platform with emphasis on voice. The first 2 products “Cloudtalk” and “Let’s Talk” are mobile apps that implement various flavors of group messaging with voice (as well as text and other media). Predicint rapid success, Cloudtalk was designed around the highly scalable noSQL database Cassandra.

I came on board to launch “Just Sayin”, another mobile app that runs on the same backend (very astute design). Just Sayin is targeted to celebrities and allows them to cross-post voice messages to Twitter and Facebook. One of my initial tasks coming on board was to scale the app, and it was suggested that we needed it to move it to Amazon Web Services so that we can scale rapidly as more celebrities (such as Ricky Gervais) adopt our product. However, a quick analysis revealed that unlike the first two products (Let’s Talk and Cloudtalk), Just Sayin’ impact on the database was relatively light, because communications were 1-to-many (e.g. Lady Gaga to her 10M fans). Rather, in order to scale, we first needed a Content Delivery Network (CDN) so that we could feed the millions of fans the messages from their celebrities with low response time.

Furthermore, while Cassandra is a great product, it was somewhat immature at the time (stability, management tools) and consequently slowed down our development. It also took us a long time to train new engineers.

While Cassandra will have been a good choice in the long run, we would have been better served in the formative stages of the company to use more established technology like mySQL. Our velocity in developing new features, and our ability to respond to changes in product strategy would have been significantly faster.

Architecting for Scale is a Process, not an Event

A startup needs to earn the right to design for scale, by first proving that it has found a legitimate market. During this first phase adaptability and velocity are its most important attributes.

This being said, we also need to anticipate that we will need to scale the system at some point. Here is how I like to approach the problem:

  • First of all, scaling is an on-going process. Even if traffic increases dramatically over a short period of time, not all parts of the system need to be scaled at the same time. Yet, as usage increases, it is likely that any point in time, some part of the system will need to be scaled.
  • In order to avoid complete rewrites of the system, we need to break it into independent components. This allows us to redesign each component independently, and have different teams work on different problems concurrently. As a consequence, good modularization of the system is much more important early on, than designing for scale
  • Every release cycle needs to budget time and resources for redesign – including both modularization and scalability. This is just like maintenance on the Golden Gate bridge: the painters are always working; when they finish at one end, they start all over at the other end.
  • We need to treat our software architecture the same way, and budget maintenance work every release cycle: dollars, time, people. CEOs have to be trained to not only think about the “shiny features” – those that are customer-facing – but also about the “continuous improvements” of the architecture that has to be factored in every release cycle.
  • We also need to instrument the code to tell us were it is under strain. Unlike the Golden Gate bridge, we can’t always see where it’s breaking, or even rationalize it. Scaling sometimes works in mysterious ways that are not always obvious to predict.

 

In summary, designing for scale is a high-class problem, on which we only get to work once we have demonstrated true demand for our product. During this first phase, velocity and adaptability are critical, and are better served with well-understood technologies, and a well modularized design. Once our product reaches an adoption phase, then designing for scale is a continuous process that hopefully can be focused on individual modules in turn – guided by proper instrumentation of the code

 

New Year’s One Wish – and One Resolution

New Year’s One Wish: Specs on Time
New Year’s One Resolution: Fostering More Innovation

It is that time of year again …. New Year’s resolutions. Well, I won’t talk – in this blog – about staying fit or a better work-life balance, but rather I’ll keep it focused on software engineering. I will also keep it simple and limit to one wish: specs on time — and one resolution: fostering more innovation.

New Year’s One Wish: Specs on Time

If I could wave a magic wand, and change just one thing in my professional life, it would be: Engineering receives specs 8 weeks prior to the official start of the release

Why?

As obvious as the answer is, in general, the Business team (product management and execs who are involved in the product roadmap) don’t seem to grasp that if Engineering gets incomplete specs when we are supposed to start coding, well … we can’t start coding – which means we will not be able to deliver as many features as we could have.

For example, let’s say that one release finishes July 31, and that –as is typically the case – we only start discussing the December release on August 1. The result is that the specs do not get finalized until the end of August. Thus rather than having the expected 5 months to implement the new release, we have four. Actually, we have much less. …

Why “8 weeks prior”?

A lot of work must take place between the time Engineering receives the spec, and the time all developers can start coding in earnest:

  • Scope the deliverables: to figure out what subset of the proposed list of features submitted by PM can actually be delivered in the imposed timeframe
  • Finalize the exact deliverables: Based on the “cost” (scope) provided by Engineering, the PM team determines the final list of deliverables for the release
  • Architecture and design: some features require thinking before coding J
  • Technology analysis: evaluate and validate any new third-party tools, libraries, or packages
  • Task breakdown and allocation to individual developers

Only when all these tasks have been completed – at least on the most important features – can development start in earnest.

8 weeks seems like a long time, and there is obviously flexibility in the number. The newer the functionality, the more uncertainty on the scope, and the more research required, the more lead time is needed by Engineering to plan out the release.

A good rule of thumb is: when the previous release is feature-complete is the time when PM and Engineering need to start planning the next release, specs in hand.

What Spec?

… Enough to be able to scope, and plan

In order for Engineering to commit to deliver certain features by a certain date, Engineering needs enough information about these features to be able to scope them – in other words, something more substantial than a handful of bullets on a PowerPoint slide, or a wiki.

There are many ways to write up requirements, and in the spirit of Agile Development, we don’t really want, nor need, everything written down upfront. On the other hand, Engineering needs a modicum of clarity and specificity about what needs to be built. This can be delivered via use cases, UI mock-ups, flow-charts, feature list as long as the Engineering team can appreciate the scope of work: e.g. feature enhancement vs brand new functionality? Is new technology required? Are there performance challenges? Are there new partners or systems with whom we need to interface? Etc.

New Year’s One Resolution: Fostering More Innovation

The day-to-day, week-to-week, month-to-month pressure on the Engineering team is “to deliver on-time, on quality and on budget” The vast majority, not to say all, demands are short-term and reactionary: requests from customers, or responses to competitive pressures.

However, the Engineering team also has the responsibility to innovate: to add to our products something that nobody else has thought of. If we don’t, then within a couple of years, a competitor, or a new startup, will, forcing us to react and catch up.

So, how does one drive innovation in an Engineering team, when there is never enough time to do the things that the business team requires? The answer is that we have to make, sometimes steal, the time, and we have to be efficient – both are admittedly easier blogged about than done.  Please send me your suggestions!

Making the Time to Innovate

Few companies are as profitable as Google, and can afford to grant a blanket 20% of their time to employees to do self-directed research. On the other hand, in a world where technologies become obsolete within a couple of years, it is suicidal not to heed Steve Covey’s 7th Habit: “Sharpen the Saw”. This applies to each of us as individuals, but also as a team.

In practice, this means each engineer must, on his/her own, make time to research new technologies and approaches. It also means that as a team, we need to organize time to discuss promising technologies.

The best approach that I have found is “Tech Talks” where members of the Engineering team present their recent work; articles, books or blogs they have read; or simply a new technique that they have invented.

Directed Innovation

“Directed innovation” is admittedly a contradiction in terms. Yet, a small company cannot afford to disperse its resources. It is thus incumbent on the leadership to present the right context for the Engineering team in which to explore, why our customers buy our product and what will make a true difference for our business: performance, scalability, usability, reliability, etc?

With this context, we can direct our investigations and leverage our efforts. Again, team work, and group discussions usually accelerate the growth of ideas.

Getting a startup to articulate its vision and strategy, particularly how these translates to technology needs, has traditionally been a challenge in most of the startups where I have worked. The mode of operation has typically been more reactive – following the orders that will make the quarter …

This is exactly why fostering more innovation is my one and only New Year’s resolution.

Planning – and Executing the Plan – are Part of the Job

Along with writing good code, planning and meeting the plan are part of an engineer’s responsibilities, in order for the product to be successful and the business to thrive

Being an engineer entails more than writing good code. It also requires being a good corporate citizen. We write and test software so that it can be used by our customers. The Engineering team is one of the teams that constitute the business. As such, we need to coordinate our activities with those of the other teams in the business: Marketing, Sales, Operations, and Support. We are dependent on these other teams for our software to find its way into the hands of our customers. We also depend on them for the business to survive. Let’s not forget that Engineering is an expense center, and that without the Sales team, there would not be any paycheck.

Our obligation to the other teams in the company can be summarized fairly simply: we need to deliver what we promised, on time. We thus need to be able to forecast within a reasonable horizon what we will be able to create, and then deliver against our forecast.

Planning is Difficult but Necessary

Some argue that writing software is a creative and innovative endeavor, which, as such cannot be predicted. The comparison is made with Civil Engineering where designing a new building is akin to applying well documented formulas and following well defined processes lending themselves to formulaic forecasting. While there is truth to the argument, it cannot be taken to the limit. It does not means that forecasting a software project is impossible, but rather that it is hard.
This being said, we don’t have a choice. As I often point out to my colleagues, sales people have to forecast every quarter, and one can argue that forecasting sales is eminently more challenging, since it relies on the behavior of people over whom we have very little control: our customers. Yet, no company can operate without a sales forecast, and forecasting is one of the skills that salespeople need to develop, along with their sales acumen. Engineers are in the same situation.

More specifically, the reasons we make plans are:

  • To forecast when a given release will be complete. This in turn will drive forecasts for sales projections, staffing assignment in services – which in turn drive financial projections, and how the company manages its expenditures – such as our salaries
  • To make strategic decisions: for example, if certain set of features take too long, or too many resources, we may decide to postpone their implementation, and allocate resources to another product or set of features.
  • To make our own decisions: by knowing how much work each task will take allows us to staff projects appropriately, and thus be as efficient as possible.  Over-staffing and under-staffing both have negative consequences that are easy to understand
  • To align internal resources: the most obvious example is that the QA team needs to know when a certain feature will be ready to be tested.
    The above illustrates how important it is to meet our commitments, once we have announced our plans. If we don’t meet our plans, we let other people down, and force them to scramble to make alternate plans. Yet, meeting one’s commitments is not only about working hard. It starts with making good plans.

Making Good Plans

How does one make good plans?

  • First and foremost: include everything (easier said than done but none the less critical)
    • Think through ALL the tasks that are required to complete the job: create a new Maven project, become familiar with the idiosyncrasies of a new software package, upgrade libraries to a new version, organize design reviews, code, unit tests, integration tests, performance tests, error recovery tests, security intrusion tests, documentation, training, etc.
    • Account for everything that happens in a typical day/week: e.g. Meetings, interrupts from Ops, support, or other
  • Be realistic: Engineers tend to be optimistic – make sure that you take into account that something at some point is going to go wrong
    • The best technique that I know is to use history as a reference. Have you typically been late/early on your past projects. Are there activities that you typically fail to account for?
  • Build some buffer – because it is important to meet the commitment (and if you don’t need the buffer, you’ll use the time to implement an extra feature, or start the next release early)

Tracking Progress

A tool like Atlassian’s Jira allows each developer to enter their tasks and the time for each task. It is critical that each developer enter their own time estimate. No task should be longer than 2-3 days. If it is, it is best to break it up. I have found it to be the right balance between having enough detail in the task to grasp its whole scope, while keeping the total number of tasks manageable.

It is important to think of a task as a complete project: including reviewing requirements, design, code, integration, testing, documentation, hand-off to QA. Of course, each of these tasks can be spelled out when their scope warrants it. Again, include the typical daily overhead in the estimates .
Once we have entered the tasks in Jira, it is critical to track them accurately. Don’t be shy about entering time beyond your original estimates if you are running late: your teammates, and your team lead, need to know — so that they can make alternate plans if necessary. Progress tracking tools are not meant to find faults, but for project management and communication: it is a much worse offense to your team to keep quiet about your being late, or struggling, on a task, than the fact of being late.  Being late is a problem that can be dealt with – keeping quiet is a professional fault that hurts the project even more than bad code.

One important note: a task is DONE when you won’t need to put any more work into it. In particular, this means a piece of code is not done until it has been fully tested and validated.

Cloud Computing – The Miracle Tool for Testing

Cloud Computing eliminates restrictions due to the number of servers in the QA lab, and thus allows concurrent testing by developers and QA engineers. By making it easy to test often, and to expose early releases to the outside world, Cloud Computing will improve product quality

Does this story rings familiar? You are in a planning meeting for the next release, and learn that in addition to supporting Oracle 11g, the product will also need to support Microsoft SQL Server 2008 (or DB2, or mySQL, or PostgreSQL). Once the typical brouhaha dies down about how complicated this will be, how the whole code will need to be ripped apart, and how much time this will take, the Director of QA turns to you and asks for a couple of additional servers for the QA lab, so that the software can be tested on the two databases in parallel; minimum of three servers: 1 for the database, 1 for our software, and 1 for the test fixtures. The following day, it’s the developer lead’s turn to ask for more servers: need at least 1 “populated” database against which the developers can test, plus another set up for the daily build, etc.  Makes perfect sense … Except that no budget has been allocated for these servers! Soon you find yourself with your beggar’s cup in the CEO’s office, explaining to him, and the CFO, why your team needs these extra servers when “you already have so many!!”

Rejoice! Here comes Cloud Computing to the rescue ..

Cloud Computing could not only eliminate the need to purchase servers for testing, but also actually radically improves your ability to test, and thus improve product quality.

Cloud Computing, such as Amazon EC2,  offers the ability to deploy (and un-deploy) software on demand. One pays “by the hour” of computing used, and storage and bandwidth consumed. This is perfect for testing (by developers and by QA): compute load varies greatly over the cycle of the day, as well as the cycles of the release.

First of all, every developer can now have his/her own test setup against which to test. There is no limitation of hardware, no begging, borrowing or stealing from your colleagues for unutilized servers. One can just deploy at will. Furthermore, there is no restriction on the number of servers. So if you need to test a four-server cluster, you don’t have to hunt around for free servers, you just do it.

Similarly the daily build can deploy to multiple test environments concurrently and thus accelerate the validation of the build.

Finally, the QA team can also test in multiple environments simultaneously, e.g. Oracle and SQL Server at the same time! This offers the potential benefit of being able to test a much larger number of deployment scenarios, than would be possible using one’s own hardware.

Naturally, leveraging a Cloud Computing infrastructure, requires new tools.

First and foremost, all the tests must be automated. While technology has created virtual servers, it has not yet inventing virtual test engineers J.  Secondly, one will have to build tools to automatically deploy, e.g. from the build environment, the new version of the software, and the test fixtures, as well as collect the results of the test runs.

One can be quite creative with the test management tools. For example, if a test setup encounters a high-severity bug, you could configure your test software to pause the test, deploy to a second environment and continue testing in the second environment. This allows you to go back to the first test setup to troubleshoot, and find the cause of the crash.

Another fascinating advantage is that you can deploy demo or beta systems at will  (assuming your deployment model allows it.), and let your sales team or prospective customers to “play with” the early release. By making it easier to expose early releases of the product to the outside world, Cloud Computing further improves the quality of your product.

Will you save money by testing in a Cloud Computing infrastructure?

Obviously the answer depends … on your usage, but also on factors like how much data you need to keep permanently in the cloud. For example you may need to permanently store a synthetic database of a million users (it would be too slow to upload it each time). You will also incur higher networking traffic.

In addition, you may not want to move all your tests to the cloud. For example, you may want to keep your stress-tests, or longevity tests in-house, since these will be running 24×7, and you may want the option of running them on bare-metal.

At the end of the day, to me the attraction of Cloud Computing for testing is that it will increase quality (in addition to reducing costs). It will allow each developer to have access to a test environment at will.  It will create an additional impetus for test automation. Cloud Computing will also allow the concurrent deployment of tests to an arbitrary number of computing environments, and make it easier to give early access to your customers. Net-net, this translates to more tests in the same amount of time with less effort. It’s all goodness.

MVP – Minimum Viable Product

Defining the Minimum Viable Product requires the selection of a segment of target customers and deliver the smallest critical mass of features – as early as possible – provided that you can charge a high enough price for it.

I have recently discovered, with great delight, Eric Ries’ “Startup Lessons Learned” blog , and in particular, his post about Minimum Viable Product (MVP). This is not surprising, since we are both fans of Steve Blank‘s Customer Discovery Process.

Eric’s post reminded me, how critical, yet how difficult in practice, the concept of Minimum Viable Product is.

Defining the minimum viable product correctly allows you to release products that are valuable to your customers with the minimal amount of energy and time invested – because as the name says, you have done the minimum, and yet you provide value. Said differently, if you only need to have 2 features in your product in order to sell it for $100, then you’d be crazy to spend the extra effort to add a 3rd or a 4th feature. Plus, by only delivering the minimum, you get to market fast – and hopefully beat the competition.

So why is this so difficult in practice … at least in my experience 🙂 ?

My first answer is that it is a lot easier to define the Maximum Product than it is to define the Minimum Viable Product.

Defining the Maximum Product  entails compiling a list of all the possible features that your product could possibly have: you only need to talk to a handful of customers and take good notes. Critical thinking is not required. It is easy to get consensus on the Maximum Product: More is always better. The only problem is that no company can afford the time it takes to deliver this “ideal” product. Hence this need for the MVP.

The first step in defining the MVP is the one that is most often overlooked: you first need to define the segment of your customers that you target with the new product. The segment has to be small enough to group customer with similar requirements, but large enough that your new product will generate enough revenue.

The second step is to define the theme of the product in terms of benefits (not features). One of the best tools to help define this theme is by imagining that you are putting up a huge billboard on 101 (one the main arteries of Silicon Valley) that will advertise the new product: what  does the billboard say?

The third and final step is to define the critical mass of features in the release. In this step,  ruthless time vs feature vs price trade-offs need to be made – because the question is not just “what features do our target customers absolutely need?” (this list will always be too long), but rather: “Will our customers be willing to buy the product with these  features – available at this date –  at this price? Economically, this question may have multiple correct answers. However, in practice, presented with this question, customers will often select a date in the near term, which in turn defines the minimum viable product.