May 2019 – Software Engineering

The Art Of Technical Due Diligence

Previously published in Forbes on April 11, 2019

Technical due diligence (TDD) takes place once an investor (such as a venture capitalist, private equity manager or another company) has decided to invest in, or acquire, a technology company. Once they make this decision, they have limited time to dig into the company in order to ensure that its technology, its engineering team and its development velocity are as advertised.

As someone who has experienced this process from both sides — and whose company provides it — I understand the stress TDD sometimes causes. That’s because a poorly executed technical due diligence initiative can derail a deal and hurt the bottom line for investors — as well as the management team of the company receiving the investment — so it is worth understanding how to do it right as a CTO.

TDD Isn’t A Beauty Contest

After participating in dozens of TDD projects, I’ve learned that there are many ways to solve a technical challenge — including using frameworks or programming languages that I personally wouldn’t touch. It is thus critical to put aside one’s own ideas of technical purity and “the right way of doing things” during TDD and to have an open mind about how technology can be used. (You could even learn something along the way.)

Furthermore, it’s important to be clear about the purpose of TDD. If they’ve already made the decision to invest in a company, investors should assume that its technology is good enough today. What you want to know instead is whether the company can execute your business plan. A technical review should thus stay away from judging the beauty of today’s product architecture and take a more dynamic view to validate whether the technical team can deliver the future that the company has drawn for itself.

This dynamic perspective is all the more important because, for companies lucky enough to grow at a fast pace, life is messy. Code architecture is constantly evolving, and documentation is often incomplete and out of date. Concurrently, penetrating new markets and creating new major features often requires introducing novel technology or considerable re-architecting. You can reign in this apparent chaos during TDD through major core projects that temporarily do not produce user-facing features yet allow the engineering team to maintain velocity in the long run. A wise TDD will use these projects to discern between the “normal growth-driven chaos” and signs of any additional structure the company may need to reach a new stage of growth. An investor should expect one, two or even more of these fundamental projects in a two-year timeframe.

It’s All About The Product Road Map

The product road map is the engineering team’s commitment to the company to deliver specific features, products and capabilities on a given schedule. The management team, in turn, makes revenue projections based on the availability of these new features. Consequently, delays in the product roadmap can have a direct impact on the company’s revenue stream — and thus its valuation.

Beyond giving the product road map a simple thumbs up or down, your technical due diligence should provide actionable information about the upcoming 24 months, including critical dependencies, risk factors and major technical milestones that will usher in product milestones. As a TDD assessor, you should gather this information to track the success of your investment over the short- to mid-term.

In order to evaluate what the technical team must accomplish in order to execute the product road map, you should:

• Capture the business context of the road map

• Understand the business objectives for the next two years or more

• Evaluate today’s technical foundation to appreciate whether it can support future plans

• Internalize the future plans

• Evaluate the team’s ability to deliver these plans — and to mitigate risks

Understanding The Business Context Is Critical

Technology serves the business. It follows that you should assess technology in the business context of the company: Consider market (consumer, enterprise, or government), space (finance, health, social, tools and so on) and company maturity (five versus 10,000 enterprise seats and 1,000 versus 1 million daily users) as a few obvious dimensions. “Scalability” or “security” have very different meanings depending on the company’s business context — and so do the solutions. Similarly, you should evaluate talent, processes, tools and operational playbooks differently based on the business context.

Skills And Experience Matter

Nowadays, many companies use multiple technology stacks. As a consequence, if you’re a CTO performing TDD, you should be “multilingual” so you can evaluate all components of the technology.

To assess development velocity, your investigation should also show how well the code is written and organized and include an evaluation of the tools for test automation, continuous integration/continuous deployment, data center deployment, monitoring, alerting, business intelligence, data science and so on. In addition, assessing a company’s specific expertise in artificial intelligence has become a must in many industries.

As if all that was not enough, TDD assessors should understand engineers as well. It is critical to assess individual and collective talent on the team, as well as organizational dynamics and methodology.

Finally, because so many of the risks and critical milestones can depend on the maturity of the company, one of the most important skills that you can bring as a CTO performing TDD is the ability to identify the inflection points in the company’s growth, assess the impact on technology and translate insights to the technology team: For example, what new technology requirements will you have when the company has reached product-market-fit and enters the growth stage? For this work, there’s no substitute for “I’ve been there.”

The Good News Is Also Important

In parallel to identifying what could go wrong, it is critical to highlight the company’s unique strengths. This starts with its intellectual property (whether it’s patentable or not), it and includes unique sources of talent, internally developed tools and methodologies that increase development velocity and difficult-to-recreate data sets … all of which may have been overlooked by non-technologically-inclined investors. Ultimately, the balance of a company’s unique strengths and weaknesses that will determine its success, and a good due diligence report will highlight that.

For a company seeking investment, TDD may seem like an unnecessary hurdle; however, when it’s properly conducted, TDD adds value and insight for both the investor and the startup.

Prediction: Self-Driving Car Manufacturers Will Own The Car Insurance Business

Previously published in Forbes on July 2, 2018

Can you picture the day when your car insurance bill drops every month? This could very well happen as self-driving car manufacturers (SDCMs) take over the car insurance business.

As it turns out, SDCMs have several powerful incentives to do so.

Their primary motivation is to remove an adoption barrier to self-driving cars: The cost, and even the availability, of car insurance could be a deterrent when purchasing an autonomous vehicle for consumers, as well as the new generation of “taxi” companies. Today’s incumbent car insurance companies do not have statistical tables for accidents and fatalities for self-driving cars since self-driving cars are not yet in circulation. As a consequence, they are likely to be conservative and set high initial costs for insuring autonomous vehicles.

By contrast, SDCMs will have the next best thing to real-life statistics — they have data centers full of data not only about accidents but also about near misses (albeit for their own cars only). This means they can generate accurate statistics about accidents of their own cars as often as they want and thus estimate the cost to insure their cars. An SDCM will be able to turn a barrier to adoption into a potential sale.

Furthermore, by offering car insurance themselves, the self-driving car manufacturers not only remove a barrier to adoption to their product but they also project confidence in their product. In addition, SDCMs will improve a customer’s purchase experience by eliminating one painful step in the car purchase process (because who enjoys shopping for car insurance?), as well as eliminate a third party in the process. Even better, pricing for car insurance will be greatly simplified since the most important variable in the pricing equations — the human — will be taken out of the system. The price of insurance will be determined by the hardware and software installed in the car — not by the human driver. Whether it’s a 16-year-old who just passed their driver’s license exam, a soccer mom with 15 years of accident-free driving or a retired senior, the price will be the same assuming the technology in the car is the same in all three scenarios.

By the way, according to a US Market Research Report on automobile insurance by IBISWorld, the industry revenues totaled $259 billion in 2017. This is no small market, which, in and of itself, provides ample motivation for the self-driving car manufacturers to enter this market.

Since they will have actual real-time data on accidents and fatalities, SDCMs will radically drive down the cost of car insurance and make car ownership more affordable, thus expanding their market. Furthermore, reducing accidents is one of their primary business drivers to increase adoption. This will provide another incentive to drive down the cost of car insurance.

Since self-driving cars will only be commercialized once SDCMs have proved that they are safer than human-driven cars, at that point in time, SDCMs will be in a position to compute the exact probabilities of accidents and their cost because they will have all the data in their data centers. Beth Buczynski is correct in predicting in her article “With Self-Driving Cars, Auto Insurance’s Time Is Limited” that the cost of auto insurance will fall significantly over time and that consumers will no longer pay directly for auto insurance. However, auto insurance will not disappear, because self-driving cars, won’t eliminate all accidents. This liability will no longer be carried by consumers but either by the “robot-taxi” companies or by the SDCMs.

Most importantly, SDCMs will be able to offer car insurance from the get-go as soon as they market self-driving cars because they will be able to offer it at a much lower price than traditional insurance companies. SDCMs will need this cost reduction to help offset the additional cost of the autonomous driving equipment in order to reduce the total cost of ownership of their product.

Finally, since the price of insurance will be determined by how smart the autonomous driving system is, each time the car manufacturer (or the vendor of the autonomous driving software) publishes a new release, the cost of insurance could come down. I can’t wait.

For Machine Learning, It’s All About GPUs

Previously published in Forbes on December 1, 2017

Isn’t it curious that two of the top conferences on artificial intelligence are organized by NVIDIA and Intel? What do chip companies have to teach us about algorithms? The answer is that nowadays, for machine learning (ML), and particularly deep learning (DL), it’s all about GPUs.

In a previous article, I made the case to every CEO and CTO that “Machine learning allows us to make even better use of the data we have, as well as the data we don’t currently possess, and answer the questions we didn’t know we should ask.”

As more companies build AI-driven products, technology providers are responding to this demand by providing products that are computationally more powerful and easier to use and manage in production.

GPUs are driving the next wave of breakthroughs.

Why GPUs Are So Important To Machine Learning

GPUs have almost 200 times more processors per chip than a CPU. For example, an Intel Xeon Platinum 8180 Processor has 28 Cores, while an NVIDIA Tesla K80 has 4,992 CUDA cores. While a CPU core is more powerful than a GPU core, the vast majority of this power goes unused by ML applications. A CPU core is designed to support an extremely broad variety of tasks (e.g., render a webpage, drive word processors and enterprise software, manage peripherals) in addition to performing computations, whereas a GPU core is optimized exclusively for data computations. Because of this singular focus, a GPU core is simpler and has a smaller die area than a CPU, allowing many more GPU cores to be crammed onto a single chip. Consequently, ML applications, which perform large numbers of computations on a vast amount of data, can see huge (i.e., 5 to 10 times) performance improvements when running on a GPU versus a CPU.

Having recognized this fundamental fact a few years ago, the tech industry, particularly the ML crowd, has focused its efforts on taking advantage of the GPU. However, this is not a simple task. All layers of the compute stack have to be redesigned to take advantage of the GPU’s power.

Recent Developments For GPUs

NVIDIA has so far been the main provider of GPU chips for ML acceleration. The company has powered the AWS compute-optimized instances for the past year.

Furthermore, chip manufacturers are about to release chips that are architected specifically for ML from the ground up (rather than continuing to optimize GPUs, which were originally designed for graphics processing). NVIDIA is shipping the Tesla V100, which incorporates Tensor Cores designed specifically for DL, in addition to GPU cores. Google announced its Tensor Processing Unit (TPU) last year that powers its main services: Google Search, Street View, Photos and Google Translate. Finally, Intel announced this month its Nervana Neural Processor, which was also architected, in collaboration with Facebook, to optimize neural network computing.

Building The GPU Compute Stack

Having super-fast GPUs is a great starting point. In order to take full advantage of their power, the compute stack has to be re-engineered from top to bottom.

• Servers

A new category of servers needs to be built to feed the beast. This is necessary to send (and store) data to the GPU at the rate at which it is capable of consuming it, requiring up to 10x improvement in bandwidth.

NVIDIA just started shipping its DGX-1 server. Data throughput and storage have been optimized in order to take full advantage of the processing power of the eight Tesla-V100 processors included in the box.

Facebook recently announced its second generation of AI-hardware (“Big Basin”) to power its own core services: speech and text translations, photo classifiers and real-time video classification.

• Data Center

An article I wrote last month highlighted the impact of ML for cloud providers. Since then, new GPU-related developments have emerged.

Google just made its TPUs available on its compute platform.

Intel just announced its Nervana DevCloud, which is limited for the time being to research and experimentation.

Finally, a super-computing veteran of 45 years is entering the fray. Leveraging its decades of experience in high-performance computing (HPC), Cray will soon be offering its supercomputers for rent on Microsoft Azure. These servers can host a large number NVIDIA Tesla GPUs.

• Frameworks, Models And Algorithms

Optimized hardware requires optimized software. All cloud providers have optimized the major frameworks (Tensorflow, PyTorch, Caffe, MXNet) to their platform. Furthermore, GPU vendors are rewriting the major models and algorithms (NVIDIA Digits, Intel Nervana Graph) to take full advantage of the GPU’s power.

Through the GPU Open Analytics Initiative, companies such as MapD (DB, visualization) and H20 (ML) are rewriting fundamental technologies like databases and programming languages in order to eliminate data copies, which, if ignored, may significantly increase overall execution time.

Finally, some technologies have reached a degree of fidelity high enough to be offered as services: AWS, Google and Microsoft each offer various flavors of speech recognition, translation and synthesis. Similarly, China’s Megvii’s face recognition service has become very popular.

• The Edge

For some applications, the ML models that have been trained in the data center must be computed at the edge (i.e., close to the end user). In the case of autonomous driving, for example, the car’s brain is trained in the data center but must be run in the car.

Now that machine learning has become mainstream in the data center, dedicated products are being released for edge computing. For example, NVIDIA provides the Drive PXfamily of accelerator cards that host 1-4 GPUs, as well as multiple video and other sensor inputs. They can thus power anything from simple highway driving today to fully autonomous driving in the future.

A New GPU-Driven ML Landscape

From this whirlwind survey of innovation driven by GPUs, one can anticipate increases in processing power of two to five times over the next months, from which a second wave of machine learning breakthroughs is bound to emerge, allowing us to solve a brand-new class of challenges.

How Machine Learning Will Disrupt The Established Cloud Providers

Previously published in Forbes on October 24, 2017

In the past few years, new categories of products have emerged thanks to the extraordinary advances in machine learning (ML) and deep learning (DL). These new techniques power product recommendations, computer-aided diagnosis in medical imaging and self-driving cars, just to name a few.

Most ML and DL algorithms require compute profiles (hardware, software, storage, networking) that are significantly different from those optimized for traditional applications. Consequently, as more and more companies develop their own ML/DL solutions and deploy them to production, the demand for the ML-optimized compute resources will grow dramatically and create opportunities for new entrants to offer solutions that compete with today’s dominant cloud providers: Amazon AWS, Microsoft Azure and Google Cloud.

The ML/DL Cloud Is Different

In an article on Mesosphere’s blog page, Edward Hsu presented the case that web applications are now primarily data-driven. Consequently, a new set of frameworks (a.k.a. stacks), namely SMACK (Spark, Mesos, Akka, Cassandra, Kafka), must replace the traditional LAMP (Linux, Apache, MySQL, PHP) stack used to build web-based applications. In my view, rather than replacing LAMP, SMACK will coexist side by side with, and feed data to, traditional web-based based frameworks, which are still needed to present nice-looking webpages and interface with mobile phones.

Yet the main point is well-taken. We need to update Marc Andreesen’s famous line about how “Software is eating the world” to “Data is eating the world.” Let’s unpack this statement and derive the consequences.

Hardware

The disruption created by machine learning and deep learning extends well beyond the software stack into chips, servers and cloud providers. This disruption is rooted in the simple fact that GPUs are much more efficient processors for ML and DL than traditional CPUs.

Up until recently, the solution was to augment traditional servers with GPU add-on cards. We are now at a point where demand for ML/DL computing is such that special-purpose servers, optimized for ML/DL compute loads, are being built.

Data centers are also being re-architected to support the extremely large amount of data consumed by ML and DL. Imagine you are designing the brains for self-driving cars. You need to process thousands and thousands of hours of video (and other such signals as GPS, gyroscopes, LIDAR) to train your algorithms. The amount of data that a Tesla on the road records in one second is a million times larger than a tweet or a post on Facebook.

ML/DL data centers thus require both huge amounts of storage and extremely high bandwidth.

Software

The software side is even more complex. A new infrastructure stack, typically using machine learning-specific frameworks such as Tensorflow (originally developed by Google) or PyTorch (originally developed at Facebook), is required to shepherd data around and manage the execution of the compute jobs. Furthermore, open-source code libraries (pandas, scikit-learn, matplotlib) are used to implement the models (e.g., neural networks, data displays). These model libraries are critical because they are optimized to be both easy to use for algorithm research and offer high performance for use in production.

Finally, each vendor offers complete building blocks for specific use cases. For example, Amazon Lex, Google Cloud Speech and Microsoft Bing Speech provide speech recognition and can even recognize intent. Each has its own API and unique behavior, making the migration from one vendor to the other time-consuming.

New Entrants

In addition to the Big Three cloud providers (Amazon AWS, Microsoft Azure and Google Cloud) that have offered GPU-accelerated instances for a few years, new ML-optimized offerings have emerged:

• NVIDIA, which is already the dominant provider of GPUs that power the graphics cards that drive computer displays, recently introduced a portfolio of “purpose-built AI supercomputers” servers known as its DGX systems.

• Servers.com offers its Prisma Cloud with dedicated GPU-optimized servers.

• Rescale, one of the niche cloud providers that focuses on high-performance computing (HPC), just announced the availability of the latest generation of GPU-powered servers, along with high-bandwidth interconnect, to create high-performance multi-node clusters.

What’s At Stake

The Big Three cloud providers are the ones most immediately at risk to be disrupted by new entrants such as NVIDIA, Servers.com and Rescale. ML/DL innovation is still running at a torrid pace thanks to innovation in algorithms as well as compute efficiency. This is creating a small arms race where end users are constantly looking for the provider that can give that extra edge.

On one hand, end users are benefiting hugely from this arms race to provide the best software and hardware compute environment. On the other, this requires constant vigilance to keep abreast of the latest offerings. Even more importantly, when deploying ML/DL products to production, CEOs and CTOs need to pick the winner — or at least a future survivor — that will keep their edge for the next two to five years. This is not an easy task.

We will delve deeper into these two topics in future posts — stay tuned.

The Machine Learning Imperative

Previously published in Forbes on June 28, 2017

There’s no longer a debate as to whether companies should invest in machine learning (ML); rather, the question is, “Do you have a valid reason not to invest in ML now?”

Machine learning is here, and it’s finally mature enough to cause a major seismic shift in virtually every industry. For example, Matt Swanson, founder of SVSG, wrote an article last year about how chatbots will disrupt a $200 billion industry. While ML cannot solve every problem, it has demonstrated a game-changing impact in enough markets that every CEO and CTO must ask himself/herself whether they understand ML well enough to rule it out for their own business. While appreciating the rewards of ML may be difficult, we do know the risks: ML has already disrupted several industries, including e-commerce, autonomous driving and customer engagement. The risk of ignoring ML today is one that is probably too large for any established company to take.

Machine Learning Changes The Game

While artificial intelligence grabs most of the spotlight in discussions about machine learning (primarily due to its easily graspable life-altering implications), it is but one of many disciplines in ML. Big data has demonstrated the enormous value of data: Netflix and Amazon recommend films and products based on our own purchase history and those of customers like us. Thus, big data has helped us answer questions we already knew to ask, questions such as, “What more can I sell to my customers?”

Machine learning allows us to make even better use of the data we have, as well as the data we don’t currently possess, and answer the questions we didn’t know we should ask.

Machine Learning Uses Data We Don’t Yet Have

Analytics and business intelligence extract information from structured data (i.e., data stored in databases: customer information, purchase history, etc.). But thanks to ML, we can now extract information from unstructured data such as texts, phone calls, images and videos.

Search engines used to return pages based the exact words of the query. ML takes this text analysis a few steps further. First, it extracts concepts out of words and associates pages that discuss the same concept with different words: A search for “artificial intelligence” will produce results that mention machine learning and robotics but not explicitly the words “artificial intelligence.” Beyond this, ML is now becoming proficient at sentiment analysis and determining intent in a given context. This means that ML can deduce, via our posts on social media, if we are happy or angry (sentiment analysis), for whom we are likely to vote for, or what purchase we are considering next (intent).

Similarly, ML techniques like natural language processing (NLP) and image categorization interpret and translate people’s speech as well as the content of images (e.g., facial recognition on Facebook).

This means that, thanks to ML, the huge amount of publicly available content — which, up until recently, was of little use — can now give us useful new insights.

Machine Learning Makes Better Use Of The Data We Have

Machine learning provides a new class of algorithms that manipulates structured data that we already possess. AWS has a nice blog, including code, on how to build a prediction engine for customer churn. BlackRock is using machines to manage funds.

In addition, data that every company gathers from its customers (emails, chats, comments, support requests, etc.) can now be analyzed by ML to extract accurate customer sentiment (satisfaction with the service, suggestions, identifying emergency requests). Even polls and surveys may be replaced by ML algorithms that can mine Facebook, Twitter and news sites to capture the sentiment of millions of people expressing themselves openly.

Machine Learning Answers Questions We Didn’t Know To Ask

At the risk of stating the obvious, the power of machine learning is that it learns. The more information provided, the faster it learns and the better it answers.

While traditional business intelligence techniques can tell us how often products A and B are purchased together, these techniques fail in the face of a massive organization such as Amazon, which sells over 368 million products. However, ML can digest the flow of purchase transactions and identify patterns of joint purchases. ML can even use these predictions to automatically make purchase decisions (see German e-commerce merchant Otto as an example).

Furthermore, by leveraging data we don’t have — such as stock market indices, weather data, political news and government statistics — we can correlate external events with our business data and thus enrich the accuracy of our predictions and decisions.

Why Now?

The rapid growth of machine learning leads to uncertainty, which may entice business leaders to hesitate in utilizing it. Yes, machine learning is complex, but it is also a powerful force of disruption. Because ML is still developing, it presents an opportunity to pull ahead of the competition by taking advantage of this maturation period. The choice is simple: disrupt or be disrupted.

It will take some time to ascertain what use cases are relevant to your company, so it is important to start this investigation now. ML is complex and challenging to master, yet the tools for machine learning are all readily available to you and are already being employed by Amazon, Google and Microsoft.

The journey to machine learning must start now.

	Ely Shemer on Lessons Learned From 50 Techni…
	Lessons Learned From… on Lessons Learned From 50 Techni…
	DevOps Consult on DevOps-Driven Development
	devops training on DevOps-Driven Development
	Time Tested Engineer… on (Boosting) Morale in Engineeri…