Previously published in Forbes on October 24, 2017
In the past few years, new categories of products have emerged thanks to the extraordinary advances in machine learning (ML) and deep learning (DL). These new techniques power product recommendations, computer-aided diagnosis in medical imaging and self-driving cars, just to name a few.
Most ML and DL algorithms require compute profiles (hardware, software, storage, networking) that are significantly different from those optimized for traditional applications. Consequently, as more and more companies develop their own ML/DL solutions and deploy them to production, the demand for the ML-optimized compute resources will grow dramatically and create opportunities for new entrants to offer solutions that compete with today’s dominant cloud providers: Amazon AWS, Microsoft Azure and Google Cloud.
The ML/DL Cloud Is Different
In an article on Mesosphere’s blog page, Edward Hsu presented the case that web applications are now primarily data-driven. Consequently, a new set of frameworks (a.k.a. stacks), namely SMACK (Spark, Mesos, Akka, Cassandra, Kafka), must replace the traditional LAMP (Linux, Apache, MySQL, PHP) stack used to build web-based applications. In my view, rather than replacing LAMP, SMACK will coexist side by side with, and feed data to, traditional web-based based frameworks, which are still needed to present nice-looking webpages and interface with mobile phones.
Yet the main point is well-taken. We need to update Marc Andreesen’s famous line about how “Software is eating the world” to “Data is eating the world.” Let’s unpack this statement and derive the consequences.
The disruption created by machine learning and deep learning extends well beyond the software stack into chips, servers and cloud providers. This disruption is rooted in the simple fact that GPUs are much more efficient processors for ML and DL than traditional CPUs.
Up until recently, the solution was to augment traditional servers with GPU add-on cards. We are now at a point where demand for ML/DL computing is such that special-purpose servers, optimized for ML/DL compute loads, are being built.
Data centers are also being re-architected to support the extremely large amount of data consumed by ML and DL. Imagine you are designing the brains for self-driving cars. You need to process thousands and thousands of hours of video (and other such signals as GPS, gyroscopes, LIDAR) to train your algorithms. The amount of data that a Tesla on the road records in one second is a million times larger than a tweet or a post on Facebook.
ML/DL data centers thus require both huge amounts of storage and extremely high bandwidth.
The software side is even more complex. A new infrastructure stack, typically using machine learning-specific frameworks such as Tensorflow (originally developed by Google) or PyTorch (originally developed at Facebook), is required to shepherd data around and manage the execution of the compute jobs. Furthermore, open-source code libraries (pandas, scikit-learn, matplotlib) are used to implement the models (e.g., neural networks, data displays). These model libraries are critical because they are optimized to be both easy to use for algorithm research and offer high performance for use in production.
Finally, each vendor offers complete building blocks for specific use cases. For example, Amazon Lex, Google Cloud Speech and Microsoft Bing Speech provide speech recognition and can even recognize intent. Each has its own API and unique behavior, making the migration from one vendor to the other time-consuming.
In addition to the Big Three cloud providers (Amazon AWS, Microsoft Azure and Google Cloud) that have offered GPU-accelerated instances for a few years, new ML-optimized offerings have emerged:
• NVIDIA, which is already the dominant provider of GPUs that power the graphics cards that drive computer displays, recently introduced a portfolio of “purpose-built AI supercomputers” servers known as its DGX systems.
• Servers.com offers its Prisma Cloud with dedicated GPU-optimized servers.
• Rescale, one of the niche cloud providers that focuses on high-performance computing (HPC), just announced the availability of the latest generation of GPU-powered servers, along with high-bandwidth interconnect, to create high-performance multi-node clusters.
What’s At Stake
The Big Three cloud providers are the ones most immediately at risk to be disrupted by new entrants such as NVIDIA, Servers.com and Rescale. ML/DL innovation is still running at a torrid pace thanks to innovation in algorithms as well as compute efficiency. This is creating a small arms race where end users are constantly looking for the provider that can give that extra edge.
On one hand, end users are benefiting hugely from this arms race to provide the best software and hardware compute environment. On the other, this requires constant vigilance to keep abreast of the latest offerings. Even more importantly, when deploying ML/DL products to production, CEOs and CTOs need to pick the winner — or at least a future survivor — that will keep their edge for the next two to five years. This is not an easy task.
We will delve deeper into these two topics in future posts — stay tuned.