Software Engineering – from the Trenches – Page 2 – The art and practice of delivering software products

The CTO’s Yearly Checklist

Previously published on Forbes on 8/19/2020

In a startup, as in any adventure, one needs to raise one’s head toward the horizon once in a while to ensure that one is still headed in the right direction. Well-run companies typically hold quarterly executive off-sites, and at least once per year, the product road map is refreshed.

This is the perfect impetus to refresh everything in engineering: technology stack, tools, methodology, team and employee roles. Technology, tools or processes that used to work may become inadequate, or even break, as the company grows. A well-executed yearly review will identify the key challenges and opportunities for the following year, and thus allow you to identify the key decisions to be made inside engineering and to prepare for these decisions.

While the executive review of the product road map will focus on the execution part of the road map, it is equally important to lead an innovation review within the engineering team to ensure that you retain your technology leadership against the competition.

Finally, in order to have an effective yearly review, a lot of work must be done prior to the review (in order to inform the product road map decisions), as well as after it (in order to reflect the new product road map).

Before The Product Road Map Review

During the product road map review, the executive team will usually concentrate on customer-facing features and will ask for dates for key deliverables. In order to make this discussion as effective as possible, you need to research what the likely top requests will be. In addition, you need to identify technical debt, as well as noncustomer-facing features (quality, robustness, performance, business continuity, compliance/security) that must be addressed — and build a business case for each of these, along with timing and resource allocations.

Because your development capacity, velocity for paying technical debt back and customer-facing work are determined by the resources available, you need to negotiate your budget for the coming year, parallel to building our future plans. Conversely, making commitments to a product road map without a clear idea of resources available will lead to uncomfortable discussions later.

With a good idea of the major engineering projects in place, you can refresh your technology road map and discuss the new technologies you need to acquire in order to deliver next year — whether this technology is inside the product or part of your internal tools. For example, have there been any significant advances in AI, cloud computing or analytics that will improve your efficiency or increase your competitive differentiation?

Finally, a good retrospective of the team will complete the preparation for the annual review. Based on this year’s accomplishments and next year’s objectives, how does the team need to evolve? How do you need to evolve? Do you need to radically improve quality? Will your market demand a step up in security? Who on the team has delivered beyond expectations? Do you need to take new classes or get a mentor? A thorough retrospective should involve a broad consultation with people inside and outside the engineering team.

During The Product Road Map Review

Product road map review meetings — particularly when part of an executive off-site — are usually intense affairs with lots of passionate discussions (usually a good thing). As CTOs, we must accomplish two critical objectives:

1. Avoid committing to any delivery dates on the spot, unless we have absolute clarity on both requirements and resources availability. However, you must provide estimates of scope for key features to inform decisions on priorities.

2. Ensure that the most important deliverables on the road map have well-documented business cases, from which it will be straightforward to extract precise requirements.

After The Product Road Map Review

Even when the yearly product road map review does not bring major surprises, the aftermath always entails a lot of work, which consists of delivering the actionable product road map and figuring out the changes necessary to execute this road map — beyond writing the code.

An actionable product road map is a commitment from the engineering team to deliver certain features by certain dates. This implies that the budget has been finalized, requirements and resources are clear, and you have done a detailed-enough design and task breakdown to make these commitments with enough confidence and buffer that you will not disappoint your customers.

In parallel, you must solidify our plans to refresh how you innovate, as well as how you execute.

On the technical side, you need to complement the customer-facing product road map with your internal technology road map, your technical debt payback plan, and your tools and infrastructure upgrade plans.

Finally, and too often forgotten, the organization must be refreshed: Team structure, culture, metrics, methodology, communication processes, technical skills and talent all need to be reevaluated with the active contribution of the teams’ leaders.

This massive effort culminates with extensive communications: The product road map, once it has become actionable, is shared with the business teams inside the company. In addition, when sharing the road map with the engineering team, it is critical to highlight the planned improvements in engineering, which will make this road map realistic, along with associated growth opportunities for each individual. This communication must be well orchestrated through all-hands, team and individual meetings so that every single engineer continues to be motivated, challenged and rewarded by the year ahead.

Finally, you need to give your team the tools for success, whether building up your direct reports and delegating more, defining new challenges to feed your continued motivation, learning new ways to lead, or implementing new technologies.

It is a lot of work to properly prepare and execute this yearly review. Yet, like most planning exercises, it usually bears fruits from the process itself of thinking about the future. Going into a new year with a well-thought-out and well-communicated actionable product road map provides a guiding path for everyone inside, and outside, the engineering department.

Growth Is A Feature: Five Immediate Actions CTOs Can Take When Growth Skyrockets

Previously published on Forbes Technology Council, July 22, 2020

The magic moment for which you have been working for so long has finally arrived: Usage of the product is accelerating — the company is taking off!

As a CTO, this is wonderful news and the validation of years of dedication. Having gone through this critical stage a few times, and having advised companies going through this transition many times, it has become clear that many companies forget that reaching success requires more than just “feeding the beast” with more and more new features.

Growth is a long game, which requires its own dedicated share of mind. Having worked so hard to pull ahead of the competition, making the proper investments now will ensure your market dominance. Focusing on team organization, alignment of success metrics, software architecture, quality, user experience and automation in parallel with new feature development may initially seem a distraction, but it soon pays off in increased efficiency and averted disasters.

1. Celebrate And Prepare The Team

Because the pace of work will soon increase for everyone in the team, it is important to directly acknowledge your success in order to prepare the company mentally and organizationally for the future.

In particular, it is important for everyone in the company to acknowledge that growth is a feature. This means that in addition to “doing one’s job,” everyone must invest additional time to support the growth. For example, more time will be spent interviewing candidates. In addition, developing new features will take longer than in the past because of higher demands in quality and reliability, among others. In this instance, be sure to allocate time for growth in your schedule and task estimates. Get help early — because consultants can bring in expertise on short notice.MORE FOR YOUTony Hsieh’s American Tragedy: The Self-Destructive Last Months Of The Zappos Visionary

2. Update Business Operational Metrics

Most often, a high growth rate is not only generated by a growing number of users, but also by attracting new types of users. When “early majority” users join “early adopters,” they bring new ways of using the product, they navigate the product differently, have new favorite features, etc.

This new cohort of users is probably less emotionally invested in the product and, thus, needs a simpler onboarding process. They have lower tolerance for bugs and higher expectations for uptime, security and response time. For the development team, everything needs to go faster: page load, new features, new releases and new hires. While the cost of failure is higher, any outage impacts 10 times more users than last year.

You must make sure to review and update key success factors (KSF) with the whole business team to match the new needs of the business. For example, does quality now become as important as the rate of releasing new features? The conversation around KSFs — and the process of getting teams all across the business aligned — is more important than the actual numbers assigned to KSF. This is an ideal time to pay down technical debt in usage and conversion tracking tools, as well as analytics.

3. Improve Quality Tenfold

As a developer, there is nothing worse than being interrupted in the middle of developing a new feature to fix a critical bug from the previous release. As usage grows, bugs that were previously “acceptable” now gather enough customer ire to be classified as “must fix.” In addition, as the product reaches a broader market, new users may be less educated about, and less patient with, the product.

Rather than wait for the avalanche of bug requests to drown the development team, it is best to anticipate and raise the breadth and depth of testing in the development phase, pre-release. A 10-times increase in volume requires a 10-times improvement in quality to keep the same number of trouble tickets and, thus, keep the size of the support team from growing 10 times.

As the number of users increases, the definition of quality must be expanded to include ease of use, in addition to “absence of bugs.” Know — and instrument — your app. Instrument the code so that performance can be easily measured. Similarly, instrument the app in production to accurately track usage, as well as conversion, since new users may have different patterns.

4. Refactor To Match Dominant Use Case(s)

A typical growth strategy involves moving to new segments of the market. Frequently, a startup will target a beachhead of a broader market when launching the first version of the product. Over time, as the products capabilities expand, the market expands as well. As a corollary, the predominant use case at launch may no longer be the most favored once a company reaches the growth stage. In order to keep the product easy to use as new dominant use cases emerge, the user experience needs to be redesigned and the code needs to be refactored (and sometimes re-architected) to support these new use cases at scale.

Increasing modularization (i.e., breaking services into smaller independent services) and refactoring APIs is usually a good strategy to support new use cases. Other factors may motivate refactoring, including performance, scaling, ease of operations and even being able to scale the development team. Increased componentization will also make testing more efficient. Finally, calibrate the degree of modularization of the architecture to the traffic on the app. There are a limited number of companies that have the traffic that justifies going all out on microservices.

5. Automate

As the development team delivers more features faster, tasks that were done once a week must now be done several times a day. With this increased pace, manual tasks become more error-prone and affect the team’s velocity. Consequently, all processes must be considered for automation: testing, CI/CD, DevOps, SysOps and even security and business continuity.

For maximum efficiency, you can coordinate efforts around actions three through five in the same project, as they are mutually reinforcing.

With these tips, you should be well on your way toward embracing a mindset that not only continues to spur growth, but also embraces it.

How To Make The CEO-CTO Relationship Work, Part Two

Previously published on Forbes on July 5, 2019

The relationship between CEO and CTO is pivotal to the success of technology-driven companies. Yet, the personalities and working styles of these driven individuals can be different, which sometimes leads to suboptimal results. I had the experience of joining a company with an established CEO and of greeting a new CEO to my company, so I decided to write two letters to help CEOs and CTOs get on the same page.

This is the counterpart to my last article, “How To Make The CEO-CTO Relationship Work“: It’s the letter that I wish I had received from my CEOs and gives CTOs tips on how to operate and communicate most effectively in service to the CEO and the executive team.

Dear CTO,
I know you have a brilliant and creative mind and an impressive mastery of technology, along with a solid track record of developing world-class products. As you may have guessed, your technical skills alone will not suffice for your success as an executive and as a productive working partner to me. To ensure our joint success, I want to share advice with you about how we can most profitably combine our efforts.
Let’s start with a pair of obvious observations. First, your colleagues on the executive team, myself included, do not have a technical background. Second, the purpose of the company is to grow as rapidly as possible by delivering products that users want and to generate income.
These two realities may clash with your natural tendencies as a gifted creator, particularly when it comes to the technical sophistication of products. Developing the coolest, fastest and slickest product is not always the best business strategy — particularly if it takes a long time. We will need to develop a partnership that allows us to make decisions that include both business needs and technical options. Not every release needs to be perfect in terms of scalability, usability, security, and every other technical consideration. Yet every release must meet the company’s business objectives of the moment. In order to achieve this, you can learn to never say “no,” but rather to present trade-offs, and explain them in terms of their business impact rather than their technical features (which we don’t understand). For example, if we need to deliver on an aggressive schedule, we need you to inform us of what is feasible within the desired time frame in order to achieve the desired business outcome. Do we need to license technology, take away specific features or limit some aspects of the product?
In a similar vein, the team as a whole will benefit enormously if you hone a new kind of creativity, or rather add a new dimension to your technical creativity. This new dimension is one that meets the needs of our customers in new ways, that identifies new markets that we can expand into easily, and that drives the growth of the company. This is a rare talent — one that combines creative understanding of the market with technical innovation.
Your (non-technical) peers on the executive team need you to use language that they understand; we know that you’ve mastered the technical ins and outs. Also, don’t mistake us for your sounding board — rather, you can go to members of your team for that. What is meaningful to us is the impact on the business. Often, it simply boils down to this binary outcome: whether or not we will meet our sales projections for the quarter. Meeting our quarterly objectives is paramount — it ensures we get to “fight another day” — and for that opportunity, we may occasionally ask you to temporarily compromise on technical purity or the efficiency of the engineering team.
We also ask you to be strong. At times, the executive team may “groupthink” into an idea that’s really bad from a technical perspective. Should we do so, we’ll need you to stand your ground and find a way to communicate to us — in terms that we understand — the errors of our ways. Use the technical facts as a foundation to illustrate the business outcomes. You are the only person in the company who knows what it will take to deliver a certain product, what technology, team, methodology, tools, and so on are best suited, and ultimately how long it will take to deliver the product to our customers.
I will do my best to listen when these situations arise. Even so, however, this process is not easy: You don’t want to give up simply because you are in the minority. Perhaps the hardest part is that, once you are confident that the executive team understands both engineering costs and the business consequences of their proposal, you’ll need to let the team make the decision. A typical scenario is when an important new feature is prioritized ahead of a major software re-architecture. Shipping the new feature on the old architecture will require rewriting it once the new architecture is complete. Yet, sometimes this inefficiency is the “right call”: for example, if it makes lighthouse customers happy and blocks out the competition.
Finally, understand that we welcome your input on all topics — not just technology and engineering. I’ve worked with remarkable CTOs who were brilliant business strategists, marketers, and even salespeople. While we seek your input, the final decision belongs to the designated executive team member.
These skills and contributions are all essential to the success of our shared enterprise, and you should develop them while retaining the qualities that inspired us to hire you in the first place. While I have emphasized communications and business acumen, your top priority remains to be a world-class innovator and technical leader. I will help you acquire these new skills over time so that your influence can reach its full potential within the executive team and as a partner to me, but you should continue (and I can’t help you here) to be a world-class technologist.
I hope you will find these tips useful, and I look forward to building a strong partnership together.
Sincerely,
CEO

How To Make The CEO-CTO Relationship Work

Previously published on Forbes on June 17, 2019

The success of a venture-backed company usually depends on two main factors: its technical innovation and the velocity with which it introduces new products. In order to sustain these competitive advantages throughout their growth, companies must ensure that the delicate relationship between the CEO and the CTO is effective.

The CEO and CTO have a fluid relationship that changes over time. As the company grows, the relationship evolves because of the expansion of the executive team beyond the original founders. As the company grows, investors may also replace the CEO with “a real business person.” Sometimes, the CTO decides to leave the company and its politics to found yet another company.

I’ve experienced this rapidly shifting dynamic from both sides — as an outside CTO coming in to replace, or supplement, the founding CTO and welcoming a new CEO after the VCs replaced the founder CEO. In both scenarios, I have observed (and suffered from) misaligned expectations between the CTO/VP of engineering and CEO that lead to frustration and a lack of effectiveness on both sides.

With the benefit of hindsight, I have written two letters. The first, which I will present here, is one that I wish I would have written to my CEOs so they could have understood the nature of my job, my contribution and how to get the best out of me. The other, the letter that I wished I had received from my CEOs, is so they could have understood how to be most effective not only in leading the engineering team but also in understanding my role on the executive team.

Here’s the letter that I, as a CTO, wish I had written to my CEOs:

Dear CEO,

I want to thank you for placing your trust in me to be the new CTO of your incredible company. During the interview process, I thoroughly enjoyed our exchanges, and I was equally impressed by your past accomplishments, your business sense, your knowledge of the market and your drive.

Since you mentioned that you are “not technical,” yet you are responsible for leading a company whose success is highly dependent on the strength of its technology, I thought that I would take a running start in our relationship-building by sharing my thoughts on what will make our relationship effective.

My primary advice is that you allow me to do the things I am good at without second-guessing me. You hired me because I have proven more than once that I can build and lead a team of world-class engineers and launch world-class products into the market. While I expect to be challenged, like every member of the executive staff, when I say that developing a new feature will take three months, please don’t ask if it could be done in two weeks. I too want to win. The three-month figure will not come out of thin air, as my team and I will have spent time coming up with this number. If we ever need to build something with roughly the same features in two weeks, it will have to be an extremely watered-down version that we’ll call “demo-ware,” (which does have its place in certain circumstances), or we’ll need to pare the release down to one or two features.

For my team to succeed, I will also need you to work with the whole executive team to create an actionable product road map. By “actionable,” I mean that the priority of the features needs to be vetted by the business team and that the engineering team will need to be given the time to estimate the scope of major features so that the time frames published on the road map are realistic. If we follow this process, a sanitized version of the road map can be shared with the sales team and even customers.

The other major benefit of an actionable road map is that the engineering team can build a technology roadmap that will allow us to develop breakthrough features because we’ll have had time for research, experimentation and prototyping. Conversely, a road map that zigzags is not conducive to engineering efficiency because it wastes the time spent on design and planning work required for major features that are deprioritized. All of us in engineering understand that sometimes a major opportunity presents itself and that the whole company has to pivot to take advantage of it. We embrace those opportunities because we want to win just as strongly as you do. Yet the decision to pivot should consider the impact on engineering velocity as well as the new business potential.

Building a good product road map requires that we understand each other about schedule estimates: Loose requirements, changing priorities, a high velocity of development and accurate schedule estimates are not compatible. If you — and by extension, the business — require reliable schedule estimates, then engineering needs precise requirements that do not change, plus the time to work out a solid design from which a list of tasks and a schedule can be derived. If the nature of the business requires frequent changes of priorities, then let’s not bother with detailed estimates. Since it is a rare business that does not see priority changes, I strongly recommend that both the business and engineering teams embrace lean product and agile development methodologies.

Finally, at the risk of stating the obvious, engineers have different personalities than salespeople. When the engineering pen is quiet, it is not an indication of low morale. On the contrary, it shows that engineers are focused on writing code. I know that can be disconcerting to extroverts.

We’ll have to move fast in the journey we have undertaken together, and to do that, we need to communicate directly and trust each other. This letter is my attempt to do this, and if you’ve made it this far, there’s a good chance that we are at the start of a productive and fruitful partnership. I can’t wait.

Bernard Fraenkel

CTO

The letter I wish I had received from my CEOs will be published in a subsequent article.

The Art Of Technical Due Diligence

Previously published in Forbes on April 11, 2019

Technical due diligence (TDD) takes place once an investor (such as a venture capitalist, private equity manager or another company) has decided to invest in, or acquire, a technology company. Once they make this decision, they have limited time to dig into the company in order to ensure that its technology, its engineering team and its development velocity are as advertised.

As someone who has experienced this process from both sides — and whose company provides it — I understand the stress TDD sometimes causes. That’s because a poorly executed technical due diligence initiative can derail a deal and hurt the bottom line for investors — as well as the management team of the company receiving the investment — so it is worth understanding how to do it right as a CTO.

TDD Isn’t A Beauty Contest

After participating in dozens of TDD projects, I’ve learned that there are many ways to solve a technical challenge — including using frameworks or programming languages that I personally wouldn’t touch. It is thus critical to put aside one’s own ideas of technical purity and “the right way of doing things” during TDD and to have an open mind about how technology can be used. (You could even learn something along the way.)

Furthermore, it’s important to be clear about the purpose of TDD. If they’ve already made the decision to invest in a company, investors should assume that its technology is good enough today. What you want to know instead is whether the company can execute your business plan. A technical review should thus stay away from judging the beauty of today’s product architecture and take a more dynamic view to validate whether the technical team can deliver the future that the company has drawn for itself.

This dynamic perspective is all the more important because, for companies lucky enough to grow at a fast pace, life is messy. Code architecture is constantly evolving, and documentation is often incomplete and out of date. Concurrently, penetrating new markets and creating new major features often requires introducing novel technology or considerable re-architecting. You can reign in this apparent chaos during TDD through major core projects that temporarily do not produce user-facing features yet allow the engineering team to maintain velocity in the long run. A wise TDD will use these projects to discern between the “normal growth-driven chaos” and signs of any additional structure the company may need to reach a new stage of growth. An investor should expect one, two or even more of these fundamental projects in a two-year timeframe.

It’s All About The Product Road Map

The product road map is the engineering team’s commitment to the company to deliver specific features, products and capabilities on a given schedule. The management team, in turn, makes revenue projections based on the availability of these new features. Consequently, delays in the product roadmap can have a direct impact on the company’s revenue stream — and thus its valuation.

Beyond giving the product road map a simple thumbs up or down, your technical due diligence should provide actionable information about the upcoming 24 months, including critical dependencies, risk factors and major technical milestones that will usher in product milestones. As a TDD assessor, you should gather this information to track the success of your investment over the short- to mid-term.

In order to evaluate what the technical team must accomplish in order to execute the product road map, you should:

• Capture the business context of the road map

• Understand the business objectives for the next two years or more

• Evaluate today’s technical foundation to appreciate whether it can support future plans

• Internalize the future plans

• Evaluate the team’s ability to deliver these plans — and to mitigate risks

Understanding The Business Context Is Critical

Technology serves the business. It follows that you should assess technology in the business context of the company: Consider market (consumer, enterprise, or government), space (finance, health, social, tools and so on) and company maturity (five versus 10,000 enterprise seats and 1,000 versus 1 million daily users) as a few obvious dimensions. “Scalability” or “security” have very different meanings depending on the company’s business context — and so do the solutions. Similarly, you should evaluate talent, processes, tools and operational playbooks differently based on the business context.

Skills And Experience Matter

Nowadays, many companies use multiple technology stacks. As a consequence, if you’re a CTO performing TDD, you should be “multilingual” so you can evaluate all components of the technology.

To assess development velocity, your investigation should also show how well the code is written and organized and include an evaluation of the tools for test automation, continuous integration/continuous deployment, data center deployment, monitoring, alerting, business intelligence, data science and so on. In addition, assessing a company’s specific expertise in artificial intelligence has become a must in many industries.

As if all that was not enough, TDD assessors should understand engineers as well. It is critical to assess individual and collective talent on the team, as well as organizational dynamics and methodology.

Finally, because so many of the risks and critical milestones can depend on the maturity of the company, one of the most important skills that you can bring as a CTO performing TDD is the ability to identify the inflection points in the company’s growth, assess the impact on technology and translate insights to the technology team: For example, what new technology requirements will you have when the company has reached product-market-fit and enters the growth stage? For this work, there’s no substitute for “I’ve been there.”

The Good News Is Also Important

In parallel to identifying what could go wrong, it is critical to highlight the company’s unique strengths. This starts with its intellectual property (whether it’s patentable or not), it and includes unique sources of talent, internally developed tools and methodologies that increase development velocity and difficult-to-recreate data sets … all of which may have been overlooked by non-technologically-inclined investors. Ultimately, the balance of a company’s unique strengths and weaknesses that will determine its success, and a good due diligence report will highlight that.

For a company seeking investment, TDD may seem like an unnecessary hurdle; however, when it’s properly conducted, TDD adds value and insight for both the investor and the startup.

Prediction: Self-Driving Car Manufacturers Will Own The Car Insurance Business

Previously published in Forbes on July 2, 2018

Can you picture the day when your car insurance bill drops every month? This could very well happen as self-driving car manufacturers (SDCMs) take over the car insurance business.

As it turns out, SDCMs have several powerful incentives to do so.

Their primary motivation is to remove an adoption barrier to self-driving cars: The cost, and even the availability, of car insurance could be a deterrent when purchasing an autonomous vehicle for consumers, as well as the new generation of “taxi” companies. Today’s incumbent car insurance companies do not have statistical tables for accidents and fatalities for self-driving cars since self-driving cars are not yet in circulation. As a consequence, they are likely to be conservative and set high initial costs for insuring autonomous vehicles.

By contrast, SDCMs will have the next best thing to real-life statistics — they have data centers full of data not only about accidents but also about near misses (albeit for their own cars only). This means they can generate accurate statistics about accidents of their own cars as often as they want and thus estimate the cost to insure their cars. An SDCM will be able to turn a barrier to adoption into a potential sale.

Furthermore, by offering car insurance themselves, the self-driving car manufacturers not only remove a barrier to adoption to their product but they also project confidence in their product. In addition, SDCMs will improve a customer’s purchase experience by eliminating one painful step in the car purchase process (because who enjoys shopping for car insurance?), as well as eliminate a third party in the process. Even better, pricing for car insurance will be greatly simplified since the most important variable in the pricing equations — the human — will be taken out of the system. The price of insurance will be determined by the hardware and software installed in the car — not by the human driver. Whether it’s a 16-year-old who just passed their driver’s license exam, a soccer mom with 15 years of accident-free driving or a retired senior, the price will be the same assuming the technology in the car is the same in all three scenarios.

By the way, according to a US Market Research Report on automobile insurance by IBISWorld, the industry revenues totaled $259 billion in 2017. This is no small market, which, in and of itself, provides ample motivation for the self-driving car manufacturers to enter this market.

Since they will have actual real-time data on accidents and fatalities, SDCMs will radically drive down the cost of car insurance and make car ownership more affordable, thus expanding their market. Furthermore, reducing accidents is one of their primary business drivers to increase adoption. This will provide another incentive to drive down the cost of car insurance.

Since self-driving cars will only be commercialized once SDCMs have proved that they are safer than human-driven cars, at that point in time, SDCMs will be in a position to compute the exact probabilities of accidents and their cost because they will have all the data in their data centers. Beth Buczynski is correct in predicting in her article “With Self-Driving Cars, Auto Insurance’s Time Is Limited” that the cost of auto insurance will fall significantly over time and that consumers will no longer pay directly for auto insurance. However, auto insurance will not disappear, because self-driving cars, won’t eliminate all accidents. This liability will no longer be carried by consumers but either by the “robot-taxi” companies or by the SDCMs.

Most importantly, SDCMs will be able to offer car insurance from the get-go as soon as they market self-driving cars because they will be able to offer it at a much lower price than traditional insurance companies. SDCMs will need this cost reduction to help offset the additional cost of the autonomous driving equipment in order to reduce the total cost of ownership of their product.

Finally, since the price of insurance will be determined by how smart the autonomous driving system is, each time the car manufacturer (or the vendor of the autonomous driving software) publishes a new release, the cost of insurance could come down. I can’t wait.

For Machine Learning, It’s All About GPUs

Previously published in Forbes on December 1, 2017

Isn’t it curious that two of the top conferences on artificial intelligence are organized by NVIDIA and Intel? What do chip companies have to teach us about algorithms? The answer is that nowadays, for machine learning (ML), and particularly deep learning (DL), it’s all about GPUs.

In a previous article, I made the case to every CEO and CTO that “Machine learning allows us to make even better use of the data we have, as well as the data we don’t currently possess, and answer the questions we didn’t know we should ask.”

As more companies build AI-driven products, technology providers are responding to this demand by providing products that are computationally more powerful and easier to use and manage in production.

GPUs are driving the next wave of breakthroughs.

Why GPUs Are So Important To Machine Learning

GPUs have almost 200 times more processors per chip than a CPU. For example, an Intel Xeon Platinum 8180 Processor has 28 Cores, while an NVIDIA Tesla K80 has 4,992 CUDA cores. While a CPU core is more powerful than a GPU core, the vast majority of this power goes unused by ML applications. A CPU core is designed to support an extremely broad variety of tasks (e.g., render a webpage, drive word processors and enterprise software, manage peripherals) in addition to performing computations, whereas a GPU core is optimized exclusively for data computations. Because of this singular focus, a GPU core is simpler and has a smaller die area than a CPU, allowing many more GPU cores to be crammed onto a single chip. Consequently, ML applications, which perform large numbers of computations on a vast amount of data, can see huge (i.e., 5 to 10 times) performance improvements when running on a GPU versus a CPU.

Having recognized this fundamental fact a few years ago, the tech industry, particularly the ML crowd, has focused its efforts on taking advantage of the GPU. However, this is not a simple task. All layers of the compute stack have to be redesigned to take advantage of the GPU’s power.

Recent Developments For GPUs

NVIDIA has so far been the main provider of GPU chips for ML acceleration. The company has powered the AWS compute-optimized instances for the past year.

Furthermore, chip manufacturers are about to release chips that are architected specifically for ML from the ground up (rather than continuing to optimize GPUs, which were originally designed for graphics processing). NVIDIA is shipping the Tesla V100, which incorporates Tensor Cores designed specifically for DL, in addition to GPU cores. Google announced its Tensor Processing Unit (TPU) last year that powers its main services: Google Search, Street View, Photos and Google Translate. Finally, Intel announced this month its Nervana Neural Processor, which was also architected, in collaboration with Facebook, to optimize neural network computing.

Building The GPU Compute Stack

Having super-fast GPUs is a great starting point. In order to take full advantage of their power, the compute stack has to be re-engineered from top to bottom.

• Servers

A new category of servers needs to be built to feed the beast. This is necessary to send (and store) data to the GPU at the rate at which it is capable of consuming it, requiring up to 10x improvement in bandwidth.

NVIDIA just started shipping its DGX-1 server. Data throughput and storage have been optimized in order to take full advantage of the processing power of the eight Tesla-V100 processors included in the box.

Facebook recently announced its second generation of AI-hardware (“Big Basin”) to power its own core services: speech and text translations, photo classifiers and real-time video classification.

• Data Center

An article I wrote last month highlighted the impact of ML for cloud providers. Since then, new GPU-related developments have emerged.

Google just made its TPUs available on its compute platform.

Intel just announced its Nervana DevCloud, which is limited for the time being to research and experimentation.

Finally, a super-computing veteran of 45 years is entering the fray. Leveraging its decades of experience in high-performance computing (HPC), Cray will soon be offering its supercomputers for rent on Microsoft Azure. These servers can host a large number NVIDIA Tesla GPUs.

• Frameworks, Models And Algorithms

Optimized hardware requires optimized software. All cloud providers have optimized the major frameworks (Tensorflow, PyTorch, Caffe, MXNet) to their platform. Furthermore, GPU vendors are rewriting the major models and algorithms (NVIDIA Digits, Intel Nervana Graph) to take full advantage of the GPU’s power.

Through the GPU Open Analytics Initiative, companies such as MapD (DB, visualization) and H20 (ML) are rewriting fundamental technologies like databases and programming languages in order to eliminate data copies, which, if ignored, may significantly increase overall execution time.

Finally, some technologies have reached a degree of fidelity high enough to be offered as services: AWS, Google and Microsoft each offer various flavors of speech recognition, translation and synthesis. Similarly, China’s Megvii’s face recognition service has become very popular.

• The Edge

For some applications, the ML models that have been trained in the data center must be computed at the edge (i.e., close to the end user). In the case of autonomous driving, for example, the car’s brain is trained in the data center but must be run in the car.

Now that machine learning has become mainstream in the data center, dedicated products are being released for edge computing. For example, NVIDIA provides the Drive PXfamily of accelerator cards that host 1-4 GPUs, as well as multiple video and other sensor inputs. They can thus power anything from simple highway driving today to fully autonomous driving in the future.

A New GPU-Driven ML Landscape

From this whirlwind survey of innovation driven by GPUs, one can anticipate increases in processing power of two to five times over the next months, from which a second wave of machine learning breakthroughs is bound to emerge, allowing us to solve a brand-new class of challenges.

How Machine Learning Will Disrupt The Established Cloud Providers

Previously published in Forbes on October 24, 2017

In the past few years, new categories of products have emerged thanks to the extraordinary advances in machine learning (ML) and deep learning (DL). These new techniques power product recommendations, computer-aided diagnosis in medical imaging and self-driving cars, just to name a few.

Most ML and DL algorithms require compute profiles (hardware, software, storage, networking) that are significantly different from those optimized for traditional applications. Consequently, as more and more companies develop their own ML/DL solutions and deploy them to production, the demand for the ML-optimized compute resources will grow dramatically and create opportunities for new entrants to offer solutions that compete with today’s dominant cloud providers: Amazon AWS, Microsoft Azure and Google Cloud.

The ML/DL Cloud Is Different

In an article on Mesosphere’s blog page, Edward Hsu presented the case that web applications are now primarily data-driven. Consequently, a new set of frameworks (a.k.a. stacks), namely SMACK (Spark, Mesos, Akka, Cassandra, Kafka), must replace the traditional LAMP (Linux, Apache, MySQL, PHP) stack used to build web-based applications. In my view, rather than replacing LAMP, SMACK will coexist side by side with, and feed data to, traditional web-based based frameworks, which are still needed to present nice-looking webpages and interface with mobile phones.

Yet the main point is well-taken. We need to update Marc Andreesen’s famous line about how “Software is eating the world” to “Data is eating the world.” Let’s unpack this statement and derive the consequences.

Hardware

The disruption created by machine learning and deep learning extends well beyond the software stack into chips, servers and cloud providers. This disruption is rooted in the simple fact that GPUs are much more efficient processors for ML and DL than traditional CPUs.

Up until recently, the solution was to augment traditional servers with GPU add-on cards. We are now at a point where demand for ML/DL computing is such that special-purpose servers, optimized for ML/DL compute loads, are being built.

Data centers are also being re-architected to support the extremely large amount of data consumed by ML and DL. Imagine you are designing the brains for self-driving cars. You need to process thousands and thousands of hours of video (and other such signals as GPS, gyroscopes, LIDAR) to train your algorithms. The amount of data that a Tesla on the road records in one second is a million times larger than a tweet or a post on Facebook.

ML/DL data centers thus require both huge amounts of storage and extremely high bandwidth.

Software

The software side is even more complex. A new infrastructure stack, typically using machine learning-specific frameworks such as Tensorflow (originally developed by Google) or PyTorch (originally developed at Facebook), is required to shepherd data around and manage the execution of the compute jobs. Furthermore, open-source code libraries (pandas, scikit-learn, matplotlib) are used to implement the models (e.g., neural networks, data displays). These model libraries are critical because they are optimized to be both easy to use for algorithm research and offer high performance for use in production.

Finally, each vendor offers complete building blocks for specific use cases. For example, Amazon Lex, Google Cloud Speech and Microsoft Bing Speech provide speech recognition and can even recognize intent. Each has its own API and unique behavior, making the migration from one vendor to the other time-consuming.

New Entrants

In addition to the Big Three cloud providers (Amazon AWS, Microsoft Azure and Google Cloud) that have offered GPU-accelerated instances for a few years, new ML-optimized offerings have emerged:

• NVIDIA, which is already the dominant provider of GPUs that power the graphics cards that drive computer displays, recently introduced a portfolio of “purpose-built AI supercomputers” servers known as its DGX systems.

• Servers.com offers its Prisma Cloud with dedicated GPU-optimized servers.

• Rescale, one of the niche cloud providers that focuses on high-performance computing (HPC), just announced the availability of the latest generation of GPU-powered servers, along with high-bandwidth interconnect, to create high-performance multi-node clusters.

What’s At Stake

The Big Three cloud providers are the ones most immediately at risk to be disrupted by new entrants such as NVIDIA, Servers.com and Rescale. ML/DL innovation is still running at a torrid pace thanks to innovation in algorithms as well as compute efficiency. This is creating a small arms race where end users are constantly looking for the provider that can give that extra edge.

On one hand, end users are benefiting hugely from this arms race to provide the best software and hardware compute environment. On the other, this requires constant vigilance to keep abreast of the latest offerings. Even more importantly, when deploying ML/DL products to production, CEOs and CTOs need to pick the winner — or at least a future survivor — that will keep their edge for the next two to five years. This is not an easy task.

We will delve deeper into these two topics in future posts — stay tuned.

The Machine Learning Imperative

Previously published in Forbes on June 28, 2017

There’s no longer a debate as to whether companies should invest in machine learning (ML); rather, the question is, “Do you have a valid reason not to invest in ML now?”

Machine learning is here, and it’s finally mature enough to cause a major seismic shift in virtually every industry. For example, Matt Swanson, founder of SVSG, wrote an article last year about how chatbots will disrupt a $200 billion industry. While ML cannot solve every problem, it has demonstrated a game-changing impact in enough markets that every CEO and CTO must ask himself/herself whether they understand ML well enough to rule it out for their own business. While appreciating the rewards of ML may be difficult, we do know the risks: ML has already disrupted several industries, including e-commerce, autonomous driving and customer engagement. The risk of ignoring ML today is one that is probably too large for any established company to take.

Machine Learning Changes The Game

While artificial intelligence grabs most of the spotlight in discussions about machine learning (primarily due to its easily graspable life-altering implications), it is but one of many disciplines in ML. Big data has demonstrated the enormous value of data: Netflix and Amazon recommend films and products based on our own purchase history and those of customers like us. Thus, big data has helped us answer questions we already knew to ask, questions such as, “What more can I sell to my customers?”

Machine learning allows us to make even better use of the data we have, as well as the data we don’t currently possess, and answer the questions we didn’t know we should ask.

Machine Learning Uses Data We Don’t Yet Have

Analytics and business intelligence extract information from structured data (i.e., data stored in databases: customer information, purchase history, etc.). But thanks to ML, we can now extract information from unstructured data such as texts, phone calls, images and videos.

Search engines used to return pages based the exact words of the query. ML takes this text analysis a few steps further. First, it extracts concepts out of words and associates pages that discuss the same concept with different words: A search for “artificial intelligence” will produce results that mention machine learning and robotics but not explicitly the words “artificial intelligence.” Beyond this, ML is now becoming proficient at sentiment analysis and determining intent in a given context. This means that ML can deduce, via our posts on social media, if we are happy or angry (sentiment analysis), for whom we are likely to vote for, or what purchase we are considering next (intent).

Similarly, ML techniques like natural language processing (NLP) and image categorization interpret and translate people’s speech as well as the content of images (e.g., facial recognition on Facebook).

This means that, thanks to ML, the huge amount of publicly available content — which, up until recently, was of little use — can now give us useful new insights.

Machine Learning Makes Better Use Of The Data We Have

Machine learning provides a new class of algorithms that manipulates structured data that we already possess. AWS has a nice blog, including code, on how to build a prediction engine for customer churn. BlackRock is using machines to manage funds.

In addition, data that every company gathers from its customers (emails, chats, comments, support requests, etc.) can now be analyzed by ML to extract accurate customer sentiment (satisfaction with the service, suggestions, identifying emergency requests). Even polls and surveys may be replaced by ML algorithms that can mine Facebook, Twitter and news sites to capture the sentiment of millions of people expressing themselves openly.

Machine Learning Answers Questions We Didn’t Know To Ask

At the risk of stating the obvious, the power of machine learning is that it learns. The more information provided, the faster it learns and the better it answers.

While traditional business intelligence techniques can tell us how often products A and B are purchased together, these techniques fail in the face of a massive organization such as Amazon, which sells over 368 million products. However, ML can digest the flow of purchase transactions and identify patterns of joint purchases. ML can even use these predictions to automatically make purchase decisions (see German e-commerce merchant Otto as an example).

Furthermore, by leveraging data we don’t have — such as stock market indices, weather data, political news and government statistics — we can correlate external events with our business data and thus enrich the accuracy of our predictions and decisions.

Why Now?

The rapid growth of machine learning leads to uncertainty, which may entice business leaders to hesitate in utilizing it. Yes, machine learning is complex, but it is also a powerful force of disruption. Because ML is still developing, it presents an opportunity to pull ahead of the competition by taking advantage of this maturation period. The choice is simple: disrupt or be disrupted.

It will take some time to ascertain what use cases are relevant to your company, so it is important to start this investigation now. ML is complex and challenging to master, yet the tools for machine learning are all readily available to you and are already being employed by Amazon, Google and Microsoft.

The journey to machine learning must start now.

Everything You Ever Wanted to Know About Technical Debt

Check out the white-paper I recently authored at the Silicon Valley Software Group.

Its main objective is to build a bridge between technical and non-technical executives to have rational discussions about technical debt, and then make rational decisions on how to tackle it.

Some of the main takeaways are:

Technical debt is on-going: Technical debt originates from a variety of sources, some legitimate, others less so, throughout the life of a product. This means that technical debt should be integrated into the product roadmap process
There are different types of technical debt, characterized mainly by the risk they entail, and the cost to remedy. Consequently, there are different strategies to address different types of technical debt
Ranking the various types technical debt of a product on the two-dimensional plane risk vs cost-to-fix provides a good vehicle to foster dialogue, and decisions, about engineering priorities between technical and business executives.

For more details, please download the white-paper at: svsg.co/sme

	Ely Shemer on Lessons Learned From 50 Techni…
	Lessons Learned From… on Lessons Learned From 50 Techni…
	DevOps Consult on DevOps-Driven Development
	devops training on DevOps-Driven Development
	Time Tested Engineer… on (Boosting) Morale in Engineeri…