Another tool won’t fix your MLOps problems

David Hershey
5 min readOct 5, 2022

There are way too many MLOps tools.

There are 100+ MLOps tools
Thanks to the AI Infrastructure alliance for capturing the sprawl

By my count there are:

  • 39 tools that help with monitoring or observability
  • 32 tools to help deploy models
  • 31 tools for experiment tracking
  • … (it doesn’t get better)

And new products are still popping up in all of these categories! With all this investment and development, surely things are settling into place in MLOps, right?

…right?

As you might have guessed, the answer is (mostly) no. What’s happened?

Let’s see if we can learn something from the discipline that we grew out of, DevOps. Here’s a tidy definition of DevOps from AWS (emphasis my own):

DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity

Those three ingredients (culture, practice, and tools) seem like the critical pieces of the recipe that has taken DevOps mainstream. Can we learn something from the DevOps recipe? What if MLOps has the right ingredients, but the wrong ratios?

Percentages entirely based on vibes

I think MLOps (to date) is too focused on the tools we can use, and not focused enough on the culture and practices that those tools should enable.

MLOps has Tool Overload

In the last few years, I’ve worked with hundreds of companies trying to improve their ML infrastructure. Almost all of those teams are building towards a target architecture, something that looks like this:

Overview of the ML infrastructure components
Courtesy of a16z (Matt Bornstein, Jennifer Li, Martin Casado)

Building an ML Platform to solve MLOps” resembles the classic “If you Build It, They Will Come” approach. Many teams I’ve worked with (and worked on!) have hoped that if they find the top MLOps tools, then “MLOps” will follow.

Unfortunately for most of those teams, this hasn’t materialized. Some of the challenges I’ve seen over and over again:

  • Lack of Buy-In: Many teams don’t have sufficient executive or practitioner buy-in, meaning it can be almost impossible to get the help you need to actually deliver products (cloud engineers, product engineers, etc.).
  • Confusion: Tools are used as a crutch for documentation and process. If every practitioner uses the same tools in different ways, confusion will ensue.
  • Alignment: Thanks to the first two problems, most practitioners still choose to build in silos. Data scientists in particular end up with isolated tools that make collaboration painful.

In my mind, these challenges (at least partially) stem from a lack of investment in building an MLOps culture. This isn’t surprising! Building culture is a lot harder than selecting tools. Building culture requires organizations to shift, teams to collaborate, and job titles to change.

So how do we build an MLOps culture?

Learning from DevOps

The easiest way to shift culture is universal buy-in, in particular among decision-makers. The biggest barrier for MLOps is that most people don’t care yet!

As someone who has been working (a lot!) to raise awareness of MLOps for ~4 years, this chart kinda stings! All of the work we’ve done, all of the VC dollars we’ve spent, and (to the extent that Google searches are the truth) MLOps is still only ~5% as cared about as DevOps.

To change the culture of our organizations, we need to make people care about not just machine learning systems, but also the tools and processes that make those systems effective, performant, and reliable. And importantly, the people who make decisions need to care. Typically (not always) the people who make decisions care about money.

So how can we make decision-makers care about MLOps? Some thoughts:

  • For Practitioners: You understand ML and its potential — that makes it your job to be loud about the opportunities that ML presents to disrupt industries. You need to paint a clear picture about both opportunities in ML ( $$$ 📈) and the price of not acting (you lose to competition).
  • For early adopters: If you have already had measurable success, and your ability to rapidly develop ML systems has made your company measurably more successful, shout it from the rooftops! Write blogs, attend conferences, whatever! Your voice will raise the whole industry, and bring about the tide change we need to reorganize the culture of work in ML.
  • For Tool-Makers: If you’re making an MLOps tool, you cannot be (extremely) successful unless the culture comes along for the ride. If you simply build another MLOps tool, you’ll find yourself lost in the landscape. If you’re a loud voice that helps ML practitioners convince decision-makers that their culture needs to change, you will stick out from the crowd.

I’ll close with a quick palette cleanser after the (sad) chart above. DevOps (at least the term) was born in ~2007, and MLOps was born in ~2015. We’re eight years behind! Docker 1.0 was released eight years ago in 2014.

You probably haven’t (yet) heard of technologies that will elevate MLOps to the level of DevOps. We’re still in the early days. Let’s figure it out together.

David Hershey is an investor at Unusual Ventures, where he invests in machine learning and data infrastructure. David started his career at Ford Motor Company, where he started their ML infrastructure team. Recently, he worked at Tecton and Determined AI, helping MLOps teams adopt those technologies. If you’re building a data or ML infrastructure company, reach out to David on LinkedIn.

--

--

David Hershey

Investor at Unusual Ventures| Machine Learning Infrastructure Enthusiast