Introduction

In the previous chapters, we have introduced a way to reason and think about developing ML solutions without having to start from scratch. We specifically recommended not following a bottom-up approach when it comes to building or utilizing ML platforms and services but to rather systematically design a set of services that tie into each other and solve both the problems of today as well as leaving room to change, grow and solve the problems of the future.

This last chapter offers a rather short overview of the endless topics of platforms and services. Our goal here is simple: to provide examples of platforms and tools that can be used to effectively support the requirements stemming from the ML lifecycle.

It is outside the scope of this wiki to name every platform, tool, or service that exists in the market landscape; instead, we mention and link to some of the tools that are at the center of most implementations today, based on anecdotal evidence of appliedAI and our partners. Take a look at for a more complete list. (Note that we don't differentiate too much between libraries, services, and frameworks as the lines between them in ML are often very blurry.)

A reliable and even larger overview of the current landscape of services and platforms is hosted at the Linux Foundation here and here.

‣. Author: Dr. Ibrahim Haddad from Linux Foundation (https://landscape.lfai.foundation/, https://www.ibrahimatlinux.com/).

‣. Author: Dr. Ibrahim Haddad from Linux Foundation (https://landscape.lfai.foundation/, https://www.ibrahimatlinux.com/).

Why it matters in ML

We have argued in the preceding chapters that the mentality, processes, frameworks, tools, and services in ML are all relatively immature, meaning that nearly none have been battle-tested for more than several years. While services from the big-data domain such as Airflow or solutions for scaling microservice workloads such as Kubernetes are mature, most modern ML systems themselves are not. It is, therefore, crucial to understand which gap each system fills in the evolving field.

Additionally, the field is overloaded with hype and advertisement. Obtaining a reliable recommendation that is not over-amplified in terms of sales or technological excitement is often hard if not impossible.

To further complicate matters, one needs to understand where the services offered via big-tech cloud players fit into the landscape, regardless of whether they represent managed services of open-source artifacts or truly proprietary systems.

Verticalization vs horizontalization

The most apparent trend in supplying tools for lifecycle management in ML is the decision to "go wide or go deep," in other words, to specialize either in individual challenges along the lifecycle or to build an E2E solution that solves at least the minimum number of challenges that a data scientist or ML engineer might encounter. To be clear: there is no right or wrong approach here. For some adopters, it makes sense or might be necessary to use highly specialized tools, which are then glued together. For others, an E2E approach might be the more reasonable and cost-effective way, aiming for speed of innovation rather than uniqueness in the underlying infrastructure layer.

Vertical enterprise platforms

Some tool providers offer enterprise platforms that follow an end-to-end approach and attempt to solve as many problems as possible along the lifecycle. This also means that the services and platforms offered span from the hardware infrastructure layer to the API layer and cover almost all tasks, from data ingestion to model monitoring. A typical example of such platforms is the well-known provider Domino, with its Domino Data Labs product. Other platforms taking a similar approach include h2o.ai, ClearML, VertexAI, Sagemaker, and Azure Machine-Learning. Such systems are typically built to integrate easily into larger data-management solutions from warehouses to lakes. Delta Lake, Snowflake, and the well-known Amazon Redshift are typical candidates to complete the stack. Such solutions often offer relatively heavy feature sets, which is why many ML engineers feel overwhelmed by the complexity and sheer power of these systems.

Platforms and services

Horizontal cloud services

It is not always a straightforward decision whether one should move into a cloud or, if so, which one. The service portfolio of all big-techs can be overwhelming, and the price points can be quite intimidating. Nonetheless, when it comes to managed services, whether proprietary or open-source, the big public clouds offer several promising services that can accelerate the creation of ML-based services and products. Excellent examples of such services, nearly all of which are practical data warehouse and storage technologies, include Amazon S3 in combination with its Amazon Redshift and Amazon Athena layer, or the infamous Google Bigquery or Oracle. All cloud players have categorized their services for pure ML tasks into three layers: ML infrastructure, ML platform/services, and niche AI services that offer a particular ML capability as a commoditized good. See, for example, the breadth and depth of AWS's service portfolio: