Lifecycles and the need for a ML lifecycle

Introducing lifecycles

The success of building and integrating sufficiently complex software systems, no matter the domain in which one operates, requires a deep understanding of the challenges, tripwires, and risks involved in the software system's creation. Over the past decades, software developers and companies around the globe have accumulated a vast amount of knowledge and best practices about these challenges. The global community of software engineers has distilled this knowledge into software development processes, or the Software Development lifecycle (SDLC). While software engineering is broadly concerned with all aspects of software production, the SDLC, or what we will call simply the "lifecycle" from now on, is concerned with the distinct phases and activities that form the basis for the successful development and delivery of a software product.

Simply put, a lifecycle describes how a software system is deployed and how it itself, as well as the activities carried out by developers and others working on it, change over time. It includes all activities carried out until the system reaches its end of life, from its early beginnings throughout its maintenance over the years. For example, in Software Development contexts, lifecycle management includes activities such as fault monitoring, faults corrections, performance improvements, and many more.

With the advent of modern machine learning applications, new challenges emerge along the software development lifecycle. The fundamentally different way in which machine-learning applications are being developed compared to "traditional software" leads to a multitude of new challenges and aspects that need to be solved or accounted for in the software lifecycle of a software product including ML components - just think of learning machines vs. programmed machines, logic based on patterns in data, explainability problems, human-centered AI or human oversight of AI algorithms to name just a few.

In this first chapter, we will discuss in what sense the Software Development lifecycle ****management ****changes when the scope of the system includes an ML component.

The cause for a new lifecycle model

Most modern perspectives on software development are relatively mature and robust. Even the agile mindset and most of its implementations in the form of SCRUM or XP have crossed the 20-year age mark. When it comes to ML, what is it that motivates us to question any of these established processes including the structure and production of software artifacts?

One of the most recognized articles aiming to answer these questions is Software 2.0 by Andrej Karpathy. In this article, Karpathy, who is leading Tesla's AI efforts at the time of writing, lays out why ML-infused systems (which he calls “software 2.0”) are a completely different kind of software. He argues that ML systems differ from traditional software in that Software 2.0 is about searching for a program that solves the problem at hand using machine learning techniques, whereas Software 1.0 is about building an artifact on your own. To quote him:

“It turns out that a large portion of real-world problems have the property that it is significantly easier to collect the data (or more generally, identify a desirable behavior) than to explicitly write the program. In these cases, the programmers will split into two teams. The 2.0 programmers manually curate, maintain, massage, clean and label datasets; each labeled example literally programs the final system because the dataset gets compiled into Software 2.0 code via the optimization.“ - Andrej Karpathy

This implies that the production of an ML system must be much more concerned with steering a search process than about coding a software artifact. Therefore, the ML lifecycle challenges revolve around the overall mental model and realization of ML components in larger software systems, which are inherently different and require a new way of looking at the lifecycle.

It is important to note that Karpathy argues only about the ML component in an otherwise normal software system and does not give any consideration to the different ways in which an ML component can be brought into the system (Commercial off-the-shelf known as COTS, pre-trained, etc.). Still, his argument has convinced many scholars and practitioners to recognize that manipulating a software system in order to alter its structure by feeding it data is significantly different from what software engineers (at least most of them) have been doing for decades. It is this realization on which we base the collection of different, high-level approaches to the lifecycle and its management.

The ML lifecycle

Similar to the Software Development lifecycle (SDLC), there exists a large diversity of definitions about the ML lifecycle (MLLC). A commonly accepted definition was stated by EDUCBA, Asia’s largest online learning platform. According to them, the ML lifecycle is a process that is cyclical; through this process a machine learning model (artifact) is developed, trained, and deployed, utilizing the data available from the corresponding applications.

According to them, the ML lifecycle is a cyclical process to develop, train and deploy a machine learning model utilizing the huge amount of data used in various applications.

The goal of the ML lifecycle is to support planning and mitigate challenges during the development of an AI project in production. Thinking about the ML lifecycle during the project definition and execution will effectively support planning all the important steps that one needs to take. These steps help to guide through the process of getting the system to work effectively while preventing unintended outcomes and surprises.

Elements of the ML lifecycle

Most lifecycles that can be found, whether in an academic context or from the perspective of a practitioner, focus on structuring a set of activities and phases that must be addressed for the development of ML-infused systems. In some ML lifecycle models, these activities and phases are clustered into high-level stages.

Stages of ML lifecycles