This guide is an attempt to share the experience of ML practitioners in the enterprise sector. Here we discuss and elaborate on the challenges companies face in the real world. Maintained by the appliedAI Initiative, the guide reflects a commitment from the partners of appliedAI to share their experiences and best practices when moving beyond the PoC. The purpose of the guide is to build shared wisdom.
The ML lifecycle
ML software architecture
Platforms and services for ML
Challenges and Best Practices
Changelog
Contributors
Sources
This also means that this wiki is never complete and never finished. It is a living artifact that will be expanded over time and is often based on anecdotal evidence and practitioners opinions. Reach out to appliedAI to contribute or comment. Note that it is not our purpose here to reason about a specific model architecture or to highlight the latest metric-pushing advancements in the academic domain. This guide is designed to solve problems that tend to arise outside of academic ML research, that is, to address most notably questions such as how to build these probability-based systems, how to architect them, and how to scale and manage tasks in large teams comprising hundreds of developers. We shed light on the ML lifecycle in the enterprise, software architectures that support that lifecycle, and the platforms that implement such architectures. This guide was built with the help of numerous individuals, all representing companies that are pushing the boundaries of applied ML.
Structure of this guide
The topics are presented in hierarchical order, each informing the next and ultimately answering the question of which tool to choose. We move from the view of the ML lifecycle to the architectures supporting that view and ultimately to the tools that implement these architectures.
- Topic 1: The ML lifecycle (see The ML lifecycle )
- The ML lifecycle supports the planning process of every ML project. This topic focuses on the ML lifecycle view from different technology players, condensing typical phases and activities executed during an AI project. In addition, it includes the ML lifecycle phases, activities, and lessons learned at appliedAI over the past few years. We discuss questions concerning the operational process and artifacts management that support this lifecycle and how certain activities are embedded in the process.
- Topic 2: ML architectures (see ML software architecture)
- This topic focuses on the requirements and abstract solution designs for the outlined lifecycles. The discussion will largely center on a collection of approaches to solving common problems encountered along the ML lifecycle, but we will also attempt to shed light on when the approaches work and when they do not.
- Topic 3: Platforms for ML (see Platforms and services for ML)
- This topic finally answers which tools can be used to implement particular architectures. The solutions here may be tools that implement clearly specified solutions discussed in topic 2, or they may be platforms that implement a set of such solutions, ultimately supporting parts of the lifecycle from topic 1.
- Topic 4: Challenges and best practices (see Challenges and Best Practices)
- This topic condenses the deep technical and non-technical insights we received from our partners and enriched them with appliedAI internal experiences learned over the last years. In particular, it includes perspectives on the practitioner's technical challenges and best practices along the ML lifecycle, management of AI projects, AI team working, and platforms usage.
Authors
In alphabetical order:
- Alexander Machado is the Head of MLOps Processes at appliedAI with several years of experience developing, consulting, and leading AI Projects from experimentation to production. Prior to joining appliedAI, he worked in data science and data engineering at the Max Planck Society and BMW. His background is in Electrical and Communications Engineering at the Technical University of Munich and has an Honours Degree in Technology Management at the Center for Digital Technology and Management (LMU-TUM).
- Alexander Waldmann is the former Manager Director of Technology and Operations at appliedAI. Currently, he is a Senior Technical Product Manager for AI and ML Strategist at the AWS Machine Learning Solutions Lab. He is a computer scientist with a background in Software-Engineering, -Architecture with a specialization in Machine-Learning.
Partners and companies that contributed to this guide
This guide was written with the help of many German industry companies as well as a number of tech companies that shape the domain as part of a working group at appliedAI. All partners of appliedAI have contributed in one or another way, we wish to thank these companies.
Additionally, a number of individuals from these companies have put in extra effort to make this guide available to the public. We want to thank these individuals and their companies (recognizing that some of these individuals and some companies wish to remain anonymous and cannot be named here).
Untitled
License
All content on these pages is Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); all details are provided at https://creativecommons.org/licenses/by-nc-sa/4.0/.
You are free to: