MLOps-A journey in the lifecycle of a machine learning model

MLOps-A journey in the lifecycle of a machine learning model

This blog post and the following are my understandings of the software(s) that makes a machine learning engineer's life easier.

Hello there, interesting people!

This blog-post is a brief introduction to MLOps, the processes involved, the pipelines, the infrastructure and the software(s) required to make the models go to production. The subsequent blog-posts will describe the software(s) associated in the deployment of ML models in industry in details. A simple ML model goes through various stages to be production-ready. We will go through every stage and understand why that is necessary in an ML workflow.

Then, let's dive in!

star_trek.gif

What is machine learning operations?

Machine learning operations (MLOps) is a vast, emerging field which connects the world of model creation with model deployment in the industry through a pipeline of stages. MLOps is a set of processes rather a sequence of channeled stages that deploys an ML model into production. MLOps is a basic component of Machine Learning engineering that focuses on optimizing the process of deploying machine learning models and subsequently actively managing (maintaining and monitoring) them. These processes are software design components that makes our production-ready models reproducible, reliable and robust.

Machine learning operations consists of processes that aids in designing quality machine learning and AI products. Data scientists and machine learning engineers embrace the MLOps techniques so that they can collaborate effectively as well as enhance the pace as well as quality of model development and production by executing continuous integration and continuous deployment (CI/CD) operations with appropriate ML model monitoring, validation, and governance.

The ML model development to production life-cycle

  • Data tracking and addition: ML models need huge amounts of data for producing better, more accurate results. A copious amount of data can not be handled solely by humans.
  • Model tracking and advance developments: Pareto's principle remarks that for achieving a decent model, minimal effort is required. Hence, engineers prefer writing a baseline model which can be developed iteratively.
  • Feature store: Feature store defines a storage unit where certain entities and their corresponding features are stored for model training. There are several key components required while building a feature store such as the capacity to ingest and update data from multiple sources, the option to define objects and their associated features, the potential to retrieve earlier features for training/re-training purposes and the flexibility to acquire features for inference from low-latency sources.
  • Developing: Development of a production level ML system should be developed in a way that it will work for everyone with varying systems. The ML model should be packaged in a way that will work in any specified environment. A configuration script should be added which will ease the installation process. Logging is the practice of capturing critical scenarios in our programs for inspection, debugging, and other purposes. Logging statements are more efficient than simple print statements as they enables us to convey particular chunks of information to specified destinations using specific formatting, shared interfaces, and so on. As a result, logging is an important component in uncovering useful insights from our application's internal operations. Documenting short descriptions of the function along with detailed description of the variables used is an important bit of developing the ML application.
  • Reproducing: Whether we're working alone or in a group, it's vital to have a system in place to keep track of changes in our projects so that we can roll back to earlier iterations and enable others to duplicate and improve our work. Git is a distributed version control system which helps us in tracking the changes in our ML project. An online host of Git systems such as GitHub, GitLab, BitBucket can be used for collaboration with others.
  • Serving: We must consider exposing the application's capabilities to ourselves, team members, and, eventually to end users while serving the ML models. The API endpoints (REST or gRPC) should allow everybody to interact with the ML model through a simple request. The end user may utilize UI/UX components to submit queries to our API rather than directly interacting with it. The model can be served to the end-users through batch-wise or real-time.
  • Managing CI/CD workflows: Our team can build, test, and integrate code in a systematic way through continuous integration (CI). CI enables the team to develop more regularly as the processes will be interconnected. Our integrated code is delivered to the many apps that depend on it via continuous delivery (CD). We can create and deploy systems that promptly adapt and function through CI/CD pipelines.
  • Managing deployment infrastructures: For enabling complete reproducibility, it is important that our application can be reproduced in any system irrespective of external dependencies or any other system level specifications. There are many system-level reproducibility solutions such as VMs, container engines, servers and server-less systems. Docker helps us in deploying locally in a scalable and reproducible environment.
  • Monitoring: We haven't precisely stated how anything works with machine learning, but we have utilized data to construct a probabilistic solution. The test data is different from the train data which might cause decline in performance. This behavior of the model should be recognized and alleviated. The major work for monitoring ML systems is to track the performance loss for detecting data drift.

What is the need for MLOps?

In today's world, many startups and companies build their ML model from scratch and provide it as a service. The model is used by researchers, engineers and students. The model can be used in cloud and or in personal systems. The model works on various OS as well as various virtual environments. For a model to be reproducible, reliable, robust, fast and accurate, it has to go through a number of processes so that it can be served to a large population.

A recent study by NewVantage Partners found that only 15% of the top 70 prominent business companies have fully used machine learning capabilities into mass production. If you don't deploy your ML to generate value, it's a waste of your time and money. These experiments are complex engineering achievements, but they don't translate into return of interest (ROI). With MLOps, companies can easily deploy, monitor, and update models in production, opening the road to ML with ROI.

What are the benefits of MLOps that negate the risks involved in the ML production lifecycle?

Manufacturing machine learning models is a very arduous process. The machine learning lifecycle comprising of data ingestion, data preparation, model training, model tuning, model deployment, model monitoring, and much more is a very complex procedure. It requires collaboration across teams of data engineering, data scientists and machine learning engineers. Reproducibility and robustness of the model is required at each point of the lifecycle. To maintain all of these processes synchronized and running in unison, high operational rigor is required. MLOps comprises the machine learning lifecycle's exploration, iteration, and continual improvement stages.

Let us see what all challenges we face in each stage of a machine learning pipeline.

  • Data tracking and addition: The process of addition of data, data labeling and data augmentation has to be automated to make the process smooth, fast and reliable. A well-defined architecture which can automate all these processes is required. These reduce the burden of an engineer as it is reproducible as well.
  • Model tracking and advance developments: Data cleaning, re-assumption of features, improving accuracy, managing data and so on can be done by tuning hyperparameters in the models. Hence, tracking the results of the models and making required adjustments needs to be automated.
  • Feature store: A feature provides a unified source from which training can occur. This prevents duplication of work as well as provides a structured pipeline which can be used for both training as well as serving. Values can be updated through online as well as offline training preventing data leaks.

Some of the major advantages by following MLOps approaches are:

  • Efficiency: MLOps helps data teams speed up the development of models, improve the quality of their models, and speed up the deployment and production of those models.
  • Ease of debugging: We must maintain track of the features with which the model interacts; feature engineering aids in model correctness significantly. Debugging through the entire system of the ML production is difficult and MLOps offers an easy path for debugging ML models.
  • Scalability: MLOps also offers scalability and administration, allowing for the supervision, control, management, and monitoring of a large number of models for continuous integration, continuous delivery, and ceaseless deployment. MLOps, in particular, allows for the repeatability of ML pipelines, allowing for more tightly tied cooperation across data teams, decreasing conflict with DevOps and IT, and speeding up the release.
  • Risk reduction: Machine learning models frequently require regulatory monitoring and drift-checking. MLOps offers better transparency and faster responses to such requests, as well as improved compliance with an organization's or industry's rules.

What is the difference between MLOps and DevOps?

MLOps is a collection of machine learning-specific design approaches that borrow from the more commonly used DevOps ideas in software engineering. While DevOps emphasizes a quick, iterative approach to continuous integration, MLOps applies the same ideas to bring machine learning models to production. In both circumstances, the end result is improved software quality, shorter patching and release cycles, and increased customer satisfaction.

Well now that you know the entire pipeline of the ML ecosystem and the importance of it, you can check my following blog posts in any order which will focus on using the software(s) for various MLOps tasks. We will see how to scale, reproduce, monitor, debug and deploy a ML model in the following blog-posts.

CREDITS

  1. Most of the inspiration for the blog-post is from madewithml.com/#mlops
  2. I used Krita for the art illustrations and gifs are used from Google images.