MLOPs — Azure Machine Learning (AML) (2023)

MLOPs — Azure Machine Learning (AML) (1)



Published in

Dev Genius


8 min read


Mar 26

MLOPs — Azure Machine Learning (AML) (3)

Machine Learning Operations (MLOps) is the practice of applying DevOps principles to machine learning workflows, with the goal of streamlining the deployment and management of machine learning models in production environments. MLOps involves a combination of automation, collaboration, and monitoring to ensure that machine learning models are scalable, reliable, and maintainable.

Azure Machine Learning service is a cloud-based platform that helps organizations implement MLOps best practices with engineering rigor. It brings together people, process, and platform to automate ML-infused software delivery & adds continuous value to users. It provides a range of tools and services for data scientists, machine learning engineers, and IT professionals to collaborate on building, training, deploying, and managing machine learning models at scale.

Azure Machine Learning service supports a variety of popular machine learning frameworks and programming languages, and provides tools for automating model training, hyperparameter tuning, and model evaluation. It also integrates with DevOps tools like Azure DevOps, GitHub, and Jenkins, enabling teams to implement continuous integration and continuous deployment (CI/CD) pipelines for machine learning models.

With Azure Machine Learning service, organizations can apply engineering rigor to their machine learning workflows, ensuring that models are built and deployed with the same level of quality and reliability as traditional software applications. By implementing MLOps best practices, organizations can accelerate the time-to-value of their machine learning investments and maximize the impact of their data science initiatives.

In this blog we will learn about Training ML Model keeping in mind the people ( data engineers, data scientists, machine learning engineers) and through a simple example explore the power of AML and learn about its capabilities in building and delivering scalable ML solutions.

“This blog focuses on ML Training.” GitHub Reference []

MLOPs — Azure Machine Learning (AML) (4)

AML supports several kinds of attachable compute, VM instance compute, VM compute clusters, Attached Computes in form of Synapse Spark Pool and Kubernetes Clusters for training ML models.

We begin by provisioning 2 Azure Machine Learning Workspaces (amlws and amlws2) and an instance of AML Registry (amlregistry).

Azure Machine Learning Registry is a service (currently in Preview) provided by Microsoft Azure that allows users to create, manage, and deploy machine learning models at scale. It is a centralized repository for storing and versioning machine learning models, pipelines, and other artifacts related to the machine learning workflow.

MLOPs — Azure Machine Learning (AML) (5)

The Azure Machine Learning Registry enables data scientists and machine learning engineers to collaborate and share models with their team members securely. It also provides features such as version control, auditing, and access control to manage the lifecycle of the models. With the registry, users can easily discover and reuse pre-built models, which can help accelerate the development and deployment of machine learning applications.

Furthermore, the Azure Machine Learning Registry integrates with other Azure services such as Azure Kubernetes Service (AKS) and Azure Container Instances (ACI), making it easy to deploy and manage models in production environments.

amlws “acts as our pre-production workspace” which hosts all 4 types of computes, registry, tracking, workflow (pipelines), a unique MLFLow URI (azureml://<subscriptionId>/resourceGroups/mlops/providers/Microsoft.MachineLearningServices/workspaces/amlws) and model artifacts. The operations team maintains this workspace and provisions required compute on this instance for enabling DS experimentation. This Workspace (AMLWS) is used by the DE community for building data preparation pipelines & the Data Scientist community uses it for model training on this workspace. The Machine Learning Engineering team finally leverages the outcome from this workspace to promote to the AML registry & then eventually host the artifacts for serving. To re-iterate, this instance also hosts the central instance of MLFLow, a model tracking server for hosting all model artifacts. All local or federated trainings are captured on this MLFlow instance for central calibration, evaluation and considerations for promoting to production.

MLOPs — Azure Machine Learning (AML) (6)

amlws2 “Is an example of an instance of federated training” it is merely a cloud replacement of local training space used by the Data Scientists to train ML solutions while pointing to amlws for central tracking.

Illustration 1 : A DS developing training script on amlws2 yet pointing to amlws for compute all model tracking and training.

MLOPs — Azure Machine Learning (AML) (7)
MLOPs — Azure Machine Learning (AML) (8)

The training is conducted as a Pipeline (Step) which submits a script to train IRIS Model using sparkcompute (Synapse Spark Pool Compute) attached to AMLWS AML Workspace. The tracking MLFlow Server is also that of AMLWS.

MLOPs — Azure Machine Learning (AML) (9)
MLOPs — Azure Machine Learning (AML) (10)
MLOPs — Azure Machine Learning (AML) (11)

For Validation: We confirm that there is no Pipeline Created on AMLWS2

MLOPs — Azure Machine Learning (AML) (12)

AMLWS Workspace — Demonstrates a Pipeline Run for Experiment AMLWS2-IRIS-V1

MLOPs — Azure Machine Learning (AML) (13)
MLOPs — Azure Machine Learning (AML) (14)

From the run artifacts we are now positioned to register the model on the AMLWS workspace.

MLOPs — Azure Machine Learning (AML) (15)

In this illustration we have observed how we have designated an instance of AML as HUB and used a spoke AML instance to train and track centrally.

MLOPs — Azure Machine Learning (AML) (16)

Illustration 2 : Training on Local leveraging Local Compute and central Tracking

In this example we have a Data Scientist training a model on local machine (on local compute) while leveraging the central AMLWS MLFLow to record the experimentation details.

MLOPs — Azure Machine Learning (AML) (17)

Below we observe the trained experiment has been recorded on the workspace AMLWS.

MLOPs — Azure Machine Learning (AML) (18)
MLOPs — Azure Machine Learning (AML) (19)

The workspace has recorded the Model generated out of the training.

MLOPs — Azure Machine Learning (AML) (20)

We illustrated a local training and central tracking scenario.

MLOPs — Azure Machine Learning (AML) (21)

Illustration 3: Azure Kubernetes Service as Compute for Training and Tracking on MLFLow on the Workspace.

This is an interesting one, we leverage the attached AKS Cluster to train our model on AKS. AML does an excellent job at OOB containerization and deploying all code with needed dependencies for training on the attached AKS. Refer [Click on the hyperlink : To learn about steps to attach AKS as Compute to AML Workspace Instance]

MLOPs — Azure Machine Learning (AML) (22)
MLOPs — Azure Machine Learning (AML) (23)

We have trained the model on Azure Kubernetes Service, which provides a scalable service for training demanding models.

MLOPs — Azure Machine Learning (AML) (24)

Illustration 4: Training on VM Cluster Compute and Track with MLFLow

MLOPs — Azure Machine Learning (AML) (25)
MLOPs — Azure Machine Learning (AML) (26)

All Models Trained So Far:

MLOPs — Azure Machine Learning (AML) (27)
MLOPs — Azure Machine Learning (AML) (28)

AML Registry: Lets register some models to AML Registry

In this section we evaluate the training metrics for all the experiments and based on our evaluation register the qualified model into the AML Registry for inferencing.

MLOPs — Azure Machine Learning (AML) (29)
MLOPs — Azure Machine Learning (AML) (30)

Lets suppose we are satisfied by the AKS and VM Cluster training outcomes, we now register them into AML registry.

MLOPs — Azure Machine Learning (AML) (31)
MLOPs — Azure Machine Learning (AML) (32)

Pipeline Scheduling As Workflow Management

AML also enables efficient workflow management. For an identified Pipeline requiring recurrent trainings, the pipelines can be scheduled for re-training and re-run very efficiently both through Python SDK V2, UI & Azure CLI. The pipelines

MLOPs — Azure Machine Learning (AML) (33)

AML pipeline is a sequence of connected steps or stages that together perform a machine learning workflow. A pipeline can consist of multiple steps such as data preparation, model training, evaluation, and deployment. Each step in a pipeline can be a job or a set of jobs. Pipelines can be created and managed through the Azure Machine Learning SDK or the Azure portal.

An AML job is a unit of work that runs a specified script or command on a specified compute target. Jobs can be used for various tasks such as training a machine learning model, running batch inferencing on a large dataset, or tuning hyperparameters of a model. Jobs can be submitted through the Azure Machine Learning SDK or through the Azure portal.

Reference []

MLOPs — Azure Machine Learning (AML) (34)

Complex Pipeline Demonstration:

MLOPs — Azure Machine Learning (AML) (35)

Lastly we will create a complex pipeline with a slightly optimized workflow where we will use 3 computes for respective 3 steps and run [Step 1, Step 2 & Step 3] all in parallel to demonstrate such on AML. AML Pipeline provides us with a robust capability to create desired workflows.

MLOPs — Azure Machine Learning (AML) (36)
MLOPs — Azure Machine Learning (AML) (37)
MLOPs — Azure Machine Learning (AML) (38)
MLOPs — Azure Machine Learning (AML) (39)
MLOPs — Azure Machine Learning (AML) (40)

Similarly we could also setup dependency on the pipeline steps to run sequentially.

Overall in this blog we have learnt about the Azure Machine Learning and its capability in offering MLOPs. This blog simply unpacked several facets of the service from the training lens. It opens the door for you to explore much more such as feature store, environments and DevOps for CI/CD releases and more. Hope you have drawn value from this article and it gets you one step closer to provisioning your first AML workspace for your organization’s ML-infused software development & solutions!

As a pre-read to the deployment and inferencing blog, I recommend one of the best articulated vlogs on this topic at from our very own Nishant Thacker. He narrates deeply knowledgeable summary of AML deployment and inferencing, hope you enjoy it!

Top Articles
Latest Posts
Article information

Author: Margart Wisoky

Last Updated: 04/14/2023

Views: 6524

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Margart Wisoky

Birthday: 1993-05-13

Address: 2113 Abernathy Knoll, New Tamerafurt, CT 66893-2169

Phone: +25815234346805

Job: Central Developer

Hobby: Machining, Pottery, Rafting, Cosplaying, Jogging, Taekwondo, Scouting

Introduction: My name is Margart Wisoky, I am a gorgeous, shiny, successful, beautiful, adventurous, excited, pleasant person who loves writing and wants to share my knowledge and understanding with you.