Skip to content
Snippets Groups Projects
PMLUserGuide.adoc 218 KiB
Newer Older
:docinfo:
Code's avatar
Code committed
:toc:
:toc-title: PML User Guide
= ProActive Machine Learning
Code's avatar
Code committed
include::../common-settings.adoc[]
include::../all-doc-links.adoc[]
=== What is ProActive Machine Learning (PML)?

include::references/Overview.adoc[]

=== Glossary

include::references/Glossary.adoc[]
== Get Started
To submit your first Machine Learning (ML) workflow to *ProActive Scheduler*, link:../admin/ProActiveAdminGuide.html#_run_the_proactive_scheduler[install] it in
your environment (default credentials: admin/admin) or just use our demo platform https://try.activeeon.com[try.activeeon.com^].

*ProActive Scheduler* provides comprehensive interfaces that allow to:

- +++Create workflows using <a class="studioUrl" href="/studio" target="_blank">ProActive Workflow Studio</a>+++
- +++Submit workflows, monitor their execution and retrieve the tasks results using <a class="schedulerUrl" href="/scheduler" target="_blank">ProActive Scheduler Portal</a>+++
- +++Add resources and monitor them using <a class="rmUrl" href="/rm" target="_blank">ProActive Resource Manager Portal</a>+++
- +++Version and share various objects using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/catalog-portal" target="_blank">ProActive Catalog Portal</a>+++
- +++Provide an end-user workflow submission interface using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/workflow-automation" target="_blank">Workflow Execution Portal</a>+++
- +++Generate metrics of multiple job executions using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/job-analytics" target="_blank">Job Analytics Portal</a>+++
- +++Plan workflow executions over time using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/job-planner-execution-planning" target="_blank">Job Planner Portal</a>+++
- +++Add services using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/service-automation" target="_blank">Service Automation Portal</a>+++
- +++Perform event based scheduling using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/event-orchestration" target="_blank">Event Orchestration Portal</a>+++
- +++Control manual workflows validation steps using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/notification-portal" target="_blank">Notification Portal</a>+++
Fabien Viale's avatar
Fabien Viale committed
We also provide a +++<a class="restUrl" href="/rest" target="_blank">REST API</a>+++ and <<../user/ProActiveUserGuide.adoc#_scheduler_command_line,command line interfaces>> for advanced users.


== Create a First Predictive Solution

Suppose you need to predict houses prices based on this information (features) provided by the estate agency:

- *CRIM* per capita crime rate by town
- *ZN* proportion of residential lawd zoned for lots over 25000
- *INDUS* proportion of non-retail business acres per town
- *CHAS* Charles River dummy variable
- *NOX* nitric oxides concentration
- *RM* average number of rooms per dwelling
- *AGE* proportion of owner-occupied units built prior to 1940
- *DIS* weighted distances to five Boston Employment centres
- *RAD* index of accessibility to radial highways
- *TAX* full-value property-tax rate per $10 000
- *PTRATIO* pupil-teacher ratio by town
- *B* 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- *LSTAT* % lower status of the population
- *MDEV* Median value of owner-occupied homes in $1000' s

Predicting houses prices is a complex problem, but we can simplify it a bit for this step-by-step example. We'll show you how you can easily create a predictive analytics solution using PML.

=== Manage the Canvas

To use PML, you need to add the *Machine Learning Bucket* as main catalog in the ProActive Studio. This bucket contains a set of generic tasks that enables you to upload and prepare data, train a model and test it.
A. Open +++<a class="studioUrl" href="/studio" target="_blank">ProActive Workflow Studio</a>+++ home page.
B. Create a new workflow.
C. Change palette preset to `Machine Learning`.
D. Click on `machine-learning` catalog and pin it open, and same for the `data-visualization` catalog.
E. Organize your canvas.
NOTE: Change palette preset allows the user to visualise different set of catalogs in the studio. 
image::manage_canvas.gif[100000,2000]

=== Upload Data

To upload data into the Workflow, you need to use a dataset stored in a CSV file.

Caroline Pacheco's avatar
Caroline Pacheco committed
A. Once dataset has been converted to *CSV* format, upload it into a cloud storage service for example https://aws.amazon.com/s3[Amazon S3^].
For this tutorial, we will use Boston house prices dataset available on this link:
https://s3.eu-west-2.amazonaws.com/activeeon-public/datasets/boston-houses-prices.csv

B. Drag and drop the <<Import_Data>> task from the *machine-learning* bucket in the ProActive Machine Learning.
C. Click on the task and click `General Parameters` in the left to change the default parameters of this task.
D. Put in *FILE_URL* variable the S3 link to upload your dataset.
E. Set the other parameters according to your dataset format.

This task uploads the data into the workflow that we can for model training and testing.

If you want to skip these steps, you can directly use the <<Load_Boston_Dataset>> Task by a simple drag and drop.

image::upload_data.gif[100000,2000]

=== Prepare Data

This step consists of preparing the data for the training and testing of the predictive model. So in this example, we will simply split our datset into two separate datasets: one for training and one for testing.

To do this, we use the <<Split_Data>> Task in the machine_learning bucket.

A. Drag and drop the <<Split_Data>> Task into the canvas, and connect it to the <<Import_Data>> or <<Load_Boston_Dataset>> Task.
B. By default, the ratio is 0.7 this means that 70% of the dataset will be used for training the model and 0.3 for testing it.
C. Click the <<Split_Data>> Task and set the *TRAIN_SIZE* variable to 0.6.
image::prepare_data.gif[100000,2000]
=== Train a Predictive Model
Using PML, you can easily create different ML models in a single experiment and compare their results. This type of experimentation helps you find the best solution for your problem.
You can also enrich the `machine-learning` bucket by adding new ML algorithms and publish or customize an existing task according to your requirements as the tasks are open source.

NOTE: To change the code of a task click on it and click the `Task Implementation`. You can also add new variables to a specific task.

In this step, we will create two different types of models and then compare their scores to decide which algorithm is most suitable to our problem. As the Boston dataset used for this example consists of predicting price of houses (continuous label). As such, we need to deal with a regression predictive problem.
To solve this problem, we have to choose a regression algorithm to train the predictive model. To see the available regression algorithms available on the PML, see *ML Regression* Section in the *machine-learning* bucket.

For this example, we will use <<Linear_Regression>> Task and <<Support_Vector_Regression>> Task.

A. Find the <<Linear_Regression>> Task and <<Support_Vector_Regression>> Task and drag them into the canvas.
B. Find the <<Train_Model>> Task and drag it twice into the canvas and set its LABEL_COLUMN variable to LABEL.
C. Connect the <<Split_Data>> Task to the two <<Train_Model>> Tasks in order to give it access to the training data. Connect then the <<Linear_Regression>> Task to the first <<Train_Model>> Task and <<Support_Vector_Regression>> to the second <<Train_Model>> Task.
D. To be able to download the model learned by each algorithm, drag two <<Download_Model>> Tasks and connect them to each <<Train_Model>> Task.
image::train_a_predictive_model.png[100000,2000]

=== Test the Predictive Model

To evaluate the two learned predictive models, we will use the testing data that was separated out by the <<Split_Data>> Task to score our trained models. We can then compare the results of the two models to see which generated better results.

A. Find the <<Predict_Model>> Task and drag and drop it twice into the canvas and set its LABEL_COLUMN variable to LABEL.
B. Connect the first <<Predict_Model>> Task to the <<Train_Model>> Task that is connected to <<Support_Vector_Regression>> Task.
C. Connect the second <<Predict_Model>> Task to the <<Train_Model>> Task that is connected to <<Linear_Regression>> Task.
D. Connect both <<Predict_Model>> Tasks to the <<Split_Data>> Task.
E. Find the <<Preview_Results>> Task in the ML bucket and drag and drop it twice into the canvas.
F. Connect each <<Preview_Results>> Task with <<Predict_Model>>.

image::test_the_predictive_model.png[100000,2000]
NOTE: if you have a pickled file (.pkl) containing a predictive model that you have learned using another platform, and you need to test it in the PML, you can load it using *Import_Model* Task.

=== Run the Experiment and Preview the Results

Now the workflow is completed, let's execute it by:
A. Click the *Execute* button on the menu to run the workflow.
B. Click the *Scheduling & Orchestration* button to track the workflow execution progress.
C. Click the Visualization tab and track the progress of your workflow execution (a green check mark appears on each Task when its execution is finished).
D. Visualize the output logs by clicking on the output tab and check the streaming check box.
E. Click the *Tasks* tab, select a *Preview_Results* task and click on the *Preview* tab, then click either on *Open in browser* to preview the results on your browser or on *Save as file* to download the results locally.
image::execute.gif[100000,2000]
The `auto-ml-optimization` bucket contains the `Distributed_Auto_ML` workflow that can be easily used to find the operating parameters for any system whose performance can be measured as a function of adjustable parameters.
Imen Bizid's avatar
Imen Bizid committed
It is an estimator that minimizes the posterior expected value of a loss function.
This bucket also comes with a set of workflows' examples that demonstrates how we can optimize mathematical functions, PML workflows and machine/deep learning algorithms from scripts using AutoML tuners.
In the following subsections, several tables represent the main variables that characterize the AutoML workflows.
In addition to the variables mentioned below, there is a set of generic variables that are common between all workflows
which can be found in the subsection <<AI Workflows Common Variables>>.
image::AutoML_1.png[align=center]

The `Distributed_Auto_ML` workflow proposes six algorithms for distributed hyperparameters' optimization. The choice of the
sampling/search strategy depends strongly on the tackled problem.
`Distributed_Auto_ML` workflow comes with specific pipelines (parallel or sequential) and visualization tools
(https://github.com/fossasia/visdom[Visdom^] or https://www.tensorflow.org/tensorboard/[TensorBoard^]) as described in the subsections below.
image::AutoML_2.png[align=center]

[cols="2,5,2"]
|===
| *Variable name* | *Description* | *Type*
| `TUNING_ALGORITHM`
| Specifies the tuner algorithm that will be used for hyperparameter optimization.
| List [Bayes, Grid, Random, QuasiRandom, CMAES, MOCMAES] (default=Random)
|  Specifies the number of maximum iterations. It should be an integer number higher than zero. Set `-1` for an infinite loop.
| `PARALLEL_EXECUTIONS_PER_ITERATION`
|  Specifies the number of parallel executions per iteration. It should be an integer number higher than zero.
Imen Bizid's avatar
Imen Bizid committed
| Int (default=2)
| `NUMBER_OF_REPETITIONS`
|  Specifies the number of hyperparameter sampling repetitions. Ensures every experiment is repeated a given number of times. It should be an integer number higher than one. Set `-1` to never see repetitions.
| Int (default=-1)
| `PAUSE_AFTER_EVERY_ITERATIONS`
|  If higher than zero, pause the workflow after every specified number of iterations. Set `-1` to disable.
| Int (default=-1)
| `STOP_IF_LOSS_IS_LOWER_THAN`
|  If higher than zero, stop the workflow execution if loss is lower than the specified value. Set `-1` to disable.
| Int (default=-1)
| `TARGET_WORKFLOW`
| Specifies the workflow path from the catalog that should be optimized.
| String (default=auto-ml-optimization/Himmelblau_Function)
| `TARGET_NATIVE_SCHEDULER`
| Name of the native scheduler node source to use on the target workflow tasks when deployed inside a cluster such as SLURM, LSF, etc.
| String (default=empty)
| `TARGET_NATIVE_SCHEDULER_PARAMS`
| Parameters given to the native scheduler (SLURM, LSF, etc) while requesting a ProActive node used to deploy the target workflow tasks.
| String (default=empty)
| `TARGET_NODE_ACCESS_TOKEN`
| If not empty, the target workflow tasks will be run only on nodes that contains the specified token.
| String (default=empty)
| `TARGET_NODE_SOURCE_NAME`
| If not empty, the target workflow tasks will be run only on nodes belonging to the specified node source.
| String (default=empty)
| `TARGET_CONTAINER_PLATFORM`
| Specifies the container platform to be used for executing the target workflow tasks.
| List [no-container, docker, podman, singularity] (default=empty)
| `TARGET_CONTAINER_IMAGE`
| Specifies the name of the container image that will be used to run the target workflow tasks.
| List [docker://activeeon/dlm3, docker://activeeon/cuda, docker://activeeon/cuda2, docker://activeeon/rapidsai, docker://activeeon/nvidia:rapidsai, docker://activeeon/nvidia:pytorch, docker://activeeon/nvidia:tensorflow, docker://activeeon/tensorflow:latest, docker://activeeon/tensorflow:latest-gpu] (default=empty)
| `TARGET_CONTAINER_GPU_ENABLED`
| If True, it will activate the use of GPU for the target workflow tasks on the selected container platform.
| Boolean (default=empty)
| `TARGET_NVIDIA_RAPIDS_ENABLED`
| If True, it will activate the use of NVIDIA RAPIDS for the target workflow tasks on the selected container platform.
| Boolean (default=empty)
| `VISDOM_ENABLED`
| If True, the Visdom service is started allowing the user to visualize the hyperparameter optimization using the Visdom web interface.
| Boolean (default=False)
| `VISDOM_PROXYFIED`
| If True, requests to Visdom are sent via a proxy server.
| Boolean (default=False)
| `TENSORBOARD_ENABLED`
| If True, the TensorBoard service is started allowing the user to visualize the hyperparameter optimization using the TensorBoard web interface.
| Boolean (default=False)
| `TENSORBOARD_PROXYFIED`
| If True, requests to TensorBoard are sent via a proxy server.
| Boolean (default=False)
image::AutoML_Full.png[align=center]
*How to define the search space:*

This subsection describes common building blocks to define a search space:

    - uniform: Uniform continuous distribution.
    - quantized_uniform: Uniform discrete distribution.
    - log: Logarithmic uniform continuous distribution.
    - quantized_log: Logarithmic uniform discrete distribution.
    - choice: Uniform choice distribution between non-numeric samples.

*Which tuner algorithm to choose?*

The choice of the tuner depends on the following aspects:

    - Time required to evaluate the model.
    - Number of hyperparameters to optimize.
    - Type of variable.
    - The size of the search space.

In the following, we briefly describe the different tuners proposed by the `Distributed_Auto_ML` workflow:
    - *Grid sampling* applies when all variables are discrete, and the number of possibilities is low. A grid search is a naive approach that will simply try all possibilities making the search extremely long even for medium-sized problems.
    - *Random sampling* is an alternative to grid search when the number of discrete parameters to optimize, and the time required for each evaluation is high. Random search picks the point randomly from the configuration space.
    - *QuasiRandom sampling* ensures a much more uniform exploration of the search space than traditional pseudo random. Thus, quasi random sampling is preferable when not all variables are discrete, the number of dimensions is high, and the time required to evaluate a solution is high.
    - *Bayes search* models the search space using gaussian process regression, which allows an estimation of the loss function, and the uncertainty on that estimate at every point of the search space. Modeling the search space suffers from the curse of dimensionality, which makes this method more suitable when the number of dimensions is low.
    - *CMAES search* (Covariance Matrix Adaptation Evolution Strategy) is one of the most powerful black-box optimization algorithm.
      However, it requires a significant number of model evaluation (in the order of 10 to 50 times the number of dimensions) to converge to an optimal solution. This search method is more suitable when the time required for a model evaluation is relatively low.
    - *MOCMAES search* (Multi-Objective Covariance Matrix Adaptation Evolution Strategy) is a multi-objective algorithm optimizing multiple tradeoffs simultaneously. To do that, MOCMAES employs a number of CMAES algorithms.
Here is a table that summarizes when to use each algorithm.
|===
| *Algorithm* | *Time* | *Dimensions* | *Continuity* | *Conditions* | *Multi-objective*
| `Grid`
| `Low`
| `Low`
| `Discrete`
| `Yes`
| `No`
| `Random`
| `High`
| `High`
| `Discrete`
| `Yes`
| `No`
| `QuasiRandom`
| `High`
| `High`
| `Mixed`
| `Yes`
| `No`
| `Bayes`
| `High`
| `Medium`
| `Mixed`
| `Yes`
| `No`
| `CMAES`
| `Low`
| `Low`
| `Mixed`
| `No`
| `No`
| `MOCMAES`
| `Low`
| `Low`
| `Mixed`
| `No`
| `Yes`
|===

The following workflows represent some mathematical functions that can be optimized by the `Distributed_Auto_ML` tuners.
*Himmelblau_Function:* is a multi-modal function containing four identical local minima. It's used to test the performance of optimization algorithms. For more info, please click https://en.wikipedia.org/wiki/Himmelblau%27s_function[here].
image::Himmelblau_Function.png[448,336,align=center]
https://al-roomi.org/benchmarks/unconstrained/2-dimensions/56-himmelblau-s-function[Mathematical Expression]
image::himmelblau_math.png[948,736,align=center]

*Kursawe_Multiobjective_Function:* is a multiobjective function proposed by http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.8050[Frank Kursawe]. It has two objectives (f1, f2) to minimize. For more info, please click https://deap.readthedocs.io/en/master/api/benchmarks.html#deap.benchmarks.kursawe[here].

image::Kursawe_Multiobjective_Function.png[648,536,align=center]

https://al-roomi.org/benchmarks/multi-objective/unconstrained-list/322-kursawe-s-function-kur[Mathematical Expression]

image::kursawe_math.png[548,436,align=center]
=== Hyperparameter Optimization
The following workflows represent some machine learning and deep learning algorithms that can be optimized.
These workflows have several common variables as in `Distributed_Auto_ML`. Some workflows are characterized
by few additional variables.
*CIFAR_10_Image_Classification:* trains a simple deep CNN on the CIFAR10 images dataset using the Keras library.
.CIFAR_10_Image_Classification Variables
[cols="2,5,5"]
Imen Bizid's avatar
Imen Bizid committed
|===
| *Variable name* | *Description* | *Type*
| `NUM_EPOCHS`
| The number of times data is passed forward and backward through the training algorithm.
| `INPUT_VARIABLES`
| A set of specific variables (usecase-related) that are used in the model training process.
| JSON format
| `SEARCH_SPACE`
| Specifies the representation of the search space which has to be defined using dictionaries or by entering the path of a json file stored in the catalog.
Imen Bizid's avatar
Imen Bizid committed
| `INSTANCE_NAME`
| Specifies the name to be provided for the instance.
| String (default=tensorboard-server)
| `CONTAINER_LOG_PATH`
| Specifies the path where the docker logs are created and stored on the docker container.
| String (default=/graphs/$INSTANCE_NAME)
| `CONTAINER_ROOTLESS_ENABLED`
| If True, the user will be able to run the workflow in a rootless mode.
The following workflows have common variables with the above illustrated workflows.
*CIFAR_10_Image_Classification:* trains a simple deep CNN on the CIFAR10 images dataset using the Keras library.
*CIFAR_100_Image_Classification:* trains a simple deep CNN on the CIFAR100 images dataset using the Keras library.
*Image_Object_Detection:* trains a YOLO model on the coco dataset using PML deep learning generic tasks.
*Digits_Classification:* python script illustrating an example of multiple machine learning models optimization.
*Text_Generation:* trains a simple Long Short-Term Memory (LSTM) to learn sequences of characters from 'The Alchemist' book. It's a novel by Brazilian author Paulo Coelho that was first published in 1988.
=== Neural Architecture Search
The following workflows contain a search space containing a set of possible neural networks architectures that can be used by `Distributed_Auto_ML` to automatically find the best combinations of neural architectures within the search space.
*Single_Handwritten_Digit_Classification:* trains a simple deep CNN on the MNIST dataset using the PyTorch library. This example allows to search for two types of neural architectures defined in the Handwritten_Digit_Classification_Search_Space.json file.
*Multiple_Objective_Handwritten_Digit_Classification:* trains a simple deep CNN on the MNIST dataset using the PyTorch library. This example allows optimizing multiple losses, such as accuracy,  number of parameters, and memory access cost (MAC) measure.

=== Distributed Training

The following workflows illustrate some examples of multi-node and multi-gpu distributed learning.
*TensorFlow_Keras_Multi_Node_Multi_GPU:* is a TensorFlow + Keras workflow template for distributed training (multi-node multi-gpu) with AutoML support.
*TensorFlow_Keras_Multi_GPU_Horovod:* is a Horovod workflow template that support multi-gpu and AutoML.
The following workflows represent python templates that can be used to implement a generic machine learning task.
*Python_Task:* is a simple Python task template pre-configured to run with `Distributed_Auto_ML`.
*R_Task:* is a simple R task template pre-configured to run with `Distributed_Auto_ML`.

[[_FL]]

== Federated Learning (FL)

https://ai.googleblog.com/2017/04/federated-learning-collaborative.html[Federated Learning (FL)] enables to train an algorithm across multiple decentralized devices (or servers) holding local data samples, without exchanging them.
The `federated-learning` bucket contains a few examples of Federated Learning workflows that can be easily used to build a common and robust machine learning model without sharing data, thus allowing to address critical issues such as data privacy, data security, data access rights and access to heterogeneous data.
This bucket uses the https://flower.dev/[Flower] library to implement federated learning workflows.
The https://flower.dev/[Flower] library is a friendly federated learning framework that presents a unified approach for federated learning.
It help federating any workload using any ML framework, and any programming language.

image::FlowerArchitecture.png[align=center]

=== PyTorch Federated Learning Tasks

The following workflows represent a client/server templates that can be used to implement a Federated Learning workflow using PyTorch.

*PyTorch_FL_Client_Task:* is a Federated Learning Client task template using PyTorch.

*PyTorch_FL_Server_Task:* is a Federated Learning Server task template using PyTorch.

=== TensorFlow Federated Learning Tasks

The following workflows represent a client/server templates that can be used to implement a Federated Learning workflow using TensorFlow/Keras.

*TensorFlow_FL_Client_Task:* is a Federated Learning Client task template using TensorFlow/Keras.

*TensorFlow_FL_Server_Task:* is a Federated Learning Server task template using TensorFlow/Keras.

=== Federated Learning Workflows

The following workflows uses the federated learning to train a deep Convolutional Neural Network (ConvNet/CNN) on the https://www.cs.toronto.edu/~kriz/cifar.html[CIFAR10 images dataset] using the https://flower.dev/[Flower] library.

*PyTorch_Federated_Learning_Example:* shows an example of Federated Learning workflow using PyTorch.

*TensorFlow_Federated_Learning_Example:* shows an example of Federated Learning workflow using TensorFlow/Keras.


References:

1. http://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf[Communication-Efficient Learning of Deep Networks from Decentralized Data]

2. https://arxiv.org/pdf/2007.14390.pdf[Flower: A Friendly Federated Learning Research Framework]

== Model as a Service for Machine Learning (MaaS_ML)
Once a predictive model is built, tested and validated, you can easily use it in real world production pipelines by deploying it as a REST Web Service via the MaaS_ML service.
MaaS_ML is dedicated to make deployments of lightweight machine learning (ML) models simple, portable, and scalable, and to easily manage their lifetimes. This will be particularly useful for engineering or business teams that want to take advantage of this model.
The life cycle of any MaaS_ML instance (i.e., from starting the generic service instance, deploying an AI specific model to pausing or deleting the instance) can be managed in three different ways in PML :
Imen Bizid's avatar
Imen Bizid committed

- Using the *Studio Portal* and more specifically the bucket *model-as-a-service* where specific generic tasks are provided to process all the possible actions (i.e., MaaS_ML_Service_Start, MaaS_ML_Deploy_Model, MaaS_ML_Call_Prediction, MaaS_ML_Actions[Finish/Pause/Resume]).
These tasks can be easily integrated to your AI pipelines/workflows as you can see in this <<Deployment Pipeline Example>>.
- Using the *Service Automation Portal* by executing the different actions associated to MaaS_ML (i.e. Deploy_ML_Model, Pause_MaaS_ML, Update_MaaS_ML, Finish_MaaS_ML.)
- Using the *Swagger UI* which is accessible once the MaaS_ML instance is up and running.
Imen Bizid's avatar
Imen Bizid committed

Once a MaaS_ML instance is up and running, it could be used for:
- *AI Model Deployment or Update*: the user has to provide a valid specific AI Model identifier in order to deploy the model of his/her choice.
- *Call of Predictions*: when a specific AI model is running, the user can request predictions for a specific payload. This latter has to be converted into json data in order to get prediction values.
- *Deploy a New Specific AI Model*: the running generic AI model can be used to deploy a new specific AI model.

Using MaaS_ML, you can easily deploy and use any machine learning model as a REST Web Service on a physical or a virtual compute host on which there is an available ProActive Node. Going through the ProActive Scheduler,
you can also trigger the deployment of a specific VM using the Resource Manager elastic policies, and, eventually, deploy a Model-Service on that specific node.
Imen Bizid's avatar
Imen Bizid committed

In the following subsections, we will illustrate the MaaS_ML instance life cycle, from starting the generic service instance,
deploying a specific model, pausing it, to deleting the instance. We will also describe how the MaaS_ML instance life cycle
can be managed via four different ways in PML:
. <<MaaS_ML Via Workflow Execution Portal>>
. <<MaaS_ML Via Studio Portal>>
. <<MaaS_ML Via Service Automation Portal>>
. <<MaaS_ML Via Swagger UI>>
In the description below, multiple tables represent the main variables that characterize the MaaS_ML workflows.
In addition to the variables mentioned below, there is a set of generic variables that are common between all workflows
which can be found in the subsection <<AI Workflows Common Variables>>.
The management of the life cycle of MaaS_ML will be detailed in the next subsections.

=== MaaS_ML Via Workflow Execution Portal

Open the link:https://try.activeeon.com/automation-dashboard/#/portal/workflow-execution[Workflow Execution Portal].

Click on the button *Submit a Job* and then search for *MaaS_ML_Service* workflow as described in the image below.

Imen Bizid's avatar
Imen Bizid committed
image::MaaS_ML_Search.png[align=center]

Check the service parameters and click on the *Submit* button to start a MaaS_ML service instance.

To get more information about the parameters of the service, please check the section <<Start a Generic Service Instance>>.

Imen Bizid's avatar
Imen Bizid committed
image::MaaS_ML_Submit.png[align=center]
You can now monitor the service status, access its endpoint and execute its different actions:

- Deploy_ML_Model : enables you to deploy a trained ML model in one click.
- Update_MaaS_ML_Parameters : enables you to update the parameters of the service instance.
- Finish_MaaS_ML : stops and deletes the service instance.

Imen Bizid's avatar
Imen Bizid committed
image::MaaS_ML_Workflow_Management.png[align=center]

When you are done with the service instance, you can terminate it by clicking on *Terminate_Job_and_Service* button as shown in the image below.

image::Terminate_MaaS_ML.png[align=center]

=== MaaS_ML Via Studio Portal
==== Start a Generic Service Instance
Open the link:https://try.activeeon.com/studio[Studio Portal].

Create a new workflow.

Add the `model_as_a_service` bucket by clicking in the `View` menu field > `Add Bucket Menu to the Palette` > `model_as_a_service`.
Drag and drop the `MaaS_ML_Service_Start` task from the bucket.
Execute the workflow by setting the different workflow's variables as described in the Table below.
.MaaS_ML_Service_Start variables
[cols="2,5,2"]
|===
|*Variable name* | *Description* | *Type*
3+^|*Workflow variables*
| `MODEL_SERVICE_INSTANCE_NAME`
| Service instance name.
| String (default="maas_ml-${PA_JOB_ID}").
| Allows access to the endpoint through a HTTP(s) Proxy.
| Boolean (default=False).
| `MODEL_SERVICE_ENTRYPOINT`
| This entry script starts the service and defines the different functions to deploy the model, scores the prediction requests based on the deployed model, and returns the results. This script is specific to your model. This file should be stored in the Catalog under the `model_as_service_resources` bucket. More information about this file can be found in the <<_customize_the_service>> section.
| String (default="ml_service").
| A YAML file that describes the OpenAPI Specification ver. 2 (known as Swagger Spec) of the service. This file should be stored in the catalog under the `model_as_service_resources` bucket. More information about the structure of this file can be found in the section <<_customize_the_service>>.
| String (default="ml_service-api").
| `MODEL_SERVICE_USER_NAME`
| A valid user name having the needed privileges to execute this action.
| String (default="user").
| `MODEL_SERVICE_NODE_NAME`
| The name of the node where the service will be deployed. If empty, the service will be deployed on an available node selected randomly.
| String (default=Empty)
| `USE_NVIDIA_RAPIDS`
| If True, the service will be configured to use the GPU and the Nvidia Rapids library.
| Boolean (default=False)
3+^|*Task variables*
| `SERVICE_ID`
| The name of the service. Please keep the default value for this variable.
| String (default="MaaS_ML")
| `INSTANCE_NAME`
| The name of the service that will be deployed.
| String (default="maas-ml-${PA_JOB_ID}")
| `ENGINE`
| Container engine.
| String (default="$CONTAINER_PLATFORM")
| `GPU_ENABLED`
| If True, the service will be configured to use the GPU and the Nvidia Rapids library.
| Boolean (default=False)
| `PROXIFIED`
| It takes by default the value of  `MODEL_SERVICE_PROXYFIED` workflow variable.
| String (default="$MODEL_SERVICE_PROXYFIED")
| `PYTHON_ENTRYPOINT`
| It takes by default the value of  `MODEL_SERVICE_ENTRYPOINT` workflow variable.
| String (default="$MODEL_SERVICE_ENTRYPOINT")
| `YAML_FILE`
| It takes by default the value of  `MODEL_SERVICE_YAML_FILE` workflow variable.
| String (default="$MODEL_SERVICE_YAML_FILE")
| `USER_NAME`
| It takes by default the value of  `MODEL_SERVICE_USER_NAME` workflow variable.
| String (default="$MODEL_SERVICE_USER_NAME")
| `NODE_NAME`
| It takes by default the value of  `MODEL_SERVICE_NODE_NAME` workflow variable.
| String (default="$MODEL_SERVICE_NODE_NAME")
| `GPU_ENABLED`
| If True, the service will be configured to use the GPU and the Nvidia Rapids library.
| Boolean (default=$USE_NVIDIA_RAPIDS)
==== Deploy a Specific ML Model
You can also deploy a specific ML model directly from the link:https://try.activeeon.com/studio[Studio Portal].
Drag and drop the `MaaS_ML_Deploy_Model` task from the Model-As-A-Service bucket.
Execute the workflow and set the different workflow's variables as follows:
[cols="2,5,2"]
|===
| *Variable name* | *Description* | *Type*
3+^|*Workflow variables*
| `CONTAINER_PLATFORM`
| Specifies the type of container platform to be used (no container, docker, singularity, or podman).
| String (default=docker)
| `CONTAINER_GPU_ENABLED`
| If True, containers will run based on images containing libraries that are compatible with GPU.
| Boolean (default=False)
| `CONTAINER_IMAGE`
| Specifies the name of the image that will be used to run the different workflow tasks.
| String (default=Empty).
| `SERVICE_TOKEN`
| A valid token generated by the MaaS_ML Service for user authentication.
| String (default=Empty).
3+^|*Task variables*
| `DEPLOY_MODEL_ENDPOINT`
| A URL endpoint defined by the user where the AI Model was deployed.
| URL (default=Empty).
| `API_EXTENSION`
| The base path to access the deployment endpoint.
| String (default="/api/deploy")
| `MODEL_URL`
| A valid URL specified by the user referencing the model that needs to be deployed.
| URL (default=  \https://activeeon-public.s3.eu-west-2.amazonaws.com/models[] )
| A valid token generated by the MaaS_ML Service for user authentication.
| `DRIFT_DETECTION_WINDOW_SIZE`
| The size of the data to be extracted from the old training dataset to be used as a baseline data for the drift detection.
| Integer (default=50).
| `MODEL_NAME`
| The name of the model to be deployed.
| String
| `MODEL_VERSION`
| The version number of the model that will be deployed.
| Integer (default=1)
| `BASELINE_DATA_URL`
| URL of the dataset to be deployed and used in the data drift detection process.
| URL (default=\https://activeeon-public.s3.eu-west-2.amazonaws.com/datasets/baseline_data.csv)
==== Call the Service for Prediction
Once the model is deployed, you can also call the service for prediction directly from the link:https://try.activeeon.com/studio[Studio Portal].
Drag and drop the `MaaS_ML_Call_Prediction` task from the Model-As-A-Service bucket.

Execute the Workflow and set the different workflow's variables as follows:

[cols="2,5,2"]
|===
| *Variable name* | *Description* | *Type*
3+^|*Workflow variables*
| `CONTAINER_PLATFORM`
| Specifies the type of container platform to be used (no container, docker, singularity, or podman).
| String (default=docker)
| `CONTAINER_GPU_ENABLED`
| If True, containers will run based on images containing libraries that are compatible with GPU.
| Boolean (default=False)
| `CONTAINER_IMAGE`
| Specifies the name of the image that will be used to run the different workflow tasks.
| String (default=Empty).
| `SERVICE_TOKEN`
| A valid token generated by the MaaS_ML Service for user authentication.
3+^|*Task variables*
| `PREDICT_MODEL_ENDPOINT`
| The endpoint of the started service.
| A valid token generated by the MaaS_ML Service for user authentication.
| `PREDICT_EXTENSION`
| The base path to access the prediction endpoint.
| String (default="/api/predict")
| `INPUT_DATA`
| Entry data that needs to be scored by the deployed model.
| JSON (default=Empty)
| `LABEL_COLUMN`
| Name of the label column. It needs to be set if data is labeled.
| String (default=Empty)
| `DATA_DRIFT_DETECTOR`
| Name of the data drift detector to be used in the drift detection process.
| List [HDDM, Page Hinkley, ADWIN] (default="HDDM")
| `MODEL_NAME`
| The name of the model to be deployed.
| String
| `MODEL_VERSION`
| The version number of the model that will be deployed.
| Integer (default=1)
| `SAVE_PREDICTIONS`
| Save the resulted predictions in order to be able to display them through the data analytics dashboard.
| boolean (default=False)
| `DRIFT_ENABLED`
| True if a detector is needed to detect data drifts in the input data based on the baseline data.
| boolean (default=False)
| `DRIFT_NOTIFICATION`
| True if the user needs to get a notification via Proactive if a data drift is detected.
| boolean (default=False)
==== Delete/Finish the Service
You can also delete the service instance using the link:https://try.activeeon.com/studio[Studio Portal].
Drag and drop the `MaaS_ML_Actions` task from the Model-As-A-Service bucket.
Execute the Workflow and set the different workflow's variables as follows:
[cols="2,5,2"]
|===
| *Variable name* | *Description* | *Type*
3+^|*Task variables*
| `ACTION`
| The action that will be processed regarding the service status.
| List [Pause_MaaS_ML, Resume_MaaS_ML, Finish_MaaS_ML] (default="Finish_MaaS_ML")
| `INSTANCE_NAME`
| The name of the service that the action will be processed on.
| String (default="maas-ml-${PA_JOB_ID}")
| `INSTANCE_ID`
| String (default=Empty)
|===
=== MaaS_ML Via Service Automation Portal
==== Start a Generic Service Instance

Open the link:https://try.activeeon.com/automation-dashboard/#/portal/service-automation[Service Automation Portal].

Search for `MaaS_ML` in Services Workflows List.

Set the following variables:
[#table_MaaS_ML]
.MaaS_ML variables
[cols="2,5,2"]
|===
| *Variable name* | *Description* | *Type*
| Pull and build the singularity image if the Singularity Image File (SIF) file is not available.
| `DEBUG_ENABLED`
| If True, the user will be able to examine the stream of output results of each task.
| Boolean (default=True)
| `DOCKER_IMAGE`
| Specifies the name of the Docker image that will be used to run the different workflow tasks.
| String (default="activeeon/maas_ml")
| `ENDPOINT_ID`
| The endpoint_id that will be used if `PROXYFIED` is set to True.
| String (default="maas-ml-gui")
| List (default="docker")
| `GPU_ENABLED`
| If True, the service will be configured to use the GPU and the Nvidia 
Rapids library.
| Boolean (default=False)
| `HTTPS_ENABLED`
| True if the protocol https is needed for the defined model-service.
| Boolean (default=False)
| `INSTANCE_NAME`
| The name of the service that will be deployed.
| String (default="maas-ml")
| `NODE_NAME`
| The name of the node where the service will be deployed. If empty, the service will be deployed on an available node selected randomly.
| String (default=Empty)
| `PROXYFIED`
| True if a proxy is needed to protect the access to this model-service endpoint.
| Boolean (default=False)
| `PYTHON_ENTRYPOINT`
| This entry script starts the service and defines the different functions to deploy the model, scores the prediction requests based on the deployed model, and returns the results. This script is specific to your model. This file should be stored in the Catalog under the `model_as_service_resources` bucket. More information about this file can be found in the <<_customize_the_service>> section.
| String (default="ml_service").
| Controls the port used to start the Model Service from Service Automation Portal. -1 for random port allocation.
| Integer (default="-1").
| `SINGULARITY_IMAGE_PATH`
| Location of the singularity image on the node file system (this path will be used to either store the singularity image or the image will be directly used if the file is present).
| String (default="/tmp/maas_ml.sif")
| `TRACE_ENABLED`
| True if the user wants to keep a trace on the different changes occurring in the service.
| Boolean (default=True)
| A YAML file that describes the OpenAPI Specification ver. 2 (known as Swagger Spec) of the service. This file should be stored in the catalog under the `model_as_service_resources` bucket. More information about the structure of this file can be found in the section <<_customize_the_service>>.
Click on `Execute Action` and follow the progress of the service creation.
image::MAAS_ML_Service_PSA.PNG[500,500,align=center]
==== Deploy a Specific ML Model
Once the status of your generic model service is displayed as `RUNNING` on `Service Automation`, you can deploy your model by following the steps below :
Select and execute the `Deploy_ML_Model` from Actions to deploy your model.
Set the Following variables:
[cols="2,5,2"]
|===
| *Variable name* | *Description* | *Type*
| `BASELINE_DATA_URL`
| URL of the dataset to be deployed and used in the data drift detection process.
| URL (default= \https://activeeon-public.s3.eu-west-2.amazonaws.com/datasets/baseline_data.csv)
| `MODEL_NAME`
| The name of the model to be deployed.
| String (default="iris_flowers_classifier")
| `MODEL_URL`
| A valid URL specified by the user referencing to the model that needs to be deployed.
| URL (default= \https://activeeon-public.s3.eu-west-2.amazonaws.com/models/model.pkl)
| `MODEL_VERSION`
| The version number of the model that will be deployed.
| Integer (default=1)
| `USER_NAME`
| A valid username having the needed privileges to execute this action.
| String (default="user")
Click on `Execute Action` and follow the progress of the model deployment.
Check that the status correctly evolves to `AI_MODEL_DEPLOYED`.
==== Delete/Finish or Update the Service Instance
You can delete the launched service instance directly from Service Automation Portal:
Open the link:https://try.activeeon.com/automation-dashboard/#/portal/cloud-automation[Service Automation Portal].
Set the action `Finish` under Actions and click on `Execute Action`.
image::MAAS_ML_Delete_Service.PNG[align=center]
There is also one more action that can be executed from Service Automation Portal which is:
- *Update_MaaS_ML_Parameters*: This action enables you to update the variable values associated to the MaaS_ML instance according to your new preferences.
==== MaaS_ML Analytics
When the MaaS_ML service instance is running, the user is able to access the MaaS_ML Analytics page by clicking on the instance's endpoint which contains 4 tabs:
** Dataset Analytics
** Data Drift Analytics
** Predictions Preview

===== Audit and Traceability

When clicking on the endpoint, the user is redirected to a 4 tabs webpage. By default, the Audit and Traceability page is opened. In this page, the user can check the different chosen values of the MaaS_ML instance variables. In addition, the MaaS_ML traceability information and warnings are listed in a table where each row represents information about the initialization, deployment, prediction, etc. corresponding to different date/time(s). The figure below shows an overview of the Audit and Traceability tab.

image::traceability1.png[align=center]

As MaaS_ML supports versioning, you are able to deploy multiple model versions for the same model type. When deploying different model versions, you have the possibility to associate each version with a subset of the data used to train the model i.e. the baseline data. The main job of the baseline data is to help in detecting drifts in the future input datasets. Data drift detection is detailed in <<_data_drift_detection_ddd>> subsection. Using the several baseline datasets, optionally, deployed with the different model versions, you are able to compare the changes occurring from one model version to another, specifically regarding the datasets used to train them.

As shown in the figure below, using the three dropdowns on the top of this tab page, you can choose the model name, the feature (or column) name you would like to monitor and the metric which is based on some data statistical functions (Mean, Minimum, Maximum, Variance, Standard Deviation). By choosing these three values, the first graph will show the evolution of the values (according to the chosen statistical function) of the chosen feature relative to the different model versions. You also have the possibility to monitor multiple features at the same time by choosing multiple feature names in the second dropdown. You can add or remove any of the displayed graphical lines using the features dropdown. Details about the obtained values are displayed by hovering over the markers on each graphical line.

If you click on one of these markers, a histogram will appear in the second graph of this tab page. The displayed histogram shows a comparison of the probability density distributions of the data values of the selected feature among all the deployed model version. By clicking on the content of the legend, you can include or exclude from the comparison any of the model versions.

image::data_analytics_tab.png[align=center]

===== Data Drift Analytics
Coming soon!

===== Predictions Preview
When the user calls a deployed model of a specific version to obtain some predictions, he can choose to save the resulting predictions. The saved predictions can be previewed in the Predictions Preview tab page. As shown in the figure below, you can choose the model name and the model version using the dropdowns in the top of the page. According to your choices, the predictions dataframe will be previewed. The figure below shows an example of the previewed predictions.

image::predictions_tab1.png[align=center]
=== MaaS_ML Via Swagger UI
To access the Swagger UI, click on the button "GO TO SWAGGER UI" in the top of the Traceability & Audit tab in the MaaS_ML Analytics page.
Through this Swagger UI, you are now able to:
** Ask for an api_token
** Deploy a model
** List the deployed models
** Make predictions
** Return the stored traceability information
** Remove deployed model
** Update the service parameter
==== Deploy/delete a Specific ML Model version
You can also deploy a specific ML model using the Swagger UI:
Open the Swagger UI.
Select the `get_token` operation and get an api_token by entering your username (default value is `user`).
Select the `deploy` operation and set the provided token and upload the model version that need to be deployed.
image::MAAS_Deploy_Swagger.png[align=center]
Select `list_saved_models` to return the list of all already deployed models.
Select `delete_deployed_model` to remove a specific model version.
==== Call the Service for Predictions
Once the model is deployed, you can call the service for predictions using the Swagger UI:
Open the Swagger UI.

Select the `get_token` operation and get an api_token by entering your username (default value is `user`).
Select the `predict` operation and set the provided token, the distinct parameters (drift_enabled, drift_notification, detector, model_name, model_version, etc) and the data that you need to score.
image::MAAS_Predict_Swagger.png[align=center]
=== Deployment Pipeline Examples
Imen Bizid's avatar
Imen Bizid committed
You can connect the different tasks in a single workflow to get the full pipeline from the model training step to the model deployment and consumption steps. Each task will propagate the acquired variables to its children tasks.
The following workflows are available on the `model_as_a_service` bucket:

*Diabetics_Deploy_Predict_Classifier_Model:* trains a Diabetics Classifier based on a Random Forest Algorithm and then deploys this classifier in a MaaS_ML service instance.
*IRIS_Deploy_Predict_Flower_Classifier_Model_Interactive:* trains an Iris Flower Classifier, starts a service instance where the trained model is deployed, and the input data is scored by consuming the endpoints exposed by the MaaS_ML service. The figure below describes this workflow.
*IRIS_Deploy_Flower_Classifier_Model:* trains an Iris Flower Classifier and deploys it in a new service instance. This instance is stopped when the user triggers the signal through the Workflow Execution portal.
image::MAAS_ML_IRIS_Workflow_Example_Interactive.png[align=center]
Imen Bizid's avatar
Imen Bizid committed
=== Customize the Service

It is possible to customize the model as a service defined by default and adapt it to your specific needs. Indeed, you can customize the following elements according to your needs:

* The file specified in the PYTHON_ENTRYPOINT variable
* The file specified in the YAML_FILE variable
* The docker image specified in the DOCKER_IMAGE variable

In the following, we describe in depth the content of each element:
*PYTHON_ENTRYPOINT file*:  The following python script refers to the ml_service.py file stored in the catalog under the `model_as_a_service_resources` bucket.
This script defines the different functions needed to deploy the model, score data and generate tokens.
Imen Bizid's avatar
Imen Bizid committed
It is possible to edit this script to make it more customized to your model. The entry script must take into consideration the:
Imen Bizid's avatar
Imen Bizid committed
* List of users that are allowed to consume the service endpoints
* Format of the model expected by the deployment, and the prediction functions (e.g., pickle, joblib, etc.)
* Format of the incoming data (e.g., JSON, Array, Matrix, etc.)
* Data format expected by the model (e.g., JSON, Array, Matrix, etc.)
[NOTE]
====
The `model_as_a_service_resources` bucket can be found under the *Catalog* section in the *Automation Dashboard* portal.
====

*YAML_FILE file*:  The following YAML script refers to the ml_service-api.yml file stored in the catalog under the `model_as_a_service_resources` bucket.
This script defines the OpenAPI specification describing the entire API built once a model_service is started.
You can adapt and edit this script in order to customize your service.

*DOCKER_IMAGE name*: Choose your own image containing the different dependencies required to run your ENTRYPOINT_SCRIPT.
Activeeon provides a pre-built image https://github.com/ow2-proactive/docker/blob/master/dlm3/Dockerfile[activeeon/model_as_a_service] ìncluding different machine learning and deep learning libraries.
If you need to use your own docker image to start the service, you need to install the following libraries in your image:

[source,bash]
----
# install java
apt-get update && apt-get install -y openjdk-11-jdk
apt-get install ca-certificates-java && update-ca-certificates -f
JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64/
export JAVA_HOME
apt-get clean

# install python libraries
pip install connexion[swagger-ui]# <1>
pip install py4j# <2>

# install your dependent libraries
...
----
<1> _Connexion_ allows you to write an OpenAPI specification, then maps the endpoints to your Python functions.
<2> _py4j_ enables Python programs running in a Python interpreter to dynamically access Java objects in a Java Virtual Machine.
=== Data Drift Detection (DDD)
The data evolves over time and can therefore cause degradations affecting the
intrinsic characteristics and behavior of the learning model.
Data drift is one of the main reasons why the accuracy of the model degrades over time.
Therefore, it is important that the model is able to adapt to these changes.
Monitoring data drifts allows detecting the model performance drops (as in the figure below) to take
actions accordingly. To deal with this problem, we have integrated a data drift