PMLUserGuide.adoc 201 KB
Newer Older
1
:docinfo:
Code's avatar
Code committed
2
:toc:
3
4
:toc-title: PML User Guide
= ProActive Machine Learning
Code's avatar
Code committed
5
include::../common-settings.adoc[]
Shatalov Yaroslav's avatar
Shatalov Yaroslav committed
6
include::../all-doc-links.adoc[]
7
8
9

== Overview

10
=== What is ProActive Machine Learning (PML)?
11
12
13
14
15

include::references/Overview.adoc[]

=== Glossary

16
include::references/Glossary.adoc[]
17

18
== Get Started
19

20
To submit your first Machine Learning (ML) workflow to *ProActive Scheduler*, link:../admin/ProActiveAdminGuide.html#_run_the_proactive_scheduler[install] it in
21
your environment (default credentials: admin/admin) or just use our demo platform https://try.activeeon.com[try.activeeon.com^].
22
23
24

*ProActive Scheduler* provides comprehensive interfaces that allow to:

25
26
27
28
- +++Create workflows using <a class="studioUrl" href="/studio" target="_blank">ProActive Workflow Studio</a>+++
- +++Submit workflows, monitor their execution and retrieve the tasks results using <a class="schedulerUrl" href="/scheduler" target="_blank">ProActive Scheduler Portal</a>+++
- +++Add resources and monitor them using <a class="rmUrl" href="/rm" target="_blank">ProActive Resource Manager Portal</a>+++
- +++Version and share various objects using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/catalog-portal" target="_blank">ProActive Catalog Portal</a>+++
29
- +++Provide an end-user workflow submission interface using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/workflow-automation" target="_blank">Workflow Execution Portal</a>+++
30
31
- +++Generate metrics of multiple job executions using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/job-analytics" target="_blank">Job Analytics Portal</a>+++
- +++Plan workflow executions over time using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/job-planner-execution-planning" target="_blank">Job Planner Portal</a>+++
32
33
- +++Add services using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/service-automation" target="_blank">Service Automation Portal</a>+++
- +++Perform event based scheduling using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/event-orchestration" target="_blank">Event Orchestration Portal</a>+++
34
- +++Control manual workflows validation steps using <a class="automationDashboardUrl" href="/automation-dashboard/#/portal/notification-portal" target="_blank">Notification Portal</a>+++
35

Fabien Viale's avatar
Fabien Viale committed
36
We also provide a +++<a class="restUrl" href="/rest" target="_blank">REST API</a>+++ and <<../user/ProActiveUserGuide.adoc#_scheduler_command_line,command line interfaces>> for advanced users.
37
38
39
40


== Create a First Predictive Solution

41
Suppose you need to predict houses prices based on this information (features) provided by the estate agency:
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

- *CRIM* per capita crime rate by town
- *ZN* proportion of residential lawd zoned for lots over 25000
- *INDUS* proportion of non-retail business acres per town
- *CHAS* Charles River dummy variable
- *NOX* nitric oxides concentration
- *RM* average number of rooms per dwelling
- *AGE* proportion of owner-occupied units built prior to 1940
- *DIS* weighted distances to five Boston Employment centres
- *RAD* index of accessibility to radial highways
- *TAX* full-value property-tax rate per $10 000
- *PTRATIO* pupil-teacher ratio by town
- *B* 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- *LSTAT* % lower status of the population
- *MDEV* Median value of owner-occupied homes in $1000' s

58
Predicting houses prices is a complex problem, but we can simplify it a bit for this step-by-step example. We'll show you how you can easily create a predictive analytics solution using PML.
59
60
61

=== Manage the Canvas

62
To use PML, you need to add the *Machine Learning Bucket* as main catalog in the ProActive Studio. This bucket contains a set of generic tasks that enables you to upload and prepare data, train a model and test it.
63

64
A. Open +++<a class="studioUrl" href="/studio" target="_blank">ProActive Workflow Studio</a>+++ home page.
65

66
B. Create a new workflow.
67

68
C. Change palette preset to `Machine Learning`.
69

70
D. Click on `machine-learning` catalog and pin it open, and same for the `data-visualization` catalog.
71

72
E. Organize your canvas.
73

74
NOTE: Change palette preset allows the user to visualise different set of catalogs in the studio. 
75

Imen Bizid's avatar
Imen Bizid committed
76
image::manage_canvas.gif[100000,2000]
77
78
79
80
81

=== Upload Data

To upload data into the Workflow, you need to use a dataset stored in a CSV file.

Caroline Pacheco's avatar
Caroline Pacheco committed
82
83
A. Once dataset has been converted to *CSV* format, upload it into a cloud storage service for example https://aws.amazon.com/s3[Amazon S3^].
For this tutorial, we will use Boston house prices dataset available on this link:
84
85
https://s3.eu-west-2.amazonaws.com/activeeon-public/datasets/boston-houses-prices.csv

86
B. Drag and drop the <<Import_Data>> task from the *machine-learning* bucket in the ProActive Machine Learning.
87

88
C. Click on the task and click `General Parameters` in the left to change the default parameters of this task.
89

90
D. Put in *FILE_URL* variable the S3 link to upload your dataset.
91

92
E. Set the other parameters according to your dataset format.
93
94
95
96
97

This task uploads the data into the workflow that we can for model training and testing.

If you want to skip these steps, you can directly use the <<Load_Boston_Dataset>> Task by a simple drag and drop.

Imen Bizid's avatar
Imen Bizid committed
98
image::upload_data.gif[100000,2000]
99
100
101
102
103
104
105

=== Prepare Data

This step consists of preparing the data for the training and testing of the predictive model. So in this example, we will simply split our datset into two separate datasets: one for training and one for testing.

To do this, we use the <<Split_Data>> Task in the machine_learning bucket.

106
A. Drag and drop the <<Split_Data>> Task into the canvas, and connect it to the <<Import_Data>> or <<Load_Boston_Dataset>> Task.
107

108
B. By default, the ratio is 0.7 this means that 70% of the dataset will be used for training the model and 0.3 for testing it.
109

110
C. Click the <<Split_Data>> Task and set the *TRAIN_SIZE* variable to 0.6.
111

Imen Bizid's avatar
Imen Bizid committed
112
image::prepare_data.gif[100000,2000]
113

114
=== Train a Predictive Model
115

116
Using PML, you can easily create different ML models in a single experiment and compare their results. This type of experimentation helps you find the best solution for your problem.
117
You can also enrich the `machine-learning` bucket by adding new ML algorithms and publish or customize an existing task according to your requirements as the tasks are open source.
118
119
120

NOTE: To change the code of a task click on it and click the `Task Implementation`. You can also add new variables to a specific task.

121
In this step, we will create two different types of models and then compare their scores to decide which algorithm is most suitable to our problem. As the Boston dataset used for this example consists of predicting price of houses (continuous label). As such, we need to deal with a regression predictive problem.
122

123
To solve this problem, we have to choose a regression algorithm to train the predictive model. To see the available regression algorithms available on the PML, see *ML Regression* Section in the *machine-learning* bucket.
124
125
126

For this example, we will use <<Linear_Regression>> Task and <<Support_Vector_Regression>> Task.

127
A. Find the <<Linear_Regression>> Task and <<Support_Vector_Regression>> Task and drag them into the canvas.
128

129
B. Find the <<Train_Model>> Task and drag it twice into the canvas and set its LABEL_COLUMN variable to LABEL.
130

Andrews Sobral's avatar
Andrews Sobral committed
131
C. Connect the <<Split_Data>> Task to the two <<Train_Model>> Tasks in order to give it access to the training data. Connect then the <<Linear_Regression>> Task to the first <<Train_Model>> Task and <<Support_Vector_Regression>> to the second <<Train_Model>> Task.
132

133
D. To be able to download the model learned by each algorithm, drag two <<Download_Model>> Tasks and connect them to each <<Train_Model>> Task.
134

135
image::train_a_predictive_model.png[100000,2000]
136
137
138
139
140

=== Test the Predictive Model

To evaluate the two learned predictive models, we will use the testing data that was separated out by the <<Split_Data>> Task to score our trained models. We can then compare the results of the two models to see which generated better results.

141
A. Find the <<Predict_Model>> Task and drag and drop it twice into the canvas and set its LABEL_COLUMN variable to LABEL.
142

143
B. Connect the first <<Predict_Model>> Task to the <<Train_Model>> Task that is connected to <<Support_Vector_Regression>> Task.
144

145
C. Connect the second <<Predict_Model>> Task to the <<Train_Model>> Task that is connected to <<Linear_Regression>> Task.
146

147
D. Connect both <<Predict_Model>> Tasks to the <<Split_Data>> Task.
148

149
E. Find the <<Preview_Results>> Task in the ML bucket and drag and drop it twice into the canvas.
150

151
152
153
F. Connect each <<Preview_Results>> Task with <<Predict_Model>>.

image::test_the_predictive_model.png[100000,2000]
154

155
NOTE: if you have a pickled file (.pkl) containing a predictive model that you have learned using another platform, and you need to test it in the PML, you can load it using *Import_Model* Task.
156
157
158

=== Run the Experiment and Preview the Results

159
Now the workflow is completed, let's execute it by:
160

161
A. Click the *Execute* button on the menu to run the workflow.
162

163
B. Click the *Scheduling & Orchestration* button to track the workflow execution progress.
164

165
C. Click the Visualization tab and track the progress of your workflow execution (a green check mark appears on each Task when its execution is finished).
166

167
D. Visualize the output logs by clicking on the output tab and check the streaming check box.
168

169
E. Click the *Tasks* tab, select a *Preview_Results* task and click on the *Preview* tab, then click either on *Open in browser* to preview the results on your browser or on *Save as file* to download the results locally.
170

Imen Bizid's avatar
Imen Bizid committed
171
image::execute.gif[100000,2000]
172

173
[[_AutoML]]
Imen Bizid's avatar
Imen Bizid committed
174

175
== Automated Machine Learning (AutoML)
176

177
The `auto-ml-optimization` bucket contains the `Distributed_Auto_ML` workflow that can be easily used to find the operating parameters for any system whose performance can be measured as a function of adjustable parameters.
Imen Bizid's avatar
Imen Bizid committed
178
It is an estimator that minimizes the posterior expected value of a loss function.
179
This bucket also comes with a set of workflows' examples that demonstrates how we can optimize mathematical functions, PML workflows and machine/deep learning algorithms from scripts using AutoML tuners.
180
181
182
In the following subsections, several tables represent the main variables that characterize the AutoML workflows.
In addition to the variables mentioned below, there is a set of generic variables that are common between all workflows
which can be found in the subsection <<AI Workflows Common Variables>>.
183

184
185
image::AutoML_1.png[align=center]

186
=== Distributed AutoML
187

188
The `Distributed_Auto_ML` workflow proposes six algorithms for distributed hyperparameters' optimization. The choice of the
189
sampling/search strategy depends strongly on the tackled problem.
190
`Distributed_Auto_ML` workflow comes with specific pipelines (parallel or sequential) and visualization tools
191
(https://github.com/fossasia/visdom[Visdom^] or https://www.tensorflow.org/tensorboard/[TensorBoard^]) as described in the subsections below.
192

193
194
image::AutoML_2.png[align=center]

195
*Variables:*
196

197
.Distributed_Auto_ML Variables
198
199
200
201
[cols="2,5,2"]
|===
| *Variable name* | *Description* | *Type*
| `TUNING_ALGORITHM`
202
203
| Specifies the tuner algorithm that will be used for hyperparameter optimization.
| List [Bayes, Grid, Random, QuasiRandom, CMAES, MOCMAES] (default=Random)
204
| `MAX_ITERATIONS`
205
|  Specifies the number of maximum iterations. It should be an integer number higher than zero. Set `-1` for an infinite loop.
206
| Int (default=2)
207
208
| `PARALLEL_EXECUTIONS_PER_ITERATION`
|  Specifies the number of parallel executions per iteration. It should be an integer number higher than zero.
Imen Bizid's avatar
Imen Bizid committed
209
| Int (default=2)
210
211
212
| `NUMBER_OF_REPETITIONS`
|  Specifies the number of hyperparameter sampling repetitions. Ensures every experiment is repeated a given number of times. It should be an integer number higher than one. Set `-1` to never see repetitions.
| Int (default=-1)
213
214
215
| `PAUSE_AFTER_EVERY_ITERATIONS`
|  If higher than zero, pause the workflow after every specified number of iterations. Set `-1` to disable.
| Int (default=-1)
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
| `TARGET_WORKFLOW`
| Specifies the workflow path from the catalog that should be optimized.
| String (default=auto-ml-optimization/Himmelblau_Function)
| `TARGET_NATIVE_SCHEDULER`
| Name of the native scheduler node source to use on the target workflow tasks when deployed inside a cluster such as SLURM, LSF, etc.
| String (default=empty)
| `TARGET_NATIVE_SCHEDULER_PARAMS`
| Parameters given to the native scheduler (SLURM, LSF, etc) while requesting a ProActive node used to deploy the target workflow tasks.
| String (default=empty)
| `TARGET_NODE_ACCESS_TOKEN`
| If not empty, the target workflow tasks will be run only on nodes that contains the specified token.
| String (default=empty)
| `TARGET_NODE_SOURCE_NAME`
| If not empty, the target workflow tasks will be run only on nodes belonging to the specified node source.
| String (default=empty)
| `TARGET_CONTAINER_PLATFORM`
| Specifies the container platform to be used for executing the target workflow tasks.
| List [no-container, docker, podman, singularity] (default=empty)
| `TARGET_CONTAINER_IMAGE`
| Specifies the name of the container image that will be used to run the target workflow tasks.
| List [docker://activeeon/dlm3, docker://activeeon/cuda, docker://activeeon/cuda2, docker://activeeon/rapidsai, docker://activeeon/nvidia:rapidsai, docker://activeeon/nvidia:pytorch, docker://activeeon/nvidia:tensorflow, docker://activeeon/tensorflow:latest, docker://activeeon/tensorflow:latest-gpu] (default=empty)
| `TARGET_CONTAINER_GPU_ENABLED`
| If True, it will activate the use of GPU for the target workflow tasks on the selected container platform.
| Boolean (default=empty)
| `TARGET_NVIDIA_RAPIDS_ENABLED`
| If True, it will activate the use of NVIDIA RAPIDS for the target workflow tasks on the selected container platform.
| Boolean (default=empty)
243
| `VISDOM_ENABLED`
244
| If True, the Visdom service is started allowing the user to visualize the hyperparameter optimization using the Visdom web interface.
245
246
247
248
249
| Boolean (default=False)
| `VISDOM_PROXYFIED`
| If True, requests to Visdom are sent via a proxy server.
| Boolean (default=False)
| `TENSORBOARD_ENABLED`
250
| If True, the TensorBoard service is started allowing the user to visualize the hyperparameter optimization using the TensorBoard web interface.
251
| Boolean (default=False)
252
253
254
| `TENSORBOARD_PROXYFIED`
| If True, requests to TensorBoard are sent via a proxy server.
| Boolean (default=False)
255
256
|===

257
image::AutoML_Full.png[align=center]
258
259
260
261
262
263
264
265
266
267
268
269
270
271
*How to define the search space:*

This subsection describes common building blocks to define a search space:

    - uniform: Uniform continuous distribution.
    - quantized_uniform: Uniform discrete distribution.
    - log: Logarithmic uniform continuous distribution.
    - quantized_log: Logarithmic uniform discrete distribution.
    - choice: Uniform choice distribution between non-numeric samples.

*Which tuner algorithm to choose?*

The choice of the tuner depends on the following aspects:

272
    - Time required to evaluate the model.
273
274
275
276
    - Number of hyperparameters to optimize.
    - Type of variable.
    - The size of the search space.

277
In the following, we briefly describe the different tuners proposed by the `Distributed_Auto_ML` workflow:
278

279
    - *Grid sampling* applies when all variables are discrete, and the number of possibilities is low. A grid search is a naive approach that will simply try all possibilities making the search extremely long even for medium-sized problems.
280

281
    - *Random sampling* is an alternative to grid search when the number of discrete parameters to optimize, and the time required for each evaluation is high. Random search picks the point randomly from the configuration space.
282

283
    - *QuasiRandom sampling* ensures a much more uniform exploration of the search space than traditional pseudo random. Thus, quasi random sampling is preferable when not all variables are discrete, the number of dimensions is high, and the time required to evaluate a solution is high.
284

285
    - *Bayes search* models the search space using gaussian process regression, which allows an estimation of the loss function, and the uncertainty on that estimate at every point of the search space. Modeling the search space suffers from the curse of dimensionality, which makes this method more suitable when the number of dimensions is low.
286

287
    - *CMAES search* (Covariance Matrix Adaptation Evolution Strategy) is one of the most powerful black-box optimization algorithm.
288
      However, it requires a significant number of model evaluation (in the order of 10 to 50 times the number of dimensions) to converge to an optimal solution. This search method is more suitable when the time required for a model evaluation is relatively low.
289

290
    - *MOCMAES search* (Multi-Objective Covariance Matrix Adaptation Evolution Strategy) is a multi-objective algorithm optimizing multiple tradeoffs simultaneously. To do that, MOCMAES employs a number of CMAES algorithms.
291

292
Here is a table that summarizes when to use each algorithm.
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
|===
| *Algorithm* | *Time* | *Dimensions* | *Continuity* | *Conditions* | *Multi-objective*
| `Grid`
| `Low`
| `Low`
| `Discrete`
| `Yes`
| `No`
| `Random`
| `High`
| `High`
| `Discrete`
| `Yes`
| `No`
| `QuasiRandom`
| `High`
| `High`
| `Mixed`
| `Yes`
| `No`
| `Bayes`
| `High`
| `Medium`
| `Mixed`
| `Yes`
| `No`
| `CMAES`
| `Low`
| `Low`
| `Mixed`
| `No`
| `No`
| `MOCMAES`
| `Low`
| `Low`
| `Mixed`
| `No`
| `Yes`
|===

Imen Bizid's avatar
Imen Bizid committed
333

334
=== Objective Functions
Imen Bizid's avatar
Imen Bizid committed
335

336
The following workflows represent some mathematical functions that can be optimized by the `Distributed_Auto_ML` tuners.
337

338
*Himmelblau_Function:* is a multi-modal function containing four identical local minima. It's used to test the performance of optimization algorithms. For more info, please click https://en.wikipedia.org/wiki/Himmelblau%27s_function[here].
339

340
image::Himmelblau_Function.png[448,336,align=center]
341
342

=== Hyperparameter Optimization
Imen Bizid's avatar
Imen Bizid committed
343

344
The following workflows represent some machine learning and deep learning algorithms that can be optimized.
345
These workflows have several common variables as in `Distributed_Auto_ML`. Some workflows are characterized
346
by few additional variables.
Imen Bizid's avatar
Imen Bizid committed
347

348
*CIFAR_10_Image_Classification:* trains a simple deep CNN on the CIFAR10 images dataset using the Keras library.
Imen Bizid's avatar
Imen Bizid committed
349

350
.CIFAR_10_Image_Classification Variables
351
[cols="2,5,5"]
Imen Bizid's avatar
Imen Bizid committed
352
353
|===
| *Variable name* | *Description* | *Type*
354
355
| `NUM_EPOCHS`
| The number of times data is passed forward and backward through the training algorithm.
356
| Integer (default=3)
357
358
359
360
| `INPUT_VARIABLES`
| A set of specific variables (usecase-related) that are used in the model training process.
| JSON format
| `SEARCH_SPACE`
361
| Specifies the representation of the search space which has to be defined using dictionaries or by entering the path of a json file stored in the catalog.
362
| JSON format
Imen Bizid's avatar
Imen Bizid committed
363
| `INSTANCE_NAME`
364
| Specifies the name to be provided for the instance.
365
| String (default=tensorboard-server)
366
| `CONTAINER_LOG_PATH`
367
| Specifies the path where the docker logs are created and stored on the docker container.
368
| String (default=/graphs/$INSTANCE_NAME)
369
370
| `CONTAINER_ROOTLESS_ENABLED`
| If True, the user will be able to run the workflow in a rootless mode.
371
| (default=True)
Imen Bizid's avatar
Imen Bizid committed
372
373
|===

374
The following workflows have common variables with the above illustrated workflows.
375

376
*CIFAR_10_Image_Classification:* trains a simple deep CNN on the CIFAR10 images dataset using the Keras library.
377

378
*CIFAR_100_Image_Classification:* trains a simple deep CNN on the CIFAR100 images dataset using the Keras library.
379

380
*Image_Object_Detection:* trains a YOLO model on the coco dataset using PML deep learning generic tasks.
Imen Bizid's avatar
Imen Bizid committed
381

382
*Digits_Classification:* python script illustrating an example of multiple machine learning models optimization.
383

384
*Text_Generation:* trains a simple Long Short-Term Memory (LSTM) to learn sequences of characters from 'The Alchemist' book. It's a novel by Brazilian author Paulo Coelho that was first published in 1988.
385

Caroline Pacheco's avatar
Caroline Pacheco committed
386

387
=== Neural Architecture Search
388

389
The following workflows contain a search space containing a set of possible neural networks architectures that can be used by `Distributed_Auto_ML` to automatically find the best combinations of neural architectures within the search space.
390

391
*Handwritten_Digit_Classification:* trains a simple deep CNN on the MNIST dataset using the PyTorch library. This example allows to search for two types of neural architectures defined in the Handwritten_Digit_Classification_Search_Space.json file.
392
393
394
395
396


=== Distributed Training

The following workflows illustrate some examples of multi-node and multi-gpu distributed learning.
397

398
*TensorFlow_Keras_Multi_Node_Multi_GPU:* is a TensorFlow + Keras workflow template for distributed training (multi-node multi-gpu) with AutoML support.
399

400
*TensorFlow_Keras_Multi_GPU_Horovod:* is a Horovod workflow template that support multi-gpu and AutoML.
401
402
403

=== Templates

404
The following workflows represent python templates that can be used to implement a generic machine learning task.
405

406
*Python_Task:* is a simple Python task template pre-configured to run with `Distributed_Auto_ML`.
407

408
*R_Task:* is a simple R task template pre-configured to run with `Distributed_Auto_ML`.
409

410
== Model as a Service for Machine Learning (MaaS_ML)
411

412
413
Once a predictive model is built, tested and validated, you can easily use it in real world production pipelines by deploying it as a REST Web Service via the MaaS_ML service.
MaaS_ML is dedicated to make deployments of lightweight machine learning (ML) models simple, portable, and scalable, and to easily manage their lifetimes. This will be particularly useful for engineering or business teams that want to take advantage of this model.
414

415
The life cycle of any MaaS_ML instance (i.e., from starting the generic service instance, deploying an AI specific model to pausing or deleting the instance) can be managed in three different ways in PML :
Imen Bizid's avatar
Imen Bizid committed
416

417
- Using the *Studio Portal* and more specifically the bucket *model-as-a-service* where specific generic tasks are provided to process all the possible actions (i.e., MaaS_ML_Service_Start, MaaS_ML_Deploy_Model, MaaS_ML_Call_Prediction, MaaS_ML_Actions[Finish/Pause/Resume]).
418
These tasks can be easily integrated to your AI pipelines/workflows as you can see in this <<Deployment Pipeline Example>>.
419
420
- Using the *Service Automation Portal* by executing the different actions associated to MaaS_ML (i.e. Deploy_ML_Model, Pause_MaaS_ML, Update_MaaS_ML, Finish_MaaS_ML.)
- Using the *Swagger UI* which is accessible once the MaaS_ML instance is up and running.
Imen Bizid's avatar
Imen Bizid committed
421

422
Once a MaaS_ML instance is up and running, it could be used for:
423

424
- *AI Model Deployment or Update*: the user has to provide a valid specific AI Model identifier in order to deploy the model of his/her choice.
425

426
- *Call of Predictions*: when a specific AI model is running, the user can request predictions for a specific payload. This latter has to be converted into json data in order to get prediction values.
427

428
429
- *Deploy a New Specific AI Model*: the running generic AI model can be used to deploy a new specific AI model.

430
431
Using MaaS_ML, you can easily deploy and use any machine learning model as a REST Web Service on a physical or a virtual compute host on which there is an available ProActive Node. Going through the ProActive Scheduler,
you can also trigger the deployment of a specific VM using the Resource Manager elastic policies, and, eventually, deploy a Model-Service on that specific node.
Imen Bizid's avatar
Imen Bizid committed
432

433
434
435
In the following subsections, we will illustrate the MaaS_ML instance life cycle, from starting the generic service instance,
deploying a specific model, pausing it, to deleting the instance. We will also describe how the MaaS_ML instance life cycle
can be managed via four different ways in PML:
436

437
438
439
440
. <<MaaS_ML Via Workflow Execution Portal>>
. <<MaaS_ML Via Studio Portal>>
. <<MaaS_ML Via Service Automation Portal>>
. <<MaaS_ML Via Swagger UI>>
441

442
In the description below, multiple tables represent the main variables that characterize the MaaS_ML workflows.
443
444
In addition to the variables mentioned below, there is a set of generic variables that are common between all workflows
which can be found in the subsection <<AI Workflows Common Variables>>.
445
446
447
448
449
450
451
452
The management of the life cycle of MaaS_ML will be detailed in the next subsections.

=== MaaS_ML Via Workflow Execution Portal

Open the link:https://try.activeeon.com/automation-dashboard/#/portal/workflow-execution[Workflow Execution Portal].

Click on the button *Submit a Job* and then search for *MaaS_ML_Service* workflow as described in the image below.

Imen Bizid's avatar
Imen Bizid committed
453
image::MaaS_ML_Search.png[align=center]
454
455
456
457
458

Check the service parameters and click on the *Submit* button to start a MaaS_ML service instance.

To get more information about the parameters of the service, please check the section <<Start a Generic Service Instance>>.

Imen Bizid's avatar
Imen Bizid committed
459
image::MaaS_ML_Submit.png[align=center]
460

461
462
463
464
465
466
You can now monitor the service status, access its endpoint and execute its different actions:

- Deploy_ML_Model : enables you to deploy a trained ML model in one click.
- Update_MaaS_ML_Parameters : enables you to update the parameters of the service instance.
- Finish_MaaS_ML : stops and deletes the service instance.

Imen Bizid's avatar
Imen Bizid committed
467
image::MaaS_ML_Workflow_Management.png[align=center]
468
469
470
471
472
473

When you are done with the service instance, you can terminate it by clicking on *Terminate_Job_and_Service* button as shown in the image below.

image::Terminate_MaaS_ML.png[align=center]

=== MaaS_ML Via Studio Portal
Diana Jlailaty's avatar
Diana Jlailaty committed
474
==== Start a Generic Service Instance
475

476
Open the link:https://try.activeeon.com/studio[Studio Portal].
477
478
479

Create a new workflow.

480
Add the `model_as_a_service` bucket by clicking in the `View` menu field > `Add Bucket Menu to the Palette` > `model_as_a_service`.
481

482
Drag and drop the `MaaS_ML_Service_Start` task from the bucket.
483

Diana Jlailaty's avatar
Diana Jlailaty committed
484
Execute the workflow by setting the different workflow's variables as described in the Table below.
485

486
.MaaS_ML_Service_Start variables
487
488
489
490
[cols="2,5,2"]
|===
|*Variable name* | *Description* | *Type*
3+^|*Workflow variables*
491
492
| `MODEL_SERVICE_INSTANCE_NAME`
| Service instance name.
493
| String (default="maas_ml-${PA_JOB_ID}").
494
| `MODEL_SERVICE_PROXIFIED`
495
| Allows access to the endpoint through a HTTP(s) Proxy.
496
497
| Boolean (default=False).
| `MODEL_SERVICE_ENTRYPOINT`
498
| This entry script starts the service and defines the different functions to deploy the model, scores the prediction requests based on the deployed model, and returns the results. This script is specific to your model. This file should be stored in the Catalog under the `model_as_service_resources` bucket. More information about this file can be found in the <<_customize_the_service>> section.
499
| String (default="ml_service").
500
| `MODEL_SERVICE_YAML_FILE`
501
| A YAML file that describes the OpenAPI Specification ver. 2 (known as Swagger Spec) of the service. This file should be stored in the catalog under the `model_as_service_resources` bucket. More information about the structure of this file can be found in the section <<_customize_the_service>>.
502
| String (default="ml_service-api").
503
504
505
506
| `MODEL_SERVICE_USER_NAME`
| A valid user name having the needed privileges to execute this action.
| String (default="user").
| `MODEL_SERVICE_NODE_NAME`
507
508
| The name of the node where the service will be deployed. If empty, the service will be deployed on an available node selected randomly.
| String (default=Empty)
509
510
511
512
513
514
| `USE_NVIDIA_RAPIDS`
| If True, the service will be configured to use the GPU and the Nvidia Rapids library.
| Boolean (default=False)
| `DRIFT_ENABLED`
| True if a detector is needed to check for drifts in the input datasets compared to the training datasets.
| Boolean (default=True)
515
516
517
3+^|*Task variables*
| `SERVICE_ID`
| The name of the service. Please keep the default value for this variable.
518
| String (default="MaaS_ML")
519
520
| `INSTANCE_NAME`
| The name of the service that will be deployed.
521
| String (default="maas-ml-${PA_JOB_ID}")
522
523
524
| `ENGINE`
| Container engine.
| String (default="$CONTAINER_PLATFORM")
525
526
527
| `GPU_ENABLED`
| If True, the service will be configured to use the GPU and the Nvidia Rapids library.
| Boolean (default=False)
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
| `PROXIFIED`
| It takes by default the value of  `MODEL_SERVICE_PROXYFIED` workflow variable.
| String (default="$MODEL_SERVICE_PROXYFIED")
| `PYTHON_ENTRYPOINT`
| It takes by default the value of  `MODEL_SERVICE_ENTRYPOINT` workflow variable.
| String (default="$MODEL_SERVICE_ENTRYPOINT")
| `YAML_FILE`
| It takes by default the value of  `MODEL_SERVICE_YAML_FILE` workflow variable.
| String (default="$MODEL_SERVICE_YAML_FILE")
| `USER_NAME`
| It takes by default the value of  `MODEL_SERVICE_USER_NAME` workflow variable.
| String (default="$MODEL_SERVICE_USER_NAME")
| `NODE_NAME`
| It takes by default the value of  `MODEL_SERVICE_NODE_NAME` workflow variable.
| String (default="$MODEL_SERVICE_NODE_NAME")
543
544
545
| `GPU_ENABLED`
| If True, the service will be configured to use the GPU and the Nvidia Rapids library.
| Boolean (default=$USE_NVIDIA_RAPIDS)
546
|===
547

548
==== Deploy a Specific ML Model
549

550
You can also deploy a specific ML model directly from the link:https://try.activeeon.com/studio[Studio Portal].
551

552
Drag and drop the `MaaS_ML_Deploy_Model` task from the Model-As-A-Service bucket.
553

Diana Jlailaty's avatar
Diana Jlailaty committed
554
Execute the workflow and set the different workflow's variables as follows:
555

556
.MaaS_ML_Deploy_Model variables
Diana Jlailaty's avatar
Diana Jlailaty committed
557
558
559
560
[cols="2,5,2"]
|===
| *Variable name* | *Description* | *Type*
3+^|*Workflow variables*
561
562
563
| `CONTAINER_PLATFORM`
| Specifies the type of container platform to be used (no container, docker, singularity, or podman).
| String (default=docker)
564
565
566
567
568
| `CONTAINER_GPU_ENABLED`
| If True, containers will run based on images containing libraries that are compatible with GPU.
| Boolean (default=False)
| `CONTAINER_IMAGE`
| Specifies the name of the image that will be used to run the different workflow tasks.
Diana Jlailaty's avatar
Diana Jlailaty committed
569
570
| String (default=Empty).
| `SERVICE_TOKEN`
571
| A valid token generated by the MaaS_ML Service for user authentication.
Diana Jlailaty's avatar
Diana Jlailaty committed
572
573
574
| String (default=Empty).
3+^|*Task variables*
| `DEPLOY_MODEL_ENDPOINT`
575
| A URL endpoint defined by the user where the AI Model was deployed.
Diana Jlailaty's avatar
Diana Jlailaty committed
576
577
578
579
580
581
| URL (default=Empty).
| `API_EXTENSION`
| The base path to access the deployment endpoint.
| String (default="/api/deploy")
| `MODEL_URL`
| A valid URL specified by the user referencing the model that needs to be deployed.
582
| URL (default=  \https://activeeon-public.s3.eu-west-2.amazonaws.com/models[] )
583
| `SERVICE_TOKEN`
584
| A valid token generated by the MaaS_ML Service for user authentication.
585
| String (default=Empty).
586
587
588
| `DRIFT_DETECTION_WINDOW_SIZE`
| The size of the data to be extracted from the old training dataset to be used as a baseline data for the drift detection.
| Integer (default=50).
589
590
591
| `LOG_PREDICTIONS`
| If True, the predictions made by the model will be displayed in the traceability page.
| Boolean (default=True).
Diana Jlailaty's avatar
Diana Jlailaty committed
592
|===
593

Diana Jlailaty's avatar
Diana Jlailaty committed
594
==== Call the Service for Predicition
595

596
Once the model is deployed, you can also call the service for prediction directly from the link:https://try.activeeon.com/studio[Studio Portal].
597

598
Drag and drop the `MaaS_ML_Call_Prediction` task from the Model-As-A-Service bucket.
Diana Jlailaty's avatar
Diana Jlailaty committed
599
600
601

Execute the Workflow and set the different workflow's variables as follows:

602
.MaaS_ML_Call_Prediction variables
603
604
[cols="2,5,2"]
|===
605
| *Variable name* | *Description* | *Type*
Diana Jlailaty's avatar
Diana Jlailaty committed
606
3+^|*Workflow variables*
607
608
609
| `CONTAINER_PLATFORM`
| Specifies the type of container platform to be used (no container, docker, singularity, or podman).
| String (default=docker)
610
611
612
613
614
615
| `CONTAINER_GPU_ENABLED`
| If True, containers will run based on images containing libraries that are compatible with GPU.
| Boolean (default=False)
| `CONTAINER_IMAGE`
| Specifies the name of the image that will be used to run the different workflow tasks.
| String (default=Empty).
Diana Jlailaty's avatar
Diana Jlailaty committed
616
| `SERVICE_TOKEN`
617
| A valid token generated by the MaaS_ML Service for user authentication.
618
| String (default=Empty).
Diana Jlailaty's avatar
Diana Jlailaty committed
619
620
621
3+^|*Task variables*
| `PREDICT_MODEL_ENDPOINT`
| The endpoint of the started service.
622
| URL (default=Empty)
623
| `SERVICE_TOKEN`
624
| A valid token generated by the MaaS_ML Service for user authentication.
625
| String (default=Empty).
Diana Jlailaty's avatar
Diana Jlailaty committed
626
627
628
629
630
631
632
633
634
| `PREDICT_EXTENSION`
| The base path to access the prediction endpoint.
| String (default="/api/predict")
| `INPUT_DATA`
| Entry data that needs to be scored by the deployed model.
| JSON (default=Empty)
| `LABEL_COLUMN`
| Name of the label column. It needs to be set if data is labeled.
| String (default=Empty)
635
636
| `DATA_DRIFT_DETECTOR`
| Name of the data drift detector to be used in the drift detection process.
637
| List [HDDM, Page Hinkley, ADWIN] (default="HDDM")
638
639
|===

Diana Jlailaty's avatar
Diana Jlailaty committed
640
==== Delete/Finish the Service
641

642
You can also delete the service instance using the link:https://try.activeeon.com/studio[Studio Portal].
643

644
Drag and drop the `MaaS_ML_Actions` task from the Model-As-A-Service bucket.
645

Diana Jlailaty's avatar
Diana Jlailaty committed
646
Execute the Workflow and set the different workflow's variables as follows:
647

648
.MaaS_ML_Actions variables
Diana Jlailaty's avatar
Diana Jlailaty committed
649
650
651
652
653
654
[cols="2,5,2"]
|===
| *Variable name* | *Description* | *Type*
3+^|*Task variables*
| `ACTION`
| The action that will be processed regarding the service status.
655
656
657
658
| List [Pause_MaaS_ML, Resume_MaaS_ML, Finish_MaaS_ML] (default="Finish_MaaS_ML")
| `INSTANCE_NAME`
| The name of the service that the action will be processed on.
| String (default="maas-ml-${PA_JOB_ID}")
Diana Jlailaty's avatar
Diana Jlailaty committed
659
| `INSTANCE_ID`
660
| The service instance ID.
Diana Jlailaty's avatar
Diana Jlailaty committed
661
662
| String (default=Empty)
|===
663

664
=== MaaS_ML Via Service Automation Portal
Diana Jlailaty's avatar
Diana Jlailaty committed
665
==== Start a Generic Service Instance
666

667
668
669

Open the link:https://try.activeeon.com/automation-dashboard/#/portal/service-automation[Service Automation Portal].

670
Search for `MaaS_ML` in Services Workflows List.
Diana Jlailaty's avatar
Diana Jlailaty committed
671
672

Set the following variables:
673

674
675
[#table_MaaS_ML]
.MaaS_ML variables
Diana Jlailaty's avatar
Diana Jlailaty committed
676
677
678
[cols="2,5,2"]
|===
| *Variable name* | *Description* | *Type*
679
| `BUILD_IMAGE_IF_NOT_EXISTS`
680
| Pull and build the singularity image if the Singularity Image File (SIF) file is not available.
681
| Boolean (default=True)
682
683
684
685
686
| `DEBUG_ENABLED`
| If True, the user will be able to examine the stream of output results of each task.
| Boolean (default=True)
| `DOCKER_IMAGE`
| Specifies the name of the Docker image that will be used to run the different workflow tasks.
687
| String (default="activeeon/maas_ml")
688
689
690
691
| `DRIFT_ENABLED`
| True if a detector is needed to check for drifts in the input datasets compared to the training datasets.
| Boolean (default=True)
| `DRIFT_THRESHOLD`
692
| The level or point at which the data drift is detected, and the user is notified.
693
694
695
| Float (default=1.9)
| `ENDPOINT_ID`
| The endpoint_id that will be used if `PROXYFIED` is set to True.
696
| String (default="maas-ml-gui")
697
698
| `ENGINE`
| Container engine.
699
700
701
702
| `GPU_ENABLED`
| If True, the service will be configured to use the GPU and the Nvidia 
Rapids library.
| Boolean (default=False)
703
| List (default="docker")
Diana Jlailaty's avatar
Diana Jlailaty committed
704
705
706
707
708
| `HTTPS_ENABLED`
| True if the protocol https is needed for the defined model-service.
| Boolean (default=False)
| `INSTANCE_NAME`
| The name of the service that will be deployed.
709
| String (default="maas-ml")
Diana Jlailaty's avatar
Diana Jlailaty committed
710
711
712
| `NODE_NAME`
| The name of the node where the service will be deployed. If empty, the service will be deployed on an available node selected randomly.
| String (default=Empty)
713
714
715
| `PROXYFIED`
| True if a proxy is needed to protect the access to this model-service endpoint.
| Boolean (default=False)
Diana Jlailaty's avatar
Diana Jlailaty committed
716
| `PYTHON_ENTRYPOINT`
717
| This entry script starts the service and defines the different functions to deploy the model, scores the prediction requests based on the deployed model, and returns the results. This script is specific to your model. This file should be stored in the Catalog under the `model_as_service_resources` bucket. More information about this file can be found in the <<_customize_the_service>> section.
Diana Jlailaty's avatar
Diana Jlailaty committed
718
| String (default="ml_service").
719
| `SERVICE_PORT`
720
| Controls the port used to start the Model Service from Service Automation Portal. -1 for random port allocation.
721
722
723
| Integer (default="-1").
| `SINGULARITY_IMAGE_PATH`
| Location of the singularity image on the node file system (this path will be used to either store the singularity image or the image will be directly used if the file is present).
724
| String (default="/tmp/maas_ml.sif")
Diana Jlailaty's avatar
Diana Jlailaty committed
725
726
727
| `TRACE_ENABLED`
| True if the user wants to keep a trace on the different changes occurring in the service.
| Boolean (default=True)
728
| `YAML_FILE`
729
| A YAML file that describes the OpenAPI Specification ver. 2 (known as Swagger Spec) of the service. This file should be stored in the catalog under the `model_as_service_resources` bucket. More information about the structure of this file can be found in the section <<_customize_the_service>>.
730
| String (default="ml_service-api").
Diana Jlailaty's avatar
Diana Jlailaty committed
731
|===
732

Diana Jlailaty's avatar
Diana Jlailaty committed
733
Click on `Execute Action` and follow the progress of the service creation.
734

735
image::MAAS_ML_Service_PSA.PNG[500,500,align=center]
736

737
==== Deploy a Specific ML Model
738

739
Once the status of your generic model service is displayed as `RUNNING` on `Service Automation`, you can deploy your model by following the steps below :
740

741
Select and execute the `Deploy_ML_Model` from Actions to deploy your model.
742

Diana Jlailaty's avatar
Diana Jlailaty committed
743
Set the Following variables:
744

745
.Deploy_ML_Model variables
746
747
748
[cols="2,5,2"]
|===
| *Variable name* | *Description* | *Type*
749
750
751
| `BASELINE_DATA_URL`
| URL of the dataset to be deployed and used in the data drift detection process.
| URL (default= \https://activeeon-public.s3.eu-west-2.amazonaws.com/datasets/baseline_data.csv)
Diana Jlailaty's avatar
Diana Jlailaty committed
752
| `DEVIATION_DETECTION`
753
| If True, the data drift will be detected, and the user will be notified about the drift.
Diana Jlailaty's avatar
Diana Jlailaty committed
754
755
| Boolean (default=True)
| `DEVIATION_THRESHOLD`
756
| It represents the data drift threshold, the level or point at which the data drift is detected, and the user is notified.
Diana Jlailaty's avatar
Diana Jlailaty committed
757
758
| Float (default=1.9)
| `LOGGING_PREDICTION`
759
| If True, the predictions will be stored, and the user will be able to preview them.
760
| Boolean (default=True)
761
| `MODEL_URL`
Diana Jlailaty's avatar
Diana Jlailaty committed
762
| A valid URL specified by the user referencing to the model that needs to be deployed.
763
| URL (default= \https://activeeon-public.s3.eu-west-2.amazonaws.com/models/model.pkl)
Diana Jlailaty's avatar
Diana Jlailaty committed
764
| `MODEL_METADATA`
765
| This variable contains some statistical features such as the mean, and the standard deviation extracted from the training data used to build the model that should be deployed. This metadata is used for the purpose of detecting drifts.
Diana Jlailaty's avatar
Diana Jlailaty committed
766
767
| Numerical vector (default= [[5.8216666667,3.0658333333,3.695,1.1766666667],[0.8128364419,0.4385797999,1.7614380107,0.7581194484]])
| `USER_NAME`
768
| A valid username having the needed privileges to execute this action.
Diana Jlailaty's avatar
Diana Jlailaty committed
769
| String (default="user")
770
771
|===

Diana Jlailaty's avatar
Diana Jlailaty committed
772
Click on `Execute Action` and follow the progress of the model deployment.
773

Diana Jlailaty's avatar
Diana Jlailaty committed
774
Check that the status correctly evolves to `AI_MODEL_DEPLOYED`.
775
776


777
==== Delete/Finish or Update the Service Instance
778

779

780
You can delete the launched service instance directly from Service Automation Portal:
781

782
Open the link:https://try.activeeon.com/automation-dashboard/#/portal/cloud-automation[Service Automation Portal].
783

784
Set the action `Finish` under Actions and click on `Execute Action`.
785

786

787
image::MAAS_ML_Delete_Service.PNG[align=center]
788

789
There is also one more action that can be executed from Service Automation Portal which is:
790

791
- *Update_MaaS_ML_Parameters*: This action enables you to update the variables values associated to the MaaS_ML instance according to your new preferences.
792

Diana Jlailaty's avatar
Diana Jlailaty committed
793
794
==== Audit and Traceability
To access the Audit and Traceability page, click on the endpoint under the Endpoint list.