Unverified Commit 4b0be27e authored by Fabien Viale's avatar Fabien Viale Committed by GitHub
Browse files

Merge pull request #803 from fviale/master

update housekeeping documentation
parents 8c023069 7f26eb2f
......@@ -2759,7 +2759,7 @@ proactive-node -Dpas.launcher.forkas.method=key
----
A SSH key can be tied to the user's account and used to impersonate the user when running a task on a given machine (using SSH).
Additionnally:
Additionally:
- The `.ssh/authorized_keys` files of all machines must be configured to accept this SSH key.
- The SSH key must not contain a *passphrase*.
......@@ -3288,35 +3288,6 @@ that automatically adds indexes on foreign keys, it is strongly recommended to
ask your DBA to identify and delete duplicate indexes created by the database
system since they may hurt performances.
==== Housekeeping
The Scheduler provides a housekeeping mechanism that periodically removes finished jobs
from the Scheduler Portal. It can also remove jobs and all its data from the database to save space.
This mechanism has two phases:
. Once a job is finished, it sets its scheduled time for removal (the job expiration date)
. Actual cleaning of expired jobs from the scheduler and/or the database will be periodically triggered and performed in a bulk operation
You can configure housekeeping with the following parameters located in <<Scheduler Properties,the scheduler config file>>:
* `pa.scheduler.core.automaticremovejobdelay` : delay in seconds after job termination to remove the job.
* `pa.scheduler.core.removejobdelay` : instead of using job termination time to initiate the delay, use the result retrieval time (for example, when a user opens the job results in the scheduler portal).
TIP: Additionnally, you can configure a remove delay for individual job execution using thee `REMOVE_DELAY` <<../user/ProActiveUserGuide.adoc#_remove_delay,generic information>>.
If the `pa.scheduler.core.automaticremovejobdelay` setting is set to `0` (the default), the Housekeeping mechanism is *disabled* (no occurrence of housekeeping).
IMPORTANT: Please note that the housekeeping mechanism is not applied to the jobs
which were already in the database before housekeeping was switched on.
Similarly, a modification of the `automaticremovejobdelay` or `removejobdelay` settings will not be retroactively applied to already finished jobs.
The housekeeping mechanism is periodically triggered through the cron
expression `pa.scheduler.core.automaticremovejobcronexpression`.
As the boolean `pa.scheduler.job.removeFromDataBase` property is set to `true` by default,
a query is executed in the scheduler database to remove all jobs that qualify for the bulk removal.
IMPORTANT: The support of `pa.scheduler.job.removeFromDataBase=false` has been discontinued in ProActive Scheduler version `10.1.0`.
==== Database impact on job submission performance
The choice of the database provider can influence the Scheduler and Resource Manager performance overall.
......@@ -3382,6 +3353,40 @@ nofile soft/hard 65536
In addition, we recommend to have at least 16GB RAM for 15.000 nodes.
=== Housekeeping
The Scheduler provides a housekeeping mechanism that periodically removes finished jobs
from the Scheduler and Workflow Execution portals. It also removes associated logs and all job data from the database to save space.
This mechanism has two phases:
. Once a job is finished, it sets its scheduled time for removal (the job expiration date)
. Actual cleaning of expired jobs from the scheduler and/or the database will be periodically triggered and performed in a bulk operation
You can configure housekeeping with the following parameters located in <<Scheduler Properties,the scheduler config file>>:
* `pa.scheduler.core.automaticremovejobdelay` : delay in seconds to remove a terminated job.
* `pa.scheduler.core.automaticremove.errorjob.delay` : delay in seconds to remove a job terminated with errors.
* `pa.scheduler.core.removejobdelay` : instead of using job termination time to initiate the delay, use the result retrieval time (for example, when a user opens the job results in the scheduler portal).
TIP: Additionally, you can configure a remove delay for individual job execution using the <<../user/ProActiveUserGuide.adoc#_remove_delay,`REMOVE_DELAY`>> or <<../user/ProActiveUserGuide.adoc#_remove_delay_on_error,`REMOVE_DELAY_ON_ERROR`>> generic information.
If the `pa.scheduler.core.automaticremovejobdelay` setting is set to `0` (the default), the Housekeeping mechanism is *disabled* (no occurrence of housekeeping).
If the `pa.scheduler.core.automaticremove.errorjob.delay` setting is set to `0` (the default), removal of finished jobs containing errors uses the `pa.scheduler.core.automaticremovejobdelay` configuration.
The `errorjob.delay` setting is generally used in production environment for audit purpose. It allows keeping jobs containing errors longer in the database.
IMPORTANT: Please note that the housekeeping mechanism is not applied to the jobs
which were already in the database before housekeeping was switched on.
Similarly, a modification of the `automaticremovejobdelay`, `automaticremove.errorjob.delay` or `removejobdelay` settings will not be retroactively applied to already finished jobs.
The housekeeping mechanism is periodically triggered through the cron
expression `pa.scheduler.core.automaticremovejobcronexpression`.
As the boolean `pa.scheduler.job.removeFromDataBase` property is set to `true` by default,
a query is executed in the scheduler database to remove all jobs that qualify for the bulk removal.
IMPORTANT: The support of `pa.scheduler.job.removeFromDataBase=false` has been discontinued in ProActive Scheduler version `10.1.0`.
== Nodes and Task Recovery
......
......@@ -161,6 +161,12 @@ pa.scheduler.stax.job.cache=5000
# Size of the cache used to ensure that delayed jobs or tasks are scheduled at the precise date (without skipping seconds)
pa.scheduler.startat.cache=5000
# Expiration period in seconds of cache used to download workflows
pa.scheduler.download.cache.expiration=60
# Expiration period in seconds of cache used to store session ids
pa.scheduler.method.session.cache.expiration=300
# Period of the HSQLDB monitoring thread (in seconds)
pa.scheduler.hsqldb.monitor.period=10
......@@ -176,6 +182,12 @@ pa.scheduler.core.removejobdelay=0
# Set this time to 0 if you don't want the job to be removed automatically.
pa.scheduler.core.automaticremovejobdelay=0
# Automatic remove error job delay (in seconds). (The time between the termination of a job which contains errors and removing it from the scheduler)
# A job is considered with errors if its number of failed or faulty tasks is greater than 0.
# This setting is ignored if automaticremovejobdelay is set to 0
# Set this time to 0 if you want to apply the same configuration as automaticremovejobdelay.
pa.scheduler.core.automaticremove.errorjob.delay=0
# Remove job in database when removing it from the scheduler.
# This housekeeping feature can be replaced by a stored procedure
# that runs at the desired period of time (e.g. non-business hours)
......
......@@ -52,7 +52,11 @@ The following table describes all available generic information:
|job-level
|JOB_PENDING_TO_RUNNING, JOB_RUNNING_TO_FINISHED
|<<_remove_delay,REMOVE_DELAY>>
|once the job is terminated, this setting controls the delay after which the will be removed from the scheduler database
|once the job is terminated, this setting controls the delay after which it will be removed from the scheduler database
|job-level
|3d 12h
|<<_remove_delay,REMOVE_DELAY_ON_ERROR>>
|once the job is terminated with errors, this setting controls the delay after which it will be removed from the scheduler database. This generic information should be set in addition to REMOVE_DELAY when there is a need to keep the job longer in the scheduler database in case of error.
|job-level
|3d 12h
|<<_earliest_deadline_first_policy,JOB_DDL>>
......@@ -249,7 +253,7 @@ See <<../user/ProActiveUserGuide.adoc#_get_notifications_on_job_events,Get Notif
==== REMOVE_DELAY
The `REMOVE_DELAY` Generic Information can be used to control when a job is removed from the scheduler database after its termination.
The `REMOVE_DELAY` generic information can be used to control when a job is removed from the scheduler database after its termination.
The <<../admin/ProActiveAdminGuide.adoc#_housekeeping,housekeeping mechanism>> must be configured to allow usage of `REMOVE_DELAY`.
......@@ -266,6 +270,25 @@ The format allows flexible combinations of the elements:
`REMOVE_DELAY` can be defined at *job-level* only.
==== REMOVE_DELAY_ON_ERROR
The `REMOVE_DELAY_ON_ERROR` generic information can be used to control when a job is removed from the scheduler database after its termination, if the job has terminated with errors.
The <<../admin/ProActiveAdminGuide.adoc#_housekeeping,housekeeping mechanism>> must be configured to allow usage of `REMOVE_DELAY_ON_ERROR`.
`REMOVE_DELAY_ON_ERROR` overrides the global `pa.scheduler.core.automaticremove.errorjob.delay` setting for a particular job.
It allows a job to be removed either *before* or *after* the delay configured globally on the server.
The general format of the `REMOVE_DELAY_ON_ERROR` generic information is `VVd XXh YYm ZZs`, where VV contain days, XX hours, YY minutes and ZZ seconds.
The format allows flexible combinations of the elements:
* `12d 1h 10m` : 12 days, 1 hour and 10 minutes.
* `26h` : 26 hours.
* `120m 12s` : 120 minutes and 12 seconds.
`REMOVE_DELAY_ON_ERROR` can be defined at *job-level* only.
==== Earliest Deadline First Policy
The <<../user/ProActiveUserGuide.html#_earliest_deadline_first_edf_policy,Earliest Deadline First Policy>> is a <<../user/ProActiveUserGuide.html#_scheduling_policies,Scheduling Policy>> which can be enabled in the ProActive Scheduler server.
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment