1. 17 Feb, 2021 1 commit
  2. 15 Feb, 2021 1 commit
  3. 16 May, 2019 9 commits
    • Mykhailenko's avatar
      Emit event (#3549) · 65ef1e41
      Mykhailenko authored
      65ef1e41
    • fviale's avatar
      Housekeeping performance · d6799e4e
      fviale authored
      - Improve housekeeping time by reducing the number of requests to the database
      - this change also reduce the number of threads created during housekeeping (very important in case of a large number of jobs to remove).
      - functional tests are still missing (there are no test which covers housekeeping)
      - removeFromContext is still too slow, probably because of the successive job removed notification (maybe a group notification mechanism would improve it).
      d6799e4e
    • fviale's avatar
      Jetty log retain days property · 74a81416
      fviale authored
      This change introduce a property to control the retention of jetty log files.
      Previous setting was hardcoded and default to 90 days, which can produce a lot of big files. Default now is 5 days.
      74a81416
    • fviale's avatar
      Add property to configure hsqldb catalog location · ea87ac45
      fviale authored
      This change introduce a new property allowing to configure where the HSQLDB catalog is located (default to scheduler_home/data/db). A relative or absolute path can be used.
      ea87ac45
    • fviale's avatar
      AvoidDeleteOnExit in rrdj temp files · e410779b
      fviale authored
      As commit 106177da could not be applied due to conflicts. Apply the same change on 8.4.X branch.
      e410779b
    • fviale's avatar
      Fix memory leak in LiveJobs · de71d0f1
      fviale authored
       - when a job is finished, the job must be removed from the LiveJobs structures. When this is not done, jobs accumulate overtime and are never cleaned up.
      de71d0f1
    • fviale's avatar
      AvoidDeleteOnExit in rrdj temp files · a96d2e3a
      fviale authored
      - deleteOnExit() must not be used on volatile temporary files creates many times as it creates a memory leak.
      - also, explicitely call multipart.close() once the multipart request is handled (avoids depending on the garbage collector)
      a96d2e3a
    • fviale's avatar
      upgrade rest easy · 838ebdf2
      fviale authored
       - version 3.0.19 fixes a memory leak where a temporary file was created and deleteOnExit() was called for each multipart request handled. Successive deleteOnExit() accumulate in a LinkedHashSet inside the JVM which is never cleaned.
      838ebdf2
    • fviale's avatar
      upgrade hsqldb version · f487e2b0
      fviale authored
       - the 2.4.1 version better handle lobs files and avoids lob files growth.
      f487e2b0
  4. 30 Apr, 2019 1 commit
  5. 23 Apr, 2019 1 commit
  6. 19 Apr, 2019 2 commits
  7. 02 Apr, 2019 1 commit
  8. 07 Mar, 2019 4 commits
    • fviale's avatar
      Install tools needed by the scripts automatically · 189a0f72
      fviale authored
      The installation script needs a few system tools (rsync, zip, unzip, git) which are not always installed by default.
      
      This change asks the user to confirm that these tools will be installed
      189a0f72
    • fviale's avatar
      Add fixed serial uid to NodeSet · 5703e915
      fviale authored
      As NodeSet is stored inside the scheduler database, a fixed serialuid on the class is needed to keep the database when upgrading the scheduler server version.
      5703e915
    • fviale's avatar
      Allow configuration of a global domain name · 13a9d468
      fviale authored
      For active directory domains, it is convenient to allow the possibility to configure globally at the scheduler level the default domain for users.
      This change adds a dedicated property, whenever a task is deployed this domain name is automatically added in the task credentials (which is used by runasme)
      13a9d468
    • fviale's avatar
      Fix for #3494 : In RMNodeStarter, after reconnecting to a resource manager... · 459f7cf5
      fviale authored
      Fix for #3494 : In RMNodeStarter, after reconnecting to a resource manager which has been restarted, JMX authentication does not work
      
      This fix does the following
       - shutdown and restart JMX monitoring upon resource manager reconnection
       - store various information such as nodes and resource manager interface as attributes
       - sanitize reconnection logging messages
      459f7cf5
  9. 06 Mar, 2019 1 commit
  10. 01 Mar, 2019 1 commit
    • Mykhailenko's avatar
      Make some tasks fields transient (#3485) · 489f19a0
      Mykhailenko authored
      EligibleTaskDescriptorImpl had two fieilds: parents and children, which were not transient.
      It was causing StackOverFlow error during serializaion. Because java serializaion mechanism IS recursive function. So if you give to it too deep data structure it will naturally fails to StackOverFlow error.
      489f19a0
  11. 22 Feb, 2019 2 commits
    • Mykhailenko's avatar
      Fix node locking (not actually) (#3482) · ce16887a
      Mykhailenko authored
      * Safer nodes locking/unlocking
      ce16887a
    • Mykhailenko's avatar
      Delay shutdown when remove/undeploy busy NS (#3480) · 6e943c23
      Mykhailenko authored
      When we order NS removal/undeploy while NS has some busy nodes, we should delay NS shutdown. 
      
      We delay this shutdown until these is at least one busy node. 
      As soon as task which is executing on this node is terminated, we call shutdown method. 
      Task termination includes task finishing, task killing, or even task failing - all these event mean that task is terminated. Shutdown method of NS, destroys infrastructure and stops AO of policy.
      6e943c23
  12. 15 Feb, 2019 5 commits
    • Fabien Viale's avatar
      fix for #3477 NPE during HOUSEKEEPING · d574130f
      Fabien Viale authored
       - added some defensive programming
       - changed log message displayed by lockJob
       - make a check inside HouseKeepingRunner to prevent printing unnecessary error messages.
      d574130f
    • fviale's avatar
      SchedulerStateRest:submitFromUrl, Allow submission from non-http url · 616087b9
      fviale authored
      submitFromUrl method only supported http(s) urls, which is a non-necessary limitation.
      This change allows using other kind of urls (ftp, file, etc), by simply using url.openStream()
      616087b9
    • Fabien Viale's avatar
      Reduce scheduler and node log space · 47451faa
      Fabien Viale authored
      Scheduler and node logs can take a big amount of space on disk, the purpose of this change is to reduce this space.
      
       - Display JobContent at TRACE level: currently, the content of all jobs appear in the logs, it used to be mandatory for debugging, but recent developments now allow to export an already submitted job and see its content (with instanciated variables). In case of debugging, it's possible to enable the job logger in trace mode to output the job content.
      
       - use log4j extra to compress log files: log4j extra library allows to automatically compress log files when a size-based rolling is executed. As log compression rate is very high, this converts a 100MB file to 10MB.
      47451faa
    • Mykhailenko's avatar
      Replace TopologicalTaskSorter recursion with iteration (#3472) · c1402617
      Mykhailenko authored
      * Replace recursion with loop
      * Add diamond test
      * Uncomment bigGraph test case
      c1402617
    • fviale's avatar
      Catch Throwable in SchedulerStateRecoverHelp · 1c56cfdc
      fviale authored
      SchedulerStateRecoverHelp can throw StackOverflowError which does not belong to Exception. If this error is not catched properly, the scheduler fails to start at all without possible recovery. By catching Throwable, we allow the scheduler to recover by cancelling the problematic job(s).
      1c56cfdc
  13. 24 Jan, 2019 3 commits
  14. 23 Jan, 2019 1 commit
  15. 22 Jan, 2019 3 commits
  16. 21 Jan, 2019 3 commits
  17. 17 Jan, 2019 1 commit