2016.09 Update 8

New Features and Enhancements

Products Internal Number Case Number Description
All 7474   Fixed instances where replacing/following symlinks is not supported in Windows.
All 7387 20602 Added an updatecrontab_vovdir.csh script to the project.swd/autostart directory to automatically update the scripts/vovdir.csh file when the server starts. Upgrading to a new release now automatically updates the version of crontab used to be consistent.
All 7088 20035 Added logic to check if the .stdstart (headLog) and .stdend (footLog) files are in use. If so, delete if size 0.
All 5525 12768 The environment variable HOSTNAME was previously being transferred to slaves via SNAPSHOT or SNAPPROP. This was causing the HOSTNAME on the slave to show an incorrect value. The transfer is now suppressed.
FlowTracer 7553 20870 2016.09 update 8 adds PAUSED state for slaves that receive SIGTSTP; receiving this signal suspends all child jobs and the slave itself. Resuming slaves will resume all child jobs. Additionally, we have added a queryable HEARTBEAT field to the "slaves" table.
  7673   In FlowTracer, changed the default value of nfsdelay from 0 to 60 seconds to protect against invalidation of jobs due to filesystem caching, such as NFS attribute and directory caching. This change is made in the policy.tcl file in the server working directory. In FlowTracer, changed the default value of timeTolerance from 0 to 1 second to protect against invalidation of jobs due to clocks not being synchronized across hosts in the network. This change is made in the policy.tcl file in the server working directory.
LicenseAllocator 6499 20229 Historical metrics can be loaded upon server restart by adding the command "LA::ReloadHistoricalMetrics" to LA's "config.tcl" AFTER all the sites and resources have been declared. The command requires a parameter specifying a duration (Vov time specification) for which to load the metrics. The duration is the time going back from "now".
LicenseMonitor 7507 20235 Added new -replaceImages option to ftlm_batch_report to replace the dynamic image elements of a batch report HTML file with static PNG images. The usage syntax for the new option is:
ftlm_batch_report -replaceImages <INFILE> [OUTFILE]
If the OUTFILE option is not passed, the utility will generate a new file named INFILE-static.html. OUTFILE can be the same as INFILE, but is not recommended.
LicenseMonitor 7472 20744 When available, make pid visible in certain LM reports.
NetworkComputer 7455 20674 The optional live_keepfor_jobs.tcl task script has been improved to reduce the load on the NC vovserver.
NetworkComputer 7433   Added field XDURPERCENT for jobs, which can be used in preemption rules.
NetworkComputer 5579 20837 Log an entry in the server log when a slave fails to send its required heartbeat to the server and enters a sick state. The log entry resembles:
vovserver(1323) May 05 11:51:19 Slave parser on host maiden (client 1) is sick due to a missing heartbeat
The server log also contains a message when the slave is healthy again:
vovserver(1323) May 05 12:10:18 Sick=1 slave parser on maiden (client 1) is healthy again
NetworkComputer 7535 20833 The proc VovGetRevokeDelay can now be added and customized by redefining it in vovresourced/config.tcl under the SWD directory to allow users to customize the revoke delay to be used in vovreconciled. This allows users to have the revoke delay from their job classes override the default value of RESD(revokeDelay). The proc definition has been added to the documentation.
NetworkComputer 7537 20834 Provide the ability to specify slot count as an adjustment to the core count. The following capacity specification forms have been added: CORES+N, CORES-N, CORES*N, CORES/N The word CORES is required, followed by a single-character operator, then a whole or decmial number. These new forms work in addition to the traditional numerical capacity setting and are supported in the following: vtk_slave_define -capacity XXXXX, vtk_slave_set_defaults -capacity XXXXX, vovslave -T XXXXX, vovslavemgr configure -capacity XXXXX *Where XXXXX is capacity specifier as described previously. Note: Capacity cannot be less than 0 slots, nor can it exceed 1000 slots.
NetworkComputer 7561   Improved debugability of preemption rules with a DEBUG property.
NetworkComputer 7642   Slave grouping was added in 2016.09 and nc gui -slaves was showing groups if there are more than 20 slaves. This changeset returns to 2016.03 behavior. nc gui -slaves shows individual slaves.

Resolved Issues

Products Internal Number Case Number Description
All 7328 20517 The sets page now reloads after invalidating the set via the invalidate icon.
All 7456 20713 The frequency of calls made by vovslave to the w command have been reduced to once per minute across all slaves for a given host.
All 7473   Fix instances where symlink depth in Linux is limited to 5 or 8 in some cases.
All 7506 20517 The sets page now reloads after invalidating the set via the invalidate icon.
All 7503 20635 Fixed a possible memory corruption issue that occurred when defining equivalences.
All 7557   Enhanced sanity to create resources for any FairShare groups that do not have a corresponding resource, such as "Group:time_users". They can be deleted by doing vovforget -allresources so this provides a mechanism to get them back.
FlowTracer 7459   vovconsole is resizable to smaller size. Previously the minimum size was set to an optimal size.
FlowTracer 7467 20481 Allow vovfileready to work with paths that did not exist or symlinks that point to paths that did not exist at build time.
FlowTracer 7511   Fixed instances where there was random invalidation due to rejection of good timestamp update.
FlowTracer 7003 20731 Flush NFS directory cache when starting a job on a vovslave to prevent chdir failure.
FlowTracer 7463   Fixed bug in vsx output that would cause a job with a name to appear next to the command without a space inbetween.
FlowTracer 7611   Status color is properly updated on Navigator and Alert window after a row is removed.
FlowTracer 7659   Ignore bkill failure when removing slave object.
LicenseAllocator 7450 20230 An issue was uncovered wherein significant memory bloat occurred resulting in large process size and gradual slowdown over a period of days or weeks. This problem has been fixed.
LicenseAllocator 7451 20691 LA will now catch errors in stopping and forgetting old probes before creating new ones. It will wait for up to 60 seconds for the probes to stop, and if they still don't stop, it will raise an alert and not create new probes.
LicenseAllocator 7494   Reset allocations before beginning new distribution, instead of upon receiving a new NC sample.
LicenseAllocator 7265 20460 LA will now convert fully qualified host names into short names before performing matching.
LicenseMonitor 7523 20803 Fixed issue that caused the rlmstat parser to fail if the license server host was changed in the configuration for a live monitor.
LicenseMonitor 7565 20878 It is now possible to choose whether or not to plot the average usage line on the usage-over-time graph of the Feature Detailed Plots page.
LicenseMonitor 7527   ftlm_agent on Linux and MacOS has been fixed to prevent an error trying to change into a non-existent LMSWD directory when attempting to execute ControlCenter jobs.
LicenseMonitor 7533   Fixed issue that caused tag renaming and site assignment to use an empty string value.
NetworkComputer 7632 21004 Addressed issue with NC starting jobs from 2016.09u7 vovserver to 2015.03 clients.
NetworkComputer 7383 20576 Start time, end time, and duration values are now validated in a call to vtk_slave_reserve to prevent values of 0 from being applied to a reservation. This prevents confusing reservation property entries in the /system/slaves/reservations FairShare group, if enabled. In 2016.09, this and the /system/slaves/messages FairShare groups/properties are disabled unless this configuration item is added to the policy.tcl file:
set config(slave.props.enable) 1
NetworkComputer 6042   Fixed balloon error in nc gui -slaves.
NetworkComputer 7461   Fixed formatting issue on system recovery setup web UI page for failover server candidates information.
NetworkComputer 5276 11584 Improved life support mode for license-based resources when the connection to LM is interrupted. Life support is now activated when an HTTP update fails, in addition to when the event monitor is closed. External resource data, such as capacity and used-by-others numbers, will be held at the value last obtained from LM, and will be updated immediately upon reconnection to LM.
NetworkComputer 7566 20787 Improved the performance and configurability of the nc list utility concerning listing by job names: Modified -J option to not use a smart set. Added help clarify the impact of -J option. Provided ability for the administrator to disable -J usage. Added documentation for vnclist.config.tcl.
NetworkComputer 7562 20821 Made some fixes to input and output declarations to address "Server is operating on a non-internal object" error.
NetworkComputer 7505 20774 Added non-Admin visibility to NC fair share graphs showing running and queued job totals.
NetworkComputer 7528   Fixed behavior where preemption method gets lost some point after server restart.
NetworkComputer 7442   The LSF emulation scripts, bjobs and bsub were modified to allow bsub -J jobname to work reliably. Job names with embedded blanks are no longer allowed.
NetworkComputer 7638 21054 Fixed HW resource accounting issue that caused slaves to report higher-than-actual numbers when suspended jobs were stopped instead of being resumed.
NetworkComputer 7547 20475 Added ability for the administrator to configure the maximum environment size for job submissions. This is done via the $VOVDIR/local/vncrun.config.tcl file, using the following configuration variable:
set VOV_JOB_DESC(maxEnvSize) 10000
The value is specified in bytes.
NetworkComputer 7633 20983 Added space between job name and command to correct format in NC.
NetworkComputer 7677   The vovfsgroup create command is now more efficient when copying parent group ACL's.
NetworkComputer 7678 21174 Previously: If there was no vnc_logs directory and nc run commands run with -l log file.log option, snapshot was saved to "vnc_logs" file. Fixed: If there is no vnc_logs directory, snapshot capturing module will try to create one. If it cannot create, it will save the snapshot to env$hashcode.env file instead of vnc_logs file.