2019.01 Update 4 Release

New Features and Enhancements

The following new features and enhancements were introduced this software release:

Product Internal Number Case Number Description
All VOV-10969   The command vovclientmgr show no longer incorrectly labels clients without "nicknames" as HTTP clients. In addition, the nc command now properly sets a client nickname in all scenarios to allow it to be more easily identified in the output of both vovclientmgr show and vovshow -clients.
All VOV-10864   Added a new trace parameter, enterpriselicense.burst to enable burst licensing for NC/WX in Auto mode. 1=enabled 0=disabled, defaults to disabled.

Cleanup of various presentation errors in web UI license page when switching modes and show more details on current usage/availability for all modes.

Disabled choices for Full and N for the licensing mode in the web UI for non-NC/WX servers.

Accelerator Plus VOV-9520 23779 wxmgr stop -freeze will now force shutdown of WXLauncher if it does not complete a graceful shutdown within 60s to support upgrade operations which require WXLauncher to restart.
Accelerator Plus VOV-10444   The behavior of crash recovery timing has changed. In previous updates, a single server parameter crashRecoveryPeriod dictated the crash recovery period. Crash recovery completed after the crashRecoveryPeriod and the server began normal operation.
Three changes were made:
  1. A bug was fixed that prevented crash recovery ending when all jobs were recovered.
  2. The upper limit on the crashRecoveryPeriod parameter was changed to 30m.
  3. Two new server parameters were added to enable a more flexible approach to crash recovery timing.

At any stage, if all jobs are recovered, crash recovery will end.

If a vovslave reconnects during a 'quiet time' before crash recovery ends, the crash recovery deadline will be extended by this 'quiet time'. The quiet time is specified by the crashRecoveryQuietTime server parameter.

The crashRecoveryMaxExtension server parameter specifies an upper limit on the amount by which the deadline is extended. The parameters can be set in policy.tcl. The ranges and default values are as follows:

# min 30s, max 1800s

VovServerConfig crashRecoveryPeriod 60

# min 0, max 300

VovServerConfig crashRecoveryQuietTime 30

# min 0s, max 1800s

VovServerConfig crashRecoveryMaxExtension 60

If desired, the original crash recovery behavior can be restored by setting the crashRecoveryMaxExtension parameter to zero. Appropriate settings for these parameters will depend on the particular site configuration and needs.

Allocator VOV-10360 24466 Improved performance of NRU matching. Also, changed the NRU bailout message to clarify the numbers in the message.
Allocator VOV-9411 23688 Added support for hierarchical Altair Allocators. This is an experimental feature. Please contact support for details.
Accelerator VOV-10520   Accelerator will now support the use of burst licenses. If license file is provisioned with nc_slots_burst type licenses, Accelerator will allocate licenses first out of the base nc_slots licenses, and then allocate additional slots as required from the nc _slots_burst license pool.
Accelerator VOV-4900 21070 TIMEVAR time slot specifications are expanded to allow the second time item in the range HH:MM-HH:MM to be a prior time, as is the case when spanning an overnight time. For example, 6 PM to 6 AM may now be specified using a 24 hour clock range as follows: 18:00-6:00.
Accelerator VOV-8188 21251 Previously, when vovgetgroups timed out or did not return groups info correctly, the job would run with the incorrect groups. Following this change, under those conditions, the job will fail. Also previously, the VOV_ALARM timeout for vovgetgroups was limited to not exceed 60 seconds. The 60 second limit has been eliminated.
Accelerator VOV-10812   The show/hide cgroups link on the Slave Resources web UI page is no longer required to show the CGROUP:RAM slave resource, and therefore has been removed.
Accelerator VOV-9555   Modified vovslave log messages to be more clear and actionable.
Accelerator VOV-10905 24801 nc why output for DP jobs will omit the confusing internal DP:SLOTS_N resource and show subjob IDs and statuses
Accelerator VOV-5294   A new capability to improve job RAM and CPUTIME accounting for jobs with detached processes is implemented on Linux systems. In addition to collecting PIDs that share a PGID or are within the process tree for a job, various types of detached processes are found if they are in the same Session ID or if the VOV_JOBID environment variable matches the values for the running job.
Accelerator VOV-10795   Support for hourly charging in the cloud has been implemented. If a slave is started with the environment variable VOV_INSTANCE_LAUNCH_TS set to the launch time of the instance on which the slave is running, then the slave will be kept alive until we approach the hour-boundary to within a few minutes.

This is an unsupported feature.

Allocator VOV-9737 24038 The VOV_LICMON environment variable now supports a comma-separated list of hosts rather than a single host.

Resolved Issues

The following issues were resolved in 2019.01 Update 4 software release.
Product Internal Number Case Number Description
All VOV-10999   Fixed issue that prevented the show all rows link from working on the buckets web UI page. Previously, using this link would result in an empty table as opposed to showing all available rows.
All VOV-10350   All installers/SFDs now reject installation paths that contain spaces.
All VOV-10427 24510 This ticket addressed three issues that affected crash recovery.

The first was a race condition that occurred when a vovslave connected to a restarted server. If the vovslave license authorization happened to be checked during a very small interval the result was that the vovslave was destroyed.

The second was that 'hog protection' was inadvertently applied to vovslaves during crash recovery with the result that reconnection of vovslaves after a serve restart could be delayed until crash recovery period had ended. (This compounded the first issue during crash recovery.)

The third issue was cosmetic and resulted an a Tcl stack trace if the vovserver took too long to respond while restarting. The database queries (vtk_select_loop) parameters were adjusted to lengthen the response period.

Also see the release notes for VOV-10444 for pertinent crash recovery parameters.

All VOV-10221 24265, 24417 vovserver failover recovery has been enhanced to try for the recovery on all the configured server candidates.
All VOV-9902   Prevent vovserver and child processes from exiting when Ctrl-C is pressed in the Windows command prompt from which the server was started.
All VOV-11126 24961 The description of the RAMUSED slave resource was updated for better clarity on usage.
Accelerator Plus VOV-10862   Behavioral change; remaining slaves in base queues that have been removed will not be filtered from wait reasons.
Accelerator Plus VOV-10557 24557 The Linux priority/"nice level" of jobs running via Accelerator Plus will now have the same priority as jobs running directly on Accelerator for the same Accelerator/Accelerator Plus designated execution priority. Use nc/wx run -p <scheduling priority>.<execution priority> ... to set the execution priority.
Accelerator Plus VOV-10117 24291 Fixed race condition when a job arrives while a slave is shutting down due to exceeding its maxIdle setting. The job will now be rescheduled instead of failing.
Accelerator Plus VOV-10273   If the server configuration parameter failover.usefailoverslavegrouponly is set (default 0), then only failover slaves participate in server election. By default all slaves participate, which may cause excessive file traffic with many slaves (particularly exacerbated by Accelerator Plus).

The server election 'voting' period in seconds can be overridden by the server configuration parameter failover.maxdelaytovote (default 120).

Accelerator Plus VOV-10033 24060 Jobs using shared memory should no longer see incorrect ram usage spikes when child processes terminate.
Accelerator Plus VOV-10705 24632 Fixed bug that masked the number of queued slave requests when Accelerator Plus was calculating how many more slaves to request and under some conditions resulted in more slaves requested than there were jobs in the bucket. Also fixed the use of quota with slave launching via arrays so that the array parameter correctly applies the quota.
Accelerator Plus VOV-11113   vovwxd will now log the time for a service loop at log level 3. The time of the latest loop will be updated in the property WXLoopTime.
Accelerator Plus VOV-11112   The WX_BUCKET_SERVICE_TS property will be updated more frequently to show activity on heavily loaded Accelerator Plus queue.
FlowTracer VOV-10990   Fixed an issue that could cause vovlsfd to fail due to errors updating reservations.
FlowTracer VOV-7913 21527 Fixed issue that caused the login link to be shown even after logging into the web UI for users possessing the READONLY security level.
FlowTracer VOV-10654 24609 Fixed unflattening of sets (Unflatten Sets in the context menu) that were flattened recursively in vovconsole.
FlowTracer VOV-10114 24290 Job (transition) may now have "Failed to get user" error code.
Accelerator VOV-10833 24749 Fixed an issue with vovreconciled not revoking component resource as per the revocation delay set for summary resource, when the component resource revocation delay is not set.
Accelerator VOV-9123 23250 Accelerator issuing "stop" from the web interface is now sending the right exit signals.
Accelerator VOV-9194 24593, 24922 Interactive jobs now use the fully-qualified domain name, if available, of the submission host to ensure the execution host can find and connect to the submission host.
Accelerator VOV-9557 23850 Added new vovslavemgr stop -sick <TIMESPEC> function that can be used to forget slaves that are older than the specified timespec-based threshold.
Accelerator VOV-10028 24238
  1. In back-compat mode, increase verbosity level to 3 for message relating to using the signal list and delay obtained from the NC_STOP_* properties.
  2. Do not show <default> in the level 3 Job Control message when using properties.
  3. In that same message, add an indicator of the signal/delay origin when not <default>. Example: "delay=4 (from property)". The indicator will be one of: property, environment, or option.
Accelerator VOV-7862 21237 Correct some edge cases in the job CPU utilization graph. Phantom CPU usage spikes were being seen.
Accelerator VOV-11046 24897 Fixed issue where the vovslave would continuously log "Killing subslave with pid = <pid>" leading to eventual exhaustion of disk space. This fix requires a restart of all vovslaves.
Accelerator VOV-10393   Slaves now start automatically on Windows.
Accelerator VOV-9794   Fixed issue where the terminal appears to freeze when the output of an interactive Accelerator job (NC -Ir) is piped to tee (tee, for example, would report "tee:write error"), cat, etc.
Accelerator VOV-10108 24207 New API containerHooksRunDir is available to specify location where to run container hook scripts. The requested job running directory will be passed to Enter hook script as env(VOV_CONTAINER_JOB_RUNDIR). Please see sample files at /etc/config/containers
Accelerator VOV-10341   Added a description of the RESV_<license> job property, which is a counter of how many times <license> has been revoked by the vovreconciled daemon (if configured).
Monitor VOV-10908 24235 When monitoring a remote instance of Monitor, ensure that a remote dropped feature is detected and results in the deletion of the local feature. This allows the local feature's capacity to be set to 0 upon the next capacity snapshot.
Monitor VOV-7082 20029 Parsers for MathLM, LMX HASP enhanced. Green Hills error for large IDs fixed.
Monitor VOV-11062 24869 Improved help for vovslavemgr config setenv to instruct windows users to quote the "name=value" parameter