2016.09 Update 16

New Features and Enhancements

Products Internal Number Case Number Description
All 8641 22235 When a job is modified, do not mark the job unsafe when the host has been changed. Also do not invalidate the job when the resources, aux resources, jobclass, job project, or FairShare group changed.
NetworkComputer 7878 21366 Added option -leaf to vovfsgroup genconfig to also include weights for the leaf nodes of the complete FairShare tree.
NetworkComputer 8644 22239 To prevent excessive server load due to too many OR clauses in resource map sums, calls to vtk_resourcemap_sum from <swd>vovresourced/config.tcl are now limited to expressions of up to 5 ORs (by default). The default limit can be adjusted by setting RESD(maxORsInResourceSum) in <swd>/vovresourced/config.tcl. If a resource sum with excessive ORs is seen, an alert is generated and an error message is output once per day in the vovresourced logfile.
NetworkComputer 8749   A new option, -writeprdir <directory_path> has been added to the ncmgr stop command. This permits the PR file to be written, uncompressed, to the specified directory instead of the trace.db directory upon server shutdown.

Resolved Issues

Products Internal Number Case Number Description
All 8658 22287 Fixed issue that randomly caused a "page too large" error to be displayed for certain pages in the web UI that contain a significant amount of data.
FlowTracer 8709   Fixed vovfileready bug that existed in 2016.09u12-2016.0u15. The bug caused the vovfileready job to become invalid instead of becoming valid and running the downcone immediately. Now, it behaves correctly. This is a 1-line Tcl change and is a trivial patch to any release.
FlowTracer 8626   An alert for not enough file descriptors from vovlsfd.tcl was confusing users and has been removed. Warnings about this are still issued in the logfile.
FlowTracer 8773   Fixed issue in Tcl procedures that attempt to validate nodes (waive exit code, force run, force validate). These procedures were broken in recent 2016.09 versions (validation did not always occur and error messages were seen). This fix allows these functions to successfully validate the requested node, which also includes the output of jobs.
LicenseMonitor 8162 21794 Made sure the data series has the same order as the legend series so that there is no mismatch between them.
LicenseMonitor 8610   Use bold font for the label and legend in pie charts, and use bold font for tick and legend in bar/histogram chart. This change should provide improved look for presentation views.
LicenseMonitor 8611   Adopted a new pie chart color selection algorithm that produces a range of lighter colors that look good with black fonts on top of them in most cases.
LicenseMonitor 8209   Fixed issue that prevented the lmmgr reset function from working.
LicenseMonitor 8571   Make sure correct where clause for the SQL is used when retrieve data from database to produce reports.
NetworkComputer 8492 22142 Fixed issue with vovslaveroot reconnecting after a 30+ minute network interruption. Now, the slaves will properly reconnect and recover the jobs when the network connection to the server is re-established even after a delay of over 30 minutes. Fixed issue with vovslaveroot when the process exits with an error to make sure that the message gets logged (previously the "fatal error" message was not shown or logged anywhere). Fixed issue with the sleep time between reconnection attempts to be accurate based on wall clock time. Previously, when a vovslave was reconnecting, the timeout would sometimes be very far off from the requested wall clock time.
NetworkComputer 8739 22515 In certain cases, when a job finishes at the same time as an internal alarm, a job can hang, or be auto-killed, so appears to fail. This problem has been resolved.
NetworkComputer 8753 22291 Fixed error in jobclass.cgi, where a jobclass using the variable VOV_JOB_DESC(jobclass) would trigger a Tcl error.
NetworkComputer 7343 20113 When there is an error opening, writing, or closing a dailylog file, such as the log used for vovresourced, catch the error and report it. The error message, including the decoded string from C++, will get printed, as well as the message that was being written. In addition, generate an alert when such an error occurs with the same information (except for the message being written).
NetworkComputer 8744 22529 Format the value of the WX_BUCKET_LINK property when viewed from within node.cgi.
NetworkComputer 7562 20821 Made some fixes to input and output declarations to address "Server is operating on a non-internal object" error.
NetworkComputer 7911 21526 The command vovslavemgr restart now behaves identically to vovslavemgr stop followed by vovslavemgr start, allowing vovslavemgr restart to properly restart busy slaves.
NetworkComputer 8623 22166
  • vtk_fsgroup_update is added. Calling this function triggers the server to update FairShare statistics such as target share, actual share, excess share.
  • FairShare statistics updated timestamp is showing on the page fairshare.cgi.
  • Sanity function does not restart vovresourced daemon any longer.
  • If running counts is wrong and gets fixed by sanity, the server log will have a message that looks like "Fixing count of running jobs old=1 new=2".
  • Running count became wrong when a job was invalidated, and this case is fixed.
NetworkComputer 8625 22226 Queries for requested-resource fields will now return correct values for jobs with non-running statuses.
NetworkComputer 8673 22267 Added "USERXDUR" and "USERXDURPP" fields to vovselect (for FlowTracer) and nc getfields (for NetworkComputer) to reflect the expected duration of the job as specified by the user. The existing fields "XDUR" and "XDURPP" will continue to be updated to reflect actual duration when the job completes successfully.
NetworkComputer 8678 22273 A slave that is in the process of stopping but is still running jobs will now have _stopped_<timestamp> appended to its name, whether stopped from the command line or through the server's browser-based UI. This allows a slave to be restarted multiple times. In addition, a request to a slave to stop can only be canceled if a replacement slave hasn't already started.
NetworkComputer 8697 21608 Improved the performance and reliability of the FREE_SLAVES preemption rule behavior.