Failover Server Candidates

If a server crashes suddenly, VOV has the capability to start a replacement server on a pre-selected host. This capability requires that the pre-selected host is configured as a failover server.

The configuration instructions follow.
Note: The vovserverdir command only works from a VOV-enabled shell when the project server is running.
  1. Edit or create the file servercandidates.tcl in the server configuration directory. Use the vovserverdir command with the -p option to find the pathname to this file.
    % vovserverdir -p servercandidates.tcl
    /home/john/vov/myProject.swd/servercandidates.tcl
    The servercandidates.tcl file should set the Tcl variable ServerCandidates to a list of possible failover hosts. This list may include the original host on which the server was started.
    set ServerCandidates {
        host1
        host2
        host3
    }
  2. Install the autostart/failover.sh script as follows:
    % cd `vovserverdir -p .`
    % mkdir autostart
    % cp $VOVDIR/etc/autostart/failover.sh autostart/failover.sh
    % chmod a+x autostart/failover.sh
  3. Activate the failover facility by running vovautostart.
    % vovautostart
    For example:
    % vovtaskermgr show -taskergroups
    ID         taskername        hostname         taskergroup
    000404374  localhost-2      titanus          g1
    000404375  localhost-1      titanus          g1
    000404376  localhost-5      titanus          g1
    000404377  localhost-3      titanus          g1
    000404378  localhost-4      titanus          g1
    000404391  failover         titanus          failover
    Note: Each machine listed as a server candidate must be a vovtasker machine; the vovtasker running on that machine acts as its agent in selecting a new server host. Taskers can be configured as dedicated failover candidates that are not allowed to run jobs by using the -failover option in the taskers definition.

    Preventing jobs from running on the candidate machine eliminates the risks of machine stability being affected by demanding jobs. The -failover option also enables some failover configuration validation checks. Failover taskers are started before the regular queue taskers, which helps ensure a failover tasker is available as soon as possible for future failover events.

    The -failover option keeps taskers up and running without the need for jobs to be running (in fact, jobs are not allowed to run on them). This allows them to participate in the server election process and start up vovserver without introducing a competition for resources.

    Without -failover, the taskers will be normal ones, which can run jobs, and in fact, they must be running jobs at the time of the vovserver kill/crash because without no jobs, the taskers will exit:

    if ( ss_activeTransitions <= 0  && (! isInFailoverGroup ) ) { 
       addLog( "No jobs running: no need to keep running." );  
       goto tasker_exit;
    }

    The easiest way to ensure the tasker is in the failover group is to use the -failover option.

    Refer to the tasker definition documentation for details on the -failover option.