Black Hole Detection
A Black Hole is a tasker that appears healthy but is unable to execute jobs. All jobs sent to that tasker quickly fail, and the tasker appears ready to execute the next job in queue, although all jobs submitted to that tasker fail.
On a given tasker, if a number
(blackholeFailedJobs
) of consecutive jobs fail within a
relatively short time (blackholeDiscardTime
), the tasker is potentially a black hole. The tasker is
certainly a black hole only when we know that a large fraction
(blackholeFailRate
) of those jobs succeed on other taskers. When a tasker is a potential back hole, it
is suspended for a short amount of time (blackholeMaybeTime
),
typically around 10 seconds. When a tasker is a black hole, it is
suspended for a longer period (blackholeSuspendTime
) typically
around 10 minutes.
% nc cmd vovsh -x 'vtk_server_config blackholedetection 1'
% nc cmd vovsh -x 'vtk_server_config blackholedetection 0'
% nc cmd vovsh -x 'vtk_generic_get policy a; parray a' | grep blackhole