Integrate Difficult Tools
Tools that Run in Place
It may happen that a file is declared both as an input and as an output.
The order in which the declarations occur is of great importance, because it determines the structure of the resulting dependency graph.
If a file is declared first as an output, all subsequent input declarations for the same file are ignored, the logic being that a job can do anything it wants with its own outputs, including reading and writing them multiple times.
If a file is declared first as an input and later as an output, FlowTracer assumes that the tool modifies its input. This adds some complication to the management of the flow, because now we have two places with the same name, one to represent the input and one to represent the output of the tool. If a tool modifies one or more of its inputs, we say that the tool runs in place.
The two places that identify the file that is being modified are said to be in a chain of places. A minimum chain contains 2 places, but if a file is modified in place by many tools the chain for that file can get arbitrarily long. Chains of places are managed automatically by FlowTracer and most of the times you need not worry about them.
Start a Subflow Before a Job is Completely Done with vovfileready
There are jobs that run for a long time and produce several outputs along the way. If it is known that one of the outputs is "ready to be used", it would be nice to be able to process that output before the current job is complete. This violates the normal rule that an output of a job becomes VALID only when the job that creates is completes successfully.
The utility vovfileready
can be used during the job runtime to make
the output available for immediate processing. This utility changes the dependency
tree so that the output becomes the VALID output of a new special job, and therefore
is immediately usable.
vovfileready
If you want to use this utility vovfileready you need to make sure
that you first call vovfileready -clean
before any other call to
vovfileready <output_name>
. See example in the usage message
below.
vovfileready: Usage Message
USAGE:
% vovfileready <NAME_OF_FILE_THAT_IS_READY> ...
OLD USAGE: the following options are now ignored:
% vovfileready -start
% vovfileready -clean
This script can only be executed as part of a running job.
From within the running job, when you know that one or more
output files are ready to be used, you can call
vovfileready NAME_OF_FILE_THAT_IS_READY ...
and this utility will mark all the files specified
on the command line as VALID.
EXAMPLE: PSEUDO-CODE
-- #!/bin/csh -f
-- # This could be a script called ./my_tool
-- DoSomeInitialization
-- DoSomeWorkFor5Minutes
-- vovfileready my_first_output ; ### The flow can start processing
-- ### my_first_output even
-- ### if ./my_tool is not quite done.
--
-- DoSomeOtherWorkFor1Hour
-- vovfileready another_output and_another_one
-- MoreWorkToFinish
-- exit 0
Then invoke the script with a wrapper, e.g.
% vov ./my_tool arg1 arg2 ...
Aggressive Retrace is Required
Due to the use of barriers in the current implementation of vovfileready, aggressive retrace is required for it to work. This disrupts the normal use of barriers, so flows that already use barriers will not work as intended if aggressive retrace is always enabled.
aggressive
option in the Retrace Priority &
Flags dialog. When retracing from the command line, use vsr
-aggressive
. To make the setting apply to every FlowTracer console for
a particular project, add the following to
$VOV_SERVER_DIR/gui.tcl:set ::VovGUI::retrace(mode) "aggressive"
Example
#!/bin/csh -f
# Clean vovfileready from previous run(s)
vovfileready -clean
#
# Do some stuff
#
foreach extraction_corner $extraction_corners {
#
# Complete extraction for this corner; could take a while
#
# When files are complete, call vovfileready on them
vovfileready data/$extraction_corner.spef.gz
}
#
# Do more stuff
#
# Full job completes, potentially much later
exit 0