renku run
Track provenance of data created by executing programs.
Description
Track the execution of your command line scripts. This will enable detection of:
arguments (flags),
string and integer options,
input files or directories if linked to existing paths in the repository,
output files or directories if modified or created while running the command.
It will create a Plan
(Workflow Template) that can be reused and a Run
which is a record of a past workflow execution for provenance purposes. Refer
to the renku workflow documentation for more details on this distinction.
Commands and options
renku run
Tracking work on a specific problem.
renku run [OPTIONS] COMMAND_LINE...
Options
- --name <name>
A name for the workflow step.
- --description <description>
Workflow step’s description.
- --keyword <keyword>
List of keywords for the workflow.
- --input <explicit_inputs>
Force a path to be considered as an input.
- --output <explicit_outputs>
Force a path to be considered an output.
- --param <explicit_parameters>
Force a string to be considered a parameter.
- --no-output
Allow command without output files.
- --no-input-detection
Disable auto-detection of inputs.
- --no-output-detection
Disable auto-detection of outputs.
- --success-code <success_codes>
Allowed command exit-code.
- --isolation
Invoke the given command in isolation.
- --verbose
Print generated plan after the execution.
Arguments
- COMMAND_LINE
Required argument(s)
Examples
$ renku run --name <plan name> -- <console command>
Note
If there were uncommitted changes in the repository, then the
renku run
command fails. See git status for details.
Warning
If executed command/script has similar arguments to renku run
(e.g. --input
) they will be treated as renku run
arguments. To
avoid this, put a --
separator between renku run
and the
command/script.
Warning
Input and output paths can only be detected if they are passed as
arguments to renku run
.
Warning
Circular dependencies are not supported for renku run
. See
Circular Dependencies for more details.
Warning
When using output redirection in renku run
on Windows (with
`` > file`` or `` 2> file``), all Renku errors and messages are redirected
as well and renku run
produces no output on the terminal. On Linux,
this is detected by renku and only the output of the command to be run is
actually redirected. Renku specific messages such as errors get printed to
the terminal as usual and don’t get redirected.
Detecting input paths
Any path passed as an argument to renku run
, which was not changed during
the execution, is identified as an input path. The identification only works if
the path associated with the argument matches an existing file or directory
in the repository.
The detection might not work as expected if:
a file is modified during the execution. In this case it will be stored as an output;
a path is not passed as an argument to
renku run
.
Specifying auxiliary inputs (--input
)
You can specify extra inputs to your program explicitly by using the
--input
option. This is useful for specifying hidden dependencies
that don’t appear on the command line. Explicit inputs must exist before
execution of renku run
command. This option is not a replacement for
the arguments that are passed on the command line. Files or directories
specified with this option will not be passed as input arguments to the
script.
You can specify --input name=path
or just --input path
, the former
of which would also set the name of the input on the resulting Plan.
For example, renku run --input inputfile=data.csv -- python script.py data.csv outfile
would force Renku to detect data.csv
as an input file and set the name
of the input to inputfile
.
Similarly, renku run --input inputfile=data.csv -- python script.py
would let Renku know that script.py
reads the file data.csv
even
though it does not show up on the command line.
Specifying auxiliary parameters (--param
)
You can specify extra parameters to your program explicitly by using the
--param
option. This is useful for getting Renku to consider a
parameter as just a string even if it matches a file name in the project.
This option is not a replacement for the arguments that are passed on the
command line.
You can specify --param name=value
or just --param value
, the former
of which would also set the name of the parameter on the resulting Plan.
For example, renku run --param myparam=hello -- python script.py hello outfile
would force Renku to detect hello
as the value of a string parameter
with name myparam
even if there is a file called hello
present on the
filesystem.
Disabling input detection (--no-input-detection
)
Input paths detection can be disabled by passing --no-input-detection
flag to renku run
. In this case, only the directories/files that are
passed as explicit input are considered to be file inputs. Those passed via
command arguments are ignored unless they are in the explicit inputs list.
This only affects files and directories; command options and flags are
still treated as inputs.
Note
renku run
prints the generated plan after execution if you pass
--verbose
to it. You can check the generated plan to verify that the
execution was done as you intended. The plan will always be printed to
stderr
even if it’s directed to a file.
Detecting output paths
Any path modified or created during the execution will be added as an output.
Because the output path detection is based on the Git repository state after
the execution of renku run
command, it is good to have a basic
understanding of the underlying principles and limitations of tracking
files in Git.
Git tracks not only the paths in a repository, but also the content stored in those paths. Therefore:
a recreated file with the same content is not considered an output file, but instead is kept as an input;
file moves are detected based on their content and can cause problems;
directories cannot be empty.
Note
When in doubt whether the outputs will be detected, remove all
outputs using git rm <path>
followed by git commit
before running
the renku run
command.
Command does not produce any files (--no-output
)
If the program does not produce any outputs, the execution ends with an error:
Error: There are not any detected outputs in the repository.
You can specify the --no-output
option to force tracking of such
an execution.
Specifying outputs explicitly (--output
)
You can specify expected outputs of your program explicitly by using the
--output
option. These output must exist after the execution of the
renku run
command. However, they do not need to be modified by
the command.
You can specify --output name=path
or just –output path`, the former
of which would also set the name of the output on the resulting Plan.
For instance, renku run --output result=result.txt -- python script.py -o result.txt
would force Renku to treat the file result.txt
as an output of the
workflow and set the name of the output to result
.
Similarly, renku run --output result=result.txt -- python script.py
would let Renku know about result.txt
created by script.py
even
though it does not show up on the command line command. Though Renku should
automatically detect these cases under normal circumstances.
Disabling output detection (--no-output-detection
)
Output paths detection can be disabled by passing --no-output-detection
flag to renku run
. When disabled, only the directories/files that are
passed as explicit output are considered to be outputs and those passed via
command arguments are ignored.
Detecting standard streams
Often the program expect inputs as a standard input stream. This is detected
and recorded in the tool specification when invoked by renku run cat < A
.
Similarly, both redirects to standard output and standard error output can be done when invoking a command:
$ renku run grep "test" B > C 2> D
Warning
Detecting inputs and outputs from pipes |
is not supported.
Specifying inputs and outputs programmatically
Sometimes the list of inputs and outputs are not known before execution of the program. For example, a program might accept a date range as input and access all files within that range during its execution.
To address this issue, the program can dump a mapping of input and output files
that it is accessing in inputs.yml
and outputs.yml
. This YAML file
should be of the format
.. code-block:: YAML
name1: path1 name2: path2
where name is the user-defined name of the input/output and path is the path. When the program is finished, Renku will look for existence of these two files and adds their content to the list of explicit inputs and outputs. Renku will then delete these two files.
By default, Renku looks for these two files in .renku/tmp
directory. One
can change this default location by setting RENKU_INDIRECT_PATH
environment variable. When set, it points to a sub-directory within the
.renku/tmp
directory where inputs.yml
and outputs.yml
reside.
Exit codes
All Unix commands return a number between 0 and 255 which is called an “exit code”. In case other numbers are returned, they are treated modulo 256 (-10 is equivalent to 246, 257 is equivalent to 1). The exit-code 0 represents a success and non-zero exit-code indicates a failure.
Therefore the command specified after renku run
is expected to return
exit-code 0. If the command returns different exit code, you can specify them
with --success-code=<INT>
parameter.
$ renku run --success-code=1 --no-output fail
Circular Dependencies
Circular dependencies are not supported in renku run
. This means you cannot
use the same file or directory as both an input and an output in the same step,
for instance reading from a file as input and then appending to it is not
allowed. Since renku records all steps of an analysis workflow in a dependency
graph and it allows you to update outputs when an input changes, this would
lead to problems with circular dependencies. An update command would change the
input again, leading to renku seeing it as a changed input, which would run
update again, and so on, without ever stopping.
Due to this, the renku dependency graph has to be acyclic. So instead of appending to an input file or writing an output file to the same directory that was used as an input directory, create new files or write to other directories, respectively.