Track provenance of data created by executing programs.
Track the execution of your command line scripts. This will enable detection of:
string and integer options,
input files or directories if linked to existing paths in the repository,
output files or directories if modified or created while running the command.
It will create a
Plan (Workflow Template) that can be reused and a
which is a record of a past workflow execution for provenance purposes. Refer
to the renku workflow documentation for more details on this distinction.
Commands and options
Tracking work on a specific problem.
renku run [OPTIONS] <COMMAND> or <WORKFLOW FILE>
- --name <name>
A name for the workflow step.
- --description <description>
Workflow step’s description.
- --keyword <keyword>
List of keywords for the workflow.
- --input <explicit_inputs>
Force a path to be considered as an input.
- --output <explicit_outputs>
Force a path to be considered an output.
- --param <explicit_parameters>
Force a string to be considered a parameter.
Allow command without output files.
Disable auto-detection of inputs.
Disable auto-detection of outputs.
Disable auto-detection of parameters.
- --success-code <success_codes>
Allowed command exit-code.
Invoke the given command in isolation.
Force running of a workflow file.
Print generated plan after the execution.
- --creator <creators>
Creator’s name, email, and affiliation. Accepted format is ‘Forename Surname <email> [affiliation]’.
Show what would have been executed in a workflow file
Don’t update metadata after the execution and don’t create a commit.
- --provider <provider>
The workflow engine to use for executing workflow files.
toil | local | cwltool
- <COMMAND> or <WORKFLOW FILE>
$ renku run --name <plan name> -- <console command>
If there were uncommitted changes in the repository, then the
renku run command fails. See git status for details.
If executed command/script has similar arguments to
--input) they will be treated as
renku run arguments. To
avoid this, put a
-- separator between
renku run and the
Input and output paths can only be detected if they are passed as
Circular dependencies are not supported for
renku run. See
Circular Dependencies for more details.
When using output redirection in
renku run on Windows (with
`` > file`` or `` 2> file``), all Renku errors and messages are redirected
as well and
renku run produces no output on the terminal. On Linux,
this is detected by renku and only the output of the command to be run is
actually redirected. Renku specific messages such as errors get printed to
the terminal as usual and don’t get redirected.
Detecting input paths
Any path passed as an argument to
renku run, which was not changed during
the execution, is identified as an input path. The identification only works if
the path associated with the argument matches an existing file or directory
in the repository.
The detection might not work as expected if:
a file is modified during the execution. In this case it will be stored as an output;
a path is not passed as an argument to
renku run prints the generated plan after execution if you pass
--verbose to it. You can check the generated plan to verify that the
execution was done as you intended. The plan will always be printed to
stderr even if it’s directed to a file.
Detecting output paths
Any path modified or created during the execution will be added as an output.
Because the output path detection is based on the Git repository state after
the execution of
renku run command, it is good to have a basic
understanding of the underlying principles and limitations of tracking
files in Git.
Git tracks not only the paths in a repository, but also the content stored in those paths. Therefore:
a recreated file with the same content is not considered an output file, but instead is kept as an input;
file moves are detected based on their content and can cause problems;
directories cannot be empty.
When in doubt whether the outputs will be detected, remove all
git rm <path> followed by
git commit before running
renku run command.
Detecting standard streams
Often the program expect inputs as a standard input stream. This is detected
and recorded in the tool specification when invoked by
renku run cat < A.
Similarly, both redirects to standard output and standard error output can be done when invoking a command:
$ renku run grep "test" B > C 2> D
Detecting inputs and outputs from pipes
| is not supported.
Specifying inputs and outputs programmatically
Sometimes the list of inputs and outputs are not known before execution of the program. For example, a program might accept a date range as input and access all files within that range during its execution.
To address this issue, the program can dump a mapping of input and output files
that it is accessing in
outputs.yml. This YAML file
should be of the format
name1: path1 name2: path2
where name is the user-defined name of the input/output and path is the path. When the program is finished, Renku will look for existence of these two files and adds their content to the list of explicit inputs and outputs. Renku will then delete these two files.
By default, Renku looks for these two files in
.renku/tmp directory. One
can change this default location by setting
environment variable. When set, it points to a sub-directory within the
.renku/tmp directory where
All Unix commands return a number between 0 and 255 which is called an “exit code”. In case other numbers are returned, they are treated modulo 256 (-10 is equivalent to 246, 257 is equivalent to 1). The exit-code 0 represents a success and non-zero exit-code indicates a failure.
Therefore the command specified after
renku run is expected to return
exit-code 0. If the command returns different exit code, you can specify them
$ renku run --success-code=1 --no-output fail
Circular dependencies are not supported in
renku run. This means you cannot
use the same file or directory as both an input and an output in the same step,
for instance reading from a file as input and then appending to it is not
allowed. Since renku records all steps of an analysis workflow in a dependency
graph and it allows you to update outputs when an input changes, this would
lead to problems with circular dependencies. An update command would change the
input again, leading to renku seeing it as a changed input, which would run
update again, and so on, without ever stopping.
Due to this, the renku dependency graph has to be acyclic. So instead of appending to an input file or writing an output file to the same directory that was used as an input directory, create new files or write to other directories, respectively.
Workflow Definition File
Instead of using
renku run to track your workflows, you can pass a workflow
definition file to renku for execution and tracking. A workflow definition file
or workflow file contains definition of each individual command as execution
steps. A step’s definition includes the command that will be executed along
with lists of all its inputs, outputs, and parameters that are used in the
command. The following shows a workflow file with one step:
name: workflow-file steps: head: command: head -n 10 data/collection/models.csv data/collection/colors.csv > intermediate inputs: - models: path: data/collection/models.csv - colors: path: data/collection/colors.csv outputs: temporary-result: path: intermediate parameters: n: prefix: -n value: 10
The step head in this workflow file, has two inputs, one output, and one parameter. All these arguments are given a name for better understanding of their purpose. The same workflow file can be simplified to the following format:
name: workflow-file steps: head: command: head -n 10 data/collection/models.csv data/collection/colors.csv > intermediate inputs: - data/collection/models.csv - data/collection/colors.csv outputs: - intermediate parameters: - -n - 10
Although the latter format is more concise it’s recommended to use the former format since it’s more readable and has a more clear definition. You can provide a description for each of the elements in the workflow file. You can also have a set of keywords for each step and for the workflow file. The following listing shows a more complete definition of the same workflow file:
name: workflow-file description: A sample workflow file used for testing keywords: - workflow file - v1 steps: head: command: head -n 10 data/collection/models.csv data/collection/colors.csv > intermediate description: first stage of the pipeline success_codes: - 0 - 127 keywords: - preprocessing - first step inputs: - models: description: all available model numbers path: data/collection/models.csv - colors: path: data/collection/colors.csv outputs: temporary-result: description: temporary intermediate result that won't be saved path: intermediate parameters: n: description: number of lines to print prefix: -n value: 10