Renku Workflows and Provenance
One of the most important ideas behind Renku is the concept of capturing the
provenance of the analysis process. Lets assume we are working with input
data
, code
, and results
:
If you write a piece of code that takes some input data, processes it and writes some output to disk, the provenance graph would look something like this:
Naturally, a result
may also be used as input data
to a subsequent step:
In Renku, we provide tools for building such workflows to record and show how data and code are connected. By encoding these relationships, your project is easier for you to manage and faster for others to read and reuse! No more reading through multiple files to understand how they are connected - workflows make the connections between code and data files easy to understand by listing each workflow step and its inputs and outputs.
Each time you track an execution with Renku, you create a workflow step. Encoding a workflow step makes it easier for you to rerun it without retyping long commands. Recording workflow steps in Renku also records metadata that you and others can use to understand how an output was generated.
To take full advantage of workflows, join individual steps together into multi-step workflows. When your code pipeline is encoded as a workflow, you can easily re-run all or portions of your workflow with simple commands, test your code with different parameters and compare the results, or send it to different execution backends.
Working with Workflows
To track your code execution as a Renku workflow, simply prepend renku
run
in front of your command. You may also identify the
workflow step’s inputs and outputs via the -i
and -o
flags, as shown
here:
$ renku run --name run-analysis -- python run_analysis.py -i input_file.csv -o output_file.csv
This command creates a workflow step called run-analysis. You can inspect the
workflow with renku workflow show
:
$ renku workflow show run-analysis
Id: /plans/76d73efb94964e9aac3635176ea57a36
Name: run-analysis
Creators: John Doe <example@renku.ch>
Command: python run_analysis.py -i input_file.csv -o output_file.csv
Success Codes:
Inputs:
- input-1:
Default Value: run_analysis.py
Position: 1
- i-2:
Default Value: input_file.csv
Position: 2
Prefix: -i
Outputs:
- o-3:
Default Value: output_file.csv
Position: 3
Prefix: -o
Once the workflow is recorded, you can execute it again renku workflow execute
:
$ renku workflow execute run-analysis
Similarly, you can re-execute the workflow with modified parameters, for example:
$ renku workflow execute run-analysis --set i-2=other_input_file.csv
which would run it on the file other_input_file.csv
instead of the original
input_file.csv
file. You could also specify an execution backend with
--provider
, e.g. toil
for execution in an HPC cluster (You need to
install renku
with the toil
extra for this to be available).
Composing workflows
To create a workflow my-workflow
out of multiple steps use renku workflow compose
:
$ renku workflow compose --link-all my-workflow run-analysis process-output
If you had two steps named run-analysis
and process-output
. --link-all
tells Renku to automatically infer dependencies between steps for you. The newly
created my-workflow
can also be executed with renku workflow execute
.
Inspecting Workflows
You can see workflows on RenkuLab by going to a project and opening the Workflows tab:
There you can view, filter and navigate all workflows and steps used in the project. Selecting a workflow or step shows you its details (parameters, dependent steps etc.) and allows you to navigate between steps.
The step detail page shows the command used, the inputs and outputs, the parameters, and other related metadata:
When an input or an output is available in the project’s latest commit, you will notice a link icon that will bring you to the file browser to get a preview or download the content.