Tracking Workflows Interactively
Renku provides a simple set of commands for recording your data processing steps as you run them. You can also stitch these steps together into workflows.
Tracking a Workflow Step with
To track your code execution as a Renku workflow, simply prepend
in front of your command. Assigning a name makes it easier to re-run this step
$ renku run --name run-analysis python run_analysis.py input_file.csv output_file.csv
If you need to distinguish
renku arguments from your script’s arguments, use
-- to mark where the Renku arguments end and the script command starts.
$ renku run --name run-analysis -- python run_analysis.py input_file.csv output_file.csv
This command creates a workflow step called run-analysis.
You can inspect the workflow with
renku workflow show:
$ renku workflow show run-analysis Id: /plans/76d73efb94964e9aac3635176ea57a36 Name: run-analysis Creators: John Doe <email@example.com> Command: python run_analysis.py -i input_file.csv -o output_file.csv Success Codes: Inputs: - input-1: Default Value: run_analysis.py Position: 1 - i-2: Default Value: input_file.csv Position: 2 Prefix: -i Outputs: - o-3: Default Value: output_file.csv Position: 3 Prefix: -o
Once the workflow is recorded, you can execute it again
renku workflow execute:
$ renku workflow execute run-analysis
Similarly, you can re-execute the workflow with modified parameters, for example:
$ renku workflow execute run-analysis --set i-2=other_input_file.csv
which would run it on the file
other_input_file.csv instead of the original
input_file.csv file. You could also specify an execution backend with
toil for execution in an HPC cluster (You need to
renku with the
toil extra for this to be available).
Updating a Workflow
Now that the workflow step is tracked in Renku, Renku keeps track of when
upstream files in a workflow have changed and downstream files need to be
updated. You can use
renku update to make sure an
output file is up-to-date. If there have been no changes to the upstream files,
Renku will not re-execute the workflow.
$ renku update output_file.csv
If you’d like to rerun the workflow and re-generate a file, regardless of
whether upstream files have changed, use
$ renku rerun output_file.csv
By default, Renku recognizes when workflow steps created by
renku run are related.
For example, consider an example where workflow step A uses data file
initial.txt to generate file
intermediate.txt, which is an input to
workflow step B in order to yield
final.txt. When you run
final.txt, Renku will check for updates in workflow steps A and B, since they
To make this linkage between workflow steps explicit, you may compose workflow
steps in order to create a named multiple-step workflow. To create a
my-workflow out of multiple steps that were created by
renku workflow compose:
$ renku workflow compose --link-all my-workflow run-analysis process-output
If you had two steps named
tells Renku to automatically infer dependencies between steps for you. The newly
my-workflow can also be executed with
renku workflow execute.
For more information about working with workflows using the Renku CLI, see