Tracking Workflows Interactively

Renku provides a simple set of commands for recording your data processing steps as you run them. You can also stitch these steps together into workflows.

Tracking a Workflow Step with renku run

To track your code execution as a Renku workflow, simply prepend renku run in front of your command. Assigning a name makes it easier to re-run this step later.

$ renku run --name run-analysis python run_analysis.py input_file.csv output_file.csv

If you need to distinguish renku arguments from your script’s arguments, use -- to mark where the Renku arguments end and the script command starts.

$ renku run --name run-analysis -- python run_analysis.py input_file.csv output_file.csv

This command creates a workflow step called run-analysis.

You can inspect the workflow with renku workflow show:

$ renku workflow show run-analysis
Id: /plans/76d73efb94964e9aac3635176ea57a36
Name: run-analysis
Creators: John Doe <example@renku.ch>
Command: python run_analysis.py -i input_file.csv -o output_file.csv
Success Codes:
Inputs:
        - input-1:
                Default Value: run_analysis.py
                Position: 1
        - i-2:
                Default Value: input_file.csv
                Position: 2
                Prefix: -i
Outputs:
        - o-3:
                Default Value: output_file.csv
                Position: 3
                Prefix: -o

Once the workflow is recorded, you can execute it again renku workflow execute:

$ renku workflow execute run-analysis

Similarly, you can re-execute the workflow with modified parameters, for example:

$ renku workflow execute run-analysis --set i-2=other_input_file.csv

which would run it on the file other_input_file.csv instead of the original input_file.csv file. You could also specify an execution backend with --provider, e.g. toil for execution in an HPC cluster (You need to install renku with the toil extra for this to be available).

Updating a Workflow

Now that the workflow step is tracked in Renku, Renku keeps track of when upstream files in a workflow have changed and downstream files need to be updated. You can use renku update to make sure an output file is up-to-date. If there have been no changes to the upstream files, Renku will not re-execute the workflow.

$ renku update output_file.csv

If you’d like to rerun the workflow and re-generate a file, regardless of whether upstream files have changed, use renku rerun.

$ renku rerun output_file.csv

Composing Workflows

By default, Renku recognizes when workflow steps created by renku run are related.

For example, consider an example where workflow step A uses data file initial.txt to generate file intermediate.txt, which is an input to workflow step B in order to yield final.txt. When you run renku update final.txt, Renku will check for updates in workflow steps A and B, since they are related.

To make this linkage between workflow steps explicit, you may compose workflow steps in order to create a named multiple-step workflow. To create a workflow my-workflow out of multiple steps that were created by renku run, use renku workflow compose:

$ renku workflow compose --link-all my-workflow run-analysis process-output

If you had two steps named run-analysis and process-output. --link-all tells Renku to automatically infer dependencies between steps for you. The newly created my-workflow can also be executed with renku workflow execute.

For more information about working with workflows using the Renku CLI, see renku workflow.