The Renku Workflow File¶
If your data processing workflow has many steps, you might find it convenient to create a workflow definition file. A workflow file help organize your data processing flow, making it easier to execute portions of the pipeline at a time. A workflow file also makes your code easier for your collaborators to understand!
Adding a Workflow File to your Project¶
To create a workflow definition file in your project, create a file called
You may have more than one workflow file in your project! Workflow files may be named however you like, as long as they end with the .yml or yaml file extension.
Do you already have a workflow you created on the Renku CLI that you’d like to convert to a Workflow File?
You can export it!
renku workflow export <workflow-name> --format renku --output workflow.yml
(Tip: Find your workflow’s name using
renku workflow ls)
Defining a Basic Workflow File¶
There are a few options for how you may define your workflow.
In the simplest version of the Renku workflow file, the command, inputs and outputs are simply listed, as in the example below:
name: data-pipeline steps: filter: command: python src/filter.py data/input/flights.csv data/output/filtered.csv inputs: - src/filter.py - data/input/flights.csv outputs: - data/output/filtered.csv
Note that the script being run is always listed in the
inputs section (here,
This workflow file defines the workflow’s name and a sequence of steps. This
file only includes one step, which is named
filter. Within the
step, list the
command to run, and then we tell Renku which parts of this
outputs by copying those paths into the relevant
To run this workflow file, run it with
$ renku run workflow.yml
Using Templating in a Workflow File¶
Renku provides a templating feature so that you never have to type the same path twice. There are a few different ways to use templating in your workflow file, and they can be mixed and matched depending on what works best for your command.
Templating by Group¶
In the example above, all of the inputs go together into the same place in the
command, followed by the output. So, rather than listing each input, output, and
parameter in the command individually, you can use the
$parameters templates to tell Renku “put all the inputs
name: data-pipeline steps: filter: command: python $parameters $inputs $outputs inputs: - src/filter.py - data/input/flights.csv outputs: - data/output/filtered.csv parameters: - -n - 10
The inputs are filled in to the command at the
$inputs template in the order
in which they are specified in the
inputs section. The same goes for
Templating by Argument Name¶
If the ordering of arguments in your command is more complex, you can reference
each argument individually by name. To do so, assign each input and output a
name (such as
raw) and a
path. Then, we reference those names in
name: data-pipeline steps: filter: command: python $n $filter-py $raw $filtered inputs: - filter-py: path: src/filter.py - raw: path: data/input/flights.csv outputs: - filtered: path: data/output/filtered.csv parameters: - n: prefix: -n value: 10
Renku uses basic YAML syntax for workflow definition files. Users should not use advanced YAML syntax like anchors, aliases, schema, etc. since the behavior is undefined. Moreover, in future we will implement a customized YAML parser that won’t allow these features.
If your command uses the
$ character, you can escape it by doing
A Multi-Step Workflow File¶
Below, you can see what the a workflow file looks like for a two-step workflow.
name: data-pipeline steps: filter: command: python $filter-py $raw $filtered inputs: - filter-py: path: src/filter.py - raw: path: data/input/flights.csv outputs: - filtered: path: data/output/filtered.csv count: command: python $count-py $filtered $counts inputs: - count-py: path: src/count.py - filtered: path: data/output/filtered.csv outputs: - counts: path: data/output/counts.csv
Executing a Workflow File¶
renku run workflow.yml will execute all steps
in the workflow file. Executing the workflow will commit all workflow inputs and
outputs, too, including the workflow file itself.
$ renku run workflow.yml Executing step 'data-pipeline.filter': 'python src/filter.py data/input/flights.csv data/output/filtered.csv' ... Executing step 'data-pipeline.count': 'python src/count.py data/output/filtered.csv data/output/counts.csv' ...
Do you have output files you don’t want to be committed, such as log files?
You have 2 options: (1) Do not list these outputs in the workflow definition
file, and Renku will ignore them. Or, (2) include the file in the workflow
file, but use the
persist: false flag to tell Renku not to commit the
Executing a Portion of a Workflow¶
Renku also helps you run only portions of your workflow at a time. For example, you can execute just one step of the workflow by referencing that step’s name:
$ renku run workflow.yml filter
You may specify more than one step to run:
$ renku run workflow.yml filter count
Workflow Step Execution Order¶
When you execute a workflow file, Renku builds an execution graph to determine how the steps in the workflow are related. Renku then executes the steps in that order. This means that only the data dependencies between steps determine the execution order, not the order of steps in the workflow file.
By passing the
--dry-run flag to the
renku run command, you can instruct
Renku to only print the order of execution of the steps without actually running
any of them.
--no-commit flags causes Renku to run the workflow file but it won’t
create a commit after the execution. Renku also won’t create any metadata in
this case. This is a great option to use when developing or verifying a workflow!
Adding more Information to a Workflow File¶
Implicit Input and Output Files¶
If your script consumes or generates an input or output that is not explicitly
passed in the command, you may still list the file in the workflow file so that
it is tracked by Renku. When doing so, also add the
implicit: true key;
otherwise, Renku will warn that the file is not used in the command string.
name: script-with-implicit-input steps: filter: command: python $my-script inputs: - my-script: path: my-script.py - hidden-input: path: data/an-input.txt implicit: true
Descriptions and Keywords¶
You may provide further details in your workflow definition, such as a description of each parameter, and keywords that describe your workflow.
name: data-pipeline description: The workflow in the Renku Tutorial keywords: - tutorial steps: filter: command: python $filter-py $raw $filtered description: Filter the raw flights data to only flights to the destination of interest inputs: - filter-py: path: src/filter.py - raw: description: The raw flights data path: data/input/flights.csv outputs: - filtered: description: Flights to the destination of interest path: data/output/filtered.csv count: command: python $count-py $filtered $counts description: Count the number of flights inputs: - count-py: path: src/count.py - filtered: description: Flights to the destination of interest path: data/output/filtered.csv outputs: - counts: description: Number of flights to the destination of interest path: data/output/counts.csv
Alternative Success Codes¶
By default, Renku considers a workflow step to have successfully executed if it returns a success code of 0. If the command is expected to return a success code other an 0, specify the acceptable codes in a success_codes key:
name: command-with-alternative-success-codes steps: head: command: head -n 10 data/collection/models.csv data/collection/colors.csv > intermediate success_codes: [0, 127] ...
Viewing a Workflow Visually¶
After executing a workflow, you can view a visual diagram of how any file created by that workflow was created.
To view this diagram, run
renku workflow visualize
and pass the path to the file you would like to inspect:
$ renku workflow visualize data/output/counts.csv ┌─────────────────────────────────────────┐ ┌─────────────┐ ┌──────────────────────┐ │workflows/workflow-flights-tutorial-3.yml│ │src/filter.py│ │data/input/flights.csv│ └─────────────────────────────────────────┘ └─────────────┘ └──────────────────────┘ * ******* *** *** * ************ **** ***** * ************** **** **** * ************* ╔═══════════════════════╗ * **║python src/filter.py...║ * ╚═══════════════════════╝ * * * * * * ┌────────────┐ * ┌────────────────────────┐ │src/count.py│ * │data/output/filtered.csv│ └────────────┘ *** └────────────────────────┘ ********* ***** ***** ************ ***** ******** ************* **** ********* ************* ╔══════════════════════╗ ***** **║python src/count.py...║ ╚══════════════════════╝ * * * ┌──────────────────────┐ │data/output/counts.csv│ └──────────────────────┘