Define your Workflow in a Workflow File
The renku run
command is great for tracking the use of a small number of
scripts or commands. However, if you are building a processing pipeline that
involves many steps, we recommend to encode your workflow in a workflow file.
Introducing the Renku Workflow File
In this example, we will use the same filter_flights
and count_flights
scripts as in the prior parts of the tutorial, but this time we will encode our
workflow in a workflow definition file, rather than using the command line.
To create a workflow file in your Renku project, create a file called
workflow.yml
.
We’ll start by creating the simplest version of the Renku workflow file:
name: flights-processing-pipeline
steps:
filter:
command: python src/filter_flights.py data/flight-data/2019-01-flights.csv.zip data/output/flights-filtered.csv
inputs:
- src/filter_flights.py
- data/flight-data/2019-01-flights.csv.zip
outputs:
- data/output/flights-filtered.csv
This workflow file defines the workflow’s name and a sequence of steps. For now,
we’ve only included the first step of our workflow, which we’ve named
filter
. Within the filter
step, we define the command to run, and then
we tell Renku which parts of this command are inputs and outputs by copying
those paths into the relevant sections.
To run this workflow file, run:
$ renku run workflow.yml
Using Templating in a Workflow File
Renku provides a templating feature so that you never have to type the same path
twice. In the command
field, we can replace the paths to the inputs with the
$inputs
template, and likewise for the $outputs
.
name: flights-processing-pipeline
steps:
filter:
command: python $inputs $outputs
inputs:
- src/filter_flights.py
- data/flight-data/2019-01-flights.csv.zip
outputs:
- data/output/flights-filtered.csv
A Multi-Step Workflow File
Below, you can see what the full workflow file looks like for the two-step workflow.
name: flights-processing-pipeline
steps:
filter:
command: python $inputs $outputs
inputs:
- src/filter_flights.py
- data/flight-data/2019-01-flights.csv.zip
outputs:
- data/output/flights-filtered.csv
count:
command: python $inputs $outputs
inputs:
- src/count_flights.py
- data/output/flights-filtered.csv
outputs:
- data/output/flights-count.csv
Executing a workflow file
Running renku run workflow.yml
will execute all steps in the workflow file.
Renku also helps you run only portions of your workflow at a time. For example, you can execute just one step of the workflow by referencing that step’s name:
$ renku run workflow.yml filter
You may specify more than one step to run:
$ renku run workflow.yml filter count
Want to learn more?
For much more information about writing and executing workflow files, see The Renku Workflow File.