Define your Workflow in a Workflow File

The renku run command is great for tracking the use of a small number of scripts or commands. However, if you are building a processing pipeline that involves many steps, we recommend to encode your workflow in a workflow file.

Introducing the Renku Workflow File

In this example, we will use the same filter_flights and count_flights scripts as in the prior parts of the tutorial, but this time we will encode our workflow in a workflow definition file, rather than using the command line.

To create a workflow file in your Renku project, create a file called workflow.yml.

We’ll start by creating the simplest version of the Renku workflow file:

name: flights-processing-pipeline
steps:
  filter:
    command: python src/filter_flights.py data/flight-data/2019-01-flights.csv.zip data/output/flights-filtered.csv
    inputs:
      - src/filter_flights.py
      - data/flight-data/2019-01-flights.csv.zip
    outputs:
      - data/output/flights-filtered.csv

This workflow file defines the workflow’s name and a sequence of steps. For now, we’ve only included the first step of our workflow, which we’ve named filter. Within the filter step, we define the command to run, and then we tell Renku which parts of this command are inputs and outputs by copying those paths into the relevant sections.

To run this workflow file, run:

$ renku run workflow.yml

Using Templating in a Workflow File

Renku provides a templating feature so that you never have to type the same path twice. In the command field, we can replace the paths to the inputs with the $inputs template, and likewise for the $outputs.

name: flights-processing-pipeline
steps:
  filter:
    command: python $inputs $outputs
    inputs:
      - src/filter_flights.py
      - data/flight-data/2019-01-flights.csv.zip
    outputs:
      - data/output/flights-filtered.csv

A Multi-Step Workflow File

Below, you can see what the full workflow file looks like for the two-step workflow.

name: flights-processing-pipeline
steps:
  filter:
    command: python $inputs $outputs
    inputs:
      - src/filter_flights.py
      - data/flight-data/2019-01-flights.csv.zip
    outputs:
      - data/output/flights-filtered.csv

  count:
    command: python $inputs $outputs
    inputs:
      - src/count_flights.py
      - data/output/flights-filtered.csv
    outputs:
      - data/output/flights-count.csv

Executing a workflow file

Running renku run workflow.yml will execute all steps in the workflow file.

Renku also helps you run only portions of your workflow at a time. For example, you can execute just one step of the workflow by referencing that step’s name:

$ renku run workflow.yml filter

You may specify more than one step to run:

$ renku run workflow.yml filter count

Want to learn more?

For much more information about writing and executing workflow files, see The Renku Workflow File.