Renku Command Line

The base command for interacting with the Renku platform.

renku (base command)

To list the available commands, either run renku with no parameters or execute renku help:

$ renku help
Usage: renku [OPTIONS] COMMAND [ARGS]...

Check common Renku commands used in various situations.


Options:
  --version                       Print version number.
  --global-config-path            Print global application's config path.
  --path <path>                   Location of a Renku repository.
                                  [default: (dynamic)]
  --external-storage / -S, --no-external-storage
                                  Use an external file storage service.
  -h, --help                      Show this message and exit.

Commands:
  # [...]

Configuration files

Depending on your system, you may find the configuration files used by Renku command line in a different folder. By default, the following rules are used:

MacOS:

~/Library/Application Support/Renku

Unix:

~/.config/renku

Windows:

C:\Users\<user>\AppData\Roaming\Renku

If in doubt where to look for the configuration file, you can display its path by running renku --global-config-path.

renku init

Create an empty Renku project or reinitialize an existing one.

Start a Renku project

If you have an existing directory which you want to turn into a Renku project, you can type:

$ cd ~/my_project
$ renku init

or:

$ renku init ~/my_project

This creates a new subdirectory named .renku that contains all the necessary files for managing the project configuration.

Every project requires a name that can either be provided using --name or automatically taken from the target folder.

You can also provide a description for a project using --description.

If provided directory does not exist, it will be created.

Use a different template

Renku is installed together with a specific set of templates you can select when you initialize a project. You can check them by typing:

$ renku init --list-templates

INDEX ID     DESCRIPTION                     PARAMETERS
----- ------ ------------------------------- -----------------------------
1     python The simplest Python-based [...] description: project des[...]
2     R      R-based renku project with[...] description: project des[...]

If you know which template you are going to use, you can provide either the id --template-id or the template index number --template-index.

You can use a newer version of the templates or even create your own one and provide it to the init command by specifying the target template repository source --template-source (both local path and remote url are supported) and the reference --template-ref (branch, tag or commit).

You can take inspiration from the official Renku template repository

$ renku init --template-ref master --template-source \
https://github.com/SwissDataScienceCenter/renku-project-template

Fetching template from
https://github.com/SwissDataScienceCenter/renku-project-template@master
... OK

INDEX ID             DESCRIPTION                PARAMETERS
----- -------------- -------------------------- ----------------------
1     python-minimal Basic Python Project:[...] description: proj[...]
2     R-minimal      Basic R Project: The [...] description: proj[...]

Please choose a template by typing the index:

Provide parameters ~~~~~~~~~~~~~~~~~-

Some templates require parameters to properly initialize a new project. You can check them by listing the templates --list-templates.

To provide parameters, use the --parameter option and provide each parameter using --parameter "param1"="value1".

$ renku init --template-id python-minimal --parameter \
"description"="my new shiny project"

Initializing new Renku repository... OK

If you don’t provide the required parameters through the option -parameter, you will be asked to provide them. Empty values are allowed and passed to the template initialization function.

Note

Project’s name is considered as a special parameter and it’s automatically added to the list of parameters forwarded to the init command.

Provide custom metadata

Custom metadata can be added to the projects knowledge graph by writing it to a json file and passing that via the –metadata option.

$ echo '{"@id": "https://example.com/id1", \
    "@type": "https://schema.org/Organization", \
    "https://schema.org/legalName": "ETHZ"}' > metadata.json

$ renku init --template-id python-minimal --parameter \
"description"="my new shiny project" --metadata metadata.json

Initializing new Renku repository... OK

Update an existing project

There are situations when the required structure of a Renku project needs to be recreated or you have an existing Git repository for folder that you wish to turn into a Renku project. In these cases, Renku will warn you if there are any files that need to be overwritten. README.md and README.rst will never be overwritten. .gitignore will be appended to to prevent files accidentally getting committed. Files that are not present in the template will be left untouched by the command.

$ echo "# Example\nThis is a README." > README.md
$ echo "FROM python:3.7-alpine" > Dockerfile
$ renku init

INDEX  ID              PARAMETERS
-------  --------------  ------------
    1  python-minimal  description
    2  R-minimal       description
    3  bioc-minimal    description
    4  julia-minimal   description
    5  minimal
Please choose a template by typing the index: 1
The template requires a value for "description": Test Project
Initializing Git repository...
Warning: The following files exist in the directory and will be overwritten:
        Dockerfile
Proceed? [y/N]: y
Initializing new Renku repository...
Initializing file .dockerignore ...
Initializing file .gitignore ...
Initializing file .gitlab-ci.yml ...
Initializing file .renku/renku.ini ...
Initializing file .renkulfsignore ...
Overwriting file Dockerfile ...
Initializing file data/.gitkeep ...
Initializing file environment.yml ...
Initializing file notebooks/.gitkeep ...
Initializing file requirements.txt ...
Project initialized.
OK

If you initialize in an existing git repository, Renku will create a backup branch before overwriting any files and will print commands to revert the changes done and to see what changes were made.

You can also enable the external storage system for output files, if it was not installed previously.

$ renku init --external-storage

renku clone

Clone a Renku project.

Cloning a Renku project

To clone a Renku project use renku clone command. This command is preferred over using git clone because it sets up required Git hooks and enables Git-LFS automatically.

$ renku clone <repository-url> <destination-directory>

It creates a new directory with the same name as the project. You can change the directory name by passing another name on the command line.

By default, renku clone pulls data from Git-LFS after cloning. If you don’t need the LFS data, pass --no-pull-data option to skip this step.

Note

To move a project to another Renku deployment you need to create a new empty project in the target deployment and push both the repository and Git-LFS objects to the new remote. Refer to Git documentation for more details.

$ git lfs fetch --all
$ git remote remove origin
$ git remote add origin <new-repository-url>
$ git push --mirror origin

To clone private repositories with an HTTPS address, you first need to log into a Renku deployment using the renku login command. renku clone will use the stored credentials when available.

renku config

Get and set Renku repository or global options.

Set values

You can set various Renku configuration options, for example the image registry URL, with a command like:

$ renku config set interactive.default_url "/tree"

By default, configuration is stored locally in the project’s directory. Use --global option to store configuration for all projects in your home directory.

Remove values

To remove a specific key from configuration use:

$ renku config remove interactive.default_url

By default, only local configuration is searched for removal. Use --global option to remove a global configuration value.

Query values

You can display all configuration values with:

$ renku config show
[renku "interactive"]
default_url = /lab

Both local and global configuration files are read. Values in local configuration take precedence over global values. Use --local or --global flag to read corresponding configuration only.

You can provide a KEY to display only its value:

$ renku config show interactive.default_url
default_url = /lab

Available configuration values

The following values are available for the renku config command:

Name

Description

Default

show_lfs_message

Whether to show messages about files being added to git LFS or not

True

lfs_threshold

Threshold file size below which files are not added to git LFS

100kb

zenodo.access_token

Access token for Zenodo API

None

dataverse.access_token

Access token for Dataverse API

None

dataverse.server_url

URL for the Dataverse API server to use

None

interactive.default_url

URL for interactive environments

None

interactive.cpu_request

CPU quota for environments

None

interactive.mem_request

Memory quota for environments

None

interactive.gpu_request

GPU quota for environments

None

interactive.lfs_auto_fetch

Whether to automatically fetch lfs files on environments startup

None

interactive.image

Pinned Docker image for environments

None

renku project

Renku CLI commands for handling of projects.

Showing project metadata

You can see the metadata of the current project by using renku project show:
$ renku project show
Id: /projects/john.doe/flights-tutorial
Name: flights-tutorial
Description: Flight tutorial project
Creator: John Doe <John Doe@datascience.ch>
Created: 2021-11-05T10:32:57+01:00
Keywords: keyword1, keyword2
Renku Version: 1.0.0
Project Template: python-minimal (1.0.0)

Editing projects

Users can edit some project’s metadata using by using renku project edit command.

The following options can be passed to this command to set various metadata for a project.

Option

Description

-d, –description

Project’s description.

-c, –creator

Creator’s name, email, and an optional affiliation. Accepted format is ‘Forename Surname <email> [affiliation]’.

-m, –metadata

Path to json file containing custom metadata to be added to the project knowledge graph.

renku dataset

Renku CLI commands for handling of datasets.

Manipulating datasets

Create a Dataset

Creating an empty dataset inside a Renku project:

$ renku dataset create my-dataset
Creating a dataset ... OK

You can pass the following options to this command to set various metadata for the dataset.

Option

Description

-t, –title

A human-readable title for the dataset.

-d, –description

Dataset’s description.

-c, –creator

Creator’s name, email, and an optional affiliation. Accepted format is ‘Forename Surname <email> [affiliation]’. Pass multiple times for a list of creators.

-k, –keyword

Dataset’s keywords. Pass multiple times for a list of keywords.

-m, –metadata

Path to file containing custom JSON-LD metadata to be added to the dataset.

Editing a dataset’s metadata:

Editing a Dataset

Use the edit sub-command to change metadata of a dataset. You can edit the same set of metadata as the create command by passing the options described in the table above.

$ renku dataset edit my-dataset --title 'New title'
Successfully updated: title.

Listing all datasets:

$ renku dataset ls
ID        NAME           TITLE          VERSION
--------  -------------  -------------  ---------
0ad1cb9a  some-dataset   Some Dataset
9436e36c  my-dataset     My Dataset

You can select which columns to display by using --columns to pass a comma-separated list of column names:

$ renku dataset ls --columns id,name,date_created,creators
ID        NAME           CREATED              CREATORS
--------  -------------  -------------------  ---------
0ad1cb9a  some-dataset   2020-03-19 16:39:46  sam
9436e36c  my-dataset     2020-02-28 16:48:09  sam

Displayed results are sorted based on the value of the first column.

You can specify output formats by passing --format with a value of tabular, json-ld or json.

To inspect the state of the dataset on a given commit we can use --revision flag for it:

$ renku dataset ls --revision=1103a42bd3006c94ef2af5d6a5e03a335f071215
ID        NAME                 TITLE               VERSION
a1fd8ce2  201901_us_flights_1  2019-01 US Flights  1
c2d80abe  ds1                  ds1

Showing dataset details:

$ renku dataset show some-dataset
Name: some-dataset
Created: 2020-12-09 13:52:06.640778+00:00
Creator(s): John Doe<john.doe@example.com> [SDSC]
Keywords: Dataset, Data
Annotations:
[
  {...}
]
Title: Some Dataset
Description:
Just some dataset

Deleting a dataset:

$ renku dataset rm some-dataset
OK

Working with data

Add data to a Dataset

Adding data to the dataset:

$ renku dataset add my-dataset http://data-url

This will copy the contents of data-url to the dataset and add it to the dataset metadata.

You can create a dataset when you add data to it for the first time by passing --create flag to add command:

$ renku dataset add --create new-dataset http://data-url

To add data from a git repository, you can specify it via https or git+ssh URL schemes. For example,

$ renku dataset add my-dataset git+ssh://host.io/namespace/project.git

Sometimes you want to add just specific paths within the parent project. In this case, use the --source or -s flag:

$ renku dataset add my-dataset --source path/within/repo/to/datafile \
    git+ssh://host.io/namespace/project.git

The command above will result in a structure like

data/
  my-dataset/
    datafile

You can use shell-like wildcards (e.g. , *, ?) when specifying paths to be added. Put wildcard patterns in quotes to prevent your shell from expanding them.

$ renku dataset add my-dataset --source 'path/**/datafile' \
    git+ssh://host.io/namespace/project.git

You can use --destination or -d flag to set the location where the new data is copied to. This location be will under the dataset’s data directory and will be created if does not exists. You will get an error message if the destination exists and is a file.

$ renku dataset add my-dataset \
    --source path/within/repo/to/datafile \
    --destination new-dir/new-subdir \
    git+ssh://host.io/namespace/project.git

will yield:

data/
  my-dataset/
    new-dir/
      new-subdir/
        datafile

To add a specific version of files, use --ref option for selecting a branch, commit, or tag. The value passed to this option must be a valid reference in the remote Git repository.

Adding external data to the dataset:

Sometimes you might want to add data to your dataset without copying the actual files to your repository. This is useful for example when external data is too large to store locally. The external data must exist (i.e. be mounted) on your filesystem. Renku creates a symbolic to your data and you can use this symbolic link in renku commands as a normal file. To add an external file pass --external or -e when adding local data to a dataset:

$ renku dataset add my-dataset -e /path/to/external/file

Updating a dataset:

After adding files from a remote Git repository or importing a dataset from a provider like Dataverse or Zenodo, you can check for updates in those files by using renku dataset update command. For Git repositories, this command checks all remote files and copies over new content if there is any. It does not delete files from the local dataset if they are deleted from the remote Git repository; to force the delete use --delete argument. You can update to a specific branch, commit, or tag by passing --ref option. For datasets from providers like Dataverse or Zenodo, the whole dataset is updated to ensure consistency between the remote and local versions. Due to this limitation, the --include and --exclude flags are not compatible with those datasets. Modifying those datasets locally will prevent them from being updated.

The update command also checks for file changes in the project and updates datasets’ metadata accordingly.

You can limit the scope of updated files by specifying dataset names, using --include and --exclude to filter based on file names, or using --creators to filter based on creators. For example, the following command updates only CSV files from my-dataset:

$ renku dataset update -I '*.csv' my-dataset

Note that putting glob patterns in quotes is needed to tell Unix shell not to expand them.

External data are not updated automatically because they require a checksum calculation which can take a long time when data is large. To update external files pass --external or -e to the update command:

$ renku dataset update -e

Tagging a dataset:

A dataset can be tagged with an arbitrary tag to refer to the dataset at that point in time. A tag can be added like this:

$ renku dataset tag my-dataset 1.0 -d "Version 1.0 tag"

A list of all tags can be seen by running:

$ renku dataset ls-tags my-dataset
CREATED              NAME    DESCRIPTION      DATASET     COMMIT
-------------------  ------  ---------------  ----------  ----------------
2020-09-19 17:29:13  1.0     Version 1.0 tag  my-dataset  6c19a8d31545b...

A tag can be removed with:

$ renku dataset rm-tags my-dataset 1.0

Importing data from other Renku projects:

To import all data files and their metadata from another Renku dataset use:

$ renku dataset import \
    https://renkulab.io/projects/<username>/<project>/datasets/<dataset-id>

or

$ renku dataset import \
    https://renkulab.io/datasets/<dataset-id>

You can get the link to a dataset form the UI or you can construct it by knowing the dataset’s ID.

Importing data from an external provider:

Import a Dataset
$ renku dataset import 10.5281/zenodo.3352150

This will import the dataset with the DOI (Digital Object Identifier) 10.5281/zenodo.3352150 and make it locally available. Dataverse and Zenodo are supported, with DOIs (e.g. 10.5281/zenodo.3352150 or doi:10.5281/zenodo.3352150) and full URLs (e.g. http://zenodo.org/record/3352150). A tag with the remote version of the dataset is automatically created.

Exporting data to an external provider:

$ renku dataset export my-dataset zenodo

This will export the dataset my-dataset to zenodo.org as a draft, allowing for publication later on. If the dataset has any tags set, you can chose if the repository HEAD version or one of the tags should be exported. The remote version will be set to the local tag that is being exported.

To export to a Dataverse provider you must pass Dataverse server’s URL and the name of the parent dataverse where the dataset will be exported to. Server’s URL is stored in your Renku setting and you don’t need to pass it every time.

To export a dataset to OLOS you must pass the OLOS server’s base URL and supply your access token when prompted for it. You must also choose which organizational unit to export the dataset to from the list shown during the export. The export does not map contributors from Renku to OLOS and also doesn’t map License information. Additionally, all file categories default to Primary/Derived. This has to adjusted manually in the OLOS interface after the export is done.

Listing all files in the project associated with a dataset.

$ renku dataset ls-files
DATASET NAME         ADDED                PATH                           LFS
-------------------  -------------------  -----------------------------  ----
my-dataset           2020-02-28 16:48:09  data/my-dataset/add-me         *
my-dataset           2020-02-28 16:49:02  data/my-dataset/weather/file1  *
my-dataset           2020-02-28 16:49:02  data/my-dataset/weather/file2
my-dataset           2020-02-28 16:49:02  data/my-dataset/weather/file3  *

You can select which columns to display by using --columns to pass a comma-separated list of column names:

$ renku dataset ls-files --columns name,creators, path
DATASET NAME         CREATORS   PATH
-------------------  ---------  -----------------------------
my-dataset           sam        data/my-dataset/add-me
my-dataset           sam        data/my-dataset/weather/file1
my-dataset           sam        data/my-dataset/weather/file2
my-dataset           sam        data/my-dataset/weather/file3

Displayed results are sorted based on the value of the first column.

You can specify output formats by passing --format with a value of tabular, json-ld or json.

Sometimes you want to filter the files. For this we use --dataset, --include and --exclude flags:

$ renku dataset ls-files --include "file*" --exclude "file3"
DATASET NAME        ADDED                PATH                           LFS
------------------- -------------------  -----------------------------  ----
my-dataset          2020-02-28 16:49:02  data/my-dataset/weather/file1  *
my-dataset          2020-02-28 16:49:02  data/my-dataset/weather/file2  *

Unlink a file from a dataset:

$ renku dataset unlink my-dataset --include file1
OK

Unlink all files within a directory from a dataset:

$ renku dataset unlink my-dataset --include "weather/*"
OK

Unlink all files from a dataset:

$ renku dataset unlink my-dataset
Warning: You are about to remove following from "my-dataset" dataset.
.../my-dataset/weather/file1
.../my-dataset/weather/file2
.../my-dataset/weather/file3
Do you wish to continue? [y/N]:

Note

The unlink command does not delete files, only the dataset record.

renku graph

Renku CLI commands for handling of Knowledge Graph data.

Exporting Knowledge Graph data

You can export part or all of the Renku Knowledge Graph metadata for the current project using the renku graph export command.

By default, this will export the metadata created in the last commit in the project. If that commit was not a renku command that creates metadata, it will produce no output.

$ renku dataset create my-dataset
OK
$ renku graph export
 [
     {
         "@id": "https://localhost/datasets/850e74d6c0204e8c923457a1b9ce52d8",
         "@type": [
         "http://schema.org/Dataset",
         "http://www.w3.org/ns/prov#Entity"
         ],

         [... many more lines ...]

     }
 ]

Here we created a new dataset and then renku graph export exported the created metadata as JSON-LD, the default format.

If you want the Knowledge Graph data for the whole project, you can use renku graph export --full. Alternatively, you can get data for a single commit by using renku graph export --revision <git commit sha> or by specifying a range of commits like renku graph export --revision sha1..sha2.

renku graph export currently supports various formats for export, such as json-ld, rdf, nt (for triples) and dot (for GraphViz graphs), which can be specified using the --format option. For instance,

$ renku graph export --full --format dot | dot -Tpng -o my_graph.png

would produce a PNG image of the whole Knowledge Graph for the project.

To run validation on the generated output, you can pass the --strict option, which will check that all the nodes and properties in the graph are correct and that there isn’t anything missing.

renku run

Track provenance of data created by executing programs.

Capture command line execution

Tracking execution of your command line script is done by simply adding the renku run command before the actual command. This will enable detection of:

  • arguments (flags),

  • string and integer options,

  • input files or directories if linked to existing paths in the repository,

  • output files or directories if modified or created while running the command.

It will create a Plan (Workflow Template) that can be reused and a Run which is a record of a past workflow execution for provenance purposes. Refer to the renku workflow documentation for more details on this distinction.

Basic usage is:

$ renku run --name <plan name> -- <console command>

Note

If there were uncommitted changes in the repository, then the renku run command fails. See git status for details.

Warning

If executed command/script has similar arguments to renku run (e.g. --input) they will be treated as renku run arguments. To avoid this, put a -- separator between renku run and the command/script.

Warning

Input and output paths can only be detected if they are passed as arguments to renku run.

Warning

Circular dependencies are not supported for renku run. See Circular Dependencies for more details.

Warning

When using output redirection in renku run on Windows (with `` > file`` or `` 2> file``), all Renku errors and messages are redirected as well and renku run produces no output on the terminal. On Linux, this is detected by renku and only the output of the command to be run is actually redirected. Renku specific messages such as errors get printed to the terminal as usual and don’t get redirected.

Detecting input paths

Any path passed as an argument to renku run, which was not changed during the execution, is identified as an input path. The identification only works if the path associated with the argument matches an existing file or directory in the repository.

The detection might not work as expected if:

  • a file is modified during the execution. In this case it will be stored as an output;

  • a path is not passed as an argument to renku run.

Specifying auxiliary inputs (--input)

You can specify extra inputs to your program explicitly by using the --input option. This is useful for specifying hidden dependencies that don’t appear on the command line. Explicit inputs must exist before execution of renku run command. This option is not a replacement for the arguments that are passed on the command line. Files or directories specified with this option will not be passed as input arguments to the script. You can specify --input name=path or just –input path`, the former of which would also set the name of the input on the resulting Plan.

Specifying auxiliary parameters (--param)

You can specify extra parameters to your program explicitly by using the --param option. This is useful for getting Renku to consider a parameter as just a string even if it matches a file name in the project. This option is not a replacement for the arguments that are passed on the command line. You can specify --param name=value or just –param value`, the former of which would also set the name of the parameter on the resulting Plan.

Disabling input detection (--no-input-detection)

Input paths detection can be disabled by passing --no-input-detection flag to renku run. In this case, only the directories/files that are passed as explicit input are considered to be file inputs. Those passed via command arguments are ignored unless they are in the explicit inputs list. This only affects files and directories; command options and flags are still treated as inputs.

Detecting output paths

Any path modified or created during the execution will be added as an output.

Because the output path detection is based on the Git repository state after the execution of renku run command, it is good to have a basic understanding of the underlying principles and limitations of tracking files in Git.

Git tracks not only the paths in a repository, but also the content stored in those paths. Therefore:

  • a recreated file with the same content is not considered an output file, but instead is kept as an input;

  • file moves are detected based on their content and can cause problems;

  • directories cannot be empty.

Note

When in doubt whether the outputs will be detected, remove all outputs using git rm <path> followed by git commit before running the renku run command.

Command does not produce any files (--no-output)

If the program does not produce any outputs, the execution ends with an error:

Error: There are not any detected outputs in the repository.

You can specify the --no-output option to force tracking of such an execution.

Specifying outputs explicitly (--output)

You can specify expected outputs of your program explicitly by using the --output option. These output must exist after the execution of the renku run command. However, they do not need to be modified by the command. You can specify --output name=path or just –output path`, the former of which would also set the name of the output on the resulting Plan.

Disabling output detection (--no-output-detection)

Output paths detection can be disabled by passing --no-output-detection flag to renku run. When disabled, only the directories/files that are passed as explicit output are considered to be outputs and those passed via command arguments are ignored.

Detecting standard streams

Often the program expect inputs as a standard input stream. This is detected and recorded in the tool specification when invoked by renku run cat < A.

Similarly, both redirects to standard output and standard error output can be done when invoking a command:

$ renku run grep "test" B > C 2> D

Warning

Detecting inputs and outputs from pipes | is not supported.

Specifying inputs and outputs programmatically

Sometimes the list of inputs and outputs are not known before execution of the program. For example, a program might accept a date range as input and access all files within that range during its execution.

To address this issue, the program can dump a list of input and output files that it is accessing in inputs.txt and outputs.txt. Each line in these files is expected to be the path to an input or output file within the project’s directory. When the program is finished, Renku will look for existence of these two files and adds their content to the list of explicit inputs and outputs. Renku will then delete these two files.

By default, Renku looks for these two files in .renku/tmp directory. One can change this default location by setting RENKU_INDIRECT_PATH environment variable. When set, it points to a sub-directory within the .renku/tmp directory where inputs.txt and outputs.txt reside.

Exit codes

All Unix commands return a number between 0 and 255 which is called an “exit code”. In case other numbers are returned, they are treated modulo 256 (-10 is equivalent to 246, 257 is equivalent to 1). The exit-code 0 represents a success and non-zero exit-code indicates a failure.

Therefore the command specified after renku run is expected to return exit-code 0. If the command returns different exit code, you can specify them with --success-code=<INT> parameter.

$ renku run --success-code=1 --no-output fail

Circular Dependencies

Circular dependencies are not supported in renku run. This means you cannot use the same file or directory as both an input and an output in the same step, for instance reading from a file as input and then appending to it is not allowed. Since renku records all steps of an analysis workflow in a dependency graph and it allows you to update outputs when an input changes, this would lead to problems with circular dependencies. An update command would change the input again, leading to renku seeing it as a changed input, which would run update again, and so on, without ever stopping.

Due to this, the renku dependency graph has to be acyclic. So instead of appending to an input file or writing an output file to the same directory that was used as an input directory, create new files or write to other directories, respectively.

renku log

Renku cli for history of renku commands.

You can use renku log to get a history of renku commands. At the moment, it only shows workflow executions.

$ renku log
DATE                 TYPE  DESCRIPTION
-------------------  ----  -------------
2021-09-21 15:46:02  Run   cp A C
2021-09-21 10:52:51  Run   cp A B

renku login

Logging in to a Renku deployment.

You can use renku login command to authenticate with a remote Renku deployment. This command will bring up a browser window where you can log in using your credentials. Renku CLI receives and stores a secure token that will be used for future authentications.

$ renku login <endpoint>

Parameter endpoint is the URL of the Renku deployment that you want to authenticate with (e.g. renkulab.io). You can either pass this parameter on the command-line or set it once in project’s configuration:

$ renku config set endpoint <endpoint>

Note

The secure token is stored in plain-text in Renku’s global configuration file on your home directory (~/.renku/renku.ini). Renku changes access rights of this file to be readable only by you. This token exists only on your system and won’t be pushed to a remote server.

This command also allows you to log into gitlab server for private repositories. You can use this method instead of creating an SSH key. Passing --git will change the repository’s remote URL to an endpoint in the deployment that adds authentication to gitlab requests.

Note

Project’s remote URL will be changed when using --git option. Changes are undone when logging out from renku in the CLI. Original remote URL will be stored in a remote with name renku-backup-<remote-name>.

Logging out from Renku removes the secure token from your system:

$ renku logout <endpoint>

If you don’t specify an endpoint when logging out, credentials for all endpoints are removed.

renku status

Show status of data files created in the repository.

Inspecting a repository

renku status command can be used to check if there are output files in a repository that are outdated and need to be re-generated. Output files get outdated due to changes in input data or source code (i.e. dependencies).

This command shows a list of output files that need to be updated along with a list of modified inputs for each file. It also display deleted inputs files if any.

To check for a specific input or output files, you can pass them to this command:

$ renku status path/to/file1 path/to/file2

In this case, renku only checks if the specified path or paths are modified or outdated and need an update, instead of checking all inputs and outputs.

The paths mentioned in the output are made relative to the current directory if you are working in a subdirectory (this is on purpose, to help cutting and pasting to other commands).

renku update

Update outdated files created by the “run” command.

Update outdate files

Recreating outdated files

The information about dependencies for each file in a Renku project is stored in various metadata.

When an update command is executed, Renku looks into the most recent execution of each workflow (Run and Plan combination) and checks which one is outdated (i.e. at least one of its inputs is modified). It generates a minimal dependency graph for each outdated file stored in the repository. It means that only the necessary steps will be executed.

Assume that the following history for the file H exists.

      C---D---E
     /         \
A---B---F---G---H

The first example shows situation when D is modified and files E and H become outdated.

      C--*D*--(E)
     /          \
A---B---F---G---(H)

** - modified
() - needs update

In this situation, you can do effectively three things:

  • Update all files

    $ renku update --all
    
  • Update only E

    $ renku update E
    
  • Update E and H

    $ renku update H
    

Note

If there were uncommitted changes then the command fails. Check git status to see details.

Pre-update checks

In the next example, files A or B are modified, hence the majority of dependent files must be recreated.

        (C)--(D)--(E)
       /            \
*A*--*B*--(F)--(G)--(H)

To avoid excessive recreation of the large portion of files which could have been affected by a simple change of an input file, consider specifying a single file (e.g. renku update G). See also renku status.

Update siblings

If a workflow step produces multiple output files, these outputs will be always updated together.

               (B)
              /
*A*--[step 1]--(C)
              \
               (D)

An attempt to update a single file would update its siblings as well.

The following commands will produce the same result.

$ renku update C
$ renku update B C D

renku rerun

Recreate files created by the “run” command.

Recreating files

Rerun workflow

Assume you have run a step 2 that uses a stochastic algorithm, so each run will be slightly different. The goal is to regenerate output C several times to compare the output. In this situation it is not possible to simply call renku update since the input file A has not been modified after the execution of step 2.

A-[step 1]-B-[step 2*]-C

Recreate a specific output file by running:

$ renku rerun C

If you do not want step 1 to also be rerun, you can specify a starting point using the --from parameter:

$ renku rerun --from B C

Note that all other outputs of the executed workflow will be recreated as well. If the output didn’t change, it will be removed from git and re-added to ensure that the re-execution is properly tracked.

renku rm

Remove a file, a directory, or a symlink.

Removing a file that belongs to a dataset will update its metadata. It also will attempt to update tracking information for files stored in an external storage (using Git LFS).

renku mv

Move or rename a file, a directory, or a symlink.

Moving a file that belongs to a dataset will update its metadata to include its new path and commit. Moreover, tracking information in an external storage (e.g. Git LFS) will be updated. Move operation fails if a destination already exists in the repo; use --force flag to overwrite them.

If you want to move files to another dataset use --to-dataset along with destination’s dataset name. This removes source paths from all datasets’ metadata that include them (if any) and adds them to the destination’s dataset metadata.

The following command moves data/src and README to data/dst directory and adds them to target-dataset’s metadata. If the source files belong to one or more datasets then they will be removed from their metadata.

$ renku mv data/src README data/dst --to-dataset target-dataset

renku workflow

Manage the set of CWL files created by renku commands.

Runs and Plans

Renku records two different kinds of metadata when a workflow is executed, Run and Plan. Plans describe a recipe for a command, they function as a template that can be used directly or combined with other workflow templates to create more complex recipes. These Plans can be run in various ways, on creation with renku run`,` doing a ``renku rerun or renku update or manually using renku workflow execute.

Each time a Plan is run, we track that instance of it as a Run. Runs track workflow execution through time. They track which Plan was run, at what time, with which specific values. This gives an insight into what were the steps taken in a repository, how they were taken and what results they produced.

The renku workflow group of commands contains most of the commands used to interact with Plans and Runs

Working with Plans

Listing Plans

List Plans
$ renku workflow ls
ID                                       NAME
---------------------------------------  ---------------
/plans/11a3702184394b93ac422df760e40999  cp-B-C-ca4da
/plans/96642cac86d9435e8abce2384f8618b9  cat-A-C-fa017
/plans/96c70626575c41c5a13853b070eaaaf5  my-other-run
/plans/9a0961844fcc46e1816fde00f57e24a8  my-run

Each entry corresponds to a recorded Plan/workflow template. You can also show additional columns using the --columns parameter, which takes any combination of values from id, name, keywords and description.

Showing Plan Details

Show Plan

You can see the details of a plan by using renku workflow show:

$ renku workflow show my-run
Id: /plans/9a0961844fcc46e1816fde00f57e24a8
Name: run1
Command: cp A B
Success Codes:
Inputs:
        - input-1:
                Default Value: A
                Position: 1
Outputs:
        - output-2:
                Default Value: B
                Position: 2

This shows the unique Id of the Plan, its name, the full command of the Plan if it was run without any modifications (more on that later), which exit codes should be considered successful executions (defaults to 0) as well as its inputs, outputs and parameters.

Executing Plans

Execute Plans

Plans can be executed using renku workflow execute. They can be run as-is or their parameters can be modified as needed. Renku has a plugin architecture to allow execution using various execution backends.

$ renku workflow execute --provider cwltool --set input-1=file.txt my-run

Parameters can be set using the --set keyword or by specifying them in a values YAML file and passing that using --values. In case of passing a file, the YAML should follow the this structure:

learning_rate: 0.9
dataset_input: dataset.csv
chart_output: mychart.png
myworkflow:
    lr: 0.8
    lookuptable: lookup.xml
    myotherworkflow:
        language: en

In addition to being passed on the command line and being available to renku.api.* classes in Python scripts, parameters are also set as environment variables when executing the command, in the form of RENKU_ENV_<parameter name>.

Provider specific settings can be passed as file using the --config parameter.

Iterate Plans

Iterate Plans

For executing a Plan with different parametrization renku workflow iterate could be used. This sub-command is basically conducting a ‘grid search’-like execution of a Plan, with parameter-sets provided by the user.

$ renku workflow iterate --map parameter-1=[1,2,3]             --map parameter-2=[10,20] my-run

The set of possible values for a parameter can be given by --map command line argument or by specifying them in a values YAML file and passing that using --mapping. Content of the mapping file for the above example should be:

parameter-1: [1,2,3]
parameter-2: [10,20]

By default renku workflow iterate will execute all the combination of the given parameters’ list of possible values. Sometimes it is desired that instead of all the combination of possible values, a specific tuple of values are executed. This could be done by marking the parameters that should be bound together with the @tag suffix in their names.

$ renku workflow iterate --map parameter-1@tag1=[1,2,3]             --map parameter-2@tag1=[10,5,30] my-run

This will result in only three distinct execution of the my-run Plan, with the following parameter combinations: [(1,10), (2,5), (3,30)]. It is important to note that parameters that have the same tag, should have the same number of possible values, i.e. the values list should have the same length.

There’s a special template variable for parameter values {iter_index}, which can be used to mark each iteration’s index in a value of a parameter. The template variable is going to be substituted with the iteration index (0, 1, 2, …).

$ renku workflow iterate --map parameter-1=[10,20,30]             --map output=output_{iter_index}.txt my-run

This would execute my-run three times, where parameter-1 values would be 10, 20` and 30 and the producing output files output_0.txt, output_1.txt and output_2.txt files in this order.

Exporting Plans

You can export a Plan to a number of different workflow languages, such as CWL (Common Workflow Language) by using renku workflow export:

$ renku workflow export --format cwl my-run
baseCommand:
- cp
class: CommandLineTool
cwlVersion: v1.0
id: 63e3a2a8-5b40-49b2-a2f4-eecc37bc76b0
inputs:
- default: B
id: _plans_9a0961844fcc46e1816fde00f57e24a8_outputs_2_arg
inputBinding:
    position: 2
type: string
- default:
    class: File
    location: file:///home/user/my-project/A
id: _plans_9a0961844fcc46e1816fde00f57e24a8_inputs_1
inputBinding:
    position: 1
type: File
- default:
    class: Directory
    location: file:///home/user/my-project/.renku
id: input_renku_metadata
type: Directory
- default:
    class: Directory
    location: file:///home/user/my-project/.git
id: input_git_directory
type: Directory
outputs:
- id: _plans_9a0961844fcc46e1816fde00f57e24a8_outputs_2
outputBinding:
    glob: $(inputs._plans_9a0961844fcc46e1816fde00f57e24a8_outputs_2_arg)
type: File
requirements:
InitialWorkDirRequirement:
    listing:
    - entry: $(inputs._plans_9a0961844fcc46e1816fde00f57e24a8_inputs_1)
    entryname: A
    writable: false
    - entry: $(inputs.input_renku_metadata)
    entryname: .renku
    writable: false
    - entry: $(inputs.input_git_directory)
    entryname: .git
    writable: false

You can export into a file directly with -o <path>.

Composing Plans into larger workflows

Composing Plans

For more complex workflows consisting of several steps, you can use the renku workflow compose command. This creates a new workflow that has substeps.

The basic usage is:

$ renku run --name step1-- cp input intermediate
$ renku run --name step2 -- cp intermediate output
$ renku workflow compose my-composed-workflow step1 step2

This would create a new workflow called my-composed-workflow that consists of step1 and step2 as steps. This new workflow is just like any other workflow in renku in that it can be executed, exported or composed with other workflows.

Workflows can also be composed based on past Runs and their inputs/outputs, using the --from and --to parameters. This finds chains of Runs from inputs to outputs and then adds them to the composed plan, applying mappings (see below) where appropriate to make sure the correct values for execution are used in the composite. This also means that all the parameters in the used plans are exposed on the composed plan directly. In the example above, this would be:

$ renku workflow compose --from input --to output my-composed-workflow

You can expose parameters of child steps on the parent workflow using --map/-m arguments followed by a mapping expression. Mapping expressions take the form of <name>=<expression> where name is the name of the property to be created on the parent workflow and expression points to one or more fields on the child steps that should be mapped to this property. The expressions come in two flavors, absolute references using the names of workflows and properties, and relative references specifying the position within a workflow.

An absolute expression in the example above could be step1.my_dataset to refer to the input, output or argument named my_dataset on the step step1. A relative expression could be @step2.@output1 to refer to the first output of the second step of the composed workflow.

Valid relative expressions are @input<n>, @output<n> and @param<n> for the nth input, output or argument of a step, respectively. For referring to steps inside a composed workflow, you can use @step<n>. For referencing a mapping on a composed workflow, you can use @mapping<n>. Of course, the names of the objects for all these cases also work.

The expressions can also be combined using , if a mapping should point to more than one parameter of a child step.

You can mix absolute and relative reference in the same expression, as you see fit.

A full example of this would be:

$ renku workflow compose --map input_file=step1.@input2        --map output_file=@step1.my-output,@step2.step2s_output        my-composed-workflow step1 step2

This would create a mapping called input_file on the parent workflow that points to the second input of step1 and a mapping called output_file that points to both the output my-output on step1 and step2s_output on step2.

You can also set default values for mappings, which override the default values of the parameters they’re pointing to by using the --set/-s parameter, for instance:

$ renku workflow compose --map input_file=step1.@input2        --set input_file=data.csv
    my-composed-workflow step1 step2

This would lead to data.csv being used for the second input of step1 when my-composed-workflow is executed (if it isn’t overridden at execution time).

You can add a description to the mappings to make them more human-readable by using the --describe-param/-p parameter, as shown here:

$ renku workflow compose --map input_file=step1.@input2        -p input_file="The dataset to process"
    my-composed-workflow step1 step2

You can also expose all inputs, outputs or parameters of child steps by using --map-inputs, --map-outputs or --map-params, respectively.

On execution, renku will automatically detect links between steps, if an input of one step uses the same path as an output of another step, and execute them in the correct order. Since this depends on what values are passed at runtime, you might want to enforce a certain order of steps by explicitly mapping outputs to inputs.

You can do that using the --link <source>=<sink> parameters, e.g. --link step1.@output1=step2.@input1. This gets recorded on the workflow template and forces step2.@input1 to always be set to the same path as step1.@output1, irrespective of which values are passed at execution time.

This way, you can ensure that the steps in your workflow are always executed in the correct order and that the dependencies between steps are modeled correctly.

Renku can also add links for you automatically based on the default values of inputs and outputs, where inputs/outputs that have the same path get linked in the composed run. To do this, pass the --link-all flag.

Warning

Due to workflows having to be directed acyclic graphs, cycles in the dependencies are not allowed. E.g. step1 depending on step2 depending on step1 is not allowed. Additionally, the flow of information has to be from outputs to inputs or parameters, so you cannot map an input to an output, only the other way around.

Values on inputs/outputs/parameters get set according to the following order of precedence (lower precedence first):

  • Default value on a input/output/parameter

  • Default value on a mapping to the input/output/parameter

  • Value passed to a mapping to the input/output/parameter

  • Value passed to the input/output/parameter

  • Value propagated to an input from the source of a workflow link

Editing Plans

Editing Plans

Plans can be edited in some limited fashion, but we do not allow structural changes, as that might cause issues with the reproducibility and provenance of the project. If you want to do structural changes (e.g. adding/removing parameters), we recommend you record a new plan instead.

You can change the name and description of Plans and of their parameters, as well as changing default values of the parameters using the renku workflow edit command:

$ renku workflow edit my-run --name new-run --description "my description"
  --rename-param input-1=my-input --set my-input=other-file.txt
  --describe-param my-input="My input parameter" my-run

This would rename the Plan my-run to new-run, change its description, rename its parameter input-1 to my-input and set the default of this parameter to other-file.txt and set its description.

Removing Plans

Sometimes you might want to discard a recorded Plan or reuse its name with a new Plan. In these cases, you can delete the old plan using renku workflow remove <plan name>. Once a Plan is removed, it doesn’t show up in most renku workflow commands. renku update ignores deleted Plans, but renku rerun will still rerun them if needed, to ensure reproducibility.

Working with Runs

Listing Runs

To get a view of what commands have been execute in the project, you can use the renku log --workflows command:

$ renku log --workflows
DATE                 TYPE  DESCRIPTION
-------------------  ----  -------------
2021-09-21 15:46:02  Run   cp A C
2021-09-21 10:52:51  Run   cp A B

Refer to the documentation of the renku log command for more details.

Visualizing Executions

Visualizing Runs

You can visualize past Runs made with renku using the renku workflow visualize command. This will show a directed graph of executions and how they are connected. This way you can see exactly how a file was generated and what steps it involved. It also supports an interactive mode that lets you explore the graph in a more detailed way.

$ renku run echo "input" > input
$ renku run cp input intermediate
$ renku run cp intermediate output
$ renku workflow visualize
     ╔════════════╗
     ║echo > input║
     ╚════════════╝
             *
             *
             *
         ┌─────┐
         │input│
         └─────┘
             *
             *
             *
 ╔═════════════════════╗
 ║cp input intermediate║
 ╚═════════════════════╝
             *
             *
             *
     ┌────────────┐
     │intermediate│
     └────────────┘
             *
             *
             *
 ╔══════════════════════╗
 ║cp intermediate output║
 ╚══════════════════════╝
             *
             *
             *
         ┌──────┐
         │output│
         └──────┘

 $ renku workflow visualize intermediate
     ╔════════════╗
     ║echo > input║
     ╚════════════╝
         *
         *
         *
         ┌─────┐
         │input│
         └─────┘
         *
         *
         *
 ╔═════════════════════╗
 ║cp input intermediate║
 ╚═════════════════════╝
         *
         *
         *
     ┌────────────┐
     │intermediate│
     └────────────┘
 $ renku workflow visualize --from intermediate
     ┌────────────┐
     │intermediate│
     └────────────┘
             *
             *
             *
 ╔══════════════════════╗
 ║cp intermediate output║
 ╚══════════════════════╝
             *
             *
             *
         ┌──────┐
         │output│
         └──────┘

You can also run in interactive mode using the --interactive flag.

$ renku workflow visualize --interactive

This will allow you to navigate between workflow execution and see details by pressing the <Enter> key.

Use renku workflow visualize -h to see all available options.

Input and output files

You can list input and output files generated in the repository by running renku workflow inputs and renku workflow outputs commands. Alternatively, you can check if all paths specified as arguments are input or output files respectively.

$ renku run wc < source.txt > result.wc
$ renku workflow inputs
source.txt
$ renku workflow outputs
result.wc
$ renku workflow outputs source.txt
$ echo $?  # last command finished with an error code
1

renku save

Convenience method to save local changes and push them to a remote server.

If you have local modification to files, you can save them using

$ renku save
Username for 'https://renkulab.io': my.user
Password for 'https://my.user@renkulab.io':
Successfully saved:
    file1
    file2
OK

Warning

The username and password for renku save are your gitlab user/password, not your renkulab login!

You can additionally supply a message that describes the changes that you made by using the -m or --message parameter followed by your message.

$ renku save -m "Updated file1 and 2."
Successfully saved:
    file1
    file2
OK

If no remote server has been configured, you can specify one by using the -d or --destination parameter. Otherwise you will get an error.

$ renku save
Error: No remote has been set up for the current branch

$ renku save -d https://renkulab.io/gitlab/my.user/my-project.git
Successfully saved:
    file1
    file2
OK

You can also specify which paths to save:

$ renku save file1
Successfully saved:
    file1
OK

renku storage

Manage an external storage.

Pulling files from git LFS

LFS works by checking small pointer files into git and saving the actual contents of a file in LFS. If instead of your file content, you see something like this, it means the file is stored in git LFS and its contents are not currently available locally (they are not pulled):

version https://git-lfs.github.com/spec/v1
oid sha256:42b5c7fb2acd54f6d3cd930f18fee3bdcb20598764ca93bdfb38d7989c054bcf
size 12

You can manually pull contents of file(s) you want with:

$ renku storage pull file1 file2

Removing local content of files stored in git LFS

If you want to restore a file back to its pointer file state, for instance to free up space locally, you can run:

$ renku storage clean file1 file2

This removes any data cached locally for files tracked in in git LFS.

Migrate large files to git LFS

If you accidentally checked a large file into git or are moving a non-LFS renku repo to git LFS, you can use the following command to migrate the files to LFS:

$ renku storage migrate --all

This will move all files that are bigger than the renku lfs_threshold config value and are not excluded by .renkulfsignore into git LFS.

To only migrate specific files, you can also pass their paths to the command like:

$ renku storage migrate big_file other_big_file

renku doctor

Check your system and repository for potential problems.

renku migrate

Migrate project to the latest Renku version.

When the way Renku stores metadata changes or there are other changes to the project structure or data that are needed for Renku to work, renku migrate can be used to bring the project up to date with the current version of Renku. This does not usually affect how you use Renku and no data is lost.

In addition, renku migrate will update your Dockerfile` to install the latest version of ``renku-python, if supported, making sure your renku version is up to date in interactive environments as well.

If you created your repository from a project template and the template has changed since you created the project, it will also update files with their newest version from the template, without overwriting local changes if there are any.

You can check if a migration is necessary and what migrations are available by running

$ renku migrate -c

renku rollback

Rollback project to a previous point in time.

If you want to undo actions taken using Renku in a project, you can use the renku rollback command to do so. This command shows a list of all actions done by renku and lets you pick one that you want to return to, discarding any changes done in the repo (by Renku or manually) after that point. Once you pick a checkpoint to return to, the commands shows all files and Renku objects that would be affected by the rollback and how they would be affected. If you confirm, the project will be reset to that point in time, with anything done after that point being deleted/lost.

$ renku rollback
Select a checkpoint to roll back to:

[0] 2021-10-20 09:50:04         renku workflow edit cp-blabla-asdasf-0b535 --name test
[1] 2021-10-20 09:49:19         renku rerun asdasf
[2] 2021-10-20 09:48:59         renku run cp blabla asdasf
[3] 2021-10-20 08:37:00         renku dataset add e blabla
[4] 2021-10-20 08:31:16         renku dataset create m
Checkpoint ([q] to quit) [q]: 4
The following changes would be done:

Metadata:

    Modified ♻️:
        Dataset: e

    Removed 🔥:
        Plan: cp-blabla-asdasf-0b535
        Plan: test
        Run: /activities/cc3ab70952fc499e93e7e4075a076bf5 (Plan name: cp-blabla-asdasf-0b535)
        Run: /activities/48b89b22567d4282abe8a016fa91878f (Plan name: cp-blabla-asdasf-0b535)

Files:

    Restored ↻:
        blabla

    Removed 🔥:
        asdasf

Proceed? [y/N]: y

Note

This command was introduced in renku-python version 1.0.0. Commands executed with previous versions of renku can’t be rolled back to.

renku service

Commands to launch service components.

renku githooks

Install and uninstall Git hooks.

Prevent modifications of output files

The commit hooks are enabled by default to prevent situation when some output file is manually modified.

$ renku init
$ renku run echo hello > greeting.txt
$ edit greeting.txt
$ git commit greeting.txt
You are trying to update some output files.

Modified outputs:
  greeting.txt

If you are sure, use "git commit --no-verify".

Error Tracking

Renku is not bug-free and you can help us to find them.

GitHub

You can quickly open an issue on GitHub with a traceback and minimal system information when you hit an unhandled exception in the CLI.

Ahhhhhhhh! You have found a bug. 🐞

1. Open an issue by typing "open";
2. Print human-readable information by typing "print";
3. See the full traceback without submitting details (default: "ignore").

Please select an action by typing its name (open, print, ignore) [ignore]:

Sentry

When using renku as a hosted service the Sentry integration can be enabled to help developers iterate faster by showing them where bugs happen, how often, and who is affected.

  1. Install Sentry-SDK with python -m pip install sentry-sdk;

  2. Set environment variable SENTRY_DSN=https://<key>@sentry.<domain>/<project>.

  3. Set the environment variable SENTRY_SAMPLE_RATE=0.2. This would track 20% of all requests in Sentry performance monitoring. Set to 0 to disable.

Warning

User information might be sent to help resolving the problem. If you are not using your own Sentry instance you should inform users that you are sending possibly sensitive information to a 3rd-party service.