Renku Command Line¶
The base command for interacting with the Renku platform.
renku
(base command)¶
To list the available commands, either run renku
with no parameters or
execute renku help
:
$ renku help
Usage: renku [OPTIONS] COMMAND [ARGS]...
Check common Renku commands used in various situations.
Options:
--version Print version number.
--global-config-path Print global application's config path.
--path <path> Location of a Renku repository.
[default: (dynamic)]
--external-storage / -S, --no-external-storage
Use an external file storage service.
-h, --help Show this message and exit.
Commands:
# [...]
Configuration files¶
Depending on your system, you may find the configuration files used by Renku command line in a different folder. By default, the following rules are used:
- MacOS:
~/Library/Application Support/Renku
- Unix:
~/.config/renku
- Windows:
C:\Users\<user>\AppData\Roaming\Renku
If in doubt where to look for the configuration file, you can display its path
by running renku --global-config-path
.
renku init
¶
Create an empty Renku project or reinitialize an existing one.
Start a Renku project¶
If you have an existing directory which you want to turn into a Renku project, you can type:
$ cd ~/my_project
$ renku init
or:
$ renku init ~/my_project
This creates a new subdirectory named .renku
that contains all the
necessary files for managing the project configuration.
Every project requires a name
that can either be provided using
--name
or automatically taken from the target folder.
You can also provide a description for a project using --description
.
If provided directory does not exist, it will be created.
Use a different template¶
Renku is installed together with a specific set of templates you can select when you initialize a project. You can check them by typing:
$ renku template ls
INDEX ID DESCRIPTION PARAMETERS
----- ------ ------------------------------- -----------------------------
1 python The simplest Python-based [...] description: project des[...]
2 R R-based renku project with[...] description: project des[...]
If you know which template you are going to use, you can provide its id using
--template-id
.
You can use a newer version of the templates or even create your own one and
provide it to the init
command by specifying the target template repository
source --template-source
(both local path and remote url are supported) and
the reference --template-ref
(branch, tag or commit).
You can take inspiration from the official Renku template repository
$ renku init --template-ref master --template-source \
https://github.com/SwissDataScienceCenter/renku-project-template
Fetching template from
https://github.com/SwissDataScienceCenter/renku-project-template@master
... OK
INDEX ID DESCRIPTION PARAMETERS
----- -------------- -------------------------- ----------------------
1 python-minimal Basic Python Project:[...] description: proj[...]
2 R-minimal Basic R Project: The [...] description: proj[...]
Please choose a template by typing the index:
Provide parameters¶
Some templates require parameters to properly initialize a new project. You
can check them by listing the templates renku template ls --verbose
.
To provide parameters, use the --parameter
option and provide each
parameter using --parameter "param1"="value1"
.
$ renku init --template-id python-minimal --parameter \
"description"="my new shiny project"
Initializing new Renku repository... OK
If you don’t provide the required parameters through the option
-parameter
, you will be asked to provide them. Empty values are allowed
and passed to the template initialization function.
Note
Project’s name
is considered as a special parameter and it’s
automatically added to the list of parameters forwarded to the init
command.
Provide custom metadata¶
Custom metadata can be added to the projects knowledge graph by writing it to a json file and passing that via the –metadata option.
$ echo '{"@id": "https://example.com/id1", \
"@type": "https://schema.org/Organization", \
"https://schema.org/legalName": "ETHZ"}' > metadata.json
$ renku init --template-id python-minimal --parameter \
"description"="my new shiny project" --metadata metadata.json
Initializing new Renku repository... OK
Update an existing project¶
There are situations when the required structure of a Renku project needs
to be recreated or you have an existing Git repository for folder that
you wish to turn into a Renku project. In these cases, Renku will warn you
if there are any files that need to be overwritten. README.md
and
README.rst
will never be overwritten. .gitignore
will be appended to
to prevent files accidentally getting committed. Files that are not present
in the template will be left untouched by the command.
$ echo "# Example\nThis is a README." > README.md
$ echo "FROM python:3.7-alpine" > Dockerfile
$ renku init
INDEX ID PARAMETERS
------- -------------- ------------
1 python-minimal description
2 R-minimal description
3 bioc-minimal description
4 julia-minimal description
5 minimal
Please choose a template by typing the index: 1
The template requires a value for "description": Test Project
Initializing Git repository...
Warning: The following files exist in the directory and will be overwritten:
Dockerfile
Proceed? [y/N]: y
Initializing new Renku repository...
Initializing file .dockerignore ...
Initializing file .gitignore ...
Initializing file .gitlab-ci.yml ...
Initializing file .renku/renku.ini ...
Initializing file .renkulfsignore ...
Overwriting file Dockerfile ...
Initializing file data/.gitkeep ...
Initializing file environment.yml ...
Initializing file notebooks/.gitkeep ...
Initializing file requirements.txt ...
Project initialized.
OK
If you initialize in an existing git repository, Renku will create a backup branch before overwriting any files and will print commands to revert the changes done and to see what changes were made.
You can also enable the external storage system for output files, if it was not installed previously.
$ renku init --external-storage
renku template
¶
Manage project templates.
Renku projects are initialized using a project template. Renku has a set of built-in templates that you can use in your projects. These templates can be listed by using:
$ renku template ls
INDEX ID
----- --------------
1 python-minimal
2 R-minimal
3 julia-minimal
You can use other sources of templates that reside inside a git repository:
$ renku template ls --source https://github.com/SwissDataScienceCenter/contributed-project-templates
INDEX ID
----- --------------
1 python-minimal
2 R-minimal
3 julia-minimal
renku template show <template-id>
command can be used to see detailed
information about a single template. If no template ID is passed, then it shows
current project’s template.
Set a template¶
You can change a project’s template using renku template set
command:
$ renku template set <template-id>
or use a template from a different source:
$ renku template set <template-id> --source <template-repo-url>
This command fails if the project already has a template. Use --force
flag
to force-change the template.
Note
Setting a template overwrites existing files in a project. Pass
--interactive
flag to get a prompt for selecting which files to keep or
overwrite.
Update a project’s template¶
A project’s template can be update using:
$ renku template update
If an update is available, this commands updates all project’s files that are
not modified locally by the project. Pass --interactive
flag to select
which files to keep or overwrite.
Passing --dry-run
flags shows the newest available template version and a
list of files that will be updated.
Note
You can specify a template version for a project by passing a
--reference
when setting it (or when initializing a project). This
approach only works for templates from sources other than Renku because
Renku templates are bound to the Renku version. Note that although a
reference can be a git tag, branch or commit SHA, it’s recommended to use
only git tags as a reference.
Note
A template maintainer can disable updates for a template. In this case,
renku update
refuses to update the project. Passing --force
flag
causes Renku to update the template anyways.
Note
Renku always preserve the project’s Renku version that is set in the Dockerfile
even if you overwrite the Dockerfile. The reason is that the project’s metadata
is not updated when setting/updating a template and therefore the project
won’t work with a different Renku version. To update Renku version you need
to use renku migrate
command.
Validating a template repository¶
If you are developing your own templates in a template repository, there are some rules that templates have to follow. To assist in creating your own templates, you can check that everything is ok with:
$ renku template validate
Running this inside a template repository (not in a Renku project) will check that the manifest and individual templates are correct and follow Renku template conventions, printing warnings or errors if something needs to be changed.
renku clone
¶
Clone a Renku project.
Cloning a Renku project¶
To clone a Renku project use renku clone
command. This command is preferred
over using git clone
because it sets up required Git hooks and enables
Git-LFS automatically.
$ renku clone <repository-url> <destination-directory>
It creates a new directory with the same name as the project. You can change the directory name by passing another name on the command line.
By default, renku clone
pulls data from Git-LFS after cloning. If you don’t
need the LFS data, pass --no-pull-data
option to skip this step.
Note
To move a project to another Renku deployment you need to create a new empty project in the target deployment and push both the repository and Git-LFS objects to the new remote. Refer to Git documentation for more details.
$ git lfs fetch --all
$ git remote remove origin
$ git remote add origin <new-repository-url>
$ git push --mirror origin
To clone private repositories with an HTTPS address, you first need to log into
a Renku deployment using the renku login command. renku clone
will use
the stored credentials when available.
renku config
¶
Get and set Renku repository or global options.
Set values¶
You can set various Renku configuration options, for example the image registry URL, with a command like:
$ renku config set interactive.default_url "/tree"
By default, configuration is stored locally in the project’s directory. Use
--global
option to store configuration for all projects in your home
directory.
Remove values¶
To remove a specific key from configuration use:
$ renku config remove interactive.default_url
By default, only local configuration is searched for removal. Use --global
option to remove a global configuration value.
Query values¶
You can display all configuration values with:
$ renku config show
[renku "interactive"]
default_url = /lab
Both local and global configuration files are read. Values in local
configuration take precedence over global values. Use --local
or
--global
flag to read corresponding configuration only.
You can provide a KEY to display only its value:
$ renku config show interactive.default_url
default_url = /lab
Available configuration values¶
The following values are used by renku-python and available for
the renku config
command:
Name |
Description |
Default |
---|---|---|
|
Whether to show messages about files being added to git LFS or not |
|
|
Threshold file size below which files are not added to git LFS |
|
|
Path to the data directory (read-only after project creation) |
|
|
Access token for Zenodo API |
|
|
Access token for Dataverse API |
|
|
URL for the Dataverse API server to use |
|
See the section on renku.ini for more configuration values.
renku project
¶
Renku CLI commands for handling of projects.
Showing project metadata¶
- You can see the metadata of the current project by using
renku project show
: $ renku project show Id: /projects/john.doe/flights-tutorial Name: flights-tutorial Description: Flight tutorial project Creator: John Doe <John Doe@datascience.ch> Created: 2021-11-05T10:32:57+01:00 Keywords: keyword1, keyword2 Renku Version: 1.0.0 Project Template: python-minimal (1.0.0)
Editing projects¶
Users can edit some project’s metadata using by using renku project edit
command.
The following options can be passed to this command to set various metadata for a project.
Option |
Description |
---|---|
-d, –description |
Project’s description. |
-c, –creator |
Creator’s name, email, and an optional affiliation. Accepted format is ‘Forename Surname <email> [affiliation]’. |
-m, –metadata |
Path to json file containing custom metadata to be added to the project knowledge graph. |
renku dataset
¶
Renku CLI commands for handling of datasets.
Manipulating datasets¶
Creating an empty dataset inside a Renku project:
$ renku dataset create my-dataset
Creating a dataset ... OK
You can pass the following options to this command to set various metadata for the dataset.
Option |
Description |
---|---|
-t, –title |
A human-readable title for the dataset. |
-d, –description |
Dataset’s description. |
-c, –creator |
Creator’s name, email, and an optional affiliation. Accepted format is ‘Forename Surname <email> [affiliation]’. Pass multiple times for a list of creators. |
-k, –keyword |
Dataset’s keywords. Pass multiple times for a list of keywords. |
-m, –metadata |
Path to file containing custom JSON-LD metadata to be added to the dataset. |
Editing a dataset’s metadata:
Use the edit
sub-command to change metadata of a dataset. You can edit the same
set of metadata as the create command by passing the options described in the
table above.
$ renku dataset edit my-dataset --title 'New title'
Successfully updated: title.
Listing all datasets:
$ renku dataset ls
ID NAME TITLE VERSION
-------- ------------- ------------- ---------
0ad1cb9a some-dataset Some Dataset
9436e36c my-dataset My Dataset
You can select which columns to display by using --columns
to pass a
comma-separated list of column names:
$ renku dataset ls --columns id,name,date_created,creators
ID NAME CREATED CREATORS
-------- ------------- ------------------- ---------
0ad1cb9a some-dataset 2020-03-19 16:39:46 sam
9436e36c my-dataset 2020-02-28 16:48:09 sam
Displayed results are sorted based on the value of the first column.
You can specify output formats by passing --format
with a value of tabular
,
json-ld
or json
.
Showing dataset details:
$ renku dataset show some-dataset
Name: some-dataset
Created: 2020-12-09 13:52:06.640778+00:00
Creator(s): John Doe<john.doe@example.com> [SDSC]
Keywords: Dataset, Data
Annotations:
[
{...}
]
Title: Some Dataset
Description:
Just some dataset
You can also show details for a specific tag using the --tag
option.
Deleting a dataset:
$ renku dataset rm some-dataset
OK
Working with data¶
Adding data to the dataset:
$ renku dataset add my-dataset http://data-url
This will copy the contents of data-url
to the dataset and add it
to the dataset metadata.
You can create a dataset when you add data to it for the first time by passing
--create
flag to add command:
$ renku dataset add --create new-dataset http://data-url
To add data from a git repository, you can specify it via https or git+ssh URL schemes. For example,
$ renku dataset add my-dataset git+ssh://host.io/namespace/project.git
Sometimes you want to add just specific paths within the parent project.
In this case, use the --source
or -s
flag:
$ renku dataset add my-dataset --source path/within/repo/to/datafile \
git+ssh://host.io/namespace/project.git
The command above will result in a structure like
data/
my-dataset/
datafile
You can use shell-like wildcards (e.g. , *, ?) when specifying paths to be added. Put wildcard patterns in quotes to prevent your shell from expanding them.
$ renku dataset add my-dataset --source 'path/**/datafile' \
git+ssh://host.io/namespace/project.git
You can use --destination
or -d
flag to set the location where the new
data is copied to. This location be will under the dataset’s data directory and
will be created if does not exists.
$ renku dataset add my-dataset \
--source path/within/repo/to/datafile \
--destination new-dir/new-subdir \
git+ssh://host.io/namespace/project.git
will yield:
data/
my-dataset/
new-dir/
new-subdir/
datafile
To add a specific version of files, use --ref
option for selecting a
branch, commit, or tag. The value passed to this option must be a valid
reference in the remote Git repository.
Adding external data to the dataset:
Sometimes you might want to add data to your dataset without copying the
actual files to your repository. This is useful for example when external data
is too large to store locally. The external data must exist (i.e. be mounted)
on your filesystem. Renku creates a symbolic to your data and you can use this
symbolic link in renku commands as a normal file. To add an external file pass
--external
or -e
when adding local data to a dataset:
$ renku dataset add my-dataset -e /path/to/external/file
Updating a dataset:
After adding files from a remote Git repository or importing a dataset from a
provider like Dataverse or Zenodo, you can check for updates in those files by
using renku dataset update --all
command. For Git repositories, this command
checks all remote files and copies over new content if there is any. It does
not delete files from the local dataset if they are deleted from the remote Git
repository; to force the delete use --delete
argument. You can update to a
specific branch, commit, or tag by passing --ref
option.
For datasets from providers like Dataverse or Zenodo, the whole dataset is
updated to ensure consistency between the remote and local versions. Due to
this limitation, the --include
and --exclude
flags are not compatible
with those datasets. Moreover, deleted remote files are automatically deleted
without requiring the --delete
argument. Modifying those datasets locally
will prevent them from being updated.
The update command also checks for file changes in the project and updates datasets’ metadata accordingly.
You can limit the scope of updated files by specifying dataset names, using
--include
and --exclude
to filter based on file names, or using
--creators
to filter based on creators. For example, the following command
updates only CSV files from my-dataset
:
$ renku dataset update -I '*.csv' my-dataset
Note that putting glob patterns in quotes is needed to tell Unix shell not to expand them.
External data are also updated automatically. Since they require a checksum
calculation which can take a long time when data is large, you can exclude them
from an update by passing --no-external
flag to the update command:
$ renku dataset update --all --no-external
You can use --dry-run
flag to get a preview of what files/datasets will be
updated by an update operation.
Tagging a dataset:
A dataset can be tagged with an arbitrary tag to refer to the dataset at that point in time. A tag can be added like this:
$ renku dataset tag my-dataset 1.0 -d "Version 1.0 tag"
A list of all tags can be seen by running:
$ renku dataset ls-tags my-dataset
CREATED NAME DESCRIPTION DATASET COMMIT
------------------- ------ --------------- ---------- ----------------
2020-09-19 17:29:13 1.0 Version 1.0 tag my-dataset 6c19a8d31545b...
A tag can be removed with:
$ renku dataset rm-tags my-dataset 1.0
Importing data from other Renku projects:
To import all data files and their metadata from another Renku dataset use:
$ renku dataset import \
https://renkulab.io/projects/<username>/<project>/datasets/<dataset-id>
or
$ renku dataset import \
https://renkulab.io/projects/<username>/<project>/datasets/<dataset-name>
or
$ renku dataset import \
https://renkulab.io/datasets/<dataset-id>
You can get the link to a dataset form the UI or you can construct it by knowing the dataset’s ID.
By default, Renku imports the latest version of a dataset from the other project. If you want to import another version, pass the dataset version’s tag to the import command:
$ renku dataset import \
https://renkulab.io/datasets/<dataset-id> --tag <version>
Importing data from an external provider:
$ renku dataset import 10.5281/zenodo.3352150
This will import the dataset with the DOI (Digital Object Identifier)
10.5281/zenodo.3352150
and make it locally available.
Dataverse and Zenodo are supported, with DOIs (e.g. 10.5281/zenodo.3352150
or doi:10.5281/zenodo.3352150
) and full URLs (e.g.
http://zenodo.org/record/3352150
). A tag with the remote version of the
dataset is automatically created.
Exporting data to an external provider:
$ renku dataset export my-dataset zenodo
This will export the dataset my-dataset
to zenodo.org
as a draft,
allowing for publication later on. If the dataset has any tags set, you
can chose if the repository HEAD version or one of the tags should be
exported. The remote version will be set to the local tag that is being
exported.
To export to a Dataverse provider you must pass Dataverse server’s URL and the name of the parent dataverse where the dataset will be exported to. Server’s URL is stored in your Renku setting and you don’t need to pass it every time.
To export a dataset to OLOS you must pass the OLOS server’s base URL and supply your access token when prompted for it. You must also choose which organizational unit to export the dataset to from the list shown during the export. The export does not map contributors from Renku to OLOS and also doesn’t map License information. Additionally, all file categories default to Primary/Derived. This has to adjusted manually in the OLOS interface after the export is done.
Listing all files in the project associated with a dataset.
$ renku dataset ls-files
DATASET NAME ADDED PATH LFS
------------------- ------------------- ----------------------------- ----
my-dataset 2020-02-28 16:48:09 data/my-dataset/add-me *
my-dataset 2020-02-28 16:49:02 data/my-dataset/weather/file1 *
my-dataset 2020-02-28 16:49:02 data/my-dataset/weather/file2
my-dataset 2020-02-28 16:49:02 data/my-dataset/weather/file3 *
You can select which columns to display by using --columns
to pass a
comma-separated list of column names:
$ renku dataset ls-files --columns name,creators, path
DATASET NAME CREATORS PATH
------------------- --------- -----------------------------
my-dataset sam data/my-dataset/add-me
my-dataset sam data/my-dataset/weather/file1
my-dataset sam data/my-dataset/weather/file2
my-dataset sam data/my-dataset/weather/file3
Displayed results are sorted based on the value of the first column.
You can specify output formats by passing --format
with a value of tabular
,
json-ld
or json
.
Sometimes you want to filter the files. For this we use --dataset
,
--include
and --exclude
flags:
$ renku dataset ls-files --include "file*" --exclude "file3"
DATASET NAME ADDED PATH LFS
------------------- ------------------- ----------------------------- ----
my-dataset 2020-02-28 16:49:02 data/my-dataset/weather/file1 *
my-dataset 2020-02-28 16:49:02 data/my-dataset/weather/file2 *
Unlink a file from a dataset:
$ renku dataset unlink my-dataset --include file1
OK
Unlink all files within a directory from a dataset:
$ renku dataset unlink my-dataset --include "weather/*"
OK
Unlink all files from a dataset:
$ renku dataset unlink my-dataset
Warning: You are about to remove following from "my-dataset" dataset.
.../my-dataset/weather/file1
.../my-dataset/weather/file2
.../my-dataset/weather/file3
Do you wish to continue? [y/N]:
Note
The unlink
command does not delete files,
only the dataset record.
renku gc
¶
Free up disk space by removing temporary files and caches in a Renku project.
renku graph
¶
Renku CLI commands for handling of Knowledge Graph data.
Exporting Knowledge Graph data¶
You can export part or all of the Renku Knowledge Graph metadata for the
current project using the renku graph export
command.
By default, this will export the metadata created in the last commit in the
project.
If that commit was not a renku
command that creates metadata, it will
produce no output.
$ renku dataset create my-dataset
OK
$ renku graph export
[
{
"@id": "https://localhost/datasets/850e74d6c0204e8c923457a1b9ce52d8",
"@type": [
"http://schema.org/Dataset",
"http://www.w3.org/ns/prov#Entity"
],
[... many more lines ...]
}
]
Here we created a new dataset and then renku graph export
exported the
created metadata as JSON-LD, the default format.
If you want the Knowledge Graph data for the whole project, you can use
renku graph export --full
. Alternatively, you can get data for a single
commit by using renku graph export --revision <git commit sha>
or by
specifying a range of commits like renku graph export --revision sha1..sha2
.
renku graph export
currently supports various formats for export, such as
json-ld
, rdf
, nt
(for triples) and dot
(for GraphViz graphs),
which can be specified using the --format
option. For instance,
$ renku graph export --full --format dot | dot -Tpng -o my_graph.png
would produce a PNG image of the whole Knowledge Graph for the project.
To run validation on the generated output, you can pass the --strict
option, which will check that all the nodes and properties in the graph are
correct and that there isn’t anything missing.
renku run
¶
Track provenance of data created by executing programs.
Capture command line execution¶
Tracking execution of your command line script is done by simply adding the
renku run
command before the actual command. This will enable detection of:
arguments (flags),
string and integer options,
input files or directories if linked to existing paths in the repository,
output files or directories if modified or created while running the command.
It will create a Plan
(Workflow Template) that can be reused and a Run
which is a record of a past workflow execution for provenance purposes. Refer
to the renku workflow documentation for more details on this distinction.
Basic usage is:
$ renku run --name <plan name> -- <console command>
Note
If there were uncommitted changes in the repository, then the
renku run
command fails. See git status for details.
Warning
If executed command/script has similar arguments to renku run
(e.g. --input
) they will be treated as renku run
arguments. To
avoid this, put a --
separator between renku run
and the
command/script.
Warning
Input and output paths can only be detected if they are passed as
arguments to renku run
.
Warning
Circular dependencies are not supported for renku run
. See
Circular Dependencies for more details.
Warning
When using output redirection in renku run
on Windows (with
`` > file`` or `` 2> file``), all Renku errors and messages are redirected
as well and renku run
produces no output on the terminal. On Linux,
this is detected by renku and only the output of the command to be run is
actually redirected. Renku specific messages such as errors get printed to
the terminal as usual and don’t get redirected.
Detecting input paths¶
Any path passed as an argument to renku run
, which was not changed during
the execution, is identified as an input path. The identification only works if
the path associated with the argument matches an existing file or directory
in the repository.
The detection might not work as expected if:
a file is modified during the execution. In this case it will be stored as an output;
a path is not passed as an argument to
renku run
.
Specifying auxiliary inputs (--input
)
You can specify extra inputs to your program explicitly by using the
--input
option. This is useful for specifying hidden dependencies
that don’t appear on the command line. Explicit inputs must exist before
execution of renku run
command. This option is not a replacement for
the arguments that are passed on the command line. Files or directories
specified with this option will not be passed as input arguments to the
script.
You can specify --input name=path
or just --input path
, the former
of which would also set the name of the input on the resulting Plan.
Specifying auxiliary parameters (--param
)
You can specify extra parameters to your program explicitly by using the
--param
option. This is useful for getting Renku to consider a
parameter as just a string even if it matches a file name in the project.
This option is not a replacement for the arguments that are passed on the
command line.
You can specify --param name=value
or just --param value
, the former
of which would also set the name of the parameter on the resulting Plan.
Disabling input detection (--no-input-detection
)
Input paths detection can be disabled by passing --no-input-detection
flag to renku run
. In this case, only the directories/files that are
passed as explicit input are considered to be file inputs. Those passed via
command arguments are ignored unless they are in the explicit inputs list.
This only affects files and directories; command options and flags are
still treated as inputs.
Note
renku run
prints the generated plan after execution if you pass
--verbose
to it. You can check the generated plan to verify that the
execution was done as you intended. The plan will always be printed to
stderr
even if it’s directed to a file.
Detecting output paths¶
Any path modified or created during the execution will be added as an output.
Because the output path detection is based on the Git repository state after
the execution of renku run
command, it is good to have a basic
understanding of the underlying principles and limitations of tracking
files in Git.
Git tracks not only the paths in a repository, but also the content stored in those paths. Therefore:
a recreated file with the same content is not considered an output file, but instead is kept as an input;
file moves are detected based on their content and can cause problems;
directories cannot be empty.
Note
When in doubt whether the outputs will be detected, remove all
outputs using git rm <path>
followed by git commit
before running
the renku run
command.
Command does not produce any files (--no-output
)
If the program does not produce any outputs, the execution ends with an error:
Error: There are not any detected outputs in the repository.
You can specify the --no-output
option to force tracking of such
an execution.
Specifying outputs explicitly (--output
)
You can specify expected outputs of your program explicitly by using the
--output
option. These output must exist after the execution of the
renku run
command. However, they do not need to be modified by
the command.
You can specify --output name=path
or just –output path`, the former
of which would also set the name of the output on the resulting Plan.
Disabling output detection (--no-output-detection
)
Output paths detection can be disabled by passing --no-output-detection
flag to renku run
. When disabled, only the directories/files that are
passed as explicit output are considered to be outputs and those passed via
command arguments are ignored.
Detecting standard streams¶
Often the program expect inputs as a standard input stream. This is detected
and recorded in the tool specification when invoked by renku run cat < A
.
Similarly, both redirects to standard output and standard error output can be done when invoking a command:
$ renku run grep "test" B > C 2> D
Warning
Detecting inputs and outputs from pipes |
is not supported.
Specifying inputs and outputs programmatically¶
Sometimes the list of inputs and outputs are not known before execution of the program. For example, a program might accept a date range as input and access all files within that range during its execution.
To address this issue, the program can dump a mapping of input and output files
that it is accessing in inputs.yml
and outputs.yml
. This YAML file
should be of the format
.. code-block:: YAML
name1: path1 name2: path2
where name is the user-defined name of the input/output and path is the path. When the program is finished, Renku will look for existence of these two files and adds their content to the list of explicit inputs and outputs. Renku will then delete these two files.
By default, Renku looks for these two files in .renku/tmp
directory. One
can change this default location by setting RENKU_INDIRECT_PATH
environment variable. When set, it points to a sub-directory within the
.renku/tmp
directory where inputs.yml
and outputs.yml
reside.
Exit codes¶
All Unix commands return a number between 0 and 255 which is called an “exit code”. In case other numbers are returned, they are treated modulo 256 (-10 is equivalent to 246, 257 is equivalent to 1). The exit-code 0 represents a success and non-zero exit-code indicates a failure.
Therefore the command specified after renku run
is expected to return
exit-code 0. If the command returns different exit code, you can specify them
with --success-code=<INT>
parameter.
$ renku run --success-code=1 --no-output fail
Circular Dependencies¶
Circular dependencies are not supported in renku run
. This means you cannot
use the same file or directory as both an input and an output in the same step,
for instance reading from a file as input and then appending to it is not
allowed. Since renku records all steps of an analysis workflow in a dependency
graph and it allows you to update outputs when an input changes, this would
lead to problems with circular dependencies. An update command would change the
input again, leading to renku seeing it as a changed input, which would run
update again, and so on, without ever stopping.
Due to this, the renku dependency graph has to be acyclic. So instead of appending to an input file or writing an output file to the same directory that was used as an input directory, create new files or write to other directories, respectively.
renku log
¶
Renku cli for history of renku commands.
You can use renku log
to get a history of renku commands.
At the moment, it shows workflow executions and dataset changes.
$ renku log
Activity /activities/be60896d8d984a0bb585e53f7a3146dc
Start Time: 2022-02-03T13:56:27+01:00
End Time: 2022-02-03T13:56:28+01:00
User: John Doe <John.Doe@example.com>
Renku Version: renku 1.0.5
Plan:
Id: /plans/c00826a571a246e79b0a3d77712e6f3b
Name: python-test-fc2ec
Command: python test.py
Inputs:
input-1: test.py
Parameters:
text: hello
Dataset testset
Date: 2022-02-03T11:26:55+01:00
Changes: created
Title set to: testset
Creators modified:
+ John Doe <John.Doe@example.com>
To show only dataset entries, use -d
, to show only workflows, use -w
.
You can select a format using the --format <format>
argument.
renku login
¶
Logging in to a Renku deployment.
You can use renku login
command to authenticate with a remote Renku
deployment. This command will bring up a browser window where you can log in
using your credentials. Renku CLI receives and stores a secure token that will
be used for future authentications.
$ renku login <endpoint>
Parameter endpoint
is the URL of the Renku deployment that you want to
authenticate with (e.g. renkulab.io
). You can either pass this parameter on
the command-line or set it once in project’s configuration:
$ renku config set endpoint <endpoint>
Note
The secure token is stored in plain-text in Renku’s global configuration
file on your home directory (~/.renku/renku.ini
). Renku changes access
rights of this file to be readable only by you. This token exists only on
your system and won’t be pushed to a remote server.
This command also allows you to log into gitlab server for private repositories.
You can use this method instead of creating an SSH key. Passing --git
will
change the repository’s remote URL to an endpoint in the deployment that adds
authentication to gitlab requests.
Note
Project’s remote URL will be changed when using --git
option. Changes
are undone when logging out from renku in the CLI. Original remote URL will
be stored in a remote with name renku-backup-<remote-name>
.
Logging out from Renku removes the secure token from your system:
$ renku logout <endpoint>
If you don’t specify an endpoint when logging out, credentials for all endpoints are removed.
renku status
¶
Show status of data files created in the repository.
Inspecting a repository¶
renku status
command can be used to check if there are output files in
a repository that are outdated and need to be re-generated. Output files get
outdated due to changes in input data or source code (i.e. dependencies).
This command shows a list of output files that need to be updated along with a list of modified inputs for each file. It also display deleted inputs files if any.
To check for a specific input or output files, you can pass them to this command:
$ renku status path/to/file1 path/to/file2
In this case, renku only checks if the specified path or paths are modified or outdated and need an update, instead of checking all inputs and outputs.
The paths mentioned in the output are made relative to the current directory if you are working in a subdirectory (this is on purpose, to help cutting and pasting to other commands).
renku update
¶
Update outdated files created by the “run” command.
Recreating outdated files¶
The information about dependencies for each file in a Renku project is stored in various metadata.
When an update command is executed, Renku looks into the most recent execution of each workflow (Plan and Activity combination) and checks which one is outdated (i.e. at least one of its inputs is modified). It generates a minimal dependency graph for each outdated file stored in the repository. It means that only the necessary steps will be executed.
Assume that the following history for the file H
exists.
C---D---E
/ \
A---B---F---G---H
The first example shows situation when D
is modified and files E
and
H
become outdated.
C--*D*--(E)
/ \
A---B---F---G---(H)
** - modified
() - needs update
In this situation, you can do effectively three things:
Update all files
$ renku update --all
Update only
E
$ renku update E
Update
E
andH
$ renku update H
Note
If there were uncommitted changes then the command fails. Check git status to see details.
Pre-update checks¶
In the next example, files A
or B
are modified, hence the majority
of dependent files must be recreated.
(C)--(D)--(E)
/ \
*A*--*B*--(F)--(G)--(H)
To avoid excessive recreation of the large portion of files which could have
been affected by a simple change of an input file, consider specifying a single
file (e.g. renku update G
). See also renku status.
Update siblings¶
If a workflow step produces multiple output files, these outputs will be always updated together.
(B)
/
*A*--[step 1]--(C)
\
(D)
An attempt to update a single file would update its siblings as well.
The following commands will produce the same result.
$ renku update C
$ renku update B C D
Ignoring deleted paths¶
The update command will regenerate any deleted files/directories. If you don’t
want to regenerate deleted paths, pass --ignore-deleted
to the update
command. You can make this the default behavior by setting
update_ignore_delete
config value for a project or globally:
$ renku config set [--global] update_ignore_delete True
Note that deleted path always will be regenerated if they have siblings or downstream dependencies that aren’t deleted.
renku rerun
¶
Recreate files created by the “run” command.
Recreating files¶
Assume you have run a step 2 that uses a stochastic algorithm, so each run
will be slightly different. The goal is to regenerate output C
several
times to compare the output. In this situation it is not possible to simply
call renku update since the input file A
has not been modified
after the execution of step 2.
A-[step 1]-B-[step 2*]-C
Recreate a specific output file by running:
$ renku rerun C
If you do not want step 1 to also be rerun, you can specify a starting point
using the --from
parameter:
$ renku rerun --from B C
Note that all other outputs of the executed workflow will be recreated as well. If the output didn’t change, it will be removed from git and re-added to ensure that the re-execution is properly tracked.
renku rm
¶
Remove a file, a directory, or a symlink.
Removing a file that belongs to a dataset will update its metadata. It also will attempt to update tracking information for files stored in an external storage (using Git LFS).
renku mv
¶
Move or rename a file, a directory, or a symlink.
Moving a file that belongs to a dataset will update its metadata to include its
new path and commit. Moreover, tracking information in an external storage
(e.g. Git LFS) will be updated. Move operation fails if a destination already
exists in the repo; use --force
flag to overwrite them.
If you want to move files to another dataset use --to-dataset
along with
destination’s dataset name. This removes source paths from all datasets’
metadata that include them (if any) and adds them to the destination’s dataset
metadata.
The following command moves data/src
and README
to data/dst
directory and adds them to target-dataset
’s metadata. If the source files
belong to one or more datasets then they will be removed from their metadata.
$ renku mv data/src README data/dst --to-dataset target-dataset
renku workflow
¶
Manage the set of CWL files created by renku
commands.
Runs and Plans¶
Renku records two different kinds of metadata when a workflow is executed,
Run
and Plan
.
Plans describe a recipe for a command. They function as a template that
can be used directly or combined with other workflow templates to create more
complex recipes.
These Plans can be run in various ways, on creation with renku run
,
doing a renku rerun
or renku update
or manually using renku workflow
execute
.
Each time a Plan
is run, we track that instance of it as a Run
.
Runs track workflow execution through time. They track which Plan was
run, at what time, with which specific values. This gives an insight into what
were the steps taken in a repository, how they were taken and what results they
produced.
The renku workflow
group of commands contains most of the commands used
to interact with Plans and Runs
Working with Plans¶
Listing Plans¶
$ renku workflow ls
ID NAME
--------------------------------------- ---------------
/plans/11a3702184394b93ac422df760e40999 cp-B-C-ca4da
/plans/96642cac86d9435e8abce2384f8618b9 cat-A-C-fa017
/plans/96c70626575c41c5a13853b070eaaaf5 my-other-run
/plans/9a0961844fcc46e1816fde00f57e24a8 my-run
Each entry corresponds to a recorded Plan/workflow template. You can also
show additional columns using the --columns
parameter, which takes any
combination of values from id
, name
, keywords
and description
.
Showing Plan Details¶
You can see the details of a plan by using renku workflow show
:
$ renku workflow show my-run
Id: /plans/9a0961844fcc46e1816fde00f57e24a8
Name: run1
Command: cp A B
Success Codes:
Inputs:
- input-1:
Default Value: A
Position: 1
Outputs:
- output-2:
Default Value: B
Position: 2
This shows the unique Id of the Plan, its name, the full command of the Plan
if it was run without any modifications (more on that later), which exit codes
should be considered successful executions (defaults to 0
) as well as its
inputs, outputs and parameters.
Executing Plans¶
Plans can be executed using renku workflow execute
. They can be run as-is
or their parameters can be modified as needed. Renku has a plugin architecture
to allow execution using various execution backends.
$ renku workflow execute --provider cwltool --set input-1=file.txt my-run
Parameters can be set using the --set
keyword or by specifying them in a
values YAML file and passing that using --values
. In case of passing a file,
the YAML should follow the this structure:
learning_rate: 0.9
dataset_input: dataset.csv
chart_output: mychart.png
myworkflow:
lr: 0.8
lookuptable: lookup.xml
myotherworkflow:
language: en
In addition to being passed on the command line and being available to
renku.ui.api.*
classes in Python scripts, parameters are also set as
environment variables when executing the command, in the form of
RENKU_ENV_<parameter name>
.
Provider specific settings can be passed as file using the --config
parameter.
Iterate Plans¶
For executing a Plan with different parametrization renku workflow iterate
could be used. This sub-command is basically conducting a ‘grid search’-like
execution of a Plan, with parameter-sets provided by the user.
$ renku workflow iterate --map parameter-1=[1,2,3] --map parameter-2=[10,20] my-run
The set of possible values for a parameter can be given by --map
command
line argument or by specifying them in a values YAML file and passing that
using --mapping
. Content of the mapping file for the above example
should be:
parameter-1: [1,2,3]
parameter-2: [10,20]
By default renku workflow iterate
will execute all the combination of the
given parameters’ list of possible values. Sometimes it is desired that instead
of all the combination of possible values, a specific tuple of values are
executed. This could be done by marking the parameters that should be bound
together with the @tag
suffix in their names.
$ renku workflow iterate --map parameter-1@tag1=[1,2,3] --map parameter-2@tag1=[10,5,30] my-run
This will result in only three distinct execution of the my-run
Plan,
with the following parameter combinations: [(1,10), (2,5), (3,30)]
. It is
important to note that parameters that have the same tag, should have the same
number of possible values, i.e. the values list should have the same length.
There’s a special template variable for parameter values {iter_index}
, which
can be used to mark each iteration’s index in a value of a parameter. The template
variable is going to be substituted with the iteration index (0, 1, 2, …).
$ renku workflow iterate --map parameter-1=[10,20,30] --map output=output_{iter_index}.txt my-run
This would execute my-run
three times, where parameter-1
values would be
10
, 20` and 30
and the producing output files output_0.txt
,
output_1.txt
and output_2.txt
files in this order.
Exporting Plans¶
You can export a Plan to a number of different workflow languages, such as CWL
(Common Workflow Language) by using renku workflow export
:
$ renku workflow export --format cwl my-run
baseCommand:
- cp
class: CommandLineTool
cwlVersion: v1.0
id: 63e3a2a8-5b40-49b2-a2f4-eecc37bc76b0
inputs:
- default: B
id: _plans_9a0961844fcc46e1816fde00f57e24a8_outputs_2_arg
inputBinding:
position: 2
type: string
- default:
class: File
location: file:///home/user/my-project/A
id: _plans_9a0961844fcc46e1816fde00f57e24a8_inputs_1
inputBinding:
position: 1
type: File
- default:
class: Directory
location: file:///home/user/my-project/.renku
id: input_renku_metadata
type: Directory
- default:
class: Directory
location: file:///home/user/my-project/.git
id: input_git_directory
type: Directory
outputs:
- id: _plans_9a0961844fcc46e1816fde00f57e24a8_outputs_2
outputBinding:
glob: $(inputs._plans_9a0961844fcc46e1816fde00f57e24a8_outputs_2_arg)
type: File
requirements:
InitialWorkDirRequirement:
listing:
- entry: $(inputs._plans_9a0961844fcc46e1816fde00f57e24a8_inputs_1)
entryname: A
writable: false
- entry: $(inputs.input_renku_metadata)
entryname: .renku
writable: false
- entry: $(inputs.input_git_directory)
entryname: .git
writable: false
You can export into a file directly with -o <path>
.
Composing Plans into larger workflows¶
For more complex workflows consisting of several steps, you can use the
renku workflow compose
command. This creates a new workflow that has
substeps.
The basic usage is:
$ renku run --name step1-- cp input intermediate
$ renku run --name step2 -- cp intermediate output
$ renku workflow compose my-composed-workflow step1 step2
This would create a new workflow called my-composed-workflow
that
consists of step1
and step2
as steps. This new workflow is just
like any other workflow in renku in that it can be executed, exported
or composed with other workflows.
Workflows can also be composed based on past Runs and their
inputs/outputs, using the --from
and --to
parameters. This finds
chains of Runs from inputs to outputs and then adds them to the
composed plan, applying mappings (see below) where appropriate to make
sure the correct values for execution are used in the composite. This
also means that all the parameters in the used plans are exposed on the
composed plan directly.
In the example above, this would be:
$ renku workflow compose --from input --to output my-composed-workflow
You can expose parameters of child steps on the parent workflow using
--map
/-m
arguments followed by a mapping expression. Mapping expressions
take the form of <name>=<expression>
where name
is the name of the
property to be created on the parent workflow and expression points to one
or more fields on the child steps that should be mapped to this property.
The expressions come in two flavors, absolute references using the names
of workflows and properties, and relative references specifying the
position within a workflow.
An absolute expression in the example above could be step1.my_dataset
to refer to the input, output or argument named my_dataset
on the step
step1
. A relative expression could be @step2.@output1
to refer
to the first output of the second step of the composed workflow.
Valid relative expressions are @input<n>
, @output<n>
and @param<n>
for the nth input, output or argument of a step, respectively. For referring
to steps inside a composed workflow, you can use @step<n>
. For referencing
a mapping on a composed workflow, you can use @mapping<n>
. Of course, the
names of the objects for all these cases also work.
The expressions can also be combined using ,
if a mapping should point
to more than one parameter of a child step.
You can mix absolute and relative reference in the same expression, as you see fit.
A full example of this would be:
$ renku workflow compose --map input_file=step1.@input2 --map output_file=@step1.my-output,@step2.step2s_output my-composed-workflow step1 step2
This would create a mapping called input_file
on the parent workflow that
points to the second input of step1
and a mapping called output_file
that points to both the output my-output
on step1
and
step2s_output
on step2
.
You can also set default values for mappings, which override the default values
of the parameters they’re pointing to by using the --set
/-s
parameter, for
instance:
$ renku workflow compose --map input_file=step1.@input2 --set input_file=data.csv
my-composed-workflow step1 step2
This would lead to data.csv
being used for the second input of
step1
when my-composed-workflow
is executed (if it isn’t overridden
at execution time).
You can add a description to the mappings to make them more human-readable
by using the --describe-param
/-p
parameter, as shown here:
$ renku workflow compose --map input_file=step1.@input2 -p input_file="The dataset to process"
my-composed-workflow step1 step2
You can also expose all inputs, outputs or parameters of child steps by
using --map-inputs
, --map-outputs
or --map-params
, respectively.
On execution, renku will automatically detect links between steps, if an input of one step uses the same path as an output of another step, and execute them in the correct order. Since this depends on what values are passed at runtime, you might want to enforce a certain order of steps by explicitly mapping outputs to inputs.
You can do that using the --link <source>=<sink>
parameters, e.g.
--link step1.@output1=step2.@input1
. This gets recorded on the
workflow template and forces step2.@input1
to always be set to the same
path as step1.@output1
, irrespective of which values are passed at
execution time.
This way, you can ensure that the steps in your workflow are always executed in the correct order and that the dependencies between steps are modeled correctly.
Renku can also add links for you automatically based on the default values
of inputs and outputs, where inputs/outputs that have the same path get
linked in the composed run. To do this, pass the --link-all
flag.
Warning
Due to workflows having to be directed acyclic graphs, cycles in the dependencies are not allowed. E.g. step1 depending on step2 depending on step1 is not allowed. Additionally, the flow of information has to be from outputs to inputs or parameters, so you cannot map an input to an output, only the other way around.
Values on inputs/outputs/parameters get set according to the following order of precedence (lower precedence first):
Default value on a input/output/parameter
Default value on a mapping to the input/output/parameter
Value passed to a mapping to the input/output/parameter
Value passed to the input/output/parameter
Value propagated to an input from the source of a workflow link
Editing Plans¶
Plans can be edited in some limited fashion, but we do not allow structural changes, as that might cause issues with the reproducibility and provenance of the project. If you want to do structural changes (e.g. adding/removing parameters), we recommend you record a new plan instead.
You can change the name and description of Plans and of their parameters, as
well as changing default values of the parameters using the renku workflow
edit
command:
$ renku workflow edit my-run --name new-run --description "my description"
--rename-param input-1=my-input --set my-input=other-file.txt
--describe-param my-input="My input parameter" my-run
This would rename the Plan my-run
to new-run
, change its description,
rename its parameter input-1
to my-input
and set the default of this
parameter to other-file.txt
and set its description.
Option |
Description |
---|---|
|
Plan’s name |
|
Plan’s description. |
|
Set default value for a parameter. Accepted format is ‘<name>=<value>’ |
|
Add a new mapping on the Plan. Accepted format is ‘<name>=<name or expression>’ |
|
Rename a parameter. Accepted format is ‘<name>=”new name”’ |
|
Add a description for a parameter. Accepted format is ‘<name>=”description”’ |
|
Path to file containing custom JSON-LD metadata to be added to the dataset. |
Removing Plans¶
Sometimes you might want to discard a recorded Plan or reuse its name with a
new Plan. In these cases, you can delete the old plan using renku workflow
remove <plan name>
. Once a Plan is removed, it doesn’t show up in most renku
workflow commands.
renku update
ignores deleted Plans, but renku rerun
will still rerun
them if needed, to ensure reproducibility.
Working with Runs¶
Listing Runs¶
To get a view of what commands have been execute in the project, you can use
the renku log --workflows
command:
$ renku log --workflows
DATE TYPE DESCRIPTION
------------------- ---- -------------
2021-09-21 15:46:02 Run cp A C
2021-09-21 10:52:51 Run cp A B
Refer to the documentation of the renku log command for more details.
Visualizing Executions¶
You can visualize past Runs made with renku using the renku workflow
visualize
command.
This will show a directed graph of executions and how they are connected. This
way you can see exactly how a file was generated and what steps it involved.
It also supports an interactive mode that lets you explore the graph in a more
detailed way.
$ renku run echo "input" > input
$ renku run cp input intermediate
$ renku run cp intermediate output
$ renku workflow visualize
╔════════════╗
║echo > input║
╚════════════╝
*
*
*
┌─────┐
│input│
└─────┘
*
*
*
╔═════════════════════╗
║cp input intermediate║
╚═════════════════════╝
*
*
*
┌────────────┐
│intermediate│
└────────────┘
*
*
*
╔══════════════════════╗
║cp intermediate output║
╚══════════════════════╝
*
*
*
┌──────┐
│output│
└──────┘
$ renku workflow visualize intermediate
╔════════════╗
║echo > input║
╚════════════╝
*
*
*
┌─────┐
│input│
└─────┘
*
*
*
╔═════════════════════╗
║cp input intermediate║
╚═════════════════════╝
*
*
*
┌────────────┐
│intermediate│
└────────────┘
$ renku workflow visualize --from intermediate
┌────────────┐
│intermediate│
└────────────┘
*
*
*
╔══════════════════════╗
║cp intermediate output║
╚══════════════════════╝
*
*
*
┌──────┐
│output│
└──────┘
You can also run in interactive mode using the --interactive
flag.
$ renku workflow visualize --interactive
This will allow you to navigate between workflow execution and see details by pressing the <Enter> key.
Use renku workflow visualize -h
to see all available options.
Input and output files¶
You can list input and output files generated in the repository by running
renku workflow inputs
and renku workflow outputs
commands. Alternatively,
you can check if all paths specified as arguments are input or output files
respectively.
$ renku run wc < source.txt > result.wc
$ renku workflow inputs
source.txt
$ renku workflow outputs
result.wc
$ renku workflow outputs source.txt
$ echo $? # last command finished with an error code
1
renku save
¶
Convenience method to save local changes and push them to a remote server.
If you have local modification to files, you can save them using
$ renku save
Username for 'https://renkulab.io': my.user
Password for 'https://my.user@renkulab.io':
Successfully saved:
file1
file2
OK
Warning
The username and password for renku save are your gitlab user/password, not your renkulab login!
You can additionally supply a message that describes the changes that you
made by using the -m
or --message
parameter followed by your
message.
$ renku save -m "Updated file1 and 2."
Successfully saved:
file1
file2
OK
If no remote server has been configured, you can specify one by using the
-d
or --destination
parameter. Otherwise you will get an error.
$ renku save
Error: No remote has been set up for the current branch
$ renku save -d https://renkulab.io/gitlab/my.user/my-project.git
Successfully saved:
file1
file2
OK
You can also specify which paths to save:
$ renku save file1
Successfully saved:
file1
OK
renku storage
¶
Manage an external storage.
Pulling files from git LFS¶
LFS works by checking small pointer files into git and saving the actual contents of a file in LFS. If instead of your file content, you see something like this, it means the file is stored in git LFS and its contents are not currently available locally (they are not pulled):
version https://git-lfs.github.com/spec/v1
oid sha256:42b5c7fb2acd54f6d3cd930f18fee3bdcb20598764ca93bdfb38d7989c054bcf
size 12
You can manually pull contents of file(s) you want with:
$ renku storage pull file1 file2
Removing local content of files stored in git LFS¶
If you want to restore a file back to its pointer file state, for instance to free up space locally, you can run:
$ renku storage clean file1 file2
This removes any data cached locally for files tracked in in git LFS.
Migrate large files to git LFS¶
If you accidentally checked a large file into git or are moving a non-LFS renku repo to git LFS, you can use the following command to migrate the files to LFS:
$ renku storage migrate --all
This will move all files that are not excluded by .renkulfsignore into git LFS.
Note
Recent versions of Git LFS don’t support filtering files based on their size. Therefore, Renku ignores lfs_threshold config value when migrating files to LFS using this command.
To only migrate specific files, you can also pass their paths to the command like:
$ renku storage migrate big_file other_big_file
renku doctor
¶
Check your system and repository for potential problems.
renku mergetool
¶
Custom git merge tool for renku metadata.
Support merging Renku metadata¶
Renku stores all of its metadata in the repository, in compressed form. When working in multiple branches with Renku, this metadata needs to be merged when a git merge is made. To support users when doing this, Renku provides a custom merge tool that takes care of merging the metadata.
The merge tool is set up automatically when creating a new project or when using renku clone
to clone a Renku
project.
You can manually set up the merge tool by running renku mergetool install
.
renku migrate
¶
Migrate project to the latest Renku version.
When the way Renku stores metadata changes or there are other changes to the
project structure or data that are needed for Renku to work, renku migrate
can be used to bring the project up to date with the current version of Renku.
This does not usually affect how you use Renku and no data is lost.
In addition, renku migrate
will update your Dockerfile` to install the
latest version of ``renku-python
, if supported, making sure your renku
version is up to date in interactive environments as well.
If you created your repository from a project template and the template has changed since you created the project, it will also update files with their newest version from the template, without overwriting local changes if there are any.
You can check if a migration is necessary and what migrations are available by running
$ renku migrate -c
renku rollback
¶
Rollback project to a previous point in time.
If you want to undo actions taken using Renku in a project, you can use the
renku rollback
command to do so.
This command shows a list of all actions done by renku and lets you pick
one that you want to return to, discarding any changes done in the repo (by
Renku or manually) after that point. Once you pick a checkpoint to return to,
the commands shows all files and Renku objects that would be affected by the
rollback and how they would be affected. If you confirm, the project will be
reset to that point in time, with anything done after that point being
deleted/lost.
$ renku rollback
Select a checkpoint to roll back to:
[0] 2021-10-20 09:50:04 renku workflow edit cp-blabla-asdasf-0b535 --name test
[1] 2021-10-20 09:49:19 renku rerun asdasf
[2] 2021-10-20 09:48:59 renku run cp blabla asdasf
[3] 2021-10-20 08:37:00 renku dataset add e blabla
[4] 2021-10-20 08:31:16 renku dataset create m
Checkpoint ([q] to quit) [q]: 4
The following changes would be done:
Metadata:
Modified ♻️:
Dataset: e
Removed 🔥:
Plan: cp-blabla-asdasf-0b535
Plan: test
Run: /activities/cc3ab70952fc499e93e7e4075a076bf5 (Plan name: cp-blabla-asdasf-0b535)
Run: /activities/48b89b22567d4282abe8a016fa91878f (Plan name: cp-blabla-asdasf-0b535)
Files:
Restored ↻:
blabla
Removed 🔥:
asdasf
Proceed? [y/N]: y
Note
This command was introduced in renku-python version 1.0.0. Commands executed with previous versions of renku can’t be rolled back to.
renku service
¶
Commands to launch service components.
renku githooks
¶
Install and uninstall Git hooks.
Prevent modifications of output files¶
The commit hooks are enabled by default to prevent situation when some output file is manually modified.
$ renku init
$ renku run echo hello > greeting.txt
$ edit greeting.txt
$ git commit greeting.txt
You are trying to update some output files.
Modified outputs:
greeting.txt
If you are sure, use "git commit --no-verify".
renku session
¶
Manage interactive sessions.
Interactive sessions can be started via the command line interface with different
providers. Currently two providers are supported: docker
and renkulab
.
Docker provider¶
The docker
provider will take the current state of the repository, build a docker
image (if one does not already exist) and then launch a session with this image. In
addition to this the docker
provider will mount the local repository inside
the docker
container so that changes made in the session are immediately reflected
on the host where the session was originally started from.
Please note that in order to use this provider Docker is expected to be installed and available on your computer. In addition, using this command from within a Renku interactive session started from the Renku website is not possible. This command is envisioned as a means for users to quickly test and check their sessions locally without going to a Renku deployment and launching a session there, or in the case where they simply have no access to a Renku deployment.
$ renku session start -p docker
Renkulab provider¶
The renkulab
provider will launch a regular interactive session
in the Renku deployment that hosts the current project. If the project has not
been uploaded/created in a Renku deployment then this provider will not be able
to launch a session. This provider is identical to going through the Renku website
and launching a session “manually” by selecting the project, commit, branch, etc.
Please note that there are a few limitations with the renkulab
provider:
If the user is not logged in (using the
renku login
command) then sessions can only be launched if the specific Renku deployment supports anonymous sessions.When launching anonymous sessions local changes cannot be reflected in them and changes made inside the session cannot be saved nor downloaded locally. This feature should be used only for adhoc exploration or work that can be discarded when the session is closed. The CLI will print a warning every time an anonymous session is launched.
Changes made inside the interactive session are not immediately reflected locally, users should
git pull
any changes made inside an interactive session to get the same changes locally.Local changes can only be reflected in the interactive session if they are committed and pushed to the git repository. When launching a session and uncommitted or unpushed changes are present, the user will be prompted to confirm whether Renku should commit and push the changes before a session is launched. The session will launch only if the changes are committed and pushed.
$ renku session start -p renkulab
Managing active sessions¶
The session
command can be used to also list, stop and open active sessions.
In order to see active sessions (from any provider) run the following command:
$ renku session start -p renkulab
ID STATUS URL
------------------- -------- -------------------------------------------------
renku-test-e4fe76cc running https://dev.renku.ch/sessions/renku-test-e4fe76cc
An active session can be opened by using its ID
from the list above. For example, the open
command below will open the single active session in the browser.
$ renku session open renku-test-e4fe76cc
An active session can be stopped by using the stop
command and the ID
from the list of
active sessions.
$ renku session stop renku-test-e4fe76cc
The command renku session stop --all
will stop all active sessions regardless of the provider.
Error Tracking¶
Renku is not bug-free and you can help us to find them.
GitHub¶
You can quickly open an issue on GitHub with a traceback and minimal system information when you hit an unhandled exception in the CLI.
Ahhhhhhhh! You have found a bug. 🐞
1. Open an issue by typing "open";
2. Print human-readable information by typing "print";
3. See the full traceback without submitting details (default: "ignore").
Please select an action by typing its name (open, print, ignore) [ignore]:
Sentry¶
When using renku
as a hosted service the Sentry integration can be enabled
to help developers iterate faster by showing them where bugs happen, how often,
and who is affected.
Install
Sentry-SDK
withpython -m pip install sentry-sdk
;Set environment variables
SENTRY_DSN=true
andSENTRY_DSN=https://<key>@sentry.<domain>/<project>
.Set the environment variable
SENTRY_SAMPLE_RATE=0.2
. This would track 20% of all requests in Sentry performance monitoring. Set to 0 to disable.
Warning
User information might be sent to help resolving the problem. If you are not using your own Sentry instance you should inform users that you are sending possibly sensitive information to a 3rd-party service.