Renku Python API

The following sections describe the Renku Python API. If you work with the R programming language, you can also use this API through the reticulate package. For more information, visit our dedicated tutorial.

Activity

Renku API Activity.

Activity represents executed workflows in a Renku project. You can get a list of all activities in a project by calling its list method:

from renku.api import Activity

activities = Activity.list()

The Activity class provides a static filter method that returns a subset of activities. It can filter activities based on their input, outputs, parameter names, and parameter values. You can pass a literal value, a list of values, or a function predicate for each of these fields to filter activities:

from numbers import Number
from renku.api import Activity

# Return activities that use ``path/to/an/input``
Activity.filter(inputs="path/to/an/input")

# Return activities that use ``input-1`` or ``input-2`` AND generate
# output files that their name starts with ``data-``
Activity.filter(inputs=["input-1", "input-2"], outputs=lambda path: path.startswith("data-"))

# Return activities that use values between ``0.5`` and ``1.5`` for the
# parameter ``lr``
Activity.filter(parameters="lr", values=lambda value: 0.5 <= value <= 1.5 if isinstance(value, Number) else False)

Dataset

Renku API Dataset.

Dataset class allows listing datasets and files inside a Renku project and accessing their metadata.

To get a list of available datasets in a Renku project use list method:

from renku.api import Dataset

datasets = Dataset.list()

You can then access metadata of a dataset like name, title, keywords, etc. To get the list of files inside a dataset use files property:

for dataset_file in dataset.files:
    print(dataset_file.path)

Inputs, Outputs, and Parameters

Renku API Workflow Models.

Input and Output classes can be used to define inputs and outputs of a script within the same script. Paths defined with these classes are added to explicit inputs and outputs in the workflow’s metadata. For example, the following mark a data/data.csv as an input with name my-input to the script:

from renku.api import Input

with open(Input("my-input", "data/data.csv")) as input_data:
    for line in input_data:
        print(line)

Users can track parameters’ values in a workflow by defining them using Parameter function.

from renku.api import Parameter

nc = Parameter(name="n_components", value=10)

print(nc.value)  # 10

Once a Parameter is tracked like this, it can be set normally in commands like renku workflow execute with the --set option to override the value.

Plan, CompositePlan

Renku API Plan.

Plan and CompositePlan classes represent Renku workflow plans executed in a Project. Each of these classes has a static list method that returns a list of all active plans/composite-plans in a project:

from renku.api import Plan

plans = Plan.list()

composite_plans = CompositePlan.list()

Project

Renku API Project.

Project class acts as a context for other Renku entities like Dataset, or Inputs/Outputs. It provides access to internals of a Renku project for such entities.

Normally, you do not need to create an instance of Project class directly unless you want to have access to Project metadata (e.g. path) or get its status. To separate parts of your script that uses Renku entities, you can create a Project context manager and interact with Renku inside it:

from renku.api import Project, Input

with Project():
    input_1 = Input("input_1", "path_1")

You can use Project’s status method to get info about outdated outputs and activities, and modified or deleted inputs:

from renku.api import Project

outdated_generations, outdated_activities, modified_inputs, deleted_inputs = Project().status()

RDF Graph

Renku RDF Graph API.

The RDFGraph class allows for the quick creation of a searchable graph object based on the project’s metadata.

To create the graph and query it:

from renku.ui.api import RDFGraph

g = RDFGraph()
# get a list of contributors to the project
list(g.subjects(object=URIRef("http://schema.org/Person")))

For more information on querying the graph, see the RDFLib documentation.