Core Business Logic

renku.core contains the business logic of renku-python. Functionality is split into subfolders based on topic, such as dataset or workflow.

Command Builder

Most renku commands require context (database/git/etc.) to be set up for them. The command builder pattern makes this easy by wrapping commands in factory methods.

Renku Command Builder .

class renku.command.command_builder.Command[source]

Base renku command builder.

__init__ of Command.

add_injection_pre_hook(order, hook)[source]

Add a pre-execution hook for dependency injection.

Parameters:
  • order (int) – Determines the order of executed hooks, lower numbers get executed first.

  • hook (Callable) – The hook to add.

add_post_hook(order, hook)[source]

Add a post-execution hook.

Parameters:
  • order (int) – Determines the order of executed hooks, higher numbers get executed first.

  • hook (Callable) – The hook to add.

add_pre_hook(order, hook)[source]

Add a pre-execution hook.

Parameters:
  • order (int) – Determines the order of executed hooks, lower numbers get executed first.

  • hook (Callable) – The hook to add.

any_builder_is_instance_of(cls)[source]

Check if any ‘chained’ command builder is an instance of a specific command builder class.

build()[source]

Build (finalize) the command.

Returns:

Finalized command that cannot be modified.

Return type:

Command

command(operation)[source]

Set the wrapped command.

Parameters:

operation (Callable) – The function to wrap in the command builder.

Returns:

This command.

Return type:

Command

execute(*args, **kwargs)[source]

Execute the wrapped operation.

First executes pre_hooks in ascending order, passing a read/write context between them. It then calls the wrapped operation. The result of the operation then gets pass to all the post_hooks, but in descending order. It then returns the result or error if there was one.

Returns:

Result of execution of command.

Return type:

CommandResult

property finalized

Whether this builder is still being constructed or has been finalized.

lock_dataset()[source]

Acquire a lock for a dataset.

lock_project()[source]

Acquire a lock for the whole project.

require_clean()[source]

Check that the repository is clean.

require_migration()[source]

Check if a migration is needed.

track_std_streams()[source]

Whether to track STD streams or not.

Returns:

This command.

Return type:

Command

property will_write_to_database

Will running the command write anything to the metadata store.

with_commit(message=None, commit_if_empty=False, raise_if_empty=False, commit_only=None, skip_staging=False, skip_dirty_checks=False)[source]

Create a commit.

Parameters:
  • message (str, optional) – The commit message. Auto-generated if left empty (Default value = None).

  • commit_if_empty (bool, optional) – Whether to commit if there are no modified files (Default value = False).

  • raise_if_empty (bool, optional) – Whether to raise an exception if there are no modified files (Default value = False).

  • commit_only (bool, optional) – Only commit the supplied paths (Default value = None).

  • skip_staging (bool) – Don’t commit staged files.

  • skip_dirty_checks (bool) – Don’t check if paths are dirty or staged.

with_communicator(communicator)[source]

Create a communicator.

Parameters:

communicator (CommunicationCallback) – Communicator to use for writing to user.

with_database(write=False, path=None, create=False)[source]

Provide an object database connection.

Parameters:
  • write (bool, optional) – Whether or not to persist changes to the database (Default value = False).

  • path (str, optional) – Location of the database (Default value = None).

  • create (bool, optional) – Whether the database should be created if it doesn’t exist (Default value = False).

with_git_isolation()[source]

Whether to run in git isolation or not.

working_directory(directory)[source]

Set the working directory for the command.

WARNING: Should not be used in the core service.

Parameters:

directory (str) – The working directory to work in.

Returns:

This command.

Return type:

Command

JSON-LD Schemes

Schema classes used to serialize domain models to JSON-LD.

Activity JSON-LD schema.

class renku.command.schema.activity.ActivitySchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Activity schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of Activity

removes_ms(objs, many, **kwargs)[source]

Remove milliseconds from datetimes.

Note: since DateField uses strftime as format, which only supports timezone info without a colon e.g. +0100 instead of +01:00, we have to deal with milliseconds manually instead of using a format string.

class renku.command.schema.activity.AssociationSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Association schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of Association

class renku.command.schema.activity.GenerationSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Generation schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of Generation

class renku.command.schema.activity.ParameterValueSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

ParameterValue schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of ParameterValue

class renku.command.schema.activity.UsageSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Usage schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of Usage

class renku.command.schema.activity.WorkflowFileActivityCollectionSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

WorkflowFileActivityCollection schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of WorkflowFileActivityCollection

Agents JSON-LD schemes.

class renku.command.schema.agent.PersonSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Person schema.

class Meta[source]

Bases: object

Meta class.

model

alias of Person

fix_affiliation(data, **kwargs)[source]

Fix affiliation to be a string.

class renku.command.schema.agent.SoftwareAgentSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

SoftwareAgent schema.

class Meta[source]

Bases: object

Meta class.

model

alias of SoftwareAgent

Annotation JSON-LD schema.

class renku.command.schema.annotation.AnnotationSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Annotation schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of Annotation

Classes for integration with Calamus.

class renku.command.schema.calamus.DateTimeList(*args, **kwargs)[source]

Bases: DateTime

A DateTime field that might be a list when deserializing.

Create an instance.

class renku.command.schema.calamus.JsonLDSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Base schema class for Renku.

Create an instance.

class renku.command.schema.calamus.Nested(*args, **kwargs)[source]

Bases: Nested

Nested field that passes along commit info.

Init method.

property schema

The nested calamus.Schema object.

This method was copied from marshmallow and modified to support multiple different nested schemes.

class renku.command.schema.calamus.StringList(*args, **kwargs)[source]

Bases: String

A String field that might be a list when deserializing.

Create an instance.

class renku.command.schema.calamus.Uri(*args, **kwargs)[source]

Bases: _JsonLDField, String, Dict

A Dict/String field.

Create an instance.

Represent a group of run templates.

class renku.command.schema.composite_plan.CompositePlanSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Plan schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of CompositePlan

removes_ms(objs, many, **kwargs)[source]

Remove milliseconds from datetimes.

Note: since DateField uses strftime as format, which only supports timezone info without a colon e.g. +0100 instead of +01:00, we have to deal with milliseconds manually instead of using a format string.

Datasets JSON-LD schemes.

class renku.command.schema.dataset.DatasetFileSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

DatasetFile schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of DatasetFile

removes_ms(objs, many, **kwargs)[source]

Remove milliseconds from datetimes.

Note: since DateField uses strftime as format, which only supports timezone info without a colon e.g. +0100 instead of +01:00, we have to deal with milliseconds manually instead of using a format string.

class renku.command.schema.dataset.DatasetSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Dataset schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of Dataset

removes_ms(objs, many, **kwargs)[source]

Remove milliseconds from datetimes.

Note: since DateField uses strftime as format, which only supports timezone info without a colon e.g. +0100 instead of +01:00, we have to deal with milliseconds manually instead of using a format string.

class renku.command.schema.dataset.DatasetTagSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

DatasetTag schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of DatasetTag

removes_ms(objs, many, **kwargs)[source]

Remove milliseconds from datetimes.

Note: since DateField uses strftime as format, which only supports timezone info without a colon e.g. +0100 instead of +01:00, we have to deal with milliseconds manually instead of using a format string.

class renku.command.schema.dataset.ImageObjectSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

ImageObject schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of ImageObject

class renku.command.schema.dataset.LanguageSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Language schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of Language

class renku.command.schema.dataset.RemoteEntitySchema(*args, **kwargs)[source]

Bases: JsonLDSchema

RemoteEntity schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of RemoteEntity

class renku.command.schema.dataset.UrlSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Url schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of Url

renku.command.schema.dataset.dump_dataset_as_jsonld(dataset)[source]

Return JSON-LD representation of a dataset.

Parameters:

dataset (Dataset) – The dataset to convert.

Returns:

JSON-LD data of dataset.

Return type:

dict

Entities JSON-LD schemes.

class renku.command.schema.entity.CollectionSchema(*args, **kwargs)[source]

Bases: EntitySchema

Entity Schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of Collection

class renku.command.schema.entity.EntitySchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Entity Schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of Entity

Parameters JSON-LD schemes.

class renku.command.schema.parameter.CommandInputSchema(*args, **kwargs)[source]

Bases: CommandParameterBaseSchema

CommandInput schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of CommandInput

class renku.command.schema.parameter.CommandOutputSchema(*args, **kwargs)[source]

Bases: CommandParameterBaseSchema

CommandOutput schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of CommandOutput

class renku.command.schema.parameter.CommandParameterBaseSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

CommandParameterBase schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of CommandParameterBase

class renku.command.schema.parameter.CommandParameterSchema(*args, **kwargs)[source]

Bases: CommandParameterBaseSchema

CommandParameter schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of CommandParameter

class renku.command.schema.parameter.MappedIOStreamSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

MappedIOStream schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of MappedIOStream

class renku.command.schema.parameter.ParameterLinkSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

ParameterLink schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of ParameterLink

class renku.command.schema.parameter.ParameterMappingSchema(*args, **kwargs)[source]

Bases: CommandParameterBaseSchema

ParameterMapping schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of ParameterMapping

Represent run templates.

class renku.command.schema.plan.PlanSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Plan schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of Plan

removes_ms(objs, many, **kwargs)[source]

Remove milliseconds from datetimes.

Note: since DateField uses strftime as format, which only supports timezone info without a colon e.g. +0100 instead of +01:00, we have to deal with milliseconds manually instead of using a format string.

Project JSON-LD schema.

class renku.command.schema.project.ProjectSchema(*args, **kwargs)[source]

Bases: JsonLDSchema

Project Schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of Project

removes_ms(objs, many, **kwargs)[source]

Remove milliseconds from datetimes.

Note: since DateField uses strftime as format, which only supports timezone info without a colon e.g. +0100 instead of +01:00, we have to deal with milliseconds manually instead of using a format string.

Represent workflow file run templates.

class renku.command.schema.workflow_file.WorkflowFileCompositePlanSchema(*args, **kwargs)[source]

Bases: CompositePlanSchema

Plan schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of WorkflowFileCompositePlan

fix_ids(objs, many, **kwargs)[source]

Renku up to 2.4.1 had a bug that created wrong ids for workflow file entities, this fixes those on export.

class renku.command.schema.workflow_file.WorkflowFilePlanSchema(*args, **kwargs)[source]

Bases: PlanSchema

WorkflowFilePlan schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of WorkflowFilePlan

fix_ids(objs, many, **kwargs)[source]

Renku up to 2.4.1 had a bug that created wrong ids for workflow file entities, this fixes those on export.

Datasets

Dataset business logic.

renku.core.dataset.dataset.add_datadir_files_to_dataset(dataset)[source]

Add all files in a datasets data directory to the dataset.

Parameters:

dataset (Dataset) – The dataset to add data dir files to.

renku.core.dataset.dataset.create_dataset(name, title=None, description=None, creators=None, keywords=None, images=None, update_provenance=True, custom_metadata=None, storage=None, datadir=None)[source]

Create a dataset.

Parameters:
  • name (str) – Name of the dataset

  • title (Optional[str], optional) – Dataset title (Default value = None).

  • description (Optional[str], optional) – Dataset description (Default value = None).

  • creators (Optional[List[Person]], optional) – Dataset creators (Default value = None).

  • keywords (Optional[List[str]], optional) – Dataset keywords (Default value = None).

  • images (Optional[List[ImageRequestModel]], optional) – Dataset images (Default value = None).

  • update_provenance (bool, optional) – Whether to add this dataset to dataset provenance (Default value = True).

  • custom_metadata (Optional[Dict[str, Any]], optional) – Custom JSON-LD metadata (Default value = None).

  • storage (Optional[str], optional) – Backend storage’s URI (Default value = None).

  • datadir (Optional[Path]) – Dataset’s data directory (Default value = None).

Returns:

The created dataset.

Return type:

Dataset

renku.core.dataset.dataset.download_file(file, storage)[source]

Download a dataset file and retrieve its missing metadata (if any).

Parameters:
  • file (DatasetFile) – Dataset file to download.

  • storage – Dataset’s cloud storage (an instance of IStorage).

Returns:

A list with the updated file if its metadata was missing; an empty list otherwise.

Return type:

List[DatasetFile]

renku.core.dataset.dataset.edit_dataset(name, title, description, creators, keywords=<object object>, images=<object object>, custom_metadata=<object object>, custom_metadata_source=<object object>)[source]

Edit dataset metadata.

Parameters:
  • name (str) – Name of the dataset to edit

  • title (Optional[Union[str, NoValueType]]) – New title for the dataset.

  • description (Optional[Union[str, NoValueType]]) – New description for the dataset.

  • creators (Optional[Union[List[Person], NoValueType]]) – New creators for the dataset.

  • keywords (Optional[Union[List[str], NoValueType]]) – New keywords for dataset (Default value = NO_VALUE).

  • images (Optional[Union[List[ImageRequestModel], NoValueType]]) – New images for dataset (Default value = NO_VALUE).

  • custom_metadata (Optional[Union[Dict, List[Dict], NoValueType]]) – Custom JSON-LD metadata (Default value = NO_VALUE).

  • custom_metadata_source (Optional[Union[str, NoValueType]]) – The custom metadata source (Default value = NO_VALUE).

Returns:

True if updates were performed.

Return type:

bool

renku.core.dataset.dataset.export_dataset(name, provider_name, tag, **kwargs)[source]

Export data to 3rd party provider.

Parameters:
  • name (str) – Name of dataset to export.

  • provider_name (str) – Provider to use for export.

  • tag (str) – Dataset tag from which to export.

Remove matching files from a dataset.

Parameters:
  • name (str) – Dataset name.

  • include (Optional[List[str]]) – Include filter for files (Default value = None).

  • exclude (Optional[List[str]]) – Exclude filter for files (Default value = None).

  • yes (bool) – Whether to skip user confirmation or not (Default value = False).

  • dataset_files (Optional[List[DatasetFile]]) – Files to remove; ignore include and exclude if passed (Default value = None).

Returns:

List of files that were removed.

Return type:

List[DynamicProxy]

renku.core.dataset.dataset.filter_dataset_files(dataset_gateway, names=None, tag=None, creators=None, include=None, exclude=None, ignore=None, immutable=False, check_data_directory=False)[source]

Filter dataset files by specified filters.

Parameters:
  • dataset_gateway (IDatasetGateway) – Injected dataset gateway.

  • names (Optional[List[str]]) – Filter by specified dataset names (Default value = None).

  • tag (Optional[str]) – Filter by specified tag (Default value = None).

  • creators (Optional[Union[str, List[str], Tuple[str]]]) – Filter by creators (Default value = None).

  • include (Optional[List[str]]) – Tuple containing patterns to which include from result (Default value = None).

  • exclude (Optional[List[str]]) – Tuple containing patterns to which exclude from result (Default value = None).

  • ignore (Optional[List[str]]) – Ignored datasets (Default value = None).

  • immutable (bool) – Return immutable copies of dataset objects (Default value = False).

  • check_data_directory (bool) – Whether to check for new files in dataset’s data directory that aren’t in the dataset yet (Default value = False).

Returns:

List of filtered files sorted by date added.

Return type:

List[DynamicProxy]

renku.core.dataset.dataset.import_dataset(uri, name='', extract=False, yes=False, datadir=None, previous_dataset=None, delete=False, gitlab_token=None, **kwargs)[source]

Import data from a 3rd party provider or another renku project.

Parameters:
  • uri (str) – DOI or URL of dataset to import.

  • name (str) – Name to give imported dataset (Default value = “”).

  • extract (bool) – Whether to extract compressed dataset data (Default value = False).

  • yes (bool) – Whether to skip user confirmation (Default value = False).

  • datadir (Optional[Path]) – Dataset’s data directory (Default value = None).

  • previous_dataset (Optional[Dataset]) – Previously imported dataset version (Default value = None).

  • delete (bool) – Whether to delete files that don’t exist anymore (Default value = False).

  • gitlab_token (Optional[str]) – Gitlab OAuth2 token (Default value = None).

renku.core.dataset.dataset.list_dataset_files(datasets=None, tag=None, creators=None, include=None, exclude=None)[source]

List dataset files.

Parameters:
  • datasets (Optional[List[str]]) – Datasets to list files for (Default value = None).

  • tag (str) – Tag to filter by (Default value = None).

  • creators (Optional[Union[str, List[str], Tuple[str]]]) – Creators to filter by (Default value = None).

  • include (Optional[List[str]]) – Include filters for file paths (Default value = None).

  • exclude (Optional[List[str]]) – Exclude filters for file paths (Default value = None).

Returns:

Filtered dataset files.

Return type:

List[DynamicProxy]

renku.core.dataset.dataset.list_datasets()[source]

List all datasets.

renku.core.dataset.dataset.mount_cloud_storage(name, existing, yes)[source]

Mount a cloud storage to a dataset’s data directory.

Parameters:
  • name (str) – Name of the dataset

  • existing (Optional[Path]) – An existing mount point to use instead of actually mounting the backend storage.

  • yes (bool) – Don’t prompt when removing non-empty dataset’s data directory.

renku.core.dataset.dataset.move_files(dataset_gateway, files, to_dataset_name=None)[source]

Move files and their metadata from one or more datasets to a target dataset.

Parameters:
  • dataset_gateway (IDatasetGateway) – Injected dataset gateway.

  • files (Dict[Path, Path]) – Files to move

  • to_dataset_name (Optional[str], optional) – Target dataset (Default value = None)

renku.core.dataset.dataset.pull_cloud_storage(name, location=None)[source]

Pull/copy data for a cloud storage to a dataset’s data directory or a specified location.

Parameters:
  • name (str) – Name of the dataset

  • location (Optional[Path]) – A directory to copy data to (Default value = None).

renku.core.dataset.dataset.read_dataset_data_location(dataset)[source]

Read data location for a dataset in the config file.

renku.core.dataset.dataset.remove_dataset(name)[source]

Delete a dataset.

Parameters:

name (str) – Name of dataset to delete.

renku.core.dataset.dataset.search_datasets(name)[source]

Get all the datasets whose name starts with the given string.

Parameters:

name (str) – Beginning of dataset name to search for.

Returns:

List of found dataset names.

Return type:

List[str]

renku.core.dataset.dataset.set_dataset_images(dataset, images)[source]

Set a dataset’s images.

Parameters:
Returns:

True if images were set/modified.

renku.core.dataset.dataset.show_dataset(name, tag=None)[source]

Show detailed dataset information.

Parameters:
  • name (str) – Name of dataset to show details for.

  • tag (str, optional) – Tags for which to get the metadata (Default value = None).

Returns:

JSON dictionary of dataset details.

Return type:

dict

renku.core.dataset.dataset.store_dataset_data_location(dataset, location)[source]

Store data location for a dataset in the config file.

renku.core.dataset.dataset.unmount_cloud_storage(name)[source]

Mount a cloud storage to a dataset’s data directory.

Parameters:

name (str) – Name of the dataset

renku.core.dataset.dataset.update_dataset_custom_metadata(dataset, custom_metadata, custom_metadata_source)[source]

Update custom metadata on a dataset.

Parameters:
  • dataset (Dataset) – The dataset to update.

  • custom_metadata (Dict) – Custom JSON-LD metadata to set.

  • custom_metadata_source (str) – The source field for the custom metadata.

renku.core.dataset.dataset.update_datasets(names, creators, include, exclude, ref, delete, no_local, no_remote, check_data_directory, update_all, dry_run, plain, dataset_gateway)[source]

Update dataset files.

Parameters:
  • names (List[str]) – Names of datasets to update.

  • creators (Optional[str]) – Creators to filter dataset files by.

  • include (Optional[List[str]]) – Include filter for paths to update.

  • exclude (Optional[List[str]]) – Exclude filter for paths to update.

  • ref (Optional[str]) – Git reference to use for update.

  • delete (bool) – Whether to delete files that don’t exist on remote anymore.

  • no_local (bool) – Whether to exclude local files from the update.

  • no_remote (bool) – Whether to exclude remote files from the update.

  • check_data_directory (bool) – Whether to check the dataset’s data directory for new files.

  • update_all (bool) – Whether to update all datasets.

  • dry_run (bool) – Whether to return a preview of what would be updated.

  • plain (bool) – Whether plain output should be produced.

  • dataset_gateway (IDatasetGateway) – Injected dataset gateway.

renku.core.dataset.dataset.update_linked_files(records, dry_run)[source]

Update files linked to other files in the project.

Parameters:
  • records (List[DynamicProxy]) – File records to update.

  • dry_run (bool) – Whether to return a preview of what would be updated.

Dataset add business logic.

renku.core.dataset.dataset_add.add_files_to_repository(dataset, files)[source]

Track files in project’s repository.

renku.core.dataset.dataset_add.add_to_dataset(dataset_name, urls, *, importer=None, force=False, create=False, overwrite=False, sources=None, destination='', revision=None, extract=False, clear_files_before=False, total_size=None, datadir=None, storage=None, **kwargs)[source]

Import the data into the data directory.

renku.core.dataset.dataset_add.copy_file(file, dataset, storage)[source]

Copy/move/link a file to dataset’s data directory.

renku.core.dataset.dataset_add.copy_files_to_dataset(dataset, files)[source]

Copy/Move files into a dataset’s directory.

renku.core.dataset.dataset_add.filter_files(dataset, files, force, overwrite)[source]

Filter ignored and overwritten files.

renku.core.dataset.dataset_add.get_cloud_dataset_from_path(path, dataset_gateway, missing_ok=False)[source]

Check the path against datasets’ storage and return a dataset (if any).

renku.core.dataset.dataset_add.get_dataset_file_path_within_dataset(dataset, entity_path)[source]

Return a dataset file’s path relative to the dataset’s datadir.

NOTE: Dataset files can reside outside the dataset’s datadir.

renku.core.dataset.dataset_add.get_files_metadata(*, urls, importer=None, dataset, destination, extract, revision, sources, force=False, **kwargs)[source]

Process file URLs for adding to a dataset.

renku.core.dataset.dataset_add.get_upload_uri(dataset, entity_path)[source]

Return the remote storage path that a dataset file would be located.

Parameters:
  • dataset (Dataset) – Dataset with a backend storage.

  • entity_path (Union[Path, str]) – Dataset file’s path (entity path); it is relative to the project’s root.

Returns:

URI within remote storage.

Return type:

str

renku.core.dataset.dataset_add.has_cloud_storage(dataset_gateway)[source]

Return if a project has any dataset with cloud storage with its data directory mounted or pulled.

renku.core.dataset.dataset_add.update_dataset_metadata(dataset, files, clear_files_before)[source]

Add newly-added files to the dataset’s metadata.

Dataset context managers.

class renku.core.dataset.context.DatasetContext(name, create=False, commit_database=False, creator=None, datadir=None, storage=None)[source]

Bases: object

Dataset context manager for metadata changes.

Datasets Provenance.

class renku.core.dataset.datasets_provenance.DatasetsProvenance[source]

Bases: object

A set of datasets.

add_or_update(dataset, date=None, creator=None)[source]

Add/update a dataset according to its new content.

NOTE: This functions always mutates the dataset.

add_tag(dataset, tag)[source]

Add a tag from a dataset.

property datasets

Return an iterator of datasets.

get_all_tags(dataset)[source]

Return the list of all tags for a dataset.

get_by_id(id, immutable=False)[source]

Return a dataset by its id.

get_by_name(name, immutable=False, strict=False)[source]

Return a dataset by its name.

Parameters:
  • name (str) – Name of the dataset

  • immutable (bool) – Whether the dataset will be used as an immutable instance or will be modified (Default value = False).

  • strict (bool) – Whether to raise an exception if the dataset doesn’t exist or not (Default value = False)

Returns:

Dataset with the specified name if exists.

Return type:

Optional[Dataset]

get_previous_version(dataset)[source]

Return the previous version of a dataset if any.

get_provenance_tails()[source]

Return the provenance for all datasets.

remove(dataset, date=None, creator=None)[source]

Remove a dataset.

remove_tag(dataset, tag)[source]

Remove a tag from a dataset.

update_during_migration(dataset, commit_sha, date=None, tags=None, remove=False, replace=False, preserve_identifiers=False)[source]

Add, update, remove, or replace a dataset in migration.

Pointer file business logic.

renku.core.dataset.pointer_file.create_external_file(target, path, checksum=None)[source]

Create a new external file.

renku.core.dataset.pointer_file.create_pointer_file(target, checksum=None)[source]

Create a new pointer file.

renku.core.dataset.pointer_file.delete_external_file(dataset_file)[source]

Delete an external file.

renku.core.dataset.pointer_file.get_pointer_file(path)[source]

Return pointer file from an external file.

renku.core.dataset.pointer_file.is_linked_file_updated(path)[source]

Check if an update to a linked file is available.

renku.core.dataset.pointer_file.update_linked_file(path, checksum)[source]

Delete existing linked file and create a new one.

Renku management dataset request models.

class renku.core.dataset.request_model.ImageRequestModel(content_url, position, mirror_locally=False, safe_image_paths=None)[source]

Bases: object

Model for passing image information to dataset use-cases.

to_image_object(dataset)[source]

Convert request model to ImageObject.

Tag management for dataset.

renku.core.dataset.tag.add_dataset_tag(dataset_name, tag, description='', force=False)[source]

Adds a new tag to a dataset.

Validates if the tag already exists and that the tag follows the same rules as docker tags. See https://docs.docker.com/engine/reference/commandline/tag/ for a documentation of docker tag syntax.

Raises:

errors.ParameterError – If tag is too long or contains invalid characters.

renku.core.dataset.tag.get_dataset_by_tag(dataset, tag)[source]

Return a version of dataset that has a specific tag.

Parameters:
  • dataset (Dataset) – A dataset to return its tagged version.

  • tag (str) – Tag name to search for.

Returns:

The dataset pointed to by the tag or None if nothing found.

Return type:

Optional[Dataset]

renku.core.dataset.tag.list_dataset_tags(dataset_name, format)[source]

List all tags for a dataset.

renku.core.dataset.tag.prompt_access_token(exporter)[source]

Prompt user for an access token for a provider.

Returns:

The new access token

renku.core.dataset.tag.prompt_tag_selection(tags)[source]

Prompt user to chose a tag or <HEAD>.

renku.core.dataset.tag.remove_dataset_tags(dataset_name, tags)[source]

Removes tags from a dataset.

Dataset Providers

Providers for dataset import and export

API for providers.

class renku.core.dataset.providers.api.AddProviderInterface[source]

Bases: ABC

Interface defining providers that can add data to a dataset.

static get_add_parameters()[source]

Returns parameters that can be set for add.

abstract get_metadata(uri, destination, **kwargs)[source]

Get metadata of files that will be added to a dataset.

abstract update_files(files, dry_run, delete, context, **kwargs)[source]

Update dataset files from the remote provider.

class renku.core.dataset.providers.api.CloudStorageProviderType(*args, **kwargs)[source]

Bases: Protocol

Intersection type for mypy hinting in storage classes.

abstract convert_to_storage_uri(uri)[source]

Convert backend-specific URI to a URI that is usable by the IStorage implementation.

property uri

Return provider’s URI.

class renku.core.dataset.providers.api.ExportProviderInterface[source]

Bases: ABC

Interface defining export providers.

static get_export_parameters()[source]

Returns parameters that can be set for export.

abstract get_exporter(dataset, *, tag, **kwargs)[source]

Get export manager.

class renku.core.dataset.providers.api.ExporterApi(dataset)[source]

Bases: ABC

Interface defining exporter methods.

property dataset

Return the dataset to be exported.

abstract export(**kwargs)[source]

Execute export process.

abstract get_access_token_url()[source]

Endpoint for creation of access token.

static requires_access_token()[source]

Return if export requires an access token.

abstract set_access_token(access_token)[source]

Set access token.

class renku.core.dataset.providers.api.ImportProviderInterface[source]

Bases: ABC

Interface defining import providers.

static get_import_parameters()[source]

Returns parameters that can be set for import.

abstract get_importer(**kwargs)[source]

Get import manager.

class renku.core.dataset.providers.api.ImporterApi(uri, original_uri)[source]

Bases: ABC

Interface defining importer methods.

abstract copy_extra_metadata(new_dataset)[source]

Copy provider specific metadata once the dataset is created.

abstract download_files(destination, extract)[source]

Download dataset files from the remote provider.

abstract fetch_provider_dataset()[source]

Deserialize this record to a ProviderDataset.

abstract is_latest_version()[source]

Check if record is at last possible version.

is_version_equal_to(dataset)[source]

Check if a dataset has the same version as the record.

property latest_uri

Get URI of the latest version.

property original_uri

Return original URI of this record without any conversion to DOI.

property provider_dataset

Return the remote dataset. This is only valid after a call to fetch_provider_dataset.

property provider_dataset_files

Return list of dataset files. This is only valid after a call to fetch_provider_dataset.

abstract tag_dataset(name)[source]

Create a tag for the dataset name if the remote dataset has a tag/version.

property uri

Return url of this record.

property version

Get record version.

class renku.core.dataset.providers.api.ProviderApi(uri, **kwargs)[source]

Bases: IDatasetProviderPlugin

Interface defining provider methods.

classmethod dataset_provider()[source]

The definition of the provider.

abstract static supports(uri)[source]

Whether or not this provider supports a given URI.

property uri

Return provider’s URI.

class renku.core.dataset.providers.api.ProviderCredentials(provider)[source]

Bases: ABC, UserDict

Credentials of a provider.

NOTE: An empty string, “”, is a valid value. NO_VALUE means that the value for a key is not set.

get_canonical_credentials_names()[source]

Return canonical credentials names that can be used as config keys.

get_canonical_credentials_names_with_no_value()[source]

Return canonical credentials names that can be used as config keys for keys with no valid value.

abstract static get_credentials_names()[source]

Return a tuple of the required credentials for a provider.

get_credentials_names_with_no_value()[source]

Return a tuple of credential keys that don’t have a valid value.

get_credentials_section_name()[source]

Get section name for storing credentials.

NOTE: This methods should be overridden by subclasses to allow multiple credentials per providers if needed. NOTE: Values used in this method shouldn’t depend on ProviderCredentials attributes since we don’t have those attributes when reading credentials. It’s OK to use ProviderApi attributes.

property provider

Return the associated provider instance.

read()[source]

Read credentials from the config and return them. Set non-existing values to None.

store()[source]

Store credentials globally.

class renku.core.dataset.providers.api.StorageProviderInterface[source]

Bases: ABC

Interface defining backend storage providers.

abstract convert_to_storage_uri(uri)[source]

Convert backend-specific URI to a URI that is usable by the IStorage implementation.

abstract get_credentials()[source]

Return an instance of provider’s credential class.

abstract get_storage(credentials=None)[source]

Return the storage manager for the provider.

abstract on_create(dataset)[source]

Hook to perform provider-specific actions on a newly-created dataset.

abstract static supports_storage(uri)[source]

Whether or not this provider supports a given URI storage.

update_files(files, dry_run, delete, context, **kwargs)[source]

Update dataset files from the remote provider.

Dataverse API integration.

class renku.core.dataset.providers.dataverse.DataverseExporter(*, dataset, server_url=None, dataverse_name=None, publish=False)[source]

Bases: ExporterApi

Dataverse export manager.

export(**kwargs)[source]

Execute export process.

get_access_token_url()[source]

Endpoint for creation of access token.

set_access_token(access_token)[source]

Set access token.

class renku.core.dataset.providers.dataverse.DataverseFileSerializer(*, content_size=None, content_url=None, description=None, file_format=None, id=None, identifier=None, name=None, parent_url=None, type=None, encoding_format=None)[source]

Bases: object

Dataverse record file.

property remote_url

Get remote URL as urllib.ParseResult.

class renku.core.dataset.providers.dataverse.DataverseImporter(uri, original_uri, json)[source]

Bases: RepositoryImporter

Dataverse record serializer.

fetch_provider_dataset()[source]

Deserialize a Dataset.

get_files()[source]

Get Dataverse files metadata as DataverseFileSerializer.

is_latest_version()[source]

Check if record is at last possible version.

property latest_uri

Get URI of latest version.

property version

Get the major and minor version of this dataset.

class renku.core.dataset.providers.dataverse.DataverseProvider(uri, is_doi=False)[source]

Bases: ProviderApi, ExportProviderInterface, ImportProviderInterface

Dataverse API provider.

static get_export_parameters()[source]

Returns parameters that can be set for export.

get_exporter(dataset, *, tag, dataverse_server=None, dataverse_name=None, publish=False, **kwargs)[source]

Create export manager for given dataset.

get_importer(**kwargs)[source]

Get importer for a record from Dataverse.

Returns:

The found record

Return type:

DataverseImporter

static record_id(uri)[source]

Extract record id from URI.

static supports(uri)[source]

Check if provider supports a given URI.

renku.core.dataset.providers.dataverse.check_dataverse_doi(doi)[source]

Check if a DOI points to a dataverse dataset.

renku.core.dataset.providers.dataverse.check_dataverse_uri(url)[source]

Check if an URL points to a dataverse instance.

renku.core.dataset.providers.dataverse.make_file_url(file_id, base_url)[source]

Create URL to access record by ID.

renku.core.dataset.providers.dataverse.make_records_url(record_id, base_url)[source]

Create URL to access record by ID.

renku.core.dataset.providers.dataverse.make_versions_url(record_id, base_url)[source]

Create URL to access the versions of a record.

Dataverse metadata templates.

DOI API integration.

class renku.core.dataset.providers.doi.DOIImporter(id, doi, url, abstract=None, author=None, categories=None, container_title=None, contributor=None, copyright=None, issued=None, language=None, publisher=None, title=None, type=None, version=None)[source]

Bases: ImporterApi

Response from doi.org for DOI metadata.

copy_extra_metadata(new_dataset)[source]

Copy provider specific metadata once the dataset is created.

download_files(destination, extract)[source]

Download dataset files from the remote provider.

fetch_provider_dataset()[source]

Deserialize this record to a ProviderDataset.

is_latest_version()[source]

Check if record is at last possible version.

property latest_uri

Get URI of the latest version.

tag_dataset(name)[source]

Create a tag for the dataset name if the remote dataset has a tag/version.

property version

Get record version.

class renku.core.dataset.providers.doi.DOIProvider(uri, headers=None, timeout=3)[source]

Bases: ProviderApi, ImportProviderInterface

doi.org registry API provider.

get_importer(**kwargs)[source]

Get import manager.

static supports(uri)[source]

Whether or not this provider supports a given URI.

renku.core.dataset.providers.doi.make_doi_url(doi)[source]

Create URL to access DOI metadata.

Git dataset provider.

class renku.core.dataset.providers.git.GitProvider(uri, **kwargs)[source]

Bases: ProviderApi, AddProviderInterface

Git provider.

static get_add_parameters()[source]

Returns parameters that can be set for add.

get_metadata(uri, destination, *, sources=None, revision=None, **kwargs)[source]

Get metadata of files that will be added to a dataset.

static supports(uri)[source]

Whether or not this provider supports a given URI.

update_files(files, dry_run, delete, context, ref=None, **kwargs)[source]

Update dataset files from the remote provider.

Local provider for local filesystem.

class renku.core.dataset.providers.local.LocalExporter(dataset, tag, path)[source]

Bases: ExporterApi

Local filesystem export manager.

export(**kwargs)[source]

Execute entire export process.

get_access_token_url()[source]

Endpoint for creation of access token.

static requires_access_token()[source]

Return if export requires an access token.

set_access_token(access_token)[source]

Set access token.

class renku.core.dataset.providers.local.LocalProvider(uri)[source]

Bases: ProviderApi, AddProviderInterface, ExportProviderInterface

Local filesystem provider.

static get_add_parameters()[source]

Returns parameters that can be set for add.

static get_export_parameters()[source]

Returns parameters that can be set for export.

get_exporter(dataset, *, tag, path=None, **kwargs)[source]

Create export manager for given dataset.

get_importer(uri, **kwargs)[source]

Get import manager.

get_metadata(uri, destination, *, move=False, copy=False, link=False, force=False, **kwargs)[source]

Get metadata of files that will be added to a dataset.

static supports(uri)[source]

Whether or not this provider supports a given URI.

update_files(files, dry_run, delete, context, check_data_directory=False, **kwargs)[source]

Update dataset files from the remote provider.

Models for providers.

class renku.core.dataset.providers.models.DatasetAddAction(value)[source]

Bases: Enum

Types of action when adding a file to a dataset.

class renku.core.dataset.providers.models.DatasetAddMetadata(entity_path, url, action, source, destination, provider=None, based_on=None, size=None)[source]

Bases: object

Metadata for a new file that will be added to a dataset.

property from_cloud_storage

Returns if file is from a cloud storage.

get_absolute_commit_path(project_path)[source]

Return path of the file in the repository.

property has_action

Returns if file’s action is not NONE.

property metadata_only

Returns if file should be added to a remote storage.

class renku.core.dataset.providers.models.DatasetUpdateAction(value)[source]

Bases: Enum

Types of action when updating a file in a dataset.

class renku.core.dataset.providers.models.DatasetUpdateMetadata(entity, action)[source]

Bases: object

Metadata for updating dataset files.

class renku.core.dataset.providers.models.ProviderDataset(*args, **kwargs)[source]

Bases: Dataset

A Dataset that is imported from a provider.

property files

Return list of existing files.

classmethod from_dataset(dataset)[source]

Create an instance from a Dataset.

classmethod from_jsonld(data, schema_class=None)[source]

Create an instance from JSON-LD data.

property tag

Return dataset’s tag.

class renku.core.dataset.providers.models.ProviderDatasetFile(source, filename, checksum, filesize, filetype, path)[source]

Bases: object

Store metadata for dataset files that will be downloaded from a provider.

class renku.core.dataset.providers.models.ProviderDatasetSchema(*args, **kwargs)[source]

Bases: DatasetSchema

ProviderDataset schema.

Create an instance.

class Meta[source]

Bases: object

Meta class.

model

alias of ProviderDataset

class renku.core.dataset.providers.models.ProviderParameter(name, default=None, flags=[], help='', is_flag=False, multiple=False, type=None, metavar=None)[source]

Bases: NamedTuple

Provider-specific parameters.

Create new instance of ProviderParameter(name, default, flags, help, is_flag, multiple, type, metavar)

default

Alias for field number 1

flags

Alias for field number 2

help

Alias for field number 3

is_flag

Alias for field number 4

metavar

Alias for field number 7

multiple

Alias for field number 5

name

Alias for field number 0

type

Alias for field number 6

OLOS API integration.

class renku.core.dataset.providers.olos.OLOSExporter(*, dataset, server_url=None)[source]

Bases: ExporterApi

OLOS export manager.

export(**kwargs)[source]

Execute export process.

get_access_token_url()[source]

Endpoint for creation of access token.

set_access_token(access_token)[source]

Set access token.

class renku.core.dataset.providers.olos.OLOSProvider(uri, is_doi=False)[source]

Bases: ProviderApi, ExportProviderInterface

Provider for OLOS integration.

static get_export_parameters()[source]

Returns parameters that can be set for export.

get_exporter(dataset, *, tag, dlcm_server=None, **kwargs)[source]

Create export manager for given dataset.

static supports(uri)[source]

Check if provider supports a given URI for importing.

Renku dataset provider.

class renku.core.dataset.providers.renku.RenkuImporter(uri, name, identifier, tag, latest_version_uri, project_url_ssh, project_url_http, gitlab_token, renku_token)[source]

Bases: ImporterApi

Renku record serializer.

Create a RenkuImporter from a Dataset.

copy_extra_metadata(new_dataset)[source]

Copy provider specific metadata once the dataset is created.

property datadir_exists

Whether the dataset data directory exists (might be missing in git if empty).

download_files(destination, extract)[source]

Download dataset files from the remote provider.

fetch_provider_dataset()[source]

Return encapsulated dataset instance.

is_latest_version()[source]

Check if dataset is at last possible version.

is_version_equal_to(dataset)[source]

Check if a dataset has the identifier as the record.

property latest_uri

Get URI of the latest version.

property project_url

URL of the Renku project in Gitlab.

property repository

The cloned repository that contains the dataset.

tag_dataset(name)[source]

Create a tag for the dataset name if the remote dataset has a tag/version.

property version

Get record version.

class renku.core.dataset.providers.renku.RenkuProvider(uri, **_)[source]

Bases: ProviderApi, ImportProviderInterface

Renku API provider.

static get_import_parameters()[source]

Returns parameters that can be set for import.

get_importer(tag=None, gitlab_token=None, **kwargs)[source]

Retrieves a dataset import manager from Renku.

Parameters:
  • tag (Optional[str]) – Dataset version to import.

  • gitlab_token (Optional[str]) – Gitlab access token.

Returns:

A Renku import manager.

Return type:

RenkuImporter

static supports(uri)[source]

Whether or not this provider supports a given URI.

Base class for online repository data providers.

class renku.core.dataset.providers.repository.RepositoryImporter(uri, original_uri)[source]

Bases: ImporterApi, ABC

Online repository importer.

copy_extra_metadata(new_dataset)[source]

Copy provider specific metadata once the dataset is created.

download_files(destination, extract)[source]

Download dataset files from the remote provider.

tag_dataset(name)[source]

Create a tag for the dataset name if the remote dataset has a tag/version.

renku.core.dataset.providers.repository.make_request(url, accept='application/json')[source]

Execute network request.

Web dataset provider.

class renku.core.dataset.providers.web.WebProvider(uri, **kwargs)[source]

Bases: ProviderApi, AddProviderInterface

A provider for downloading data from web URLs.

get_metadata(uri, destination, *, extract=False, filename=None, multiple=False, **kwargs)[source]

Get metadata of files that will be added to a dataset.

static supports(uri)[source]

Whether or not this provider supports a given URI.

update_files(files, dry_run, delete, context, **kwargs)[source]

Update dataset files from the remote provider.

renku.core.dataset.providers.web.download_file(uri, filename=None, *, project_path, destination, extract=False, multiple=False)[source]

Download a file from a URI and return its metadata.

renku.core.dataset.providers.web.download_files(urls, destination, names, extract)[source]

Download multiple files and return their metadata.

Zenodo API integration.

class renku.core.dataset.providers.zenodo.ZenodoDeposition(exporter, id=None)[source]

Bases: object

Zenodo record for a deposit.

attach_metadata(dataset, tag)[source]

Attach metadata to deposition on Zenodo.

property attach_metadata_url

Return URL for attaching metadata.

property deposit_at

Return deposit at URL.

property new_deposit_url

Return URL for creating new deposit.

new_deposition()[source]

Create new deposition on Zenodo.

publish_deposition()[source]

Publish existing deposition.

property publish_url

Returns publish URL.

property published_at

Return published at URL.

upload_file(filepath, path_in_repo)[source]

Upload and attach a file to existing deposition on Zenodo.

property upload_file_url

Return URL for uploading file.

class renku.core.dataset.providers.zenodo.ZenodoExporter(dataset, publish, tag)[source]

Bases: ExporterApi

Zenodo export manager.

dataset_to_request()[source]

Prepare dataset metadata for request.

property default_params

Create request default parameters.

export(**kwargs)[source]

Execute entire export process.

get_access_token_url()[source]

Endpoint for creation of access token.

set_access_token(access_token)[source]

Set access token.

property zenodo_url

Returns correct Zenodo URL based on environment.

class renku.core.dataset.providers.zenodo.ZenodoFileSerializer(*, id=None, checksum=None, links=None, filename=None, filesize=None)[source]

Bases: object

Zenodo record file.

property remote_url

Get remote URL as urllib.ParseResult.

property type

Get file type.

class renku.core.dataset.providers.zenodo.ZenodoImporter(*, uri, original_uri, json)[source]

Bases: RepositoryImporter

Zenodo importer.

fetch_provider_dataset()[source]

Deserialize a Dataset.

get_files()[source]

Get Zenodo files metadata as ZenodoFile.

get_jsonld()[source]

Get record metadata as jsonld.

is_latest_version()[source]

Check if this record is the latest version.

property latest_uri

Get URI of latest version.

property version

Get record version.

class renku.core.dataset.providers.zenodo.ZenodoMetadataSerializer(*, access_right=None, communities=None, contributors=None, creators=None, description=None, doi=None, extras=None, grants=None, image_type=None, journal_issue=None, journal_pages=None, journal_title=None, journal_volume=None, keywords=None, language=None, license=None, notes=None, prereserve_doi=None, publication_date=None, publication_type=None, references=None, related_identifiers=None, title=None, upload_type=None, version=None)[source]

Bases: object

Zenodo metadata.

classmethod from_metadata(metadata)[source]

Create an instance from a metadata dict.

Parameters:

metadata – The dict data to convert.

Returns:

Serializer containing data in deserialized form.

Return type:

ZenodoMetadataSerializer

class renku.core.dataset.providers.zenodo.ZenodoProvider(uri, is_doi=False)[source]

Bases: ProviderApi, ExportProviderInterface, ImportProviderInterface

Zenodo registry API provider.

static get_export_parameters()[source]

Returns parameters that can be set for export.

get_exporter(dataset, *, tag, publish=False, **kwargs)[source]

Create export manager for given dataset.

get_importer(**kwargs)[source]

Get importer for a record from Zenodo.

static get_record_id(uri)[source]

Extract record id from URI.

static supports(uri)[source]

Whether this provider supports a given URI.

renku.core.dataset.providers.zenodo.make_records_url(record_id, uri)[source]

Create URL to access record by ID.

Parameters:

record_id – The id of the record.

Returns:

Full URL for the record.

Return type:

str

Workflows

Activity management.

class renku.core.workflow.activity.ModifiedActivitiesEntities(modified, deleted, hidden_modified)[source]

Bases: NamedTuple

A class containing sets of modified/deleted activities and entities for both normal and hidden entities.

Create new instance of ModifiedActivitiesEntities(modified, deleted, hidden_modified)

deleted

Set of deleted activity and entity tuples.

hidden_modified

Set of modified activity and entity tuples for hidden entities.

modified

Set of modified activity and entity tuples.

renku.core.workflow.activity.add_activity_if_recent(activity, activities)[source]

Add activity to activities if it’s not in the set or is the latest executed instance.

Remove existing activities that were executed earlier.

renku.core.workflow.activity.create_activity_graph(activities, remove_overridden_parents=True, with_inputs_outputs=False, with_hidden_dependencies=False)[source]

Create a dependency DAG from activities.

renku.core.workflow.activity.filter_overridden_activities(activities)[source]

Filter out overridden activities from a list of activities.

renku.core.workflow.activity.get_activities_until_paths(paths, sources, activity_gateway, revision=None)[source]

Get all current activities leading to paths, from sources.

renku.core.workflow.activity.get_all_modified_and_deleted_activities_and_entities(repository, activity_gateway, check_hidden_dependencies=False)[source]

Return latest activities with at least one modified or deleted input along with the modified/deleted input entity.

An activity can be repeated if more than one of its inputs are modified.

Parameters:
  • repository – The current Repository.

  • activity_gateway (IActivityGateway) – The injected Activity gateway.

Returns:

Modified and deleted activities and entities.

Return type:

ModifiedActivitiesEntities

renku.core.workflow.activity.get_downstream_generating_activities(starting_activities, paths, ignore_deleted, project_path, activity_gateway)[source]

Return activities downstream of passed activities that generate at least a path in paths.

Parameters:
  • starting_activities (Set[Activity]) – Activities to use as starting/upstream nodes.

  • paths (List[str]) – Optional generated paths to end downstream chains at.

  • ignore_deleted (bool) – Whether to ignore deleted generations.

  • project_path (Path) – Path to project’s root directory.

  • activity_gateway (IActivityGateway) – The injected Activity gateway.

Returns:

All activities and their downstream activities.

Return type:

Set[Activity]

renku.core.workflow.activity.get_latest_activity(activities)[source]

Return the activity that was executed after all other activities.

renku.core.workflow.activity.get_latest_activity_before(activities, activity)[source]

Return the latest activity that was executed before the passed activity.

renku.core.workflow.activity.get_modified_activities(activities, repository, check_hidden_dependencies)[source]

Get lists of activities that have modified/deleted usage entities.

renku.core.workflow.activity.is_activity_valid(activity)[source]

Return whether this plan has not been deleted.

Parameters:

activity (Activity) – The Activity whose Plan should be checked.

Returns:

True if the activities’ Plan is still valid, False otherwise.

Return type:

bool

renku.core.workflow.activity.revert_activity(*, activity_gateway, activity_id, delete_plan, force, metadata_only)[source]

Revert an activity.

Parameters:
  • activity_gateway (IActivityGateway) – The injected activity gateway.

  • activity_id (str) – ID of the activity to be reverted.

  • delete_plan (bool) – Delete the plan if it’s not used by any other activity.

  • force (bool) – Revert the activity even if it has some downstream activities.

  • metadata_only (bool) – Only revert the metadata and don’t touch generated files.

Returns:

The deleted activity.

renku.core.workflow.activity.sort_activities(activities, remove_overridden_parents=True)[source]

Return a sorted list of activities based on their dependencies and execution order.

Plan management.

renku.core.workflow.plan.compose_workflow(name, description, mappings, defaults, links, param_descriptions, map_inputs, map_outputs, map_params, link_all, keywords, steps, sources, sinks, creators, activity_gateway, plan_gateway, project_gateway)[source]

Compose workflows into a CompositePlan.

Parameters:
  • name (str) – Name of the new composed Plan.

  • description (Optional[str]) – Description for the Plan.

  • mappings (Optional[List[str]]) – Mappings between parameters of this and child Plans.

  • defaults (Optional[List[str]]) – Default values for parameters.

  • links (Optional[List[str]]) – Links between parameters of child Plans.

  • param_descriptions (Optional[List[str]]) – Descriptions of parameters.

  • map_inputs (bool) – Whether or not to automatically expose child inputs.

  • map_outputs (bool) – Whether or not to automatically expose child outputs.

  • map_params (bool) – Whether or not to automatically expose child parameters.

  • link_all (bool) – Whether or not to automatically link child steps’ parameters.

  • keywords (Optional[List[str]]) – Keywords for the Plan.

  • steps (Optional[List[str]]) – Child steps to include.

  • sources (Optional[List[str]]) – Starting files when automatically detecting child Plans.

  • sinks (Optional[List[str]]) – Ending files when automatically detecting child Plans.

  • creators (Optional[List[Person]]) – Creator(s) of the composite plan.

  • activity_gateway (IActivityGateway) – Injected activity gateway.

  • plan_gateway (IPlanGateway) – Injected plan gateway.

  • project_gateway (IProjectGateway) – Injected project gateway.

Returns:

The newly created CompositePlan.

renku.core.workflow.plan.edit_workflow(name, new_name, description, set_params, map_params, rename_params, describe_params, creators, keywords, plan_gateway, custom_metadata=None)[source]

Edits a workflow details.

Parameters:
  • name (str) – Name of the Plan to edit.

  • new_name (Optional[str]) – New name of the Plan.

  • description (Optional[str]) – New description of the Plan.

  • set_params (List[str]) – New default values for parameters.

  • map_params (List[str]) – New mappings for Plan.

  • rename_params (List[str]) – New names for parameters.

  • describe_params (List[str]) – New descriptions for parameters.

  • creators (Union[List[Person], NoValueType]) – Creators of the workflow.

  • keywords (Union[List[str], NoValueType]) – New keywords for the workflow.

  • plan_gateway (IPlanGateway) – Injected plan gateway.

  • custom_metadata (Dict, optional) – Custom JSON-LD metadata (Default value = None).

Returns:

Details of the modified Plan.

renku.core.workflow.plan.export_workflow(name_or_id, plan_gateway, format, output, values, basedir, resolve_paths, nest_workflows)[source]

Export a workflow to a given format.

Parameters:
  • name_or_id – name or id of the Plan to export

  • plan_gateway (IPlanGateway) – The injected Plan gateway.

  • format (str) – Format to export to.

  • output (Optional[str]) – Output path to store result at.

  • values (Optional[Dict[str,Any]]) – Parameter names and values to apply before export.

  • basedir (Optional[str]) – The base path prepended to all paths in the exported workflow, if None it defaults to the absolute path of the renku project.

  • resolve_paths (Optional[bool]) – Resolve all symlinks and make paths absolute, defaults to True.

  • nest_workflows (Optional[bool]) – Whether to try to nest all workflows into one specification and file or not, defaults to False.

Returns:

The exported workflow as string.

renku.core.workflow.plan.get_activities(plan, activity_gateway)[source]

Return all valid activities that use the plan or one of its parent/child derivatives.

renku.core.workflow.plan.get_composite_plans_by_child(plan, plan_gateway)[source]

Return all composite plans that contain a child plan.

renku.core.workflow.plan.get_derivative_chain(plan, plan_gateway)[source]

Return all plans in the derivative chain of a given plan including its parents/children and the plan itself.

renku.core.workflow.plan.get_initial_id(plan, plan_gateway)[source]

Return the id of the first plan in the derivative chain.

renku.core.workflow.plan.get_latest_plan(plan, plan_gateway)[source]

Return the latest version of a given plan in its derivative chain.

renku.core.workflow.plan.get_plan(plan_gateway, name_or_id_or_path=None, workflow_file=None)[source]

Return the latest version of a given plan in its derivative chain.

renku.core.workflow.plan.get_plans_with_metadata(activity_gateway, plan_gateway)[source]

Get all plans in the project with additional metadata.

Adds information about last execution, number of executions and whether the plan was used to create files currently existing in the project.

renku.core.workflow.plan.is_plan_removed(plan)[source]

Return true if the plan or any plan in its derivative chain is deleted.

renku.core.workflow.plan.list_workflows(plan_gateway, format, columns)[source]

List or manage workflows with subcommands.

Parameters:
  • plan_gateway (IPlanGateway) – The injected Plan gateway.

  • format (str) – The output format.

  • columns (List[str]) – The columns to show for tabular output.

Returns:

List of workflows formatted by format.

renku.core.workflow.plan.remove_plan(name_or_id, force, plan_gateway)[source]

Remove the workflow by its name or id.

Parameters:
  • name_or_id (str) – The name of the Plan to remove.

  • force (bool) – Whether to force removal or not.

  • plan_gateway (IPlanGateway) – The injected Plan gateway.

  • when (datetime) – Time of deletion (Default value = current local date/time).

Raises:

errors.ParameterError – If the Plan doesn’t exist or was already deleted.

renku.core.workflow.plan.search_workflows(name, plan_gateway)[source]

Get all the workflows whose Plan.name start with the given name.

Parameters:
  • name (str) – The name to search for.

  • plan_gateway (IPlanGateway) – Injected Plan gateway.

Returns:

All Plans whose name starts with name.

renku.core.workflow.plan.show_workflow(name_or_id_or_path, activity_gateway, with_metadata=False)[source]

Show the details of a workflow.

Parameters:
  • name_or_id_or_path (str) – Name or id of the Plan to show or path to a workflow file.

  • activity_gateway (IActivityGateway) – The injected Activity gateway.

  • with_metadata (bool) – Whether to get additional calculated metadata for the plan.

Returns:

Details of the Plan.

renku.core.workflow.plan.visualize_graph(sources, targets, show_files, activity_gateway, revision=None)[source]

Visualize an activity graph.

Parameters:
  • sources (List[str]) – Input paths to start the visualized graph at.

  • targets (List[str]) – Output paths to end the visualized graph at.

  • show_files (bool) – Whether or not to show file nodes.

  • activity_gateway (IActivityGateway) – The injected activity gateway.

  • revision (Optional[str], optional) – Revision or revision range to show the graph for (Default value = None)

Returns:

Graph visualization view model.

renku.core.workflow.plan.workflow_inputs(activity_gateway, paths=None)[source]

Get inputs used by workflows.

Parameters:
  • activity_gateway (IActivityGateway) – The injected activity gateway.

  • paths (List[str], optional) – List of paths to consider as inputs (Default value = None).

Returns:

Set of input file paths.

Return type:

Set[str]

renku.core.workflow.plan.workflow_outputs(activity_gateway, paths=None)[source]

Get inputs used by workflows.

Parameters:
  • activity_gateway (IActivityGateway) – The injected activity gateway.

  • paths (List[str], optional) – List of paths to consider as outputs (Default value = None).

Returns:

Set of output file paths.

Return type:

Set[str]

Plan execution.

renku.core.workflow.execute.check_for_cycles(graph)[source]

Check for cycles in the graph and raises an error if there are any.

renku.core.workflow.execute.execute_workflow(name_or_id, set_params, provider, config, values, plan_gateway)[source]

Execute a plan with specified values.

Parameters:
  • name_or_id (str) – Name or id of the Plan to iterate.

  • set_params (List[str]) – List of values specified for workflow parameters.

  • provider (str) – Name of the workflow provider backend to use for execution.

  • config (Optional[str]) – Path to config for the workflow provider.

  • values (Optional[str]) – Path to YAMl file containing values specified for workflow parameters.

  • plan_gateway (IPlanGateway) – The plan gateway.

renku.core.workflow.execute.execute_workflow_graph(dag, activity_gateway, plan_gateway, provider='toil', config=None, workflow_file_plan=None)[source]

Execute a Run with/without subprocesses.

Parameters:
  • dag (DiGraph) – The workflow graph to execute.

  • activity_gateway (IActivityGateway) – The injected activity gateway.

  • plan_gateway (IPlanGateway) – The injected plan gateway.

  • provider – Provider to run the workflow with (Default value = “toil”).

  • config – Path to config for the workflow provider (Default value = None).

  • workflow_file_plan (Optional[WorkflowFileCompositePlan) – If passed, a workflow file is executed, so, store related metadata.

renku.core.workflow.execute.iterate_workflow(name_or_id, mapping_path, mappings, dry_run, provider, config, plan_gateway)[source]

Iterate a workflow repeatedly with differing values.

Parameters:
  • name_or_id (str) – Name or id of the Plan to iterate.

  • mapping_path (str) – Path to file defining workflow mappings.

  • mappings (List[str]) – List of workflow mappings.

  • dry_run (bool) – Whether to preview execution or actually run it.

  • provider (str) – Name of the workflow provider backend to use for execution.

  • config (Optional[str]) – Path to config for the workflow provider.

  • plan_gateway (IPlanGateway) – The plan gateway.

Build an execution graph for a workflow.

class renku.core.workflow.model.concrete_execution_graph.ExecutionGraph(workflows, virtual_links=False)[source]

Bases: object

Represents an execution graph for one or more workflow steps.

static are_paths_linked(*, output, input)[source]

Return True if input has a relation to output (i.e. can be generated by it).

calculate_concrete_execution_graph(virtual_links=False)[source]

Create an execution DAG between Plans showing dependencies between them.

Resolve ParameterLink’s involving ParameterMapping’s to the underlying actual parameters and potentially also virtual links determined by parameter values.

property cycles

Get potential cycles in execution graph.

property workflow_graph

Return a subgraph with only workflows and their dependencies.

Models to represent a workflow definition file.

class renku.core.workflow.model.workflow_file.BaseParameter(description=None, implicit=False, name=None, name_set_by_user=None, position=None, prefix=None)[source]

Bases: object

Base class for Input, Output, and Parameter.

class renku.core.workflow.model.workflow_file.BasePath(path, mapped_to=None, persist=True, **kwargs)[source]

Bases: BaseParameter

Base for workflow Input/Output.

class renku.core.workflow.model.workflow_file.HiddenParameter(value, **kwargs)[source]

Bases: Parameter

A parameter that isn’t defined by user and is created by Renku.

class renku.core.workflow.model.workflow_file.Input(path, mapped_to=None, persist=True, **kwargs)[source]

Bases: BasePath

An input to a workflow file.

to_command_input(plan_id, index)[source]

Convert to a Plan input.

class renku.core.workflow.model.workflow_file.Output(path, mapped_to=None, persist=True, **kwargs)[source]

Bases: BasePath

An output from a workflow file.

to_command_output(plan_id, index)[source]

Convert to a Plan output.

class renku.core.workflow.model.workflow_file.Parameter(value, **kwargs)[source]

Bases: BaseParameter

A parameter for a workflow file.

to_command_parameter(plan_id, index)[source]

Convert to a Plan parameter.

class renku.core.workflow.model.workflow_file.Step(*, command, date_created=None, description=None, inputs=None, keywords=None, name, original_command=None, outputs=None, parameters=None, path, success_codes=None, workflow_file_name)[source]

Bases: object

A single step in a workflow file.

to_plan(project_id)[source]

Convert a step to a WorkflowFilePlan.

class renku.core.workflow.model.workflow_file.StepCommandParser(step)[source]

Bases: nodevisitor

A class to parse a workflow file step’s command and figure out position of its arguments.

find_and_process_argument(*, cls, collection, prefix, value, exclude_prefix, are_values_equal)[source]

Find the given input/output/parameter in the collection.

find_argument(prefix, value, search_outputs, exclude_prefix)[source]

Check the argument against inputs/outputs/parameters to find its type.

Parameters:
  • prefix (Optional[str]) – Argument prefix.

  • value (str) – Argument value.

  • search_outputs (bool) – Whether to search outputs or not. When no part of a command hasn’t found yet, we shouldn’t search outputs.

  • exclude_prefix (bool) – Whether prefix must be excluded in search or not. When the command hasn’t been fully found, the prefix might be part of the command and not the argument.

Returns:

An Input, Output, or Parameter instance if a match is found; None otherwise.

Return type:

Optional[BaseParameter]

find_argument_by_name(name, collection=None)[source]

Find an inputs/outputs/parameters with the given name.

find_input(prefix, path, exclude_prefix)[source]

Search for a given path in inputs.

find_output(prefix, path, exclude_prefix)[source]

Search for a given path in outputs.

find_parameter(prefix, value, exclude_prefix)[source]

Find a parameter based on its value and prefix; adjust parameter’s prefix if needed.

get_node_as_string(node)[source]

Return user-friendly error message for bashlex nodes.

parse_command()[source]

Parse the command and assign some properties (e.g. position) for parameters and the command.

static unescape_dollar_sign(word)[source]

Unescape $ in the command.

visitcommand(node, parts)[source]

Process a full command.

NOTE: There is exactly one command node since we don’t allow piping or multiple commands. Therefore, this method is called only once.

visitnode(node)[source]

Start of processing a single command token.

visitredirect(node, input, type, output, heredoc)[source]

Process a redirect.

class renku.core.workflow.model.workflow_file.WorkflowFile(path, steps, name, description=None, keywords=None)[source]

Bases: object

A workflow definition file.

set_missing_names()[source]

Set missing names for attributes.

to_plan(project_gateway)[source]

Convert a workflow file to a CompositePlan.

validate()[source]

Validate a workflow file.

renku.core.workflow.model.workflow_file.calculate_derivatives(plan, plan_gateway, sequence=None)[source]

Check for existing plans and set derivative chain if needed.

Parameters:
  • plan (Union[WorkflowFileCompositePlan, WorkflowFilePlan]) – The potential derivative plan.

  • plan_gateway (IPlanGateway) – PlanGateway instance.

  • sequence (Optional[itertools.count]) – sequence is used to generate deterministic IDs in case a previous ID cannot be used for a plan (e.g. when a plan was deleted and re-executed) (Default value = False).

Returns:

An existing plan, if found and is the same; otherwise, the

passed-in plan.

Return type:

Union[WorkflowFileCompositePlan, WorkflowFilePlan]

renku.core.workflow.model.workflow_file.generate_qualified_plan_name(workflow_file_name, step_name)[source]

Generate name for WorkflowFile and Step.

Represent a PlanFactory for tracking workflows.

class renku.core.workflow.plan_factory.PlanFactory(command_line, explicit_inputs=None, explicit_outputs=None, explicit_parameters=None, directory=None, working_dir=None, no_input_detection=False, no_output_detection=False, no_parameter_detection=False, success_codes=None, stdin=None, stdout=None, stderr=None)[source]

Bases: object

Factory for creating a plan from a command line call.

add_command_input(default_value, prefix=None, position=None, postfix=None, name=None, encoding_format=None)[source]

Create a CommandInput.

add_command_output(default_value, prefix=None, position=None, postfix=None, encoding_format=None, name=None, id=None, mapped_to=None)[source]

Create a CommandOutput.

add_command_output_from_input(input, name)[source]

Create a CommandOutput from an input.

add_command_output_from_parameter(parameter, name)[source]

Create a CommandOutput from a parameter.

add_command_parameter(default_value, prefix=None, position=None, name=None)[source]

Create a CommandParameter.

add_explicit_inputs()[source]

Add explicit inputs .

add_explicit_parameters()[source]

Add explicit parameters.

add_indirect_inputs()[source]

Read indirect inputs list and add them to explicit inputs.

add_indirect_outputs()[source]

Read indirect outputs list and add them to explicit outputs.

add_inputs_and_parameters(arguments)[source]

Yield command input parameters.

add_outputs(candidates)[source]

Yield detected output and changed command input parameter.

get_stream_mapping_for_value(value)[source]

Return a stream mapping if value is a path mapped to a stream.

guess_type(value, ignore_filenames=None)[source]

Return new value and CWL parameter type.

iter_input_files(basedir)[source]

Yield tuples with input id and path.

split_command_and_args()[source]

Return tuple with command and args from command line arguments.

to_plan(project_gateway, name=None, description=None, keywords=None, creators=None, date_created=None)[source]

Return an instance of Plan based on this factory.

watch(no_output=False)[source]

Watch a Renku repository for changes to detect outputs.

renku.core.workflow.plan_factory.add_indirect_parameter(working_dir, name, value)[source]

Add a parameter to indirect parameters.

renku.core.workflow.plan_factory.add_to_files_list(file_list_path, name, path)[source]

Add a parameter to indirect parameters.

renku.core.workflow.plan_factory.delete_indirect_files_list(working_dir)[source]

Remove indirect inputs, outputs, and parameters list.

renku.core.workflow.plan_factory.get_indirect_inputs_path(project_path)[source]

Return path to file that contains indirect inputs list.

renku.core.workflow.plan_factory.get_indirect_outputs_path(project_path)[source]

Return path to file that contains indirect outputs list.

renku.core.workflow.plan_factory.get_indirect_parameters_path(project_path)[source]

Return path to file that contains indirect parameters list.

renku.core.workflow.plan_factory.read_files_list(files_list)[source]

Read files list yaml containing name:path pairs.

renku.core.workflow.plan_factory.read_indirect_parameters(working_dir)[source]

Read and return indirect parameters.

Represent the Common Workflow Language types.

class renku.core.workflow.types.Directory(path, listing=None)[source]

Represent a directory.

class renku.core.workflow.types.File(path, mime_type=None)[source]

Represent a file.

Resolution of Workflow execution values precedence.

class renku.core.workflow.value_resolution.CompositePlanValueResolver(plan, values=None)[source]

Bases: ValueResolver

Value resolution class for a CompositePlan.

Applies values and default_values to a nested workflow.

Order of precedence is as follows (from lowest to highest): - Default value on a parameter - Default value on a mapping to the parameter - Value passed to a mapping to the parameter - Value passed to the parameter - Value propagated to a parameter from the source of a ParameterLink

apply()[source]

Applies values and default_values to a CompositePlan.

Returns:

A CompositePlan with values applied.

class renku.core.workflow.value_resolution.PlanValueResolver(plan, values)[source]

Bases: ValueResolver

Value resolution class for a Plan.

Applies values and default_values to a workflow.

apply()[source]

Applies values and default_values to a Plan.

Returns:

A Plan with values applied.

class renku.core.workflow.value_resolution.TemplateVariableFormatter[source]

Bases: Formatter

Template variable formatter for CommandParameterBase.

apply(param, parameters=None)[source]

Renders the parameter template into its final value.

get_value(key, args, kwargs)[source]

Ignore some special keys when formatting the variable.

static to_map(parameters)[source]

Converts a list of CommandParameterBase into parameter name-value dictionary.

class renku.core.workflow.value_resolution.ValueResolver(plan, values)[source]

Bases: ABC

Value resolution class for an AbstractPlan.

abstract apply()[source]

Applies values and default_values to a potentially nested workflow.

Returns:

The AbstractPlan with the user provided values set.

Return type:

AbstractPlan

static get(plan, values)[source]

Factory method to obtain the specific ValueResolver for a workflow.

Parameters:
  • plan (AbstractPlan) – a workflow.

  • values (Dict[str, Any]) – user defined dictionary of runtime values for the provided workflow.

Returns:

A ValueResolver object.

Return type:

“ValueResolver”

Apply values from parameter links.

Parameters:

workflow (CompositePlan) – The workflow whose links values should be applied on.

Workflow file core logic.

renku.core.workflow.workflow_file.filter_steps(workflow, steps)[source]

Return a subset of workflow file steps.

renku.core.workflow.workflow_file.get_workflow_file_inputs_and_outputs(workflow_file, steps)[source]

Return a list of all inputs and outputs that must be committed.

renku.core.workflow.workflow_file.run_workflow_file(path, steps, dry_run, workflow_file, provider, plan_gateway)[source]

Run a workflow file.

Sessions

Docker based interactive session provider.

class renku.core.session.docker.DockerSessionProvider[source]

Bases: ISessionProvider

A docker based interactive session provider.

build_image(image_descriptor, image_name, config)[source]

Builds the container image.

docker_client()[source]

Get the docker client.

Note

This is not a @property, even though it should be, because pluggy will call it in that case in unrelated parts of the code.

Raises:

errors.DockerError – Exception when docker is not available.

Returns:

The docker client.

find_image(image_name, config)[source]

Find the given container image.

force_build_image(force_build=False, **kwargs)[source]

Whether we should force build the image directly or check for an existing image first.

get_open_parameters()[source]

Returns parameters that can be set for session open.

get_start_parameters()[source]

Returns parameters that can be set for session start.

is_remote_provider()[source]

Return True for remote providers (i.e. not local Docker).

property name

Return session provider’s name.

session_list(project_name)[source]

Lists all the sessions currently running by the given session provider.

Returns:

a list of sessions.

Return type:

list

session_open(project_name, session_name, **kwargs)[source]

Open a given interactive session.

Parameters:
  • project_name (str) – Renku project name.

  • session_name (Optional[str]) – The unique id of the interactive session.

session_provider()[source]

Supported session provider.

Returns:

A reference to self.

session_start(image_name, project_name, config, cpu_request=None, mem_request=None, disk_request=None, gpu_request=None, **kwargs)[source]

Creates an interactive session.

Returns:

Provider message and a possible warning message.

Return type:

Tuple[str, str]

session_stop(project_name, session_name, stop_all)[source]

Stops all or a given interactive session.

session_url(session_name)[source]

Get the URL of the interactive session.

Interactive session business logic.

class renku.core.session.session.SessionList(sessions, all_local, warning_messages)[source]

Bases: NamedTuple

Session list return.

Create new instance of SessionList(sessions, all_local, warning_messages)

all_local

Alias for field number 1

sessions

Alias for field number 0

warning_messages

Alias for field number 2

renku.core.session.session.search_session_providers(name)[source]

Get all session providers that their name starts with the given name.

Parameters:

name (str) – The name to search for.

Returns:

All session providers whose name starts with name.

renku.core.session.session.search_sessions(name, provider=None)[source]

Get all sessions that their name starts with the given name.

Parameters:
  • name (str) – The name to search for.

  • provider (Optional[str]) – Name of the session provider to use (Default value = None).

Returns:

All sessions whose name starts with name.

renku.core.session.session.session_list(*, provider=None)[source]

List interactive sessions.

Parameters:

provider (Optional[str]) – Name of the session provider to use (Default value = None).

Returns:

The list of sessions, whether they’re all local sessions and potential warnings raised.

renku.core.session.session.session_open(session_name, provider=None, **kwargs)[source]

Open interactive session in the browser.

Parameters:
  • session_name (Optional[str]) – Name of the session to open.

  • provider (Optional[str]) – Name of the session provider to use.

renku.core.session.session.session_start(config_path, provider, image_name=None, cpu_request=None, mem_request=None, disk_request=None, gpu_request=None, **kwargs)[source]

Start interactive session.

Parameters:
  • config_path (str, optional) – Path to config YAML.

  • provider (str, optional) – Name of the session provider to use.

  • image_name (str, optional) – Image to start.

  • cpu_request (float, optional) – Number of CPUs to request.

  • mem_request (str, optional) – Size of memory to request.

  • disk_request (str, optional) – Size of disk to request (if supported by provider).

  • gpu_request (str, optional) – Number of GPUs to request.

renku.core.session.session.session_stop(session_name, stop_all=False, provider=None)[source]

Stop interactive session.

Parameters:
  • session_name (Optional[str]) – Name of the session to open.

  • stop_all (bool) – Whether to stop all sessions or just the specified one.

  • provider (Optional[str]) – Name of the session provider to use.

renku.core.session.session.ssh_setup(existing_key=None, force=False)[source]

Setup SSH keys for SSH connections to sessions.

Parameters:
  • existing_key (Path, optional) – Existing private key file to use instead of generating new ones.

  • force (bool) – Whether to prompt before overwriting keys or not

Templates

Template management.

class renku.core.template.template.EmbeddedTemplates(path, source, reference, version, skip_validation=False)[source]

Bases: TemplatesSource

Represent templates that are bundled with Renku.

For embedded templates, source is “renku”. In the old versioning scheme, version is set to the installed Renku version and reference is not set. In the new scheme, both version and reference are set to the template version.

classmethod fetch(source, reference)[source]

Fetch embedded Renku templates.

get_all_references(id)[source]

Return all available references for a template id.

get_latest_reference_and_version(id, reference, version)[source]

Return latest reference and version number of a template.

get_template(id, reference)[source]

Return all available versions for a template id.

class renku.core.template.template.FileAction(value)[source]

Bases: IntEnum

Types of operation when copying a template to a project.

class renku.core.template.template.RepositoryTemplates(path, source, reference, version, repository, skip_validation=False)[source]

Bases: TemplatesSource

Represent a local/remote template repository.

A template repository is checked out at a specific Git reference if one is provided. However, it’s still possible to get available versions of templates.

For these templates, reference is set to whatever user passed as a reference (defaults to remote HEAD if not passed) and version is set to the commit SHA of the reference commit.

classmethod fetch(source, reference)[source]

Fetch a template repository.

get_all_references(id)[source]

Return a list of git tags that are valid SemVer and include a template id.

get_latest_reference_and_version(id, reference, version)[source]

Return latest reference and version number of a template.

get_template(id, reference)[source]

Return a template at a specific reference.

class renku.core.template.template.TemplateAction(value)[source]

Bases: Enum

Types of template rendering.

renku.core.template.template.copy_template_to_project(rendered_template, project, actions, cleanup=True)[source]

Update project files and metadata from a template.

renku.core.template.template.fetch_templates_source(source, reference)[source]

Fetch a template.

renku.core.template.template.get_file_actions(rendered_template, template_action, interactive)[source]

Render a template regarding files in a project.

renku.core.template.template.get_sorted_actions(actions)[source]

Return a sorted actions list.

renku.core.template.template.has_template_checksum()[source]

Return if project has a templates checksum file.

renku.core.template.template.is_renku_template(source)[source]

Return if template comes from Renku.

renku.core.template.template.read_template_checksum()[source]

Read templates checksum file for a project.

renku.core.template.template.set_template_parameters(template, template_metadata, input_parameters, interactive=False)[source]

Set and verify template parameters’ values in the template_metadata.

renku.core.template.template.write_template_checksum(checksums)[source]

Write templates checksum file for a project.

Template use cases.

renku.core.template.usecase.check_for_template_update(project)[source]

Check if the project can be updated to a newer version of the project template.

renku.core.template.usecase.does_dockerfile_contain_only_version_change()[source]

Return True if Dockerfile only contains Renku version changes.

renku.core.template.usecase.is_dockerfile_updated_by_user()[source]

Return if user modified the Dockerfile.

renku.core.template.usecase.list_templates(source, reference)[source]

Return available templates from a source.

renku.core.template.usecase.select_template(templates_source, id=None)[source]

Select a template from a template source.

renku.core.template.usecase.set_template(source, reference, id, force, interactive, input_parameters, dry_run)[source]

Set template for a project.

renku.core.template.usecase.show_template(source, reference, id)[source]

Show template details.

renku.core.template.usecase.update_dockerfile_checksum(new_checksum)[source]

Update Dockerfile template checksum if possible.

renku.core.template.usecase.update_template(force, interactive, dry_run)[source]

Update project’s template if possible. Return corresponding viewmodel if updated.

renku.core.template.usecase.validate_templates(source=None, reference=None)[source]

Validate a template repository.

Parameters:
  • source (str, optional) – Remote repository URL to clone and check (Default value = None).

  • reference (str, optional) – Git commit/branch/tag to check (Default value = None).

Returns:

Dictionary containing errors and warnings for manifest and

templates, along with a valid field telling if all checks passed.

Return type:

Dict[str, Union[str, Dict[str, List[str]]]]

Errors

Errors that can be raised by renku.core.

Renku exceptions.

exception renku.core.errors.ActivityDownstreamNotEmptyError(activity)[source]

Bases: RenkuException

Raised when an activity cannot be deleted because its downstream is not empty.

exception renku.core.errors.AuthenticationError[source]

Bases: RenkuException

Raise when there is a problem with authentication.

exception renku.core.errors.ChildWorkflowNotFoundError(child, workflow)[source]

Bases: WorkflowError

Raised when a child could not be found on a composite workflow.

Embed exception and build a custom message.

exception renku.core.errors.CommandFinalizedError[source]

Bases: RenkuException

Raised when trying to modify a finalized command builder.

exception renku.core.errors.CommandNotFinalizedError[source]

Bases: RenkuException

Raised when a non-finalized command is executed.

exception renku.core.errors.CommitMessageEmpty[source]

Bases: RenkuException

Raise invalid commit message.

Build a custom message.

exception renku.core.errors.CommitProcessingError[source]

Bases: RenkuException

Raised when a commit couldn’t be processed during graph build.

exception renku.core.errors.ConfigurationError[source]

Bases: RenkuException

Raise in case of misconfiguration; use GitConfigurationError for git-related configuration errors.

exception renku.core.errors.DatasetException[source]

Bases: RenkuException

Base class for all dataset-related exceptions.

exception renku.core.errors.DatasetExistsError(name)[source]

Bases: DatasetException

Raise when trying to create an existing dataset.

exception renku.core.errors.DatasetImageError[source]

Bases: DatasetException

Raised when a local dataset image is not accessible.

exception renku.core.errors.DatasetImportError[source]

Bases: DatasetException

Raised when a dataset cannot be imported/pulled from a remote source.

exception renku.core.errors.DatasetNotFound(*, name=None, message=None)[source]

Bases: DatasetException

Raise when dataset is not found.

Build a custom message.

exception renku.core.errors.DatasetProviderNotFound(*, name=None, uri=None, message=None)[source]

Bases: DatasetException, ParameterError

Raised when a dataset provider cannot be found based on a URI or a provider name.

exception renku.core.errors.DatasetTagNotFound(tag)[source]

Bases: DatasetException

Raise when a tag can’t be found.

exception renku.core.errors.DirectoryNotEmptyError(path)[source]

Bases: RenkuException

Raised when a directory passed as output is not empty.

Build a custom message.

exception renku.core.errors.DirtyRenkuDirectory(repository)[source]

Bases: RenkuException

Raise when a directory in the renku repository is dirty.

Build a custom message.

exception renku.core.errors.DirtyRepository(repository)[source]

Bases: RenkuException

Raise when trying to work with dirty repository.

Build a custom message.

exception renku.core.errors.DockerAPIError(reason)[source]

Bases: DockerError

Raised when error has returned from the Docker API.

Embed exception and build a custom message.

exception renku.core.errors.DockerError(reason)[source]

Bases: RenkuException

Raised when error has occurred while executing docker command.

Embed exception and build a custom message.

exception renku.core.errors.DockerfileUpdateError[source]

Bases: RenkuException

Raised when the renku version in the Dockerfile couldn’t be updated.

exception renku.core.errors.DuplicateWorkflowNameError[source]

Bases: WorkflowError

Raises when a workflow name already exists.

exception renku.core.errors.ExportError[source]

Bases: DatasetException

Raised when a dataset cannot be exported.

exception renku.core.errors.ExternalFileNotFound(path)[source]

Bases: DatasetException

Raise when an external file is not found.

Build a custom message.

exception renku.core.errors.ExternalStorageDisabled[source]

Bases: RenkuException

Raise when disabled repository storage API is trying to be used.

Build a custom message.

exception renku.core.errors.ExternalStorageNotInstalled[source]

Bases: RenkuException

Raise when LFS is required but not found or installed in the repository.

Build a custom message.

exception renku.core.errors.FailedMerge(repository, branch, merge_args)[source]

Bases: RenkuException

Raise when automatic merge failed.

Build a custom message.

exception renku.core.errors.FileNotFound(path, checksum=None, revision=None)[source]

Bases: RenkuException

Raise when a file is not found.

Build a custom message.

exception renku.core.errors.GitCommandError(message='Git command failed.', command=None, stdout=None, stderr=None, status=None)[source]

Bases: GitError

Raised when a Git command fails.

Build a custom message.

exception renku.core.errors.GitCommitNotFoundError[source]

Bases: GitError

Raised when a commit cannot be found in a Repository.

exception renku.core.errors.GitConfigurationError[source]

Bases: GitError

Raised when a git configuration cannot be accessed.

exception renku.core.errors.GitError[source]

Bases: RenkuException

Raised when a Git operation fails.

exception renku.core.errors.GitLFSError[source]

Bases: RenkuException

Raised when a Git LFS operation fails.

exception renku.core.errors.GitMissingEmail(message=None)[source]

Bases: GitConfigurationError

Raise when the email is not configured.

Build a custom message.

exception renku.core.errors.GitMissingUsername(message=None)[source]

Bases: GitConfigurationError

Raise when the username is not configured.

Build a custom message.

exception renku.core.errors.GitReferenceNotFoundError[source]

Bases: GitError

Raised when a branch or a reference cannot be found.

exception renku.core.errors.GitRemoteNotFoundError[source]

Bases: GitError

Raised when a remote cannot be found.

exception renku.core.errors.GraphCycleError(cycles, message=None)[source]

Bases: RenkuException

Raised when a parameter reference cannot be resolved to a parameter.

Embed exception and build a custom message.

exception renku.core.errors.IncompatibleParametersError(first_param=None, second_param=None)[source]

Bases: ParameterError

Raise in case of incompatible parameters/flags.

Build a custom message.

exception renku.core.errors.InvalidAccessToken[source]

Bases: RenkuException

Raise when access token is incorrect.

Build a custom message.

exception renku.core.errors.InvalidFileOperation[source]

Bases: RenkuException

Raise when trying to perform invalid file operation.

exception renku.core.errors.InvalidGitURL[source]

Bases: GitError

Raise when a Git URL is not valid.

exception renku.core.errors.InvalidInputPath[source]

Bases: RenkuException

Raise when input path does not exist or is not in the repository.

exception renku.core.errors.InvalidOutputPath[source]

Bases: RenkuException

Raise when trying to work with an invalid output path.

exception renku.core.errors.InvalidSuccessCode(return_code, success_codes=None, message=None)[source]

Bases: RenkuException

Raise when the exit-code is not 0 or redefined.

Build a custom message.

exception renku.core.errors.InvalidTemplateError[source]

Bases: TemplateError

Raised when using a non-valid template.

exception renku.core.errors.KeyNotFoundError[source]

Bases: RenkuException

Raise when an SSH private or public key couldn’t be found.

exception renku.core.errors.LockError[source]

Bases: RenkuException

Raise when a project cannot be locked.

exception renku.core.errors.MappingExistsError(existing_mappings)[source]

Bases: WorkflowError

Raised when a parameter mapping exists already.

Embed exception and build a custom message.

exception renku.core.errors.MappingNotFoundError(mapping, workflow)[source]

Bases: WorkflowError

Raised when a parameter mapping does not exist.

Embed exception and build a custom message.

exception renku.core.errors.MetadataCorruptError(path)[source]

Bases: RenkuException

Raised when metadata is corrupt and couldn’t be loaded.

exception renku.core.errors.MetadataMergeError[source]

Bases: RenkuException

Raised when merging of metadata failed.

exception renku.core.errors.MigrationError[source]

Bases: RenkuException

Raised when something went wrong during migrations.

exception renku.core.errors.MigrationRequired[source]

Bases: RenkuException

Raise when migration is required.

Build a custom message.

exception renku.core.errors.MinimumVersionError(current_version, minimum_version)[source]

Bases: RenkuException

Raised when accessing a project whose minimum version is larger than the current renku version.

exception renku.core.errors.NodeNotFoundError[source]

Bases: RenkuException

Raised when NodeJs is not installed on the system.

Build a custom message.

exception renku.core.errors.NotFound[source]

Bases: RenkuException

Raise when an object is not found in KG.

exception renku.core.errors.NotebookSessionImageNotExistError[source]

Bases: RenkuException

Raised when a user attempts to start a session with an image that does not exist.

exception renku.core.errors.NotebookSessionNotReadyError[source]

Bases: RenkuException

Raised when a user attempts to open a session that is not ready.

exception renku.core.errors.NothingToCommit[source]

Bases: RenkuException

Raise when there is nothing to commit.

Build a custom message.

exception renku.core.errors.NothingToExecuteError[source]

Bases: RenkuException

Raised when a rerun/update command does not execute any workflows.

exception renku.core.errors.ObjectNotFoundError(filename)[source]

Bases: RenkuException

Raised when an object is not found in the storage.

Embed exception and build a custom message.

exception renku.core.errors.OperationError[source]

Bases: RenkuException

Raised when an operation at runtime raises an error.

exception renku.core.errors.OutputsNotFound[source]

Bases: RenkuException

Raise when there are not any detected outputs in the repository.

Build a custom message.

exception renku.core.errors.ParameterError(message, param_hint=None, show_prefix=True)[source]

Bases: RenkuException

Raise in case of invalid parameter.

Build a custom message.

exception renku.core.errors.ParameterLinkError(reason)[source]

Bases: RenkuException

Raised when a parameter link cannot be created.

Embed exception and build a custom message.

exception renku.core.errors.ParameterNotFoundError(parameter, workflow)[source]

Bases: WorkflowError

Raised when a parameter reference cannot be resolved to a parameter.

Embed exception and build a custom message.

exception renku.core.errors.ParseError[source]

Bases: RenkuException

Raise when a workflow file command has invalid format.

exception renku.core.errors.ProjectContextError[source]

Bases: RenkuException

Raise when no project context is pushed or there is a project context-related error.

exception renku.core.errors.ProjectNotFound[source]

Bases: RenkuException

Raise when one or more projects couldn’t be found in the KG.

exception renku.core.errors.ProjectNotSupported[source]

Bases: RenkuException

Raise when project version is newer than the supported version.

Build a custom message.

exception renku.core.errors.ProtectedFiles(ignored)[source]

Bases: RenkuException

Raise when trying to work with protected files.

Build a custom message.

exception renku.core.errors.RCloneException[source]

Bases: DatasetException

Base class for all rclone-related exceptions.

exception renku.core.errors.RenkuException[source]

Bases: Exception

A base class for all Renku related exception.

You can catch all errors raised by Renku SDK by using except RenkuException:.

exception renku.core.errors.RenkuSaveError[source]

Bases: RenkuException

Raised when renku save doesn’t work.

exception renku.core.errors.RenkulabSessionError[source]

Bases: SessionStartError

Raised when an error occurs trying to start sessions with the notebook service.

exception renku.core.errors.RenkulabSessionGetUrlError[source]

Bases: RenkuException

Raised when Renku deployment’s URL cannot be gotten from project’s remotes or configured remotes.

exception renku.core.errors.RequestError[source]

Bases: RenkuException

Raise when a requests call fails.

exception renku.core.errors.SHACLValidationError[source]

Bases: RenkuException

Raises when SHACL validation of the graph fails.

exception renku.core.errors.SSHNotFoundError[source]

Bases: RenkuException

Raised when SSH client is not installed on the system.

Build a custom message.

exception renku.core.errors.SSHNotSetupError[source]

Bases: RenkuException

Raised when SSH client is not installed on the system.

Build a custom message.

exception renku.core.errors.SessionStartError[source]

Bases: RenkuException

Raised when an error occurs trying to start sessions.

exception renku.core.errors.StorageObjectNotFound(error=None)[source]

Bases: RCloneException

Raised when a file or directory cannot be found in the remote storage.

exception renku.core.errors.StorageProviderNotFound(uri)[source]

Bases: DatasetException, ParameterError

Raised when a storage provider cannot be found based on a URI.

exception renku.core.errors.TemplateError[source]

Bases: RenkuException

Base class for template-related exceptions.

exception renku.core.errors.TemplateMissingReferenceError[source]

Bases: TemplateError

Raised when using a non-valid template.

exception renku.core.errors.TemplateNotFoundError[source]

Bases: TemplateError

Raised when a template cannot be found in a template source or at a specific reference.

exception renku.core.errors.TemplateUpdateError[source]

Bases: TemplateError

Raised when a project couldn’t be updated from its template.

exception renku.core.errors.TerminalSizeError[source]

Bases: RenkuException

Raised when terminal is too small for a command.

exception renku.core.errors.UninitializedProject(repo_path)[source]

Bases: RenkuException

Raise when a project does not seem to have been initialized yet.

Build a custom message.

exception renku.core.errors.UnmodifiedOutputs(repository, unmodified)[source]

Bases: RenkuException

Raise when there are unmodified outputs in the repository.

Build a custom message.

exception renku.core.errors.UsageError[source]

Bases: RenkuException

Raise in case of unintended usage of certain function calls.

exception renku.core.errors.WorkflowError[source]

Bases: RenkuException

Base class for workflow-related errors.

exception renku.core.errors.WorkflowExecuteError(fail_reason=None, show_prefix=True)[source]

Bases: WorkflowError

Raises when a workflow execution fails.

Build a custom message.

exception renku.core.errors.WorkflowExportError[source]

Bases: WorkflowError

Raises when a workflow cannot be exported.

exception renku.core.errors.WorkflowNotFoundError(name_or_id)[source]

Bases: WorkflowError

Raised when a workflow could not be found.

Embed exception and build a custom message.

exception renku.core.errors.WorkflowRerunError(cwl_file)[source]

Bases: WorkflowError

Raises when a workflow re-execution fails.

Build a custom message.

Utilities

Communicator classes for printing output.

class renku.core.util.communication.CommunicationCallback[source]

Bases: object

Base communication callback class.

busy(msg)[source]

Indicate a busy status.

For instance, show a spinner in the CLI.

confirm(msg, abort=False, warning=False, default=False)[source]

Get confirmation for an action.

echo(msg, end='\n')[source]

Write a message.

error(msg)[source]

Write an error message.

finalize_progress(name)[source]

End a progress tracker.

has_prompt()[source]

Return True if communicator provides a direct prompt to users.

info(msg)[source]

Write an info message.

prompt(msg, type=None, default=None, **kwargs)[source]

Show a message prompt.

start_progress(name, total, **kwargs)[source]

Create a new progress tracker.

update_progress(name, amount)[source]

Update a progress tracker.

warn(msg)[source]

Write a warning message.

renku.core.util.communication.confirm(msg, abort=False, warning=False, default=False)[source]

Get confirmation for an action from all listeners.

renku.core.util.communication.disable()[source]

Disable all outputs; by default everything is enabled.

renku.core.util.communication.enable()[source]

Enable all outputs.

renku.core.util.communication.error(msg)[source]

Write an error message to all listeners.

renku.core.util.communication.finalize_progress(name)[source]

End a progress tracker on all listeners.

renku.core.util.communication.has_prompt()[source]

Return True if communicator provides a direct prompt to users.

renku.core.util.communication.info(msg)[source]

Write an info message to all listeners.

renku.core.util.communication.prompt(msg, type=None, default=None, **kwargs)[source]

Show a message prompt.

renku.core.util.communication.start_progress(name, total, **kwargs)[source]

Start a progress tracker on all listeners.

renku.core.util.communication.subscribe(listener)[source]

Subscribe a communication listener.

renku.core.util.communication.unsubscribe(listener)[source]

Unsubscribe a communication listener.

renku.core.util.communication.update_progress(name, amount)[source]

Update a progress tracker on all listeners.

renku.core.util.communication.warn(msg)[source]

Write a warning message to all listeners.

Implement various context managers.

class renku.core.util.contexts.Isolation(**kwargs)[source]

Bases: ExitStack

Isolate execution.

Create a context manager.

renku.core.util.contexts.Lock(filename, timeout=0, mode='shared', blocking=False)[source]

A file-based lock context manager.

renku.core.util.contexts.chdir(path)[source]

Change the current working directory.

renku.core.util.contexts.measure(message='TOTAL')[source]

Measure execution time of enclosing code block.

class renku.core.util.contexts.redirect_stdin(new_target)[source]

Bases: ContextDecorator

Implement missing redirect stdin based on contextlib.py.

Keep the original stream.

renku.core.util.contexts.renku_project_context(path, check_git_path=True)[source]

Provide a project context with repo path injected.

renku.core.util.contexts.wait_for(delay)[source]

Make sure that at least delay seconds are passed during the execution of the wrapped code block.

renku.core.util.contexts.with_project_metadata(project_gateway, database_gateway, read_only=False, name=None, namespace=None, description=None, keywords=None, custom_metadata=None)[source]

Yield an editable metadata object.

Parameters:
  • project_gateway (IProjectGateway) – Injected project gateway.

  • database_gateway (IDatabaseGateway) – Injected database gateway.

  • read_only (bool) – Whether to save changes or not (Default value = False).

  • name (Optional[str]) – Name of the project (when creating a new one) (Default value = None).

  • namespace (Optional[str]) – Namespace of the project (when creating a new one) (Default value = None).

  • description (Optional[str]) – Project description (when creating a new one) (Default value = None).

  • keywords (Optional[List[str]]) – Keywords for the project (when creating a new one) (Default value = None).

  • custom_metadata (Optional[Dict]) – Custom JSON-LD metadata (when creating a new project) (Default value = None).

Renku datetime utilities.

renku.core.util.datetime8601.fix_datetime(value)[source]

Fix timezone of non-aware datetime objects and remove microseconds.

renku.core.util.datetime8601.local_now(remove_microseconds=False)[source]

Return current datetime in local timezone.

renku.core.util.datetime8601.match_iso8601(string, pos=0, endpos=9223372036854775807)

Matches zero or more characters at the beginning of the string.

renku.core.util.datetime8601.parse_date(value)[source]

Convert date to datetime.

renku.core.util.datetime8601.validate_iso8601(str_val)[source]

Check if datetime string is in ISO8601 format.

Helper utilities for handling DOIs.

renku.core.util.doi.extract_doi(uri)[source]

Return the DOI in a string if there is one.

renku.core.util.doi.get_doi_url(identifier)[source]

Return DOI URL for a given id.

renku.core.util.doi.is_doi(uri)[source]

Check if URI is DOI.

Git utility functions.

renku.core.util.git.check_global_git_user_is_configured()[source]

Check that git user information is configured.

renku.core.util.git.clone_renku_repository(url, path, gitlab_token=None, deployment_hostname=None, depth=None, install_githooks=False, install_lfs=True, skip_smudge=True, recursive=True, progress=None, config=None, raise_git_except=False, checkout_revision=None, use_renku_credentials=False, reuse_existing_repository=False)[source]

Clone a Renku Repository.

Parameters:
  • url (str) – The Git URL to clone.

  • path (Union[Path, str]) – The path to clone into.

  • gitlab_token – The gitlab OAuth2 token (Default value = None).

  • deployment_hostname – The hostname of the current renku deployment (Default value = None).

  • depth (Optional[int], optional) – The clone depth, number of commits from HEAD (Default value = None).

  • install_githooks – Whether to install git hooks (Default value = False).

  • install_lfs – Whether to install Git LFS (Default value = True).

  • skip_smudge – Whether to pull files from Git LFS (Default value = True).

  • recursive – Whether to clone recursively (Default value = True).

  • progress – The GitProgress object (Default value = None).

  • config (Optional[dict], optional) – Set configuration for the project (Default value = None).

  • raise_git_except – Whether to raise git exceptions (Default value = False).

  • checkout_revision – The revision to check out after clone (Default value = None).

  • use_renku_credentials (bool, optional) – Whether to use Renku provided credentials (Default value = False).

  • reuse_existing_repository (bool, optional) – Whether to clone over an existing repository (Default value = False).

Returns:

The cloned repository.

renku.core.util.git.clone_repository(url, path=None, install_githooks=True, install_lfs=True, skip_smudge=True, recursive=True, depth=None, progress=None, config=None, raise_git_except=False, checkout_revision=None, no_checkout=False, clean=False, clone_options=None)[source]

Clone a Git repository and install Git hooks and LFS.

Parameters:
  • url – The Git URL to clone.

  • path (Union[Path, str], optional) – The path to clone into (Default value = None).

  • install_githooks – Whether to install git hooks (Default value = True).

  • install_lfs – Whether to install Git LFS (Default value = True).

  • skip_smudge – Whether to pull files from Git LFS (Default value = True).

  • recursive – Whether to clone recursively (Default value = True).

  • depth – The clone depth, number of commits from HEAD (Default value = None).

  • progress – The GitProgress object (Default value = None).

  • config (Optional[dict], optional) – Set configuration for the project (Default value = None).

  • raise_git_except – Whether to raise git exceptions (Default value = False).

  • checkout_revision – The revision to check out after clone (Default value = None).

  • no_checkout (bool, optional) – Whether to perform a checkout (Default value = False).

  • clean (bool, optional) – Whether to require the target folder to be clean (Default value = False).

  • clone_options (List[str], optional) – Additional clone options (Default value = None).

Returns:

The cloned repository.

renku.core.util.git.commit_changes(*paths, repository, message=None)[source]

Commit paths to the repository.

Parameters:
  • *paths (Union[Path, str]) – The paths to commit.

  • repository (Repository) – The repository to commit to.

  • message – The commit message (Default value = None).

Raises:

errors.GitError – If paths couldn’t be committed.

Returns:

List of paths that were committed.

renku.core.util.git.create_backup_remote(repository, remote_name, url)[source]

Create a backup for remote_name and sets its url to url.

Parameters:
  • repository (Repository) – The current repository.

  • remote_name (str) – The name of the backup remote.

  • url (str) – The remote URL.

Returns:

Tuple of backup remote name, whether it existed already and the created

remote if successful.

Return type:

Tuple[str, bool, Optional[Remote]]

renku.core.util.git.finalize_commit(*, diff_before, repository, transaction_id, commit_only=None, commit_empty=True, raise_if_empty=False, commit_message=None, abbreviate_message=True, skip_staging=False)[source]

Commit modified/added paths.

renku.core.util.git.get_cache_directory_for_repository(url)[source]

Return a path to project’s cache directory.

Parameters:

url – The repository URL.

Returns:

The path of the cache.

Return type:

Path

renku.core.util.git.get_committer_agent(commit)[source]

Return committer SoftwareAgent.

Parameters:

commit (Commit) – The commit to check.

Returns:

The agent responsible for the commit.

Return type:

SoftwareAgent

renku.core.util.git.get_dirty_paths(repository)[source]

Get paths of dirty files in the repository.

renku.core.util.git.get_entity_from_revision(repository, path, revision=None, bypass_cache=False, checksum=None)[source]

Return an Entity instance from given path and revision.

Parameters:
  • repository (Repository) – The current repository.

  • path (Union[Path, str]) – The path of the entity.

  • revision (str, optional) – The revision to check at (Default value = None).

  • bypass_cache (bool) – Whether to ignore cached entries and get information from disk (Default value = False).

  • checksum (str, optional) – Pre-calculated checksum for performance reasons, will be calculated if not set.

Returns:

The Entity for the given path and revision.

Return type:

Entity

renku.core.util.git.get_file_size(repository_path, path)[source]

Return file size for a file inside a git repository.

renku.core.util.git.get_full_repository_path(url)[source]

Extract hostname/path of a git repository from its URL.

Parameters:

url (Optional[str]) – The URL.

Returns:

The hostname plus path extracted from the URL.

renku.core.util.git.get_git_path(path='.')[source]

Return the repository path.

renku.core.util.git.get_git_progress_instance()[source]

Return a GitProgress object.

renku.core.util.git.get_git_repository(path='.')[source]

Get Git repository from the current path or any of its parents.

Parameters:

path – Path to start from (Default value = “.”).

Raises:

ValueError – If not inside a git repository.

Returns:

Git repository

renku.core.util.git.get_git_user(repository)[source]

Return git user.

Parameters:

repository (Optional[Repository]) – The Git repository.

Returns:

The person associated with the repository.

Return type:

Optional[Person]

renku.core.util.git.get_hook_path(path, name)[source]

Return path to the given named hook in the given repository.

Parameters:
  • path (Path) – The current Git repository’s path.

  • name (str) – The name of the hook.

Returns:

Path to the hook.

Return type:

Path

renku.core.util.git.get_in_submodules(repository, commit, path)[source]

Resolve filename in submodules.

renku.core.util.git.get_oauth_url(url, gitlab_token)[source]

Format URL with a username and password.

Parameters:
  • url – The URL to format.

  • gitlab_token – The Gitlab OAuth2 Token.

Returns:

The URL with credentials added.

renku.core.util.git.get_remote(repository, *, name=None, url=None)[source]

Return repository’s remote using its name or url or return default remote if any.

Parameters:
  • repository (Optional[Repository]) – The Git repository.

  • name (str, optional) – The name of the remote (Default value = None).

  • url (str, optional) – The remote URL (Default value = None).

Returns:

The remote, if found.

Return type:

Optional[Remote]

renku.core.util.git.get_renku_repo_url(remote_url, deployment_hostname=None, access_token=None)[source]

Return a repo url that can be authenticated by renku.

Parameters:
  • remote_url – The repository URL.

  • deployment_hostname – The host name used by this deployment (Default value = None).

  • access_token – The OAuth2 access token (Default value = None).

Returns:

The Renku repository URL with credentials.

renku.core.util.git.get_repository_name(url)[source]

Extract name of a git repository from its URL.

Parameters:

url (str) – The URL to get the repository name from.

Returns:

The repository name.

Return type:

str

renku.core.util.git.have_same_remote(url1, url2)[source]

Checks if two git urls point to the same remote repo ignoring protocol and credentials.

Parameters:
  • url1 – The first URL.

  • url2 – The second URL.

Returns:

True if both URLs point to the same repository.

Return type:

bool

renku.core.util.git.is_path_safe(path)[source]

Check if the path should be used in output.

Parameters:

path (Union[Path, str]) – The path to check.

Returns:

True if the path is safe else False.

Return type:

bool

renku.core.util.git.is_valid_git_repository(repository)[source]

Return if is a git repository and has a valid HEAD.

Parameters:

repository (Optional[Repository]) – The repository to check.

Returns:

Whether this is a valid Git repository.

Return type:

bool

renku.core.util.git.parse_git_url(url)[source]

Return parsed git url.

Parameters:

url (Optional[str]) – The URL to parse.

Raises:

errors.InvalidGitURL – If url is empty.

Returns:

The parsed GitURL.

Return type:

GitURL

renku.core.util.git.prepare_commit(*, repository, commit_only=None, skip_dirty_checks=False, skip_staging=False)[source]

Gather information about repo needed for committing later on.

renku.core.util.git.push_changes(repository, remote=None, reset=True)[source]

Push to a remote branch. If the remote branch is protected a new remote branch will be created and pushed to.

Parameters:
  • repository (Repository) – The current repository.

  • remote (str, optional) – The remote to push to (Default value = None).

  • reset (bool, optional) – Whether to reset active branch to its upstream branch, used if changes get pushed to a temporary branch (Default value = True).

Raises:

errors.GitError – If there’s no remote or the push fails.

Returns:

Name of the branch that was pushed to.

Return type:

str

renku.core.util.git.run_command(command, *paths, separator=None, **kwargs)[source]

Execute command by splitting paths to make sure that argument list will be within os limits.

Parameters:
  • command – A list or tuple containing command and its arguments.

  • *paths – Paths to run on.

  • separator – Separator for paths if they need to be passed as string. (Default value = None)

Raises:

errors.GitError – If a Git subcommand failed.

Returns:

Result of last invocation.

renku.core.util.git.set_git_credential_helper(repository, hostname)[source]

Set up credential helper for renku git.

renku.core.util.git.shorten_message(message, line_length=100, body_length=65000)[source]

Wraps and shortens a commit message.

Parameters:
  • message (str) – message to adjust.

  • line_length (int, optional) – maximum line length before wrapping. 0 for infinite (Default value = 100).

  • body_length (int, optional) – maximum body length before cut. 0 for infinite (Default value = 65000).

Raises:

ParameterError – If line_length or body_length < 0

Returns:

message wrapped and trimmed.

renku.core.util.git.with_commit(*, repository, transaction_id, commit_only=None, commit_empty=True, raise_if_empty=False, commit_message=None, abbreviate_message=True, skip_dirty_checks=False)[source]

Automatic commit.

JWT utilities.

renku.core.util.jwt.is_token_expired(token)[source]

Return True if the given token is expired.

Helpers functions for metadata management/parsing.

renku.core.util.metadata.construct_creator(creator, ignore_email)[source]

Parse input and return an instance of Person.

renku.core.util.metadata.construct_creators(creators, ignore_email=False)[source]

Parse input and return a list of Person.

renku.core.util.metadata.get_canonical_key(key)[source]

Make a consistent configuration key.

renku.core.util.metadata.is_external_file(path, project_path)[source]

Checks if a path is an external file.

renku.core.util.metadata.is_linked_file(path, project_path)[source]

Return True if a dataset file is a linked file.

renku.core.util.metadata.is_protected_path(path)[source]

Checks if a path is a protected path.

renku.core.util.metadata.is_renku_project()[source]

Check if repository is a renku project.

renku.core.util.metadata.make_project_temp_dir(project_path)[source]

Create a temporary directory inside project’s temp path.

renku.core.util.metadata.prompt_for_credentials(provider_credentials)[source]

Prompt for provider credentials if needed and update and store them.

renku.core.util.metadata.read_credentials(section, key)[source]

Read provider’s credentials.

renku.core.util.metadata.read_renku_version_from_dockerfile(path=None)[source]

Read RENKU_VERSION from the content of path if a valid version is available.

renku.core.util.metadata.replace_renku_version_in_dockerfile(dockerfile_content, version)[source]

Replace Renku version in the Dockerfile.

renku.core.util.metadata.store_credentials(section, key, value)[source]

Write provider’s credentials.

OS utility functions.

renku.core.util.os.are_paths_equal(a, b)[source]

Returns if two paths are the same.

Return True if paths are equal or one is the parent of the other.

renku.core.util.os.bytes_to_unit(size_in_bytes, unit)[source]

Return size in the provided unit.

Create a symlink that points from symlink_path to target.

renku.core.util.os.delete_dataset_file(filepath, ignore_errors=True, follow_symlinks=False)[source]

Remove a file/symlink and its pointer file (for external files).

renku.core.util.os.delete_path(path)[source]

Delete a file/directory/symlink.

renku.core.util.os.expand_directories(paths)[source]

Expand directory with all files it contains.

renku.core.util.os.get_absolute_path(path, base=None, resolve_symlinks=False, expand=True)[source]

Return absolute normalized path.

Parameters:
  • path (Union[Path, str]) – Path to get its absolute.

  • base (Union[Path, str]) – Base path to get absolute path from it.

  • resolve_symlinks (bool) – Whether to keep or resolve symlinks.

  • expand (bool) – Whether to expand ~ or not (Default value = True)

Returns:

Absolute path.

Return type:

str

renku.core.util.os.get_expanded_user_path(path)[source]

Expand the path if it starts with ~.

renku.core.util.os.get_file_size(path, follow_symlinks=True)[source]

Return size of a file in bytes.

renku.core.util.os.get_files(path)[source]

Return all files from a starting file/directory.

renku.core.util.os.get_relative_path(path, base, strict=False)[source]

Return a relative path to the base if path is within base without resolving symlinks.

renku.core.util.os.get_relative_path_to_cwd(path)[source]

Get a relative path to current working directory.

renku.core.util.os.get_relative_paths(paths, base)[source]

Return a list of paths relative to a base path.

renku.core.util.os.get_safe_relative_path(path, base)[source]

Return a relative path to the base and check path is within base with all symlinks resolved.

NOTE: This is used to prevent path traversal attack.

renku.core.util.os.hash_file(path, hash_type='sha256')[source]

Calculate the sha256 hash of a file.

renku.core.util.os.hash_file_descriptor(file, hash_type='sha256')[source]

Hash content of a file descriptor.

renku.core.util.os.hash_string(content, hash_type='sha256')[source]

Hash a string.

renku.core.util.os.is_ascii(data)[source]

Check if provided string contains only ascii characters.

renku.core.util.os.is_path_empty(path)[source]

Check if path contains files.

Ref path:

target path

renku.core.util.os.is_subpath(path, base)[source]

Return True if path is within or same as base.

renku.core.util.os.matches(path, pattern)[source]

Check if a path matched a given pattern.

renku.core.util.os.normalize_to_ascii(input_string, sep='-')[source]

Convert a string to only contain ASCII characters, with non-ASCII substring replaced with sep.

renku.core.util.os.parse_file_size(size_str)[source]

Parse a human readable file size to bytes.

renku.core.util.os.safe_read_yaml(path)[source]

Parse a YAML file.

Returns:

In case of success a dictionary of the YAML’s content, otherwise raises a ParameterError exception.

renku.core.util.os.unmount_path(path)[source]

Unmount the given path and ignore all errors.

Utility for working with HTTP session.

This module provides some wrapper functions around requests library. It sets a timeout and converts exception types whenever needed. Use this module instead of requests.

renku.core.util.requests.check_response(response)[source]

Check for expected response status code.

renku.core.util.requests.delete(url, headers=None)[source]

Send a DELETE request.

renku.core.util.requests.download_file(base_directory, url, filename, extract, chunk_size=16384)[source]

Download a URL to a given location.

renku.core.util.requests.get(url, headers=None, params=None)[source]

Send a GET request.

renku.core.util.requests.get_filename_from_headers(response)[source]

Extract filename from content-disposition headers if available.

renku.core.util.requests.get_redirect_url(url)[source]

Return redirect URL if any; otherwise, return the original URL.

renku.core.util.requests.head(url, *, allow_redirects=False, headers=None)[source]

Send a HEAD request.

renku.core.util.requests.post(url, *, data=None, files=None, headers=None, json=None, params=None)[source]

Send a POST request.

renku.core.util.requests.put(url, *, data=None, files=None, headers=None, params=None)[source]

Send a PUT request.

JSON-LD SHACL validations.

renku.core.util.shacl.validate_graph(graph, shacl_path=None, format='nquads')[source]

Validate the current graph with a SHACL schema.

Uses default schema if not supplied.

SSH utility functions.

class renku.core.util.ssh.SSHKeyPair(private_key, public_key)[source]

Bases: NamedTuple

A public/private key pair for SSH.

Create new instance of SSHKeyPair(private_key, public_key)

private_key

Alias for field number 0

public_key

Alias for field number 1

class renku.core.util.ssh.SystemSSHConfig[source]

Bases: object

Class to manage system SSH config.

Initialize class and calculate paths.

property is_configured

Check if the system is already configured correctly.

is_session_configured(session_name)[source]

Check if a session is configured for SSH.

Parameters:

session_name (str) – The name of the session.

property public_key_string

Get the public key string, ready for authorized_keys.

session_config_path(project_name, session_name)[source]

Get path to a session config.

Parameters:
  • project_name (str) – The name of the project, without the owner name.

  • session_name (str) – The name of the session to setup a connection to.

Returns:

The path to the SSH connection file.

setup_session_config(project_name, session_name)[source]

Setup local SSH config for connecting to a session.

Parameters:
  • project_name (str) – The name of the project, without the owner name.

  • session_name (str) – The name of the session to setup a connection to.

Returns:

The name of the created SSH host config.

setup_session_keys()[source]

Add a users key to a project.

renku.core.util.ssh.generate_ssh_keys()[source]

Generate an SSH key pair.

Returns:

Private Public key pair.

Print a collection as a table.

renku.core.util.tabulate.format_cell(cell, datetime_fmt=None)[source]

Format a cell.

renku.core.util.tabulate.tabulate(collection, headers, datetime_fmt='%Y-%m-%d %H:%M:%S', **kwargs)[source]

Pretty-print a collection.

Helper utilities for handling URLs.

renku.core.util.urls.check_url(url)[source]

Check if a url is local/remote and if it contains a git repository.

renku.core.util.urls.get_host(use_project_context=True)[source]

Return the hostname for the resource URIs.

Default is localhost. If RENKU_DOMAIN is set, it overrides the host from remote.

renku.core.util.urls.get_path(url)[source]

Return path part of a url.

renku.core.util.urls.get_scheme(uri)[source]

Return scheme of a URI.

renku.core.util.urls.get_slug(name, invalid_chars=None, lowercase=True)[source]

Create a slug from name.

renku.core.util.urls.is_uri_subfolder(uri, subfolder_uri)[source]

Check if one uri is a ‘subfolder’ of another.

renku.core.util.urls.parse_authentication_endpoint(endpoint=None, use_remote=False)[source]

Return a parsed url.

If an endpoint is provided then use it, otherwise, look for a configured endpoint. If no configured endpoint exists then try to use project’s remote url.

renku.core.util.urls.remove_credentials(url)[source]

Remove username and password from a URL.

renku.core.util.urls.resolve_uri(uri)[source]

Resolve path part of a URI if it’s a local URI.

renku.core.util.urls.url_to_string(url)[source]

Convert url from list or ParseResult to string.

General utility functions.

renku.core.util.util.is_test_session_running()[source]

Return if the code is being executed in a test and not called by user.

renku.core.util.util.is_uuid(value)[source]

Check if value is UUID4.

Copied from https://stackoverflow.com/questions/19989481/

renku.core.util.util.parallel_execute(function, *data, rate=1, **kwargs)[source]

Execute the function using multiple threads.

Parameters:
  • function (Callable[..., Any]) – Function to parallelize. Must accept at least one parameter and returns a list.

  • data (Union[Tuple[Any], List[Any]]) – List of data where each of its elements is passed to a function’s execution.

  • rate (float) – Number of executions per thread per second.

Returns:

A list of return results of all executions.

Return type:

List[Any]

renku.core.util.util.to_semantic_version(value)[source]

Convert value to SemVer.

renku.core.util.util.to_string(value, strip=False)[source]

Return a string representation of value and return an empty string for None.

Support JSON-LD context in models.

class renku.core.util.yaml.NoDatesSafeLoader(stream)[source]

Bases: CSafeLoader

Used to safely load basic python objects but ignore datetime strings.

classmethod remove_implicit_resolver(tag_to_remove)[source]

Remove implicit resolvers for a particular tag.

Takes care not to modify resolvers in super classes.

We want to load datetimes as strings, not dates, because we go on to serialize as json which doesn’t have the advanced types of yaml, and leads to incompatibilities down the track.

renku.core.util.yaml.dumps_yaml(data)[source]

Convert YAML data to a YAML string.

renku.core.util.yaml.load_yaml(data)[source]

Load YAML data and return its content as a dict.

renku.core.util.yaml.read_yaml(path)[source]

Load YAML file and return its content as a dict.

renku.core.util.yaml.write_yaml(path, data)[source]

Store data to a YAML file.

Git Internals

Git repository management.

renku.core.git.ensure_clean(ignore_std_streams=False)[source]

Make sure the repository is clean.

renku.core.git.finalize_worktree(isolation, path, branch_name, delete, new_branch, merge_args=('--ff-only',), exception=None)[source]

Cleanup and merge a previously created Git worktree.

renku.core.git.get_mapped_std_streams(lookup_paths, streams=('stdin', 'stdout', 'stderr'))[source]

Get a mapping of standard streams to given paths.

renku.core.git.prepare_worktree(path=None, branch_name=None, commit=None)[source]

Set up a Git worktree to provide isolation.

renku.core.git.with_worktree(path=None, branch_name=None, commit=None, merge_args=('--ff-only',))[source]

Create new worktree.

Git utilities.

class renku.domain_model.git.GitURL(href, path=None, scheme='ssh', hostname='localhost', username=None, password=None, port=None, owner=None, name=None, slug=None, regex=None)[source]

Parser for common Git URLs.

Method generated by attrs for class GitURL.

property image

Return image name.

property instance_url

Get the url of the git instance.

classmethod parse(href)[source]

Derive URI components.