Gateways
Renku uses several gateways to abstract away dependencies on external systems such as the database or git.
Interfaces
Interfaces that the Gateways implement.
Renku activity gateway interface.
- class renku.core.interface.activity_gateway.IActivityGateway[source]
Bases:
ABC
Interface for the ActivityGateway.
- get_activities_by_generation(path, checksum=None)[source]
Return the list of all activities that generate a path.
- get_activities_by_usage(path, checksum=None)[source]
Return the list of all activities that use a path.
- get_downstream_activities(activity, max_depth=None)[source]
Get downstream activities that depend on this activity.
- get_downstream_activity_chains(activity)[source]
Get a list of tuples of all downstream paths of this activity.
- get_upstream_activities(activity, max_depth=None)[source]
Get upstream activities that this activity depends on.
- get_upstream_activity_chains(activity)[source]
Get a list of tuples of all upstream paths of this activity.
- remove(activity, keep_reference=True, force=False)[source]
Remove an activity from the storage.
- Parameters:
activity (Activity) – The activity to be removed.
keep_reference (bool) – Whether to keep the activity in the
activities
index or not.force (bool) – Force-delete the activity even if it has downstream activities.
Renku database gateway interface.
- class renku.core.interface.database_gateway.IDatabaseGateway[source]
Bases:
ABC
Gateway interface for basic database operations.
Renku dataset gateway interface.
- class renku.core.interface.dataset_gateway.IDatasetGateway[source]
Bases:
ABC
Interface for the DatasetGateway.
External storage interface.
- class renku.core.interface.storage.FileHash(uri, path, size, hash)[source]
Bases:
object
The hash for a file at a specific location.
- class renku.core.interface.storage.IStorage(storage_scheme, provider, credentials, provider_configuration)[source]
Bases:
ABC
Interface for the external storage handler.
- property credentials
Return the provider credentials for this storage handler.
- property provider
Return the dataset provider for this storage handler.
- property storage_scheme
Storage’s URI scheme.
- class renku.core.interface.storage.IStorageFactory[source]
Bases:
ABC
Interface to get a cloud storage.
- abstract static get_storage(storage_scheme, provider, credentials, configuration)[source]
Return a storage that handles provider.
- Parameters:
storage_scheme (str) – Storage name.
provider (CloudStorageProviderType) – The backend provider.
credentials (ProviderCredentials) – Credentials for the provider.
configuration (Dict[str, str]) – Storage-specific configuration that are passed to the IStorage implementation
- Returns:
An instance of IStorage.
Renku plan gateway interface.
- class renku.core.interface.plan_gateway.IPlanGateway[source]
Bases:
ABC
Interface for the PlanGateway.
Renku project gateway interface.
Implementations
Implementation of Gateway interfaces.
Renku activity database gateway implementation.
- class renku.infrastructure.gateway.activity_gateway.ActivityGateway[source]
Bases:
IActivityGateway
Gateway for activity database operations.
- get_activities_by_generation(path, checksum=None)[source]
Return the list of all activities that generate a path.
- get_activities_by_usage(path, checksum=None)[source]
Return the list of all activities that use a path.
- get_downstream_activities(activity, max_depth=None)[source]
Get downstream activities that depend on this activity.
- get_downstream_activity_chains(activity)[source]
Get a list of tuples of all downstream paths of this activity.
- get_upstream_activities(activity, max_depth=None)[source]
Get upstream activities that this activity depends on them.
- get_upstream_activity_chains(activity)[source]
Get a list of tuples of all upstream paths of this activity.
- remove(activity, keep_reference=True, force=False)[source]
Remove an activity from the storage.
- Parameters:
activity (Activity) – The activity to be removed.
keep_reference (bool) – Whether to keep the activity in the
activities
index or not.force (bool) – Force-delete the activity even if it has downstream activities.
- renku.infrastructure.gateway.activity_gateway.reindex_catalog(database)[source]
Clear and re-create database’s activity-catalog and its relations.
Renku generic database gateway implementation.
- class renku.infrastructure.gateway.database_gateway.ActivityDownstreamRelation(downstream, upstream)[source]
Bases:
object
Implementation of Downstream interface.
- class renku.infrastructure.gateway.database_gateway.DatabaseGateway[source]
Bases:
IDatabaseGateway
Gateway for base database operations.
- renku.infrastructure.gateway.database_gateway.dump_activity(activity, catalog, cache)[source]
Get storage token for an activity.
- renku.infrastructure.gateway.database_gateway.dump_downstream_relations(relation, catalog, cache)[source]
Dump relation entry to database.
- renku.infrastructure.gateway.database_gateway.initialize_database(database)[source]
Initialize an empty database with all required metadata.
- renku.infrastructure.gateway.database_gateway.load_activity(token, catalog, cache)[source]
Load activity from storage token.
- renku.infrastructure.gateway.database_gateway.load_downstream_relations(token, catalog, cache)[source]
Load relation entry from database.
Renku dataset gateway interface.
- class renku.infrastructure.gateway.dataset_gateway.DatasetGateway[source]
Bases:
IDatasetGateway
Gateway for dataset database operations.
Storage factory implementation.
- class renku.infrastructure.storage.factory.StorageFactory[source]
Bases:
IStorageFactory
Return an external storage.
- static get_storage(storage_scheme, provider, credentials, configuration)[source]
Return a storage that handles provider.
- Parameters:
storage_scheme (str) – Storage name.
provider (CloudStorageProviderType) – The backend provider.
credentials (ProviderCredentials) – Credentials for the provider.
configuration (Dict[str, str]) – Storage-specific configuration that are passed to the IStorage implementation
- Returns:
An instance of IStorage.
Base storage handler.
- class renku.infrastructure.storage.rclone.RCloneStorage(storage_scheme, provider, credentials, provider_configuration)[source]
Bases:
IStorage
External storage implementation that uses RClone.
- get_hashes(uri, hash_type='md5')[source]
Download hashes with rclone and parse them.
Returns a tuple containing a list of parsed hashes.
- Parameters:
uri (str) – Provider uri.
hash_type (str) – Type of hash to get from rclone (Default value =
md5
).
Example
hashes_raw json:
[ { "Path":"resources/hg19.window.masker.bed.gz.tbi","Name":"hg19.window.masker.bed.gz.tbi", "Size":578288,"MimeType":"application/x-gzip","ModTime":"2022-02-07T18:45:52.000000000Z", "IsDir":false,"Hashes":{"md5":"e93ac5364e7799bbd866628d66c7b773"},"Tier":"STANDARD" } ]
- is_directory(uri)[source]
Return True if URI points to a directory.
NOTE: This returns True for non-existing paths on bucket-based backends like S3 since listing non-existing paths won’t fail and there is no way to distinguish between empty directories and non-existing paths.
- run_command(command, *args, **kwargs)[source]
Run a RClone command with storage-specific configuration.
- renku.infrastructure.storage.rclone.get_rclone_env_var_name(provider_name, name)[source]
Get name of an RClone env var config.
- renku.infrastructure.storage.rclone.run_rclone_command(command, *args, env=None, **kwargs)[source]
Execute an RClone command.
- renku.infrastructure.storage.rclone.transform_args(*args)[source]
Transforms args to command line args.
- renku.infrastructure.storage.rclone.transform_kwargs(**kwargs)[source]
Transforms kwargs to command line args.
Renku plan database gateway implementation.
- class renku.infrastructure.gateway.plan_gateway.PlanGateway[source]
Bases:
IPlanGateway
Gateway for plan database operations.
Renku project gateway interface.
Repository
Renku uses git repositories for tracking changes. To abstract away git internals,
we delegate all git calls to the Repository
class.
An abstraction layer for the underlying VCS.
- class renku.infrastructure.repository.Actor(name, email)[source]
Bases:
NamedTuple
Author/creator of a commit.
Create new instance of Actor(name, email)
- email
Alias for field number 1
- name
Alias for field number 0
- class renku.infrastructure.repository.BaseRepository(path='.', repository=None)[source]
Bases:
object
Abstract Base repository.
- property active_branch
Return current checked out branch.
- property all_files
Return absolute paths of all files in the index and untracked files.
- property branches
Return all branches.
- commit(message, *, amend=False, author=None, committer=None, no_verify=False, no_edit=False, paths=None)[source]
Commit added files to the VCS.
- copy_content_to_file(path, *, revision=None, checksum=None, output_path=None, apply_filters=True)[source]
Get content of an object using its checksum, write it to a file, and return the file’s path.
- Parameters:
path (Union[Path, str]) – Relative or absolute path to the file.
revision (Optional[Union[Reference, str]]) – A commit/branch/tag to get the file from. This cannot be passed with
checksum
.checksum (Optional[str]) – Git hash of the file to be retrieved. This cannot be passed with
revision
.output_path (Optional[Union[Path, str]]) – A path to copy the content to. A temporary file is created if it is
None
.apply_filters (bool) – Whether to apply Git filter on the retrieved object. Note that
apply_filters
still works if repository is cloned with--skip-smudge
or ifGIT_LFS_SKIP_SMUDGE
is set. It also works if there is not entry for the file in.gitattributes
(e.g. when a file was deleted). The reason is that we use git lfs smudge command to get the file content if this option is passed and we also disableGIT_LFS_SKIP_SMUDGE
.
- Returns:
The path to the created file.
- create_worktree(path, reference, branch=None, checkout=True, detach=False)[source]
Create a git worktree.
- Parameters:
path (Path) – Target folder.
reference (Union[Branch, Commit, Reference, str]) – the reference to base the tree on.
branch (str, optional) – Optional new branch to create in the worktree.
checkout (bool, optional) – Whether to perform a checkout of the reference (Default value = False).
detach (bool, optional) – Whether to detach HEAD in worktree (Default value = False).
- fetch(remote=None, refspec=None, all=False, tags=False, unshallow=False, depth=None)[source]
Update a remote branches.
- property files
Return a list of all files in the current version of the repository.
- get_attributes(*paths)[source]
Return a map from paths to its attributes.
NOTE: Dict keys are the same relative or absolute path as inputs.
- get_configuration(writable=False, scope=None)[source]
Return git configuration.
NOTE: Scope can be “global” or “local”.
- get_content(path, *, revision=None, checksum=None, binary=False)[source]
Get content of a file in a given revision as text or binary.
- get_existing_paths_in_revision(paths=None, revision='HEAD')[source]
List all paths that exist in a revision.
- get_ignored_paths(*paths)[source]
Return ignored paths matching
.gitignore
file.NOTE: This function returns the same value as inputs: If input is an absolute path output is an absolute path. The same is true for relative paths. NOTE: Relative paths should be relative to the current working directory and not the repository’s root.
- get_object_hash(path, revision=None)[source]
Return git hash of an object in a Repo or its submodule.
NOTE: path must be relative to the repo’s root regardless if this function is called from a subdirectory or not.
- get_object_hashes(paths, revision=None)[source]
Return git hash of an object in a Repo or its submodule.
NOTE: path must be relative to the repo’s root regardless if this function is called from a subdirectory or not.
- get_previous_commit(path, revision=None, first=False, full_history=True, submodule=False)[source]
Return a previous commit for a given path starting from
revision
.
- get_raw_content(*, path, revision=None, checksum=None)[source]
Get raw content of a file in a given revision as text without applying any filter on it.
- get_revisions_paths(*checksums)[source]
Return a revision:path tuple for each checksum so that revision contains the given blob with the checksum.
- static hash_object(path)[source]
Create a git hash for a a path. The path doesn’t need to be in a repository.
- static hash_objects(paths)[source]
Create a git hash for a list of paths. The paths don’t need to be in a repository.
- property head
HEAD of the repository.
- is_dirty(untracked_files=True)[source]
Return True if the repository has modified or untracked files ignoring submodules.
- iterate_commits(*paths, revision=None, reverse=False, full_history=False, max_count=-1)[source]
Return a list of commits.
- property lfs
Return a Git LFS manager.
- property path
Absolute path to the repository’s root.
- push(remote=None, refspec=None, *, no_verify=False, set_upstream=False, delete=False, force=False)[source]
Push local changes to a remote repository.
- property remotes
Return all remotes.
- remove(*paths, index=False, not_exists_ok=False, recursive=False, force=False)[source]
Remove paths from repository or index.
- property staged_changes
Return a list of staged changes.
NOTE: This can be implemented by
git diff --cached --name-status -z
.
- property submodules
Return a list of submodules.
- property tags
Return all available tags.
- property unmerged_blobs
Return a map of path to stage and blob for unmerged blobs in the current index.
- property unstaged_changes
Return a list of changes that are not staged.
- property untracked_files
Return the list of untracked files.
- class renku.infrastructure.repository.Branch(repository, path)[source]
Bases:
Reference
A git branch.
- property remote_branch
Return the remote branch if any.
- class renku.infrastructure.repository.BranchManager(repository)[source]
Bases:
object
Manage branches of a Repository.
- class renku.infrastructure.repository.Commit(repository, commit)[source]
Bases:
object
A VCS commit.
- property author
Author of the commit.
- property authored_datetime
Commit authored date.
- property committed_datetime
Commit date.
- property committer
Committer of the commit.
- get_changes(*paths, commit=None, patch=False)[source]
Return list of changes in a commit.
NOTE: This function can be implemented with
git diff-tree
. NOTE: Whenpatch
is FalseDiff.diff
will be empty. We need to callCommit.diff
twice whenpatch
is True because GitPython won’t setDiff.change_type
in this case.
- property hexsha
Commit sha.
- property message
Commit message.
- property parents
List of commit parents.
- property root
Return True if this commit is the root commit.
- property tree
Return all objects in the commit’s tree.
- class renku.infrastructure.repository.Configuration(repository=None, scope=None, writable=True)[source]
Bases:
object
Git configuration manager.
- class renku.infrastructure.repository.Diff(a_path, b_path, change_type, diff)[source]
Bases:
NamedTuple
A single diff object between two trees.
Create new instance of Diff(a_path, b_path, change_type, diff)
- a_path
Alias for field number 0
- property added
True if file was added.
- b_path
Alias for field number 1
- change_type
Alias for field number 2
- property deleted
True if file was deleted.
- diff
Alias for field number 3
- class renku.infrastructure.repository.DiffChangeType(value)[source]
Bases:
Enum
Type of change in a
Diff
.
- class renku.infrastructure.repository.DiffLine(text, change_type)[source]
Bases:
NamedTuple
A single line in a patch.
Create new instance of DiffLine(text, change_type)
- property added
True if line was added.
- change_type
Alias for field number 1
- property deleted
True if line was deleted.
- text
Alias for field number 0
- class renku.infrastructure.repository.DiffLineChangeType(value)[source]
Bases:
Enum
Type of change in a
DiffLine
.
- class renku.infrastructure.repository.Object(path, type, size, hexsha)[source]
Bases:
NamedTuple
Represent a git object.
Create new instance of Object(path, type, size, hexsha)
- hexsha
Alias for field number 3
- path
Alias for field number 0
- size
Alias for field number 2
- type
Alias for field number 1
- class renku.infrastructure.repository.Reference(repository, path)[source]
Bases:
object
A git reference.
- property commit
Commit pointed to by the reference.
- property name
Reference name.
- property path
Reference path.
- class renku.infrastructure.repository.Remote(repository, name)[source]
Bases:
object
Remote of a Repository.
- property head
The head commit of the remote.
- property name
Remote’s name.
- property references
Return a list of remote references.
- property url
Remote’s URL.
- class renku.infrastructure.repository.RemoteManager(repository)[source]
Bases:
object
Manage remotes of a Repository.
- class renku.infrastructure.repository.RemoteReference(repository, path)[source]
Bases:
Reference
A git remote reference.
- property remote
Return reference’s remote.
- class renku.infrastructure.repository.Repository(path='.', search_parent_directories=False, repository=None)[source]
Bases:
BaseRepository
Abstract Base repository.
- class renku.infrastructure.repository.Submodule(parent, name, path, url)[source]
Bases:
BaseRepository
A git submodule.
- property name
Return submodule’s name.
- property relative_path
Relative submodule’s path to its parent repository.
- property url
Return submodule’s url.
- class renku.infrastructure.repository.SubmoduleManager(repository)[source]
Bases:
object
Manage submodules of a Repository.
- class renku.infrastructure.repository.SymbolicReference(repository, path)[source]
Bases:
Reference
A git symbolic reference.
- property reference
Return the reference that this object points to.
- class renku.infrastructure.repository.Tag(repository, path)[source]
Bases:
Reference
A git tag.
- property commit
Return the commit the tag refers to.
- class renku.infrastructure.repository.TagManager(repository)[source]
Bases:
object
Manage tags of a Repository.