What is Renku?¶

Renku is a web platform (RenkuLab) and a command-line interface (Renku Client) built on top of open source components for researchers, data scientists, educators, and students to help manage their data, code, workflows, provenance, and computational environments.

Renku combines many widely-used open-source tools to equip every project on the platform with resources that aid reproducibility, reusability and collaboration. Version control for data and code, containerization for runtime environments and automatic workflow capture are the core pillars on which the platform is built.

RenkuLab¶

RenkuLab is a web platform for creating, storing, working on, and sharing your collaborative data analysis projects. Our public “flagship” deployment of RenkuLab can be found at renkulab.io and is free for anyone to use.

RenkuLab is the glue that makes it possible to develop and share your research entirely in the cloud. You can, directly from a project’s homepage on RenkuLab, launch JupyterLab and RStudio sessions (or anything else you might run from a Docker container) using pre-built templates. You can work on your project and when you are done push the changes back to the repository for safe storage. Our pre-built base images also contain the renku command-line tool so you don’t need to worry about installation and can benefit from provenance tracking right in the interactive session. RenkuLab automatically builds images for your interactive sessions so the computational environments you or your collaborators use are always up-to-date.

The main components that make up RenkuLab are GitLab for repository management, version control, and continuous-integration pipelines; Jupyter servers for interactive sessions; a Knowledge Graph for search and discovery; a few custom services for all Renku-specific tasks like handling datasets.

Please contact us if you would like to deploy your own instance or see Guides for Administrators.

Renku Client¶

The renku command-line client is the powerful complement of the hosted RenkuLab platform. It is a tool for easily capturing your data-science process as you work, by extending version control to encompass the complete research and data exploration life-cycle. It lets you manage versioned datasets, automatically create workflows and keep track of computational environments. Check out Get Started on RenkuLab tutorial for a complete example.

The command-line client is automatically installed in computational environments on RenkuLab, but you can you can follow these installation instructions if you need to use it elsewhere.

Datasets¶

The renku client can be used to create “datasets”, which are just collections of data files bundled together and enriched with metadata.

It is easy to create datasets

renku dataset create mydataset

add files to the dataset

renku dataset add mydataset datafile

and even import existing datasets from public repositories like Zenodo and Dataverse:

renku dataset import https://zenodo.org/record/3981451

The full metadata of the data repository is preserved and mirrored in the Knowledge Graph for easy retrieval and search.

Provenance of results¶

Capturing the provenance of results is critical for understanding what input data were used, what code was run, and what results were produced.

The renku client gives researchers and analysts a simple tool to automatically track provenance and iteratively develop a workflow.

Creating a workflow is done by invoking renku run in front of any shell command:

renku run echo "hello-world!" > hello.txt
renku run wc hello.txt > hello.wc