The Renku Knowledge Graph¶
The Knowledge Graph is the backbone of Renku. It is currently in beta and is used to record and query the relationships between data and code. In the future we will also use the Knowledge Graph to store and retrieve metadata and to provide search functionality. See the Workflows section for a high-level description. A simple Knowledge Graph in the current Renku web user interface looks something like this:
Every time you use the
renku run command we record some information about
the inputs, the outputs, and the actual execution of your script. While you are
working in your repository this information is simply stored in your project and
accessible via tools like the
renku log command. However, as soon as the
project is pushed to the server, the commits are interpreted by the graph
service and the data is fed into the database holding the Knowledge Graph. This
happens automatically and there is no action required on your part, apart from
activating the Knowledge Graph integration.
The Knowledge Graph is activated by default on all new Public or Internal projects. On projects that were in place before this feature was released, the user is prompted to enable the integration via a pop-up button:
For Private projects, there is an option to opt-out of the Knowledge Graph, if you do not want your metadata to be able to be queried without access.
You can then turn on Knowledge Graph integration at a later time if you wish.
File Lineage Display¶
To see the current provenance information of any file in your project, select the
file in your file browser and click either the
Lineage View drop down
or the graph icon on the right-hand side of the file preview pane:
In short: if you enable the Knowledge Graph integration on a private project, the metadata about your project will become publicly accessible. Important: your data is in no way affected by this - the contents of your project are still private. What is metadata then? Things like filenames, paths within the project, people working on the project, project directory structure etc. This is especially true if your private project references a public project - a lineage query will reveal the workflow steps taken within your private project using the public data. If sharing this level of metadata is not acceptable to you, then please do not enable the Knowledge Graph integration for now. For new private projects, you will be given an option when creating the project whether to enable the Graph or not. If you choose not to do so initially, you can change your mind later and we will happily add the project to the Graph for you. However, while possible, removing a project completely from the Graph is not easily supported at the moment. We are working on providing comprehensive access control on the graph queries that traverse projects, but at this stage they are not yet implemented.