Graph services

In Renku, the dependencies of research artifacts are recorded into a knowledge graph. Each project’s local knowledge graph is recorded in its repository; the creation of the global knowledge graph is possible via the graph services. When a project’s repository is pushed to the server, a webhook is triggered that causes the changes represented by the commits and all of the captured dependencies to be rendered as RDF triples and pushed to the triple store.

The graph services are made up of four micro-services: the webhook-service, triples-generator, token-repository and knowledge-graph. The knowledge graph data is stored in the triple store (currently Apache Jena). The basic architecture is illustrated below.

strict digraph architecture { compound=true; newrank=true; graph [fontname="Raleway", nodesep="0.8"]; node [shape="rect", style="filled,rounded", fontname="Raleway"]; edge [fontname="Raleway"] GitLab [fillcolor="lightblue"] UI [fillcolor="#f4d142"] CLI [fillcolor="#f4d142"] WHS [label="Webhook Service" fillcolor="#f4d142"] TG [label="Triples Generator" fillcolor="#f4d142"] KG [label="Knowledge Graph" fillcolor="#f4d142"] Gateway [fillcolor="#f4d142"] Jena [label="Apache Jena" fillcolor="lightblue"] Log [label="Event Log" fillcolor="#f4d142", shape="parallelogram", width=2.0] LogDB [label="Event Log DB" fillcolor="lightblue", shape="parallelogram", width=2.0] subgraph cluster_clients { label="Clients" UI CLI {rank=same; UI, CLI}; } CLI -> GitLab [label=" git push"] WHS -> GitLab [label=" registers webhooks"] GitLab -> WHS [label=" sends Push Event\nwith information about pushed commits"] WHS -> Log [label=" writes Commit Events"] Log -> LogDB [label=" stores Commit Events"] TG -> Log [label=" subscribes for Events"] Log -> TG [label=" pushes Commit Events"] TG -> Jena [label=" generates RDF triples"] KG -> Jena [label=" SPARQL query"] UI -> Gateway [label=" interacts with Graph Services"] Gateway -> WHS [label=" asks to register webhooks,\nchecks Events processing status"] Gateway -> KG [label=" queries for metadata"] }

Sequence diagram of Graph Services APIs and processes.

POST <knowledge-graph>/knowledge-graph/graphql

An endpoint that allows performing GraphQL queries on the Knowledge Graph data.


POST <webhook-service>/projects/:id/webhooks

An endpoint to create a Graph Services webhook for a project in GitLab.


POST <webhook-service>/projects/:id/webhooks/validation

An endpoint to validate project’s webhook. It checks if a relevant Graph Services webhook exists on the repository in GitLab and if Graph Services have an Access Token associated with the project so they can use it for finding project specific information in GitLab.


POST <webhook-service>/webhooks/events

An endpoint to send Push Events containing information about commits pushed to the GitLab.


GET <webhook-service>/projects/:id/events/status

An endpoint that returns information about processing progress of events for a specific project.


Subscription to unprocessed Commit Events

A process initiated and maintained by Triples Generator instances so Event Log can send them Events requiring generation of triples.


Commit Events to RDF Triples

A process responsible for translating unprocessed Commit Events from the Event Log to RDF Triples in the RDF Store. This process runs continuously by polling the Event Log for unprocessed Commit Events.


Missed commits synchronization job

A scheduled job which synchronizes state between the Event Log and GitLab and generates Commit Events missing from the Event Log. It runs periodically with a configured interval.


Knowledge Graph re-provisioning process

A process executed on Triples Generator start-up that checks if triples in the RDF Store were generated with the version of renku-python currently set in the Triples Generator.