[OCWR] Week 15 - OpenCitations Weekly Report

Week from Nov 09 to Nov 15

Introduction

This week I ran again an integration test of the oc_ocdm v3.0.0 against the CCC repository. Contextually, I wrote a document (in the italian language) in which I explain what should be changed in the BEE/SPACIN workflow in order to adapt the code with respect to the latest oc_ocdm’s version. Then, I created two text files which are compliant to the ShExC and the ShapeMap formats. They simply describe the rules stated by the OCDM in a formal way, enabling the possibility of an automated validation of any RDF graph which is required to adhere to those same rules. Finally, the add_ methods from GraphSet were modified: should the entity be already contained inside the GraphSet, the method will return it without instantiating a new object.

Report

The ShExC and the ShapeMap files for RDF validation

The ShExC (Shape Expression - compact) format is a formal language which permits to express the rules about the shape of an RDF graph. I had to write the ShExC file for the OCDM, alongside with its respective ShapeMap file (which associates nodes in the graph with their shape). Actually, the ShapeMap file is implemented in the QueryMap format, which allows to easily define queries used to programmatically obtain the nodes that should be associated to a particular shape. This complex work required a deep understanding of the OCDM: as a way to help myself, I created a PDF document which contains a sort of ER schema reflecting the shapes mandated by the OCDM.

The OCDM ER schema.pdf file is downloadable from here.

GraphEntity subclasses are now singletons within the context of a GraphSet

I modified the add_ methods from GraphSet (such as add_br, add_be, add_ci, etc.). They now check whether the entity with the required URIRef (if one was passed as argument) is already contained inside the GraphSet: in such a case, the already existing entity is returned instead of a newly instantiated one. This ensures that no duplicate objects are instantiated, leading to a singleton-like behaviour of the entities. The new implementation should also be lighter memory-wise.

Integration test with BEE and SPACIN

I ran again (see the fourth weekly report) a test to verify whether the oc_ocdm package could be used as a drop-in replacement for the original graphlib.py module. I forked the CCC repository and I opened a new branch in which I made all the modifications required to adapt the SPACIN and BEE code to the latest version of the oc_ocdm package (v3.0.0). Everything worked fine out of the box, even if I found (and fixed) a little bug related to the encoding of files persisted by scripts/script/bee/refstorer.py and scripts/script/ocdm/storer.py.