[OCWR] Week 15 - OpenCitations Weekly Report
Week from Nov 09 to Nov 15
Introduction
This week I ran again an integration test of the oc_ocdm v3.0.0 against the CCC repository. Contextually, I
wrote a document (in the italian language) in which I explain what should be changed in the BEE/SPACIN workflow in
order to adapt the code with respect to the latest oc_ocdm’s version. Then, I created two text files which are
compliant to the ShExC and the ShapeMap formats. They simply describe the rules stated by the OCDM in a formal
way, enabling the possibility of an automated validation of any RDF graph which is required to adhere to those same rules. Finally, the
add_
methods from GraphSet
were modified: should the entity be already contained inside the GraphSet
, the method will return it
without instantiating a new object.
Report
The ShExC and the ShapeMap files for RDF validation
The ShExC (Shape Expression - compact) format is a formal language which permits to express the rules about the shape of an RDF graph. I had to write the ShExC file for the OCDM, alongside with its respective ShapeMap file (which associates nodes in the graph with their shape). Actually, the ShapeMap file is implemented in the QueryMap format, which allows to easily define queries used to programmatically obtain the nodes that should be associated to a particular shape. This complex work required a deep understanding of the OCDM: as a way to help myself, I created a PDF document which contains a sort of ER schema reflecting the shapes mandated by the OCDM.
The OCDM ER schema.pdf
file is downloadable from here.
GraphEntity subclasses are now singletons within the context of a GraphSet
I modified the add_
methods from GraphSet
(such as add_br
, add_be
, add_ci
, etc.). They now check whether the entity with the
required URIRef
(if one was passed as argument) is already contained inside the GraphSet
: in such a case, the already existing entity
is returned instead of a newly instantiated one. This ensures that no duplicate objects are instantiated, leading to a singleton-like
behaviour of the entities. The new implementation should also be lighter memory-wise.
Integration test with BEE and SPACIN
I ran again (see the fourth weekly report) a test to verify whether the oc_ocdm package could be used as a drop-in
replacement for the original graphlib.py
module. I forked the CCC repository and I opened a new branch in which I
made all the modifications required to adapt the SPACIN and BEE code to the latest version of the
oc_ocdm package (v3.0.0). Everything worked fine out of the box, even if I found (and fixed) a little bug related to
the encoding of files persisted by scripts/script/bee/refstorer.py
and scripts/script/ocdm/storer.py
.